db buffer poo
Terminology:
- Hardware page: The storage device guarantees an atomic write of the size of the hardware page. A hardware page is the largest block of data that the storage device can guarantee failsafe writes.
- Database heap: Heap file organization is one of ways to organize pages of a DBMS. It helps to find the location of the page that DBMS wants on disk. A heap file consists of unordered pages where tuples are stored in random order.
- Tuple: a tuple is a sequence of bytes, contiguous or not, to interpreted by DBMS into attribute types and values.
Storage should prove virtually sufficient memory, maximize sequential access due to principle of locality and prevent I/O delays due to disk access. Databases manage its own storage because operating systems are not aware of the file layout or the difference of queries.
The DBMS stores database as files on disk in pages. Storage manager can organize files as a collection of pages and keep track of data read/written to pages and available space. Pages in DBMS concepts are divided into hardware page (usually 4 KB), OS page (4 KB), and database page (1-16 KB). A database page is always fixed-size to avoid the engineering overhead to support variable page size. If a database page is larger than a hardware page, the DBMS needs to guarantee write atomicity.
Data layouts in pages use two main approaches: slotted-pages and log-structured. Slotted pages map slots to offsets. Log-structured architecture stores the log records of changes to the tuples. The log-structured model assumes no overwrites and only allows creation of new pages on a system, which helps to address several problems of slotted pages such as fragmentation, useless and random disk I/O.
Log-structure storage is built upon LSTF and LSM Tree. To store a log entry, the DBMS does the following operations:
- Apply changes to an in-memory data structure (MemTable)
- Write out the changes sequentially to disk (SSTable)
- Sorts each SSTable based on keys before writing out.
The read path is to check MemTable first and then check SSTables at each level. To avoid brute force scans of SSTables, the DBMS maintains an in-memory SummaryTable to track system catalogs.
In a write-heavy workload, the DBMS stockpiles a large number of SSTables. Thus, the DBMS periodically compacts the logs by taking the most recent update. The log compaction can reduce wasted space and accelerate reads. There are three major approaches to log compaction: universal, level, and tiering compaction.
Analysis of trade-offs of using log-structured storage breaks down to several aspects: writes, reads, compaction, write amplification.
In the index-organized storage, the DBMS stores indexes as keys and a table’s tuples as the values to those keys. The tables of databases are inherently unsorted. Storages rely on indexes to find a particular tuple.
Buffer Pool Manager¶
Terminology:
- Buffer pool: an in-memory cache of pages between memory and disk.
- write-back cache: dirty pages are buffered and not written to disk immediately on mutation.
- write-through cache: any changes are instantly propagated to disk.
mmap vs. buffer pool
Disadvantages of mmap:
- Mmap must align sizes to a page size.
- The memory mapped file’s content must fit into the calling process’ virtual address space.
- Performance issues: cache coherence.
What differentiates them is the management of memory contents in terms of OS or databases. A database can bypass the page cache of OS’ and manage in-memory data matching its own needs.
Paper: Are You Sure You Want to Use MMAP in Your Database Management System?