db memory management

storage: Goal:

Provide an illusion that it has sufficient memory, avoid large stalls due to expensive disk access, maximize sequential access due to principle of locality.

Problem: why not use the os for moving the files’ pages in and out of memory?

Answer: OS doesn’t care about what the file is while DBMS sees the difference of queries.

Disk manager

Problem: how the DBMS represents the database in files on disk?

Where to store data?

Fixed-size db pages.

Storage manager: organize files as a collection of pages.

  • represents the files as a collection of pages.
  • keeps track: data read/written to pages, available space
  • hardware page

Fact:The storage device guarantees an atomic write of the size of the hardware page.

A hardware page is the largest block of data that the storage device can guarantee failsafe writes. If our database page is larger than our hardware page, the DBMS will have to take extra measures to ensure that the data gets written out safely since the program can get partway through writing a database
page to disk when the system crashes.

  • Database heap: random pages collection.

Problem: how to locate a page on disk given page_id?

  1. Linked list: sequenctial scan

  2. Page directory: translation

how to organize the data inside of the page?

  1. Slotted-pages

    Header + slot array + tuples

    To add a tuple, the slot array will grow from the beginning to the end, and the data of the tuples will grow from end to the beginning. The page is considered full when the slot array and the tuple data meet.

  2. Log-structured

    Log records: fast writes, slow reads.

    • Optimizations:

      • build index

      • Compact the log

        Concerns: write amplification

what is the data?

System catalog tracks the schema.

> tuple layout

header:Meta-data about the tuple + data + unique identifier(page_id + (offset/slot))

Denormalize tuple data: when two tables are related, they can be pre-joined and it results in one table under the cost of update while it makes reads faster.

> data representations

The data in a tuple is essentially just byte arrays. It is up to the DBMS to know how to interpret those bytes to derive the values for attributes. A data representation scheme is how a DBMS stores the bytes for a value.

Five high level datatypes in tuples:

  1. Integer

    • Examples: INTEGER, BIGINT, SMALLINT, TINYINT
  2. variable precision numbers

    upside: Operations on variable precision numbers are faster than arbitrary precision numbers.

    downside:rounding errors

    • Examples: FLOAT, REAL
  3. Fixed-point precision numbers

    Upside:precise

    Downside: slow

    • Examples: NUMERIC, DECIMAL
  4. Variable-length data

    problem: large values exceeding the page

    Solution:

    ​ 1.use separate overflow storage pages

    ​ 2.store data in external files, downside:DBMS lose control of this file

    • Examples: VARCHAR, VARBINARY, TEXT, BLOB.
  5. Dates and times

    Examples: TIME, DATE, TIMESTAMP

workloads types

workloads: general nature of requests a system will have to handle.

  1. OLTP:online transaction processing: simple queries; more writes than reads

  2. OLAP:online analytical processing: complex queries; more reads than writes

  3. HTAP: hybrid transaction + analytical processing

data storage models

assume n-ary storage model in class.

  1. N-ary storage model(NSM): stores all attributes for a single tuple
    contiguously in a page -> row store

    • pro: scanning the entire tuple
    • con: not good for scanning large portions of the table/a subset of the attributes
  2. decomposition storage model(DSM): stores the values of a single attribute for all tuples contiguously in a page -> column store

    • pro: scan portiosn of the table; query processing; data compression
    • con: because of tuple splitting/stitching, slow for point queries, inserts, updates, deletes
    • tuple indentification: put the tuples back together when using a column store

      • Fixed-length offsets

      • Embedded tuple ids

Buffer pool manager

Problem: how the DBMS manages its memory and move data back-and-forth from disk?

Two goals of db storage is spatial control and temporal control.

  • keep pages that are used together often as physically close together as possible on disk.
  • minimize the number of stalls from having to read data from disk.

What is buffer pool?

The buffer pool is an in-memory cache of pages read from disk. It is essentially a large memory region allocated inside of the database to store pages that are fetched from disk.

  1. memory organization.

    Buffer pool’s region of memory organized as an array of fixed size pages. Each entry is a frame.

  2. Meta-data

    • page table: an in-memory hash table that keeps track of pages that are currently in memory.

      (Key:frame location , value: page id)

    • Dirty-flag: set by a thread whenever it modifies a page.

      • Indicate: the page must be written back to disk
    • Pin/reference counter

      • tracks the number of threads that are currently accessing that page (either reading or modifying it).
      • if count > 0, then evicting that page from mem is not allowed.
  3. allocation policies

    • global view: best in all transactions
    • local view: make a single query/transaction run faster

buffer pool optimizations

  1. multiple buffer pools

    • approach 1: Object ID: mapping from objects to specific buffer pools

      <objectID, pageID, slotNum>

    • approach 2: Hashing: Hash the page id to select which buffer pool to access.

  2. Pre-fetching: prefetched the following pages based on the query plan.

    • Scenario: accessing many pages sequentially

    • approach 1: sequential scans

    • approach 2: index scans

  3. Scan-sharing: scanning the same data

    Q2 will attach to Q1 and keep scanning as Q1 moves. When Q1 stops, Q2 continues to scan until meet the attaching point.

  4. Buffer pool bypass: access pages in disk directly

    • pro: fetch a large sequence of pages that are contiguous on disk; knows as light scans

os page cache

another way to access data

os maintains its own file system cache.

Buffer Replacement Policies

Problem: When the DBMS needs to free up a frame to make room for a new page, it must decide which page to evict from the buffer pool.

  1. LRU: least-recently used

  2. CLOCK: use reference bit in each page instead of timestamps in LRU

Con: susceptible to sequential flooding. Buffer pool’s content are corrupted due to a sequential scan

Approaches:

  1. LRU-K: tracks the last K references as timestamps
  2. localization per query: evict on a per transaction/query basis.
  3. Priority hints: based on the context of each page during query execution

dirty pages

Approaches:

  1. Fast evictions: drop dirty pages
  2. Slow evictions: write back
  3. background writing: the DBMS can periodically walk through the page table and write dirty pages to disk. When a dirty page is safely written, the DBMS can either evict the page or just unset the dirty flag

other memory pool

  1. Sorting + join buffers
  2. query caches
  3. maintenance buffers
  4. log buffers
  5. Dictionary caches