CMU15-445

Jimmy_kiet included in Database

2022-06-24 3841 words 8 minutes

前言

雖然project的部分有點窒礙難行，不過既然都聽課了還是把筆記做好吧！

CS CMU 15-445/645, Fall 2019/2021

related resources

gradescope newtest_2021

my repo

reference

Relational Model

logical layer separate from physical layer

tuple/record/row mean same thing table/relation mean same thing value in tuple should be atomic

Primary key：identify a specific tuple in current relation(table)
Foreign key：identify a specific tuple in another relation(table)

Relational Algebra：Each operator takes one or more relations as its inputs and outputs a new relation.
join：more flexible intersection

Advanced SQL

Aggregate

Aggregate functions can only be used in the SELECT output list
Output of other columns outside of an aggregate is undefined. -> aggregate function only output one value

Group By

Aggregates records by group
if output clause have one aggregate function, then other non-aggregate attribute should in Group By clause

Having

you can not use aggregation resulit in where clause because you don’t have that result yet

Like

%：match for substring
_：match for character

Output Redirection

store output of previous query into new/existing table, schema should be match

Output Control

order by
limit

Nested Queries

only inner query can reference information in outer query

Window Funtion

aggregation(row_number) + group by(over)
in last example, maybe OVER() remain the same, and changeing RANK() to ROW_NUMBER() will get same result (?

Common Table Expression

WITH statement is much more clear than nested query IMO.
WITH RECURSIVE：WTF

HomeWork 1(2019)

Q4也太折磨…不過看來integer也能用Strign operation的like 好吧應該就是一點數學的trick… 2010s|2789741 好像錯了？全部總合也沒那麼多啊…

Database Storage I

Architecture

bottom to top(lower level to higher level)：

disk manager (database itself)
file system (OS provide)

Storage Manager：organizes the files as a collection of pages
1. represent data on disk
2. move data between disk and memory
DataBase Pages：A page is a fixed-size block of data

Manage Pages

Heap File：an unordered collection of pages
- Linked List：suck
- Page Directory：
Sequential/Sorted File：
Hashing File：

Data Layout

Page：Header + Data

Tuple-oriented
- SLOTTED PAGES：slotted array(grow top to bottom) + fixed/variable length tuple of same table(grow bottom to top)
- tuple is assigned a unique record identifier(page_id + offset/slot)
log-oriented：read operation will be slow(recreate tuple), need index(jump to locations in the log)

Tuple Layout

Denormalize：Store data from different tables into same page

Database Storage II

Tuple Storage

DBMS interpret bytes into tables attribute and its value

Data Type

FLOAT, REAL/DOUBLE：fast but not precise
NUMERIC, DECIMAL：slow but precise, use meta-data to get arbitrary precision

Large Values：each tuple size should not greater than page size
1. overflow/TOAST page：has protection
2. BLOB data type：DBMS can’t midify the content of this file, but others can -> no protection

System Catalogs

meta-data about database
store catalogs as table

Database Workloads

OLTP(On-Line Transaction Processing)：query access only small portions of database(Write operations)
OLAP(On-Line Analytical Processing)：query access large portions of database(Read operations)

Data Storage Model

N-ary Storage Model (NSM, row)
1. store single tuple’s attributes in a page
2. good for OLTP：queries that access small number of entire tuples
3. not good for OLAP：some attribues that bring from disk to memory become useless in query execution
Decomposition Storage Model (DSM, column)
1. store same attribute value of all tuples in a page
2. problems：how to track each tuple’s attribute？
  1. Fixed-length Offsets：offsets
  2. Embedded Tuple Ids：use tuple identifier for each attribute

Projects Setup

照著做沒煩惱，repo的部分參考官方的做法。

origin docker engine content, 這個應該不用動？

{
  "features": {
    "buildkit": true
  },
  "builder": {
    "gc": {
      "enabled": true,
      "defaultKeepStorage": "20GB"
    }
  },
  "experimental": false
}

在bustub文件中执行这一条命令

應該是要在docker裡的bustub操作

1.3 配置本地目录挂载

太方便啦

docker container run -it -v /Users/jimmy/documents/CMU-DBMS/bustub-private:/bustub --name=bustub_env bustub /bin/bash

重新執行

docker start 8dbdbd03dfed
docker exec -it 8dbdbd03dfed /bin/bash

Projects 0(2021)

被繼承的東西到底要怎麼拿出來搞死…

throw exception請參照code base的方法…

test

$ mkdir build
$ cd build
$ make starter_test
$ ./test/starter_test

format

$ make format
$ make check-lint
$ make check-clang-tidy

zip project0-submission.zip src/include/primer/p0_starter.h

local run: make check-clang-tidy

encounter following: WTF ?

Checking: /bustub/src/buffer/lru_replacer.cppppr_instance.cppll.ccc error: the clang compiler does not support ‘-march=native’ [clang-diagnostic-error]

Buffer Pools

Spatial Control：often used paged be put together on disk
Temporal Control：minimize times to read pages from disk

Buffer Pool = Memory Region = array of fixed size frame
Page Table：record the memory location of each page(currently in memory)、Dirty Flag、Pin/Reference Counter
Locks vs. Latches
1. Lock：High level outside of database, need roll back changes
2. Latch：Low level inside of database(critical sections), does not need roll back changes
Page Directory vs. Page Table
1. Page Directory：find page location in datablase files by page id, need to be recorded on disk
2. Page Table：mapping between page id and location in buffer pool, does not need to be recorded on disk

Buffer Pool Optimization

Multiple Buffer Pool
- can use different policies for Buffer Pools
Prefetching
- sequential：OS can do too
- index：DBMS know index page meaning, so can fecth the exact pages that query wants
Scan Sharing
- similar queries share same cursor
Buffer Pool Bypass
- allocate other memory for sequential scan query in order not to pollute Buffer Pool
OS Page Cache
- OS have same page copy on file system cache

Buffer Pool Replacement Policies

LRU
clock(approximate LRU)

both have sequential flooding problem(Least Recently Used page might be most needed page)

LRU-K
Localization
Priority Hints
Dirty Pages vs. Non-Dirty Pages
1. non-dirty page：fast. one I/O(read needed page from disk), may be needed in future
2. dirty page：slow. two I/O(write dirty page back to disk && read needed page from disk), often used with BACKGROUND WRITING

Projects 1(2021)

C++ mutithreading How C++ code run

LRUReplacer is used to store unpinned frames
Pin：this frame can’t become victim
pin_count of a page becomes 0 = no threads access this page
page will be pinned when NewPage() invoke
FetchPgImp(pin) vs. UnpinPgImp
NewPgImp vs. DeletePgImp
page_table contain pin/unpinned pages! -> Delete operation will search unpinned page in page_table!

domain knowledge

觀念沒有釐清，花太多時間在正確性不足的code上debug...
pages_ = buffer pools
page_table_ = {page_id, frame_id}
use frame_id to find page in pages_
free_list = unused frames
frame will be in {page_table_(buffer pools/pages_), free_list_, replacer_}, one of them

task #1

根據data locality，剛被訪問(pin/unpin)的page很可能再次被訪問，所以放到replacer最後面(最新)。
而每次要victime page時從最前面(最老)開始，這樣就能達成最小化disk訪問次數的目標。

Unpin(frame_id_t) : This method should be called when the pin_count of a page becomes 0.
特別重要，但在lru裡面只管移除frame，外部使用者須自己注意call Unpin的時候page的pin_count <= 0。

task #2

FetchPgImp：根據page_id從bpm拿到Page。
如果在buffer pool(pages_)裡，pin完後直接return page。
不在就看有沒有free frame(從list or lru拿)，記得個別處理remove frame，
從page_table內把remove frame對應的victim page_id拿掉。
處理dirty，更新這個新page的metadata，寫到page_table_。

NewPgImp：為了AllocatePage的page_id，在buffer pool上清出一個空間。
如果buffer pools裡的page都為pinned->沒有空間了，return nullptr；
至少一個page不為pinned，這個page有可能屬於free_list_或是replacer_，需個別處理remove。
處理dirty，更新這個新page的metadata，寫到page_table_。

UnpinPgImp：將pages_上page_id對應的page，pin_count--。
處理dirty，更新這個新page的metadata，放回replacer_。

DeletePgImp：初始化(還回free_list_)buffer pool上 given page_id 的空間。
如果page_id不存在pages_，任務達成return true。
pin_count > 0，有thread在用無法初始化，return false。
處理dirty，更新這個新page的metadata，從page_table_移除，放回free_list_。

FlushPgImp：

task #3

todo 從76score branch，把邏輯對一對，cout拔掉，測試過了就交…
回來把code改寫完成了，結果跑出autograder failed to execute correctly是哪招…明明style也檢查過了

Hash Tables

Hash Table(Static)：

Hash Funciton：map keys(large space) to array index(small space), return a integer representaion of the given key
- trade-off：fast v.s collision rate。
Hash Scheme：handle collision after hashing。
- trade-off：allocate large array v.s non efficient insert/find operations
1. linear probing：find empty/desired slot if hash idx slot is full/not desired
situation：use in join

Dynamic Hash Tables

Chained Hashing：each key maps to a linked-list
- easy to implement, worst case operation will become O(n)
Extendible Hashing
- resize and rehash in the same time
- problem：global latch
Linear Hashing

conclusion：good for database as internal data structure other than table index case(need ordering, so b+ tree is better)

Tree Indexes I

table index：a replica of table, should be sync with table

B means balance
B tree：key存在所有node(inner, leaf)裡 -> no duplicate keys
B+ tree：key存在leaf node裡 -> duplicate keys
- M keys in one node -> M + 1 children(# of intervals caused by M keys)
- Every node other than the root, is at least half-full -> M/2-1 ≤ #keys ≤ M-1
- node：meta-data、arrays of key-value pairs
- key：table index column, value：tuple data or tuple data pointer
- insert：keep # keys in one node satisfy property, otherwise do split operation
- delete：keep # keys in one node satisfy property, otherwise do merge operation

index types：
1. Clustered Indexes：
  1. each table has one Clustered Indexes
  2. tuple is sorted(by primary key)
2. Compound Index：use more than one attribute to build index

Optimizations

Prefix Compression：store keys same prefix + keys different suffix
Suffix Truncation：use minimum prefix to route to correct index
Bulk Insert：sort keys and bottom up build index
Pointer Swizzling：node stores page_id as reference to other nodes(need to go to buffer pool to get page pointer) -> stores page corresponding page pointer(valid)

Tree Indexes II

handle duplicate keys in b+ tree：Append Record Id

no index or can’t find index to use -> sequential scan
database will create index on primary key, otherwise we have no choice but sequential scan

Implicit Indexes

Primary Keys
Unique Constraints
Foreign Keys(the attribute be referenced to should be unique)

Partial Indexes

create index on subset of table

CREATE INDEX idx_foo
          ON foo (a, b)
       WHERE c = 'WuTang';

radix tree = compact trie

Inverted Indexes

b+ tree index is good at point/range query
not good at keyword search -> full-text search index/inverted index

Multi-Threaded Index Concurrency Control

Concurrency control

Logical Correctness：
Physical Correctness：

Latch Modes:

Read Mode: multiple threads in the same time
Write Mode: one thread in the same time

HASH TABLE LATCHING: no deadlock, because requiring latchs happened in same direction

Page Latches
Slot Latches

B+TREE CONCURRENCY CONTROL

Latch Crabbing/Coupling
1. acquire parent latch
2. acquire child latch
3. if child is safe, release parent latch
- release the latch on node which has more children first(top to bottom)
- all modifications start from taking write latch on root node -> bottle neck, so change to take read latch instead
Better Latching Algorithm
- get read latch all the way down to leaf node(write latch), and check leaf node is safe or not
- if not safe, do Latch Crabbing/Coupling

LEAF NODE SCAN(range query)

might have deadlock, so abort myself when desired latch can’t acquire
Delayed Parent Updates(lazy update, apply for insert operation): Update parent node the next time that a thread takes a write latch on it.

Sorting & Aggregations

Query Processing

Data flows from leaves nodes to root node(query result)

DOUBLE BUFFERING OPTIMIZATION

Don’t waste time on wating I/O request, so asynchronously fetch next runs while CPU processing the current runs

General External Merge Sort

聽完影片後，再對照reference的筆記看一次就清楚多了…

Sorting Phase: pages分成多個chunks，chunk可以完整放進memory，在memory裡排序好再放回disk。
Merge Phase: 從小runs到大runs
pass0:
- B buffer pages
- ceiling(n / b)個大小為 B 的sorted runs
pass1,2,3…: 合併B-1個runs
Complexity:
- number of passes: 1 + ceiling(lg(b-1)ceing(N/B))
- cost per pass: 2N
- total cost: number of passes * cost per pass

USING B+TREES FOR SORTING

Clustered B+ Tree: order of key in b+ tree is equal to order of actual location of tuples
B+Tree index on the sort attribute(s)
use B+ tree only when there is Clustered B+ Tree case

Aggregation

by Sort
by Hash
1. Partition Phase: 根據hash的結果把tuple放到partition裡
2. Rehash Phase: 利用temp hash table計算aggregation的結果

Contents