CMU15-445
前言
雖然project的部分有點窒礙難行,不過既然都聽課了還是把筆記做好吧!
Relational Model
logical layer separate from physical layer
tuple/record/row mean same thing table/relation mean same thing value in tuple should be atomic
- Primary key:identify a specific tuple in current relation(table)
- Foreign key:identify a specific tuple in another relation(table)
- Relational Algebra:Each operator takes one or more relations as its inputs and outputs a new relation.
- join:more flexible intersection
Advanced SQL
Aggregate
- Aggregate functions can only be used in the SELECT output list
- Output of other columns outside of an aggregate is undefined. -> aggregate function only output one value
Group By
- Aggregates records by group
- if output clause have one aggregate function, then other non-aggregate attribute should in Group By clause
Having
- you can not use aggregation resulit in where clause because you don’t have that result yet
Like
- %:match for substring
- _:match for character
Output Redirection
- store output of previous query into new/existing table, schema should be match
Output Control
- order by
- limit
Nested Queries
- only inner query can reference information in outer query
Window Funtion
- aggregation(row_number) + group by(over)
- in last example, maybe OVER() remain the same, and changeing RANK() to ROW_NUMBER() will get same result (?
Common Table Expression
- WITH statement is much more clear than nested query IMO.
- WITH RECURSIVE:WTF
HomeWork 1(2019)
Q4也太折磨…不過看來integer也能用Strign operation的like 好吧應該就是一點數學的trick… 2010s|2789741 好像錯了? 全部總合也沒那麼多啊…
Database Storage I
- Architecture
bottom to top(lower level to higher level):
disk manager (database itself)
file system (OS provide)
- Storage Manager:organizes the files as a collection of pages
- represent data on disk
- move data between disk and memory
- DataBase Pages:A page is a fixed-size block of data
Manage Pages
- Heap File:an unordered collection of pages
- Linked List:suck
- Page Directory:
- Sequential/Sorted File:
- Hashing File:
Data Layout
Page:Header + Data
- Tuple-oriented
- SLOTTED PAGES:slotted array(grow top to bottom) + fixed/variable length tuple of same table(grow bottom to top)
- tuple is assigned a unique record identifier(page_id + offset/slot)
- log-oriented:read operation will be slow(recreate tuple), need index(jump to locations in the log)
Tuple Layout
- Denormalize:Store data from different tables into same page
Database Storage II
Tuple Storage
- DBMS interpret bytes into tables attribute and its value
Data Type
- FLOAT, REAL/DOUBLE:fast but not precise
- NUMERIC, DECIMAL:slow but precise, use meta-data to get arbitrary precision
- Large Values:each tuple size should not greater than page size
- overflow/TOAST page:has protection
- BLOB data type:DBMS can’t midify the content of this file, but others can -> no protection
System Catalogs
- meta-data about database
- store catalogs as table
Database Workloads
- OLTP(On-Line Transaction Processing):query access only small portions of database(Write operations)
- OLAP(On-Line Analytical Processing):query access large portions of database(Read operations)
Data Storage Model
- N-ary Storage Model (NSM, row)
- store single tuple’s attributes in a page
- good for OLTP:queries that access small number of entire tuples
- not good for OLAP:some attribues that bring from disk to memory become useless in query execution
- Decomposition Storage Model (DSM, column)
- store same attribute value of all tuples in a page
- problems:how to track each tuple’s attribute?
- Fixed-length Offsets:offsets
- Embedded Tuple Ids:use tuple identifier for each attribute
Projects Setup
照著做沒煩惱,repo的部分參考官方的做法。
origin docker engine content, 這個應該不用動?
{
"features": {
"buildkit": true
},
"builder": {
"gc": {
"enabled": true,
"defaultKeepStorage": "20GB"
}
},
"experimental": false
}
在bustub文件中执行这一条命令
應該是要在docker裡的bustub操作
1.3 配置本地目录挂载
太方便啦
docker container run -it -v /Users/jimmy/documents/CMU-DBMS/bustub-private:/bustub --name=bustub_env bustub /bin/bash
重新執行
- docker start 8dbdbd03dfed
- docker exec -it 8dbdbd03dfed /bin/bash
Projects 0(2021)
throw exception請參照code base的方法…
test
$ mkdir build
$ cd build
$ make starter_test
$ ./test/starter_test
format
$ make format
$ make check-lint
$ make check-clang-tidy
zip project0-submission.zip src/include/primer/p0_starter.h
local run: make check-clang-tidy
encounter following: WTF ?
Checking: /bustub/src/buffer/lru_replacer.cppppr_instance.cppll.ccc error: the clang compiler does not support ‘-march=native’ [clang-diagnostic-error]
Buffer Pools
- Spatial Control:often used paged be put together on disk
- Temporal Control:minimize times to read pages from disk
- Buffer Pool = Memory Region = array of fixed size frame
- Page Table:record the memory location of each page(currently in memory)、Dirty Flag、Pin/Reference Counter
- Locks vs. Latches
- Lock:High level outside of database, need roll back changes
- Latch:Low level inside of database(critical sections), does not need roll back changes
- Page Directory vs. Page Table
- Page Directory:find page location in datablase files by page id, need to be recorded on disk
- Page Table:mapping between page id and location in buffer pool, does not need to be recorded on disk
Buffer Pool Optimization
- Multiple Buffer Pool
- can use different policies for Buffer Pools
- Prefetching
- sequential:OS can do too
- index:DBMS know index page meaning, so can fecth the exact pages that query wants
- Scan Sharing
- similar queries share same cursor
- Buffer Pool Bypass
- allocate other memory for sequential scan query in order not to pollute Buffer Pool
- OS Page Cache
- OS have same page copy on file system cache
Buffer Pool Replacement Policies
- LRU
- clock(approximate LRU)
- both have sequential flooding problem(Least Recently Used page might be most needed page)
- LRU-K
- Localization
- Priority Hints
- Dirty Pages vs. Non-Dirty Pages
- non-dirty page:fast. one I/O(read needed page from disk), may be needed in future
- dirty page:slow. two I/O(write dirty page back to disk && read needed page from disk), often used with BACKGROUND WRITING
Projects 1(2021)
C++ mutithreading How C++ code run
- LRUReplacer is used to store unpinned frames
- Pin:this frame can’t become victim
- pin_count of a page becomes 0 = no threads access this page
- page will be pinned when NewPage() invoke
- FetchPgImp(pin) vs. UnpinPgImp
- NewPgImp vs. DeletePgImp
- page_table contain pin/unpinned pages! -> Delete operation will search unpinned page in page_table!
- domain knowledge
觀念沒有釐清,花太多時間在正確性不足的code上debug...
pages_ = buffer pools
page_table_ = {page_id, frame_id}
use frame_id to find page in pages_
free_list = unused frames
frame will be in {page_table_(buffer pools/pages_), free_list_, replacer_}, one of them
- task #1
根據data locality,剛被訪問(pin/unpin)的page很可能再次被訪問,所以放到replacer最後面(最新)。
而每次要victime page時從最前面(最老)開始,這樣就能達成最小化disk訪問次數的目標。
Unpin(frame_id_t) : This method should be called when the pin_count of a page becomes 0.
特別重要,但在lru裡面只管移除frame,外部使用者須自己注意call Unpin的時候page的pin_count <= 0。
- task #2
FetchPgImp:根據page_id從bpm拿到Page。
如果在buffer pool(pages_)裡,pin完後直接return page。
不在就看有沒有free frame(從list or lru拿),記得個別處理remove frame,
從page_table內把remove frame對應的victim page_id拿掉。
處理dirty,更新這個新page的metadata,寫到page_table_。
NewPgImp:為了AllocatePage的page_id,在buffer pool上清出一個空間。
如果buffer pools裡的page都為pinned->沒有空間了,return nullptr;
至少一個page不為pinned,這個page有可能屬於free_list_或是replacer_,需個別處理remove。
處理dirty,更新這個新page的metadata,寫到page_table_。
UnpinPgImp:將pages_上page_id對應的page,pin_count--。
處理dirty,更新這個新page的metadata,放回replacer_。
DeletePgImp:初始化(還回free_list_)buffer pool上 given page_id 的空間。
如果page_id不存在pages_,任務達成return true。
pin_count > 0,有thread在用無法初始化,return false。
處理dirty,更新這個新page的metadata,從page_table_移除,放回free_list_。
FlushPgImp:
- task #3
- todo 從76score branch,把邏輯對一對,cout拔掉,測試過了就交…
- 回來把code改寫完成了,結果跑出autograder failed to execute correctly是哪招…明明style也檢查過了
Hash Tables
- Hash Table(Static):
- Hash Funciton:map keys(large space) to array index(small space), return a integer representaion of the given key
- trade-off:fast v.s collision rate。
- Hash Scheme:handle collision after hashing。
- trade-off:allocate large array v.s non efficient insert/find operations
- linear probing:find empty/desired slot if hash idx slot is full/not desired
- situation:use in join
- Dynamic Hash Tables
- Chained Hashing:each key maps to a linked-list
- easy to implement, worst case operation will become O(n)
- Extendible Hashing
- resize and rehash in the same time
- problem:global latch
- Linear Hashing
- conclusion:good for database as internal data structure other than table index case(need ordering, so b+ tree is better)
Tree Indexes I
- table index:a replica of table, should be sync with table
- B means balance
- B tree:key存在所有node(inner, leaf)裡 -> no duplicate keys
- B+ tree:key存在leaf node裡 -> duplicate keys
- M keys in one node -> M + 1 children(# of intervals caused by M keys)
- Every node other than the root, is at least half-full -> M/2-1 ≤ #keys ≤ M-1
- node:meta-data、arrays of key-value pairs
- key:table index column, value:tuple data or tuple data pointer
- insert:keep # keys in one node satisfy property, otherwise do split operation
- delete:keep # keys in one node satisfy property, otherwise do merge operation
- index types:
- Clustered Indexes:
- each table has one Clustered Indexes
- tuple is sorted(by primary key)
- Compound Index:use more than one attribute to build index
- Clustered Indexes:
- Optimizations
- Prefix Compression:store keys same prefix + keys different suffix
- Suffix Truncation:use minimum prefix to route to correct index
- Bulk Insert:sort keys and bottom up build index
- Pointer Swizzling:node stores page_id as reference to other nodes(need to go to buffer pool to get page pointer) -> stores page corresponding page pointer(valid)
Tree Indexes II
- handle duplicate keys in b+ tree:Append Record Id
- no index or can’t find index to use -> sequential scan
- database will create index on primary key, otherwise we have no choice but sequential scan
Implicit Indexes
- Primary Keys
- Unique Constraints
- Foreign Keys(the attribute be referenced to should be unique)
Partial Indexes
- create index on subset of table
CREATE INDEX idx_foo
ON foo (a, b)
WHERE c = 'WuTang';
- radix tree = compact trie
Inverted Indexes
- b+ tree index is good at point/range query
- not good at keyword search -> full-text search index/inverted index
Multi-Threaded Index Concurrency Control
- Concurrency control
- Logical Correctness:
- Physical Correctness:
- Latch Modes:
- Read Mode: multiple threads in the same time
- Write Mode: one thread in the same time
HASH TABLE LATCHING: no deadlock, because requiring latchs happened in same direction
- Page Latches
- Slot Latches
B+TREE CONCURRENCY CONTROL
- Latch Crabbing/Coupling
- acquire parent latch
- acquire child latch
- if child is safe, release parent latch
- release the latch on node which has more children first(top to bottom)
- all modifications start from taking write latch on root node -> bottle neck, so change to take read latch instead
- Better Latching Algorithm
- get read latch all the way down to leaf node(write latch), and check leaf node is safe or not
- if not safe, do Latch Crabbing/Coupling
LEAF NODE SCAN(range query)
- might have deadlock, so abort myself when desired latch can’t acquire
- Delayed Parent Updates(lazy update, apply for insert operation): Update parent node the next time that a thread takes a write latch on it.
Sorting & Aggregations
Query Processing
- Data flows from leaves nodes to root node(query result)
DOUBLE BUFFERING OPTIMIZATION
- Don’t waste time on wating I/O request, so asynchronously fetch next runs while CPU processing the current runs
General External Merge Sort
聽完影片後,再對照reference的筆記看一次就清楚多了…
- Sorting Phase: pages分成多個chunks,chunk可以完整放進memory,在memory裡排序好再放回disk。
- Merge Phase: 從小runs到大runs
- pass0:
- B buffer pages
- ceiling(n / b)個大小為 B 的sorted runs
- pass1,2,3…: 合併B-1個runs
- Complexity:
- number of passes: 1 + ceiling(lg(b-1)ceing(N/B))
- cost per pass: 2N
- total cost: number of passes * cost per pass
USING B+TREES FOR SORTING
- Clustered B+ Tree: order of key in b+ tree is equal to order of actual location of tuples
- B+Tree index on the sort attribute(s)
- use B+ tree only when there is Clustered B+ Tree case
Aggregation
- by Sort
- by Hash
- Partition Phase: 根據hash的結果把tuple放到partition裡
- Rehash Phase: 利用temp hash table計算aggregation的結果