节点分裂
前面已经完成了只有一个节点的示例,随着数据的增多,很快这一个节点就放不下了,必须考虑节点分裂的情况。我们首先在插入的时候加入一个检查的逻辑,如果放不下的话就进行分裂。分裂的步骤是这样的:
- 首先通过
Pager
分配一个新的页; - 然后将节点后半部分的 cell 全部复制到新的页之中;
- 节点本身只保留前半部分的 cell;
- 然后将要保存的键值对放到正确的页之中,例如如果键小于新分配的页的第一个键,就放到当前节点内,否则就放到新增的节点里;
- 此时再通过
Cursor
回退到父节点,在父节点中插入一个指向新增的节点的指针 cell; - 检查父节点是不是能放下这个 cell,不能的话重复进行分裂;
- 一直冒泡到根节点,如果根节点不能放下这个 cell,再分裂以后还需要创建一个空节点作为新的根节点;
- 新的根节点也要插入正确的指针 cell,到此才结束整个流程。
我们先给 BTree
的插入方法加上检查的逻辑:
//...
public insert(key: Buffer, value: Buffer) {
//...
if (node.canHold(key, value)) {
node.insertKeyValueCell(key, value);
this.saveNodeToFile(node);
} else {
node.splitAndInsert(key, value, this);
}
}
//...
public insert(key: Buffer, value: Buffer) {
//...
if (node.canHold(key, value)) {
node.insertKeyValueCell(key, value);
this.saveNodeToFile(node);
} else {
node.splitAndInsert(key, value, this);
}
}
然后在 BTreeNode
内实现 canHold
接口:
//...
public canHold(key: Buffer, value: Buffer | null) {
const cellSize = value
? KeyValueCell.calcSize(key.length, value.length)
: PointerCell.calcSize(key.length);
return (
this.cellAreaStart - this.freeStart >
BTreeNode.CELL_POINTER_SIZE + cellSize
);
}
//...
public canHold(key: Buffer, value: Buffer | null) {
const cellSize = value
? KeyValueCell.calcSize(key.length, value.length)
: PointerCell.calcSize(key.length);
return (
this.cellAreaStart - this.freeStart >
BTreeNode.CELL_POINTER_SIZE + cellSize
);
}
然后是最核心的 splitAndInsert
的逻辑:
public splitAndInsert(
key: Buffer,
valueOrPointer: Buffer | number,
btree: BTree
) {
// ...
}
public splitAndInsert(
key: Buffer,
valueOrPointer: Buffer | number,
btree: BTree
) {
// ...
}
先分配一个新的节点:
//...
const [id, buffer] = btree.pager.allocNewPage();
const newNode = new BTreeNode(id, buffer);
//...
const [id, buffer] = btree.pager.allocNewPage();
const newNode = new BTreeNode(id, buffer);
然后复制后半部分的 cell 到新的节点内部:
// Copy latter half of cells to new node
const latterHalfOfPtrs = ptrs.slice(Math.floor(ptrs.length / 2));
for (const p of latterHalfOfPtrs) {
if (this.isInternalNode()) {
const cell = this.readCellByPointer(p)! as PointerCell;
newNode.insertPointerCell(cell.key, cell.childPageId);
} else if (this.isLeafNode()) {
const cell = this.readCellByPointer(p)! as KeyValueCell;
newNode.insertKeyValueCell(cell.key, cell.value);
}
}
// Copy latter half of cells to new node
const latterHalfOfPtrs = ptrs.slice(Math.floor(ptrs.length / 2));
for (const p of latterHalfOfPtrs) {
if (this.isInternalNode()) {
const cell = this.readCellByPointer(p)! as PointerCell;
newNode.insertPointerCell(cell.key, cell.childPageId);
} else if (this.isLeafNode()) {
const cell = this.readCellByPointer(p)! as KeyValueCell;
newNode.insertKeyValueCell(cell.key, cell.value);
}
}
当前节点只保留前半部分的 cell,记得更新相应的头部中相关信息:
// Only keep former half of cells in current node, this will reset buffer
const formerHalfOfPtrs = ptrs.slice(0, Math.floor(ptrs.length / 2));
const buf = Buffer.concat(
formerHalfOfPtrs.map(p => this.readCellByPointer(p)!.buffer)
);
this.buffer.fill(0, this.cellAreaStart, BTreeNode.CELL_AREA_END); // reset all cells
buf.copy(this.buffer, BTreeNode.DEFAULT_CELL_START - buf.length);
this.cellOffsets = formerHalfOfPtrs;
this.freeStart =
BTreeNode.DEFAULT_FREE_START +
BTreeNode.CELL_POINTER_SIZE * formerHalfOfPtrs.length;
this.cellAreaStart = BTreeNode.DEFAULT_CELL_START - buf.length;
// Only keep former half of cells in current node, this will reset buffer
const formerHalfOfPtrs = ptrs.slice(0, Math.floor(ptrs.length / 2));
const buf = Buffer.concat(
formerHalfOfPtrs.map(p => this.readCellByPointer(p)!.buffer)
);
this.buffer.fill(0, this.cellAreaStart, BTreeNode.CELL_AREA_END); // reset all cells
buf.copy(this.buffer, BTreeNode.DEFAULT_CELL_START - buf.length);
this.cellOffsets = formerHalfOfPtrs;
this.freeStart =
BTreeNode.DEFAULT_FREE_START +
BTreeNode.CELL_POINTER_SIZE * formerHalfOfPtrs.length;
this.cellAreaStart = BTreeNode.DEFAULT_CELL_START - buf.length;
把要插入的键值对放到合适的节点里面:
// Place the new element into the corresponding node.
if (Buffer.compare(key, newNode.firstKey()) === -1) {
if (this.isLeafNode()) {
this.insertKeyValueCell(key, valueOrPointer as Buffer);
} else if (this.isInternalNode()) {
this.insertPointerCell(key, valueOrPointer as number);
}
} else {
if (this.isLeafNode()) {
newNode.insertKeyValueCell(key, valueOrPointer as Buffer);
} else if (this.isInternalNode()) {
newNode.insertPointerCell(key, valueOrPointer as number);
}
}
// Place the new element into the corresponding node.
if (Buffer.compare(key, newNode.firstKey()) === -1) {
if (this.isLeafNode()) {
this.insertKeyValueCell(key, valueOrPointer as Buffer);
} else if (this.isInternalNode()) {
this.insertPointerCell(key, valueOrPointer as number);
}
} else {
if (this.isLeafNode()) {
newNode.insertKeyValueCell(key, valueOrPointer as Buffer);
} else if (this.isInternalNode()) {
newNode.insertPointerCell(key, valueOrPointer as number);
}
}
然后修改父节点内部保存的指针,需要注意的是 Cursor 的 prev
方法返回了一个全新的节点对象,因此在更新后需要重新让 BTree
内保存的根节点引用指向到这个新的节点:
const parent = btree.cursor.prev();
if (!parent) {
// it indicates current node is root node
btree.createRootAndIncreaseHeight(newNode);
btree.saveNodeToFile(this);
btree.saveNodeToFile(newNode);
} else if (parent.canHold(newNode.firstKey(), null)) {
// parent node can hold pointer to new node
parent.insertPointerCell(newNode.firstKey(), newNode.id);
btree.saveNodeToFile(parent);
btree.saveNodeToFile(this);
btree.saveNodeToFile(newNode);
if (parent.id === btree.root?.id) {
// 更新根节点的引用
btree.root = parent;
}
} else {
// parent node does not have enough space to hold pointer to new node
// should keep split and propagate
parent.splitAndInsert(newNode.firstKey(), newNode.id, btree);
}
const parent = btree.cursor.prev();
if (!parent) {
// it indicates current node is root node
btree.createRootAndIncreaseHeight(newNode);
btree.saveNodeToFile(this);
btree.saveNodeToFile(newNode);
} else if (parent.canHold(newNode.firstKey(), null)) {
// parent node can hold pointer to new node
parent.insertPointerCell(newNode.firstKey(), newNode.id);
btree.saveNodeToFile(parent);
btree.saveNodeToFile(this);
btree.saveNodeToFile(newNode);
if (parent.id === btree.root?.id) {
// 更新根节点的引用
btree.root = parent;
}
} else {
// parent node does not have enough space to hold pointer to new node
// should keep split and propagate
parent.splitAndInsert(newNode.firstKey(), newNode.id, btree);
}
如果不存在父节点的话,说明当前节点就是父节点了,我们就调用 BTree
的 createRootAndIncreaseHeight
来方法来创建新的根节点:
public createRootAndIncreaseHeight(newChildNode: BTreeNode) {
const [id, buf] = this.pager.allocNewPage();
const newRoot = new BTreeNode(id, buf);
newRoot.insertPointerCell(this.root!.lastKey(), this.root!.id);
newRoot.insertPointerCell(newChildNode.firstKey(), newChildNode.id);
this.root = newRoot;
this.saveNodeToFile(newRoot);
this.pager.setRootPage(id, newRoot.buffer);
}
public createRootAndIncreaseHeight(newChildNode: BTreeNode) {
const [id, buf] = this.pager.allocNewPage();
const newRoot = new BTreeNode(id, buf);
newRoot.insertPointerCell(this.root!.lastKey(), this.root!.id);
newRoot.insertPointerCell(newChildNode.firstKey(), newChildNode.id);
this.root = newRoot;
this.saveNodeToFile(newRoot);
this.pager.setRootPage(id, newRoot.buffer);
}
接下来可以继续编写测试来检查一下,这里就不贴完整的测试代码了。这样,目前为止我们就已经完整实现了一个基于 B 树的存储引擎。当然它还有非常多可以继续完善的地方,例如单独的索引结构,原子化的事务,日志,针对更多复杂的数据类型的存储等等。
同时由于代码运行在单线程的 Node.js 环境中,所有的操作都是串行化的,读者可以尝试使用像 Go 这样的语言来实现,并提供多线程环境下的并发支持,或者添加网络接口,加上针对 SQL 语言的支持,让它成为一个类似 MySQL 一样的数据库。
数据库系统并不神秘,它涉及到了非常广的领域,从存储,计算,网络,编译原理,乃至到新兴的分布式系统等等,这个系列是我自己在学习数据库的过程中写的,让我自己的理解更深入了,希望也可以让你建立起对数据库的初步了解。后续的文章会继续简单介绍相关的概念,但不会再给出具体的代码实现,有兴趣的话可以自己动手做一做。