节点分裂

前面已经完成了只有一个节点的示例,随着数据的增多,很快这一个节点就放不下了,必须考虑节点分裂的情况。我们首先在插入的时候加入一个检查的逻辑,如果放不下的话就进行分裂。分裂的步骤是这样的:

  • 首先通过 Pager 分配一个新的页;
  • 然后将节点后半部分的 cell 全部复制到新的页之中;
  • 节点本身只保留前半部分的 cell;
  • 然后将要保存的键值对放到正确的页之中,例如如果键小于新分配的页的第一个键,就放到当前节点内,否则就放到新增的节点里;
  • 此时再通过 Cursor 回退到父节点,在父节点中插入一个指向新增的节点的指针 cell;
  • 检查父节点是不是能放下这个 cell,不能的话重复进行分裂;
  • 一直冒泡到根节点,如果根节点不能放下这个 cell,再分裂以后还需要创建一个空节点作为新的根节点;
  • 新的根节点也要插入正确的指针 cell,到此才结束整个流程。

我们先给 BTree 的插入方法加上检查的逻辑:

//...
public insert(key: Buffer, value: Buffer) {
  //...
  if (node.canHold(key, value)) {
    node.insertKeyValueCell(key, value);
    this.saveNodeToFile(node);
  } else {
    node.splitAndInsert(key, value, this);
  }
}
//...
public insert(key: Buffer, value: Buffer) {
  //...
  if (node.canHold(key, value)) {
    node.insertKeyValueCell(key, value);
    this.saveNodeToFile(node);
  } else {
    node.splitAndInsert(key, value, this);
  }
}

然后在 BTreeNode 内实现 canHold 接口:

//...
public canHold(key: Buffer, value: Buffer | null) {
  const cellSize = value
    ? KeyValueCell.calcSize(key.length, value.length)
    : PointerCell.calcSize(key.length);

  return (
    this.cellAreaStart - this.freeStart >
    BTreeNode.CELL_POINTER_SIZE + cellSize
  );
}
//...
public canHold(key: Buffer, value: Buffer | null) {
  const cellSize = value
    ? KeyValueCell.calcSize(key.length, value.length)
    : PointerCell.calcSize(key.length);

  return (
    this.cellAreaStart - this.freeStart >
    BTreeNode.CELL_POINTER_SIZE + cellSize
  );
}

然后是最核心的 splitAndInsert 的逻辑:

public splitAndInsert(
  key: Buffer,
  valueOrPointer: Buffer | number,
  btree: BTree
) {
  // ...
}
public splitAndInsert(
  key: Buffer,
  valueOrPointer: Buffer | number,
  btree: BTree
) {
  // ...
}

先分配一个新的节点:

//...
const [id, buffer] = btree.pager.allocNewPage();
const newNode = new BTreeNode(id, buffer);
//...
const [id, buffer] = btree.pager.allocNewPage();
const newNode = new BTreeNode(id, buffer);

然后复制后半部分的 cell 到新的节点内部:

// Copy latter half of cells to new node
const latterHalfOfPtrs = ptrs.slice(Math.floor(ptrs.length / 2));
for (const p of latterHalfOfPtrs) {
  if (this.isInternalNode()) {
    const cell = this.readCellByPointer(p)! as PointerCell;
    newNode.insertPointerCell(cell.key, cell.childPageId);
  } else if (this.isLeafNode()) {
    const cell = this.readCellByPointer(p)! as KeyValueCell;
    newNode.insertKeyValueCell(cell.key, cell.value);
  }
}
// Copy latter half of cells to new node
const latterHalfOfPtrs = ptrs.slice(Math.floor(ptrs.length / 2));
for (const p of latterHalfOfPtrs) {
  if (this.isInternalNode()) {
    const cell = this.readCellByPointer(p)! as PointerCell;
    newNode.insertPointerCell(cell.key, cell.childPageId);
  } else if (this.isLeafNode()) {
    const cell = this.readCellByPointer(p)! as KeyValueCell;
    newNode.insertKeyValueCell(cell.key, cell.value);
  }
}

当前节点只保留前半部分的 cell,记得更新相应的头部中相关信息:

// Only keep former half of cells in current node, this will reset buffer
const formerHalfOfPtrs = ptrs.slice(0, Math.floor(ptrs.length / 2));
const buf = Buffer.concat(
  formerHalfOfPtrs.map(p => this.readCellByPointer(p)!.buffer)
);
this.buffer.fill(0, this.cellAreaStart, BTreeNode.CELL_AREA_END); // reset all cells
buf.copy(this.buffer, BTreeNode.DEFAULT_CELL_START - buf.length);
this.cellOffsets = formerHalfOfPtrs;
this.freeStart =
  BTreeNode.DEFAULT_FREE_START +
  BTreeNode.CELL_POINTER_SIZE * formerHalfOfPtrs.length;
this.cellAreaStart = BTreeNode.DEFAULT_CELL_START - buf.length;
// Only keep former half of cells in current node, this will reset buffer
const formerHalfOfPtrs = ptrs.slice(0, Math.floor(ptrs.length / 2));
const buf = Buffer.concat(
  formerHalfOfPtrs.map(p => this.readCellByPointer(p)!.buffer)
);
this.buffer.fill(0, this.cellAreaStart, BTreeNode.CELL_AREA_END); // reset all cells
buf.copy(this.buffer, BTreeNode.DEFAULT_CELL_START - buf.length);
this.cellOffsets = formerHalfOfPtrs;
this.freeStart =
  BTreeNode.DEFAULT_FREE_START +
  BTreeNode.CELL_POINTER_SIZE * formerHalfOfPtrs.length;
this.cellAreaStart = BTreeNode.DEFAULT_CELL_START - buf.length;

把要插入的键值对放到合适的节点里面:

// Place the new element into the corresponding node.
if (Buffer.compare(key, newNode.firstKey()) === -1) {
  if (this.isLeafNode()) {
    this.insertKeyValueCell(key, valueOrPointer as Buffer);
  } else if (this.isInternalNode()) {
    this.insertPointerCell(key, valueOrPointer as number);
  }
} else {
  if (this.isLeafNode()) {
    newNode.insertKeyValueCell(key, valueOrPointer as Buffer);
  } else if (this.isInternalNode()) {
    newNode.insertPointerCell(key, valueOrPointer as number);
  }
}
// Place the new element into the corresponding node.
if (Buffer.compare(key, newNode.firstKey()) === -1) {
  if (this.isLeafNode()) {
    this.insertKeyValueCell(key, valueOrPointer as Buffer);
  } else if (this.isInternalNode()) {
    this.insertPointerCell(key, valueOrPointer as number);
  }
} else {
  if (this.isLeafNode()) {
    newNode.insertKeyValueCell(key, valueOrPointer as Buffer);
  } else if (this.isInternalNode()) {
    newNode.insertPointerCell(key, valueOrPointer as number);
  }
}

然后修改父节点内部保存的指针,需要注意的是 Cursor 的 prev 方法返回了一个全新的节点对象,因此在更新后需要重新让 BTree 内保存的根节点引用指向到这个新的节点:

const parent = btree.cursor.prev();

if (!parent) {
  // it indicates current node is root node
  btree.createRootAndIncreaseHeight(newNode);
  btree.saveNodeToFile(this);
  btree.saveNodeToFile(newNode);
} else if (parent.canHold(newNode.firstKey(), null)) {
  // parent node can hold pointer to new node
  parent.insertPointerCell(newNode.firstKey(), newNode.id);
  btree.saveNodeToFile(parent);
  btree.saveNodeToFile(this);
  btree.saveNodeToFile(newNode);
  if (parent.id === btree.root?.id) {
    // 更新根节点的引用
    btree.root = parent;
  }
} else {
  // parent node does not have enough space to hold pointer to new node
  // should keep split and propagate
  parent.splitAndInsert(newNode.firstKey(), newNode.id, btree);
}
const parent = btree.cursor.prev();

if (!parent) {
  // it indicates current node is root node
  btree.createRootAndIncreaseHeight(newNode);
  btree.saveNodeToFile(this);
  btree.saveNodeToFile(newNode);
} else if (parent.canHold(newNode.firstKey(), null)) {
  // parent node can hold pointer to new node
  parent.insertPointerCell(newNode.firstKey(), newNode.id);
  btree.saveNodeToFile(parent);
  btree.saveNodeToFile(this);
  btree.saveNodeToFile(newNode);
  if (parent.id === btree.root?.id) {
    // 更新根节点的引用
    btree.root = parent;
  }
} else {
  // parent node does not have enough space to hold pointer to new node
  // should keep split and propagate
  parent.splitAndInsert(newNode.firstKey(), newNode.id, btree);
}

如果不存在父节点的话,说明当前节点就是父节点了,我们就调用 BTreecreateRootAndIncreaseHeight 来方法来创建新的根节点:

public createRootAndIncreaseHeight(newChildNode: BTreeNode) {
  const [id, buf] = this.pager.allocNewPage();
  const newRoot = new BTreeNode(id, buf);
  newRoot.insertPointerCell(this.root!.lastKey(), this.root!.id);
  newRoot.insertPointerCell(newChildNode.firstKey(), newChildNode.id);
  this.root = newRoot;
  this.saveNodeToFile(newRoot);
  this.pager.setRootPage(id, newRoot.buffer);
}
public createRootAndIncreaseHeight(newChildNode: BTreeNode) {
  const [id, buf] = this.pager.allocNewPage();
  const newRoot = new BTreeNode(id, buf);
  newRoot.insertPointerCell(this.root!.lastKey(), this.root!.id);
  newRoot.insertPointerCell(newChildNode.firstKey(), newChildNode.id);
  this.root = newRoot;
  this.saveNodeToFile(newRoot);
  this.pager.setRootPage(id, newRoot.buffer);
}

接下来可以继续编写测试来检查一下,这里就不贴完整的测试代码了。这样,目前为止我们就已经完整实现了一个基于 B 树的存储引擎。当然它还有非常多可以继续完善的地方,例如单独的索引结构,原子化的事务,日志,针对更多复杂的数据类型的存储等等。

同时由于代码运行在单线程的 Node.js 环境中,所有的操作都是串行化的,读者可以尝试使用像 Go 这样的语言来实现,并提供多线程环境下的并发支持,或者添加网络接口,加上针对 SQL 语言的支持,让它成为一个类似 MySQL 一样的数据库。

数据库系统并不神秘,它涉及到了非常广的领域,从存储,计算,网络,编译原理,乃至到新兴的分布式系统等等,这个系列是我自己在学习数据库的过程中写的,让我自己的理解更深入了,希望也可以让你建立起对数据库的初步了解。后续的文章会继续简单介绍相关的概念,但不会再给出具体的代码实现,有兴趣的话可以自己动手做一做。