CSharpDB.Storage
A low-level, high-performance storage engine for .NET 10 built on top of RandomAccess and SafeFileHandle. It provides random-access async I/O, page caching, write-ahead logging (WAL), crash recovery, and the B+tree/index primitives that power the SQL engine and collection API.
Architecture Overview
+------------------------------------------------------+
| Application |
| (SQL Engine / Collection API) |
+------------------------------------------------------+
| SchemaCatalog |
| Tables - Indexes - Views - Triggers |
+--------------+---------------+-----------------------+
| BTree | IndexStore | RecordEncoder |
| (data) | (secondary) | (row format) |
+--------------+---------------+-----------------------+
| Pager |
| PageCache - DirtyTracking - PageAllocator |
+----------------------+-------------------------------+
| WriteAheadLog | CheckpointCoordinator |
| (WAL + WalIndex) | (policy-driven) |
+----------------------+-------------------------------+
| IStorageDevice |
| (FileStorageDevice / memory) |
+------------------------------------------------------+
Current storage builds can also enable:
- Memory-mapped reads for clean main-file pages when the storage device supports it
- Speculative B+tree leaf read-ahead during sequential forward scans
- Checkpoint residency preservation so already-owned main-file pages stay hot across checkpoint
Page Layout
Page 0 (File Header):
[Magic: 4 bytes "CSDB"]
[FormatVersion: 4 bytes]
[PageSize: 4 bytes = 4096]
[PageCount: 4 bytes]
[SchemaRootPage: 4 bytes]
[FreelistHead: 4 bytes]
[ChangeCounter: 4 bytes]
[... reserved to 100 bytes ...]
[Slotted page content: 3996 bytes]
Pages 1+:
[SlottedPage: 4096 bytes]
[Header: 9 bytes]
PageType (1) - CellCount (2) - CellContentStart (2) - RightChild/NextLeaf (4)
[CellPointers: 2 bytes each]
[Free space]
[Cells: growing backward from page end]
FileStorageDevice
FileStorageDevice wraps a SafeFileHandle opened with FileOptions.Asynchronous | FileOptions.RandomAccess, giving you:
- True async I/O via
RandomAccess.ReadAsync/RandomAccess.WriteAsync - Position-independent reads and writes — no seek, no shared file pointer
- Concurrent reads from other processes (
FileShare.Read) - Direct fsync to durable storage via
RandomAccess.FlushToDisk
public FileStorageDevice(string filePath, bool createNew = false)
| Parameter | Description |
|---|---|
filePath | Path to the database file. |
createNew | true → FileMode.CreateNew (fails if file exists). false → OpenOrCreate. |
IStorageDevice Interface
All storage operations go through IStorageDevice, making it easy to swap implementations (e.g., for in-memory testing).
public interface IStorageDevice : IAsyncDisposable, IDisposable
{
long Length { get; }
ValueTask<int> ReadAsync(long offset, Memory<byte> buffer, CancellationToken ct = default);
ValueTask WriteAsync(long offset, ReadOnlyMemory<byte> buffer, CancellationToken ct = default);
ValueTask FlushAsync(CancellationToken ct = default);
ValueTask SetLengthAsync(long length, CancellationToken ct = default);
}
Device Scenarios
Create a New File
await using var device = new FileStorageDevice("mydb.cdb", createNew: true);
Console.WriteLine($"File created. Length: {device.Length}"); // 0
Write and Read Raw Bytes
await using var device = new FileStorageDevice("mydb.cdb");
byte[] payload = "Hello, CSharpDB!"u8.ToArray();
await device.WriteAsync(offset: 0, payload);
var buffer = new byte[16];
int bytesRead = await device.ReadAsync(offset: 0, buffer);
Console.WriteLine(System.Text.Encoding.UTF8.GetString(buffer, 0, bytesRead));
Pre-allocate / Extend File Length
await using var device = new FileStorageDevice("mydb.cdb", createNew: true);
const int PageSize = 4096;
await device.SetLengthAsync(PageSize * 8);
Console.WriteLine($"Pre-allocated: {device.Length} bytes"); // 32768
Flush to Disk (fsync)
await device.WriteAsync(offset: 0, data);
await device.FlushAsync(); // durable on disk after this returns
Injecting via IStorageDevice (Testability)
// Production wiring
IStorageDevice device = new FileStorageDevice("mydb.cdb");
var pager = await Pager.CreateAsync(device, wal, walIndex);
// In a unit test -- swap in your own IStorageDevice implementation
Pager
The Pager sits between the B+tree layer and the storage device. It owns the page cache, tracks dirty pages, coordinates transactions, manages WAL integration, and drives checkpointing.
Create a New Database
await using var device = new FileStorageDevice("mydb.cdb", createNew: true);
var walIndex = new WalIndex();
await using var wal = new WriteAheadLog("mydb.cdb", walIndex);
await wal.OpenAsync(currentDbPageCount: 0);
var pager = await Pager.CreateAsync(device, wal, walIndex);
await pager.InitializeNewDatabaseAsync(); // writes file header (page 0)
Open and Recover an Existing Database
var pager = await Pager.CreateAsync(device, wal, walIndex);
await pager.RecoverAsync(); // replays committed WAL frames
Read and Write Pages
byte[] page = await pager.GetPageAsync(pageId: 1);
page[0] = 0xFF;
await pager.MarkDirtyAsync(pageId: 1); // tracked for WAL write on commit
Transaction Lifecycle
await pager.BeginTransactionAsync();
try
{
// ... modify pages via B+tree ...
await pager.CommitAsync(); // writes dirty pages to WAL, fsync
}
catch
{
await pager.RollbackAsync(); // discards uncommitted WAL frames
throw;
}
Snapshot Isolation (Concurrent Readers)
WalSnapshot snapshot = pager.AcquireReaderSnapshot();
Pager snapshotPager = pager.CreateSnapshotReader(snapshot);
byte[] page = await snapshotPager.GetPageAsync(pageId: 1);
pager.ReleaseReaderSnapshot();
Configure Checkpoint Policy
var options = new PagerOptions
{
CheckpointPolicy = new AnyCheckpointPolicy(
new FrameCountCheckpointPolicy(threshold: 500),
new TimeIntervalCheckpointPolicy(TimeSpan.FromMinutes(5))
),
AutoCheckpointExecutionMode = AutoCheckpointExecutionMode.Background,
AutoCheckpointMaxPagesPerStep = 64
};
Built-in policies:
| Policy | Triggers When |
|---|---|
FrameCountCheckpointPolicy(n) | Committed frame count exceeds n |
WalSizeCheckpointPolicy(bytes) | Estimated WAL size exceeds bytes |
TimeIntervalCheckpointPolicy(span) | Elapsed time since last checkpoint exceeds span |
AnyCheckpointPolicy(...) | Any sub-policy triggers |
B+Tree
B+tree keyed by signed 64-bit long keys. Leaf pages store (key, payload) pairs; interior pages store routing keys and child pointers. Supports forward-only cursor iteration, cache-only fast paths, and page-level rebalance/merge on delete.
Create and Use a B+Tree
uint rootPageId = await BTree.CreateNewAsync(pager);
var tree = new BTree(pager, rootPageId);
// Insert
byte[] payload = System.Text.Encoding.UTF8.GetBytes("Hello, B+tree!");
await tree.InsertAsync(key: 42, payload);
// Point lookup
byte[]? result = await tree.FindAsync(key: 42);
// Cache-only fast path
if (tree.TryFindCached(key: 42, out byte[]? cached))
Console.WriteLine($"Cache hit: {cached is not null}");
// Delete
bool deleted = await tree.DeleteAsync(key: 42);
// Forward cursor scan
var cursor = tree.CreateCursor();
while (await cursor.MoveNextAsync())
Console.WriteLine($"Key={cursor.CurrentKey}");
// Seek to a specific key
if (await cursor.SeekAsync(targetKey: 100))
do { Console.WriteLine(cursor.CurrentKey); }
while (await cursor.MoveNextAsync());
// Count entries
long count = await tree.CountEntriesAsync();
Write-Ahead Log (WAL)
Redo-style WAL for crash recovery and concurrent snapshot-isolated readers. Each commit writes dirty pages as frames to the WAL file. On checkpoint, committed frames are copied to the main database file.
WAL File Format:
[WAL Header: 32 bytes]
Magic - Version - PageSize - Checksum salt
[Frame 0: 4120 bytes]
[FrameHeader: 24 bytes] - PageId - DbPageCount - Checksum
[PageData: 4096 bytes]
[Frame 1: 4120 bytes]
...
Open, Write, and Checkpoint
var walIndex = new WalIndex();
await using var wal = new WriteAheadLog("mydb.cdb", walIndex);
await wal.OpenAsync(currentDbPageCount: pager.PageCount);
// Write transaction
wal.BeginTransaction();
await wal.AppendFrameAsync(pageId: 1, pageData);
await wal.CommitAsync(newDbPageCount: pager.PageCount);
// Checkpoint to main file
await wal.CheckpointAsync(device, pageCount: pager.PageCount);
walIndex.Reset();
Slotted Page Layout
SlottedPage is a struct that overlays a byte[4096] buffer, providing structured access to variable-size cells within a fixed-size page.
byte[] buffer = new byte[4096];
var sp = new SlottedPage(buffer, pageId: 1);
sp.Initialize(PageConstants.PageTypeLeaf);
byte[] cellData = new byte[] { 0x01, 0x02, 0x03, 0x04 };
bool inserted = sp.InsertCell(index: 0, cellData);
Span<byte> cell = sp.GetCell(index: 0);
sp.DeleteCell(index: 0);
sp.Defragment(); // rewrites cells contiguously
Indexing
Secondary B+tree-backed indexes with optional caching and ordered range scan support.
uint indexRootPage = await BTree.CreateNewAsync(pager);
var indexTree = new BTree(pager, indexRootPage);
IIndexStore index = new BTreeIndexStore(indexTree);
// Insert and lookup
byte[] rowIdPayload = BitConverter.GetBytes(42L);
await index.InsertAsync(key: hashOfColumnValue, rowIdPayload);
byte[]? result = await index.FindAsync(key: hashOfColumnValue);
// Range scan
var range = new IndexScanRange(
LowerBound: 100, LowerInclusive: true,
UpperBound: 200, UpperInclusive: false);
var cursor = index.CreateCursor(range);
while (await cursor.MoveNextAsync())
Console.WriteLine($"IndexKey={cursor.CurrentKey}");
// Add caching
IIndexStore cached = new CachingIndexStore(
inner: new BTreeIndexStore(indexTree),
capacity: 2048);
Record Serialization
Compact binary encoding for database rows. Supports selective column projection and fast filter evaluation without materializing managed strings.
Binary Format:
[columnCount: varint]
[col0_typeTag: 1 byte] [col0_data: ...]
[col1_typeTag: 1 byte] [col1_data: ...]
Type Tags:
Null (0x00) -> no data
Integer (0x01) -> 8 bytes, little-endian long
Text (0x02) -> [length: varint] [UTF-8 bytes]
Real (0x03) -> 8 bytes, little-endian double (IEEE 754)
Blob (0x04) -> [length: varint] [raw bytes]
// Encode and decode
var values = new DbValue[]
{
DbValue.FromInteger(1),
DbValue.FromText("Alice"),
DbValue.FromInteger(30)
};
byte[] encoded = RecordEncoder.Encode(values);
DbValue[] decoded = RecordEncoder.Decode(encoded);
// Selective column projection
DbValue[] partial = RecordEncoder.DecodeUpTo(encoded, maxColumnIndexInclusive: 1);
DbValue age = RecordEncoder.DecodeColumn(encoded, columnIndex: 2);
// Fast filter without materialization
byte[] expectedUtf8 = "Alice"u8.ToArray();
if (RecordEncoder.TryColumnTextEquals(encoded, columnIndex: 1, expectedUtf8, out bool equals))
Console.WriteLine($"Column 1 is Alice: {equals}");
Schema Catalog
B+tree-backed metadata store for tables, indexes, views, and triggers. Provides in-memory caching with a schema version counter for cache invalidation.
var catalog = await SchemaCatalog.CreateAsync(pager);
// Create a table
var schema = new TableSchema
{
TableName = "users",
Columns = new[]
{
new ColumnDefinition { Name = "id", Type = DbType.Integer, IsPrimaryKey = true },
new ColumnDefinition { Name = "name", Type = DbType.Text },
new ColumnDefinition { Name = "age", Type = DbType.Integer },
}
};
await catalog.CreateTableAsync(schema);
// Query table metadata
TableSchema? users = catalog.GetTable("users");
uint rootPage = catalog.GetTableRootPage("users");
BTree tableTree = catalog.GetTableTree("users");
// Create and query indexes
var indexSchema = new IndexSchema
{
IndexName = "idx_users_name",
TableName = "users",
Columns = new[] { "name" },
IsUnique = false,
};
await catalog.CreateIndexAsync(indexSchema);
IIndexStore indexStore = catalog.GetIndexStore("idx_users_name");
// Views and triggers
await catalog.CreateViewAsync("active_users", "SELECT * FROM users WHERE age > 18");
string? viewSql = catalog.GetViewSql("active_users");
Folder & File Storage
To build a folder/file storage system, use the higher-level Database + Collection<T> API from CSharpDB.Engine. A single .cdb file holds all folders and files as typed collection documents in B+tree-backed collections.
await using var db = await Database.OpenAsync("storage.cdb");
var folders = await db.GetCollectionAsync<FolderEntry>("folders");
var files = await db.GetCollectionAsync<FileEntry>("files");
// Create a folder
await folders.PutAsync("/documents", new FolderEntry(
Name: "documents", Path: "/documents",
CreatedAt: DateTime.UtcNow));
// Create a file
await files.PutAsync("/documents/readme.txt", new FileEntry(
Name: "readme.txt", FolderPath: "/documents",
Content: "Hello world", ContentType: "text/plain",
SizeBytes: 11, CreatedAt: DateTime.UtcNow, UpdatedAt: DateTime.UtcNow));
// Read a file
FileEntry? file = await files.GetAsync("/documents/readme.txt");
// List files in a folder
await foreach (var kvp in files.FindAsync(f => f.FolderPath == "/documents"))
Console.WriteLine(kvp.Value.Name);
Key Design Notes
| Concern | Detail |
|---|---|
| No shared file pointer | RandomAccess APIs are stateless w.r.t. position, so concurrent reads at different offsets are safe without locking. |
| Async-first | All I/O is issued via RandomAccess.ReadAsync / WriteAsync. |
| Zero-fill on short reads | ReadAsync always fills the entire buffer. Pages beyond EOF are returned as zeros. |
| fsync on flush | FlushAsync calls RandomAccess.FlushToDisk which maps to FlushFileBuffers (Windows) or fsync (Linux/macOS). |
| 4 KB page size | All pages are PageConstants.PageSize (4096 bytes). Page 0 reserves 100 bytes for the file header. |
| Single writer, multiple readers | The TransactionCoordinator enforces a single writer via SemaphoreSlim. Readers use WAL snapshots for isolation. |
| Optional memory-mapped reads | The pager can use memory-mapped reads for clean main-file pages when PagerOptions.UseMemoryMappedReads is enabled. |
| Sequential scan read-ahead | The pager can speculatively pull the next B+tree leaf page during forward scans. |
| Checkpoint residency preservation | With PreserveOwnedPagesOnCheckpoint, already-owned main-file pages can stay resident across checkpoint. |
| Pluggable checkpoint policies | ICheckpointPolicy allows frame-count, WAL-size, time-interval, or custom composite triggers. |
| Schema versioning | SchemaCatalog.SchemaVersion increments on every DDL operation, enabling cache invalidation. |