CSharpDB.Storage

A low-level, high-performance storage engine for .NET 10 built on top of RandomAccess and SafeFileHandle. It provides random-access async I/O, page caching, write-ahead logging (WAL), crash recovery, and the B+tree/index primitives that power the SQL engine and collection API.

Architecture Overview

+------------------------------------------------------+
|                   Application                        |
|          (SQL Engine / Collection API)               |
+------------------------------------------------------+
|                 SchemaCatalog                         |
|    Tables - Indexes - Views - Triggers               |
+--------------+---------------+-----------------------+
|    BTree     |  IndexStore   |  RecordEncoder        |
|  (data)      |  (secondary)  |  (row format)         |
+--------------+---------------+-----------------------+
|                    Pager                              |
|   PageCache - DirtyTracking - PageAllocator          |
+----------------------+-------------------------------+
|   WriteAheadLog      |   CheckpointCoordinator       |
|   (WAL + WalIndex)   |   (policy-driven)             |
+----------------------+-------------------------------+
|              IStorageDevice                           |
|         (FileStorageDevice / memory)                  |
+------------------------------------------------------+

Current storage builds can also enable:

  • Memory-mapped reads for clean main-file pages when the storage device supports it
  • Speculative B+tree leaf read-ahead during sequential forward scans
  • Checkpoint residency preservation so already-owned main-file pages stay hot across checkpoint

Page Layout

Page 0 (File Header):
  [Magic: 4 bytes "CSDB"]
  [FormatVersion: 4 bytes]
  [PageSize: 4 bytes = 4096]
  [PageCount: 4 bytes]
  [SchemaRootPage: 4 bytes]
  [FreelistHead: 4 bytes]
  [ChangeCounter: 4 bytes]
  [... reserved to 100 bytes ...]
  [Slotted page content: 3996 bytes]

Pages 1+:
  [SlottedPage: 4096 bytes]
    [Header: 9 bytes]
      PageType (1) - CellCount (2) - CellContentStart (2) - RightChild/NextLeaf (4)
    [CellPointers: 2 bytes each]
    [Free space]
    [Cells: growing backward from page end]

FileStorageDevice

FileStorageDevice wraps a SafeFileHandle opened with FileOptions.Asynchronous | FileOptions.RandomAccess, giving you:

  • True async I/O via RandomAccess.ReadAsync / RandomAccess.WriteAsync
  • Position-independent reads and writes — no seek, no shared file pointer
  • Concurrent reads from other processes (FileShare.Read)
  • Direct fsync to durable storage via RandomAccess.FlushToDisk
public FileStorageDevice(string filePath, bool createNew = false)
ParameterDescription
filePathPath to the database file.
createNewtrueFileMode.CreateNew (fails if file exists). falseOpenOrCreate.

IStorageDevice Interface

All storage operations go through IStorageDevice, making it easy to swap implementations (e.g., for in-memory testing).

public interface IStorageDevice : IAsyncDisposable, IDisposable
{
    long Length { get; }
    ValueTask<int> ReadAsync(long offset, Memory<byte> buffer, CancellationToken ct = default);
    ValueTask WriteAsync(long offset, ReadOnlyMemory<byte> buffer, CancellationToken ct = default);
    ValueTask FlushAsync(CancellationToken ct = default);
    ValueTask SetLengthAsync(long length, CancellationToken ct = default);
}

Device Scenarios

Create a New File

await using var device = new FileStorageDevice("mydb.cdb", createNew: true);
Console.WriteLine($"File created. Length: {device.Length}"); // 0

Write and Read Raw Bytes

await using var device = new FileStorageDevice("mydb.cdb");
byte[] payload = "Hello, CSharpDB!"u8.ToArray();
await device.WriteAsync(offset: 0, payload);

var buffer = new byte[16];
int bytesRead = await device.ReadAsync(offset: 0, buffer);
Console.WriteLine(System.Text.Encoding.UTF8.GetString(buffer, 0, bytesRead));

Pre-allocate / Extend File Length

await using var device = new FileStorageDevice("mydb.cdb", createNew: true);
const int PageSize = 4096;
await device.SetLengthAsync(PageSize * 8);
Console.WriteLine($"Pre-allocated: {device.Length} bytes"); // 32768

Flush to Disk (fsync)

await device.WriteAsync(offset: 0, data);
await device.FlushAsync(); // durable on disk after this returns

Injecting via IStorageDevice (Testability)

// Production wiring
IStorageDevice device = new FileStorageDevice("mydb.cdb");
var pager = await Pager.CreateAsync(device, wal, walIndex);

// In a unit test -- swap in your own IStorageDevice implementation

Pager

The Pager sits between the B+tree layer and the storage device. It owns the page cache, tracks dirty pages, coordinates transactions, manages WAL integration, and drives checkpointing.

Create a New Database

await using var device = new FileStorageDevice("mydb.cdb", createNew: true);
var walIndex = new WalIndex();
await using var wal = new WriteAheadLog("mydb.cdb", walIndex);
await wal.OpenAsync(currentDbPageCount: 0);

var pager = await Pager.CreateAsync(device, wal, walIndex);
await pager.InitializeNewDatabaseAsync(); // writes file header (page 0)

Open and Recover an Existing Database

var pager = await Pager.CreateAsync(device, wal, walIndex);
await pager.RecoverAsync(); // replays committed WAL frames

Read and Write Pages

byte[] page = await pager.GetPageAsync(pageId: 1);
page[0] = 0xFF;
await pager.MarkDirtyAsync(pageId: 1); // tracked for WAL write on commit

Transaction Lifecycle

await pager.BeginTransactionAsync();
try
{
    // ... modify pages via B+tree ...
    await pager.CommitAsync(); // writes dirty pages to WAL, fsync
}
catch
{
    await pager.RollbackAsync(); // discards uncommitted WAL frames
    throw;
}

Snapshot Isolation (Concurrent Readers)

WalSnapshot snapshot = pager.AcquireReaderSnapshot();
Pager snapshotPager = pager.CreateSnapshotReader(snapshot);
byte[] page = await snapshotPager.GetPageAsync(pageId: 1);
pager.ReleaseReaderSnapshot();

Configure Checkpoint Policy

var options = new PagerOptions
{
    CheckpointPolicy = new AnyCheckpointPolicy(
        new FrameCountCheckpointPolicy(threshold: 500),
        new TimeIntervalCheckpointPolicy(TimeSpan.FromMinutes(5))
    ),
    AutoCheckpointExecutionMode = AutoCheckpointExecutionMode.Background,
    AutoCheckpointMaxPagesPerStep = 64
};

Built-in policies:

PolicyTriggers When
FrameCountCheckpointPolicy(n)Committed frame count exceeds n
WalSizeCheckpointPolicy(bytes)Estimated WAL size exceeds bytes
TimeIntervalCheckpointPolicy(span)Elapsed time since last checkpoint exceeds span
AnyCheckpointPolicy(...)Any sub-policy triggers

B+Tree

B+tree keyed by signed 64-bit long keys. Leaf pages store (key, payload) pairs; interior pages store routing keys and child pointers. Supports forward-only cursor iteration, cache-only fast paths, and page-level rebalance/merge on delete.

Create and Use a B+Tree

uint rootPageId = await BTree.CreateNewAsync(pager);
var tree = new BTree(pager, rootPageId);

// Insert
byte[] payload = System.Text.Encoding.UTF8.GetBytes("Hello, B+tree!");
await tree.InsertAsync(key: 42, payload);

// Point lookup
byte[]? result = await tree.FindAsync(key: 42);

// Cache-only fast path
if (tree.TryFindCached(key: 42, out byte[]? cached))
    Console.WriteLine($"Cache hit: {cached is not null}");

// Delete
bool deleted = await tree.DeleteAsync(key: 42);

// Forward cursor scan
var cursor = tree.CreateCursor();
while (await cursor.MoveNextAsync())
    Console.WriteLine($"Key={cursor.CurrentKey}");

// Seek to a specific key
if (await cursor.SeekAsync(targetKey: 100))
    do { Console.WriteLine(cursor.CurrentKey); }
    while (await cursor.MoveNextAsync());

// Count entries
long count = await tree.CountEntriesAsync();

Write-Ahead Log (WAL)

Redo-style WAL for crash recovery and concurrent snapshot-isolated readers. Each commit writes dirty pages as frames to the WAL file. On checkpoint, committed frames are copied to the main database file.

WAL File Format:
  [WAL Header: 32 bytes]
    Magic - Version - PageSize - Checksum salt
  [Frame 0: 4120 bytes]
    [FrameHeader: 24 bytes] - PageId - DbPageCount - Checksum
    [PageData: 4096 bytes]
  [Frame 1: 4120 bytes]
    ...

Open, Write, and Checkpoint

var walIndex = new WalIndex();
await using var wal = new WriteAheadLog("mydb.cdb", walIndex);
await wal.OpenAsync(currentDbPageCount: pager.PageCount);

// Write transaction
wal.BeginTransaction();
await wal.AppendFrameAsync(pageId: 1, pageData);
await wal.CommitAsync(newDbPageCount: pager.PageCount);

// Checkpoint to main file
await wal.CheckpointAsync(device, pageCount: pager.PageCount);
walIndex.Reset();

Slotted Page Layout

SlottedPage is a struct that overlays a byte[4096] buffer, providing structured access to variable-size cells within a fixed-size page.

byte[] buffer = new byte[4096];
var sp = new SlottedPage(buffer, pageId: 1);
sp.Initialize(PageConstants.PageTypeLeaf);

byte[] cellData = new byte[] { 0x01, 0x02, 0x03, 0x04 };
bool inserted = sp.InsertCell(index: 0, cellData);
Span<byte> cell = sp.GetCell(index: 0);

sp.DeleteCell(index: 0);
sp.Defragment(); // rewrites cells contiguously

Indexing

Secondary B+tree-backed indexes with optional caching and ordered range scan support.

uint indexRootPage = await BTree.CreateNewAsync(pager);
var indexTree = new BTree(pager, indexRootPage);
IIndexStore index = new BTreeIndexStore(indexTree);

// Insert and lookup
byte[] rowIdPayload = BitConverter.GetBytes(42L);
await index.InsertAsync(key: hashOfColumnValue, rowIdPayload);
byte[]? result = await index.FindAsync(key: hashOfColumnValue);

// Range scan
var range = new IndexScanRange(
    LowerBound: 100, LowerInclusive: true,
    UpperBound: 200, UpperInclusive: false);
var cursor = index.CreateCursor(range);
while (await cursor.MoveNextAsync())
    Console.WriteLine($"IndexKey={cursor.CurrentKey}");

// Add caching
IIndexStore cached = new CachingIndexStore(
    inner: new BTreeIndexStore(indexTree),
    capacity: 2048);

Record Serialization

Compact binary encoding for database rows. Supports selective column projection and fast filter evaluation without materializing managed strings.

Binary Format:
  [columnCount: varint]
  [col0_typeTag: 1 byte] [col0_data: ...]
  [col1_typeTag: 1 byte] [col1_data: ...]

Type Tags:
  Null (0x00)    -> no data
  Integer (0x01) -> 8 bytes, little-endian long
  Text (0x02)    -> [length: varint] [UTF-8 bytes]
  Real (0x03)    -> 8 bytes, little-endian double (IEEE 754)
  Blob (0x04)    -> [length: varint] [raw bytes]
// Encode and decode
var values = new DbValue[]
{
    DbValue.FromInteger(1),
    DbValue.FromText("Alice"),
    DbValue.FromInteger(30)
};
byte[] encoded = RecordEncoder.Encode(values);
DbValue[] decoded = RecordEncoder.Decode(encoded);

// Selective column projection
DbValue[] partial = RecordEncoder.DecodeUpTo(encoded, maxColumnIndexInclusive: 1);
DbValue age = RecordEncoder.DecodeColumn(encoded, columnIndex: 2);

// Fast filter without materialization
byte[] expectedUtf8 = "Alice"u8.ToArray();
if (RecordEncoder.TryColumnTextEquals(encoded, columnIndex: 1, expectedUtf8, out bool equals))
    Console.WriteLine($"Column 1 is Alice: {equals}");

Schema Catalog

B+tree-backed metadata store for tables, indexes, views, and triggers. Provides in-memory caching with a schema version counter for cache invalidation.

var catalog = await SchemaCatalog.CreateAsync(pager);

// Create a table
var schema = new TableSchema
{
    TableName = "users",
    Columns = new[]
    {
        new ColumnDefinition { Name = "id", Type = DbType.Integer, IsPrimaryKey = true },
        new ColumnDefinition { Name = "name", Type = DbType.Text },
        new ColumnDefinition { Name = "age", Type = DbType.Integer },
    }
};
await catalog.CreateTableAsync(schema);

// Query table metadata
TableSchema? users = catalog.GetTable("users");
uint rootPage = catalog.GetTableRootPage("users");
BTree tableTree = catalog.GetTableTree("users");

// Create and query indexes
var indexSchema = new IndexSchema
{
    IndexName = "idx_users_name",
    TableName = "users",
    Columns = new[] { "name" },
    IsUnique = false,
};
await catalog.CreateIndexAsync(indexSchema);
IIndexStore indexStore = catalog.GetIndexStore("idx_users_name");

// Views and triggers
await catalog.CreateViewAsync("active_users", "SELECT * FROM users WHERE age > 18");
string? viewSql = catalog.GetViewSql("active_users");

Folder & File Storage

To build a folder/file storage system, use the higher-level Database + Collection<T> API from CSharpDB.Engine. A single .cdb file holds all folders and files as typed collection documents in B+tree-backed collections.

await using var db = await Database.OpenAsync("storage.cdb");
var folders = await db.GetCollectionAsync<FolderEntry>("folders");
var files   = await db.GetCollectionAsync<FileEntry>("files");

// Create a folder
await folders.PutAsync("/documents", new FolderEntry(
    Name: "documents", Path: "/documents",
    CreatedAt: DateTime.UtcNow));

// Create a file
await files.PutAsync("/documents/readme.txt", new FileEntry(
    Name: "readme.txt", FolderPath: "/documents",
    Content: "Hello world", ContentType: "text/plain",
    SizeBytes: 11, CreatedAt: DateTime.UtcNow, UpdatedAt: DateTime.UtcNow));

// Read a file
FileEntry? file = await files.GetAsync("/documents/readme.txt");

// List files in a folder
await foreach (var kvp in files.FindAsync(f => f.FolderPath == "/documents"))
    Console.WriteLine(kvp.Value.Name);

Key Design Notes

ConcernDetail
No shared file pointerRandomAccess APIs are stateless w.r.t. position, so concurrent reads at different offsets are safe without locking.
Async-firstAll I/O is issued via RandomAccess.ReadAsync / WriteAsync.
Zero-fill on short readsReadAsync always fills the entire buffer. Pages beyond EOF are returned as zeros.
fsync on flushFlushAsync calls RandomAccess.FlushToDisk which maps to FlushFileBuffers (Windows) or fsync (Linux/macOS).
4 KB page sizeAll pages are PageConstants.PageSize (4096 bytes). Page 0 reserves 100 bytes for the file header.
Single writer, multiple readersThe TransactionCoordinator enforces a single writer via SemaphoreSlim. Readers use WAL snapshots for isolation.
Optional memory-mapped readsThe pager can use memory-mapped reads for clean main-file pages when PagerOptions.UseMemoryMappedReads is enabled.
Sequential scan read-aheadThe pager can speculatively pull the next B+tree leaf page during forward scans.
Checkpoint residency preservationWith PreserveOwnedPagesOnCheckpoint, already-owned main-file pages can stay resident across checkpoint.
Pluggable checkpoint policiesICheckpointPolicy allows frame-count, WAL-size, time-interval, or custom composite triggers.
Schema versioningSchemaCatalog.SchemaVersion increments on every DDL operation, enabling cache invalidation.

See Also