Storage Engine Tutorial

A hands-on walkthrough of every layer in the CSharpDB storage stack — from raw file I/O up to a complete folder/file storage system.

Prerequisites: .NET 10 SDK installed. Basic familiarity with async/await in C#. Add the CSharpDB.Storage NuGet package (and CSharpDB.Engine for the higher-level sections).

Architecture Overview

The storage engine is organized in layers. Each layer depends only on the one below it, making individual components testable and replaceable.

Application

SQL Engine / Collection API

▼

SchemaCatalog

Tables Indexes Views Triggers

▼

BTree

Data storage

IndexStore

Secondary indexes

RecordEncoder

Row format

▼

Pager

PageCache DirtyTracking PageAllocator

▼

WriteAheadLog

WAL + WalIndex

CheckpointCoordinator

Policy-driven

▼

IStorageDevice

FileStorageDevice / memory

Additional read-path optimizations layered on top:

Memory-mapped reads for clean main-file pages when the storage device supports it
Speculative B+tree leaf read-ahead during sequential forward scans
Checkpoint residency preservation so already-owned pages stay hot in lazy-resident hybrid mode

Page Layout

Page 0 (File Header):
  [Magic: 4 bytes "CSDB"]
  [FormatVersion: 4 bytes]
  [PageSize: 4 bytes = 4096]
  [PageCount: 4 bytes]
  [SchemaRootPage: 4 bytes]
  [FreelistHead: 4 bytes]
  [ChangeCounter: 4 bytes]
  [... reserved to 100 bytes ...]
  [Slotted page content: 3996 bytes]

Pages 1+:
  [SlottedPage: 4096 bytes]
    [Header: 9 bytes]
      PageType (1) ─ CellCount (2) ─ CellContentStart (2) ─ RightChild/NextLeaf (4)
    [CellPointers: 2 bytes each]
    [Free space]
    [Cells: growing backward from page end]

FileStorageDevice

FileStorageDevice wraps a SafeFileHandle opened with FileOptions.Asynchronous | FileOptions.RandomAccess, providing:

True async I/O via RandomAccess.ReadAsync / WriteAsync
Position-independent reads and writes — no seek, no shared file pointer
Concurrent reads from other processes (FileShare.Read)
Direct fsync to durable storage via RandomAccess.FlushToDisk

public FileStorageDevice(string filePath, bool createNew = false)

Parameter	Description
`filePath`	Path to the database file.
`createNew`	`true` → `FileMode.CreateNew` (fails if file exists). `false` → `OpenOrCreate`.

IStorageDevice Interface

All storage operations go through IStorageDevice, making it easy to swap implementations for in-memory testing.

public interface IStorageDevice : IAsyncDisposable, IDisposable
{
    long Length { get; }
    ValueTask<int> ReadAsync(long offset, Memory<byte> buffer, CancellationToken ct = default);
    ValueTask WriteAsync(long offset, ReadOnlyMemory<byte> buffer, CancellationToken ct = default);
    ValueTask FlushAsync(CancellationToken ct = default);
    ValueTask SetLengthAsync(long length, CancellationToken ct = default);
}

Device Scenarios

1. Create a New File

await using var device = new FileStorageDevice("mydb.cdb", createNew: true);
Console.WriteLine($"File created. Length: {device.Length}"); // 0

2. Open an Existing File

await using var device = new FileStorageDevice("mydb.cdb");
Console.WriteLine($"Opened. Length: {device.Length}");

3. Write Raw Bytes

Writes are position-independent. Multiple writes at different offsets can be issued concurrently without locking.

await using var device = new FileStorageDevice("mydb.cdb");
byte[] payload = "Hello, CSharpDB!"u8.ToArray();
await device.WriteAsync(offset: 0, payload);

4. Read Raw Bytes

ReadAsync loops internally until the buffer is fully filled or EOF is reached.

await using var device = new FileStorageDevice("mydb.cdb");
var buffer = new byte[16];
int bytesRead = await device.ReadAsync(offset: 0, buffer);
Console.WriteLine($"Read {bytesRead} byte(s)");

5. Zero-Fill on Short Reads

Reading past EOF zero-fills the remainder of the buffer — useful for treating uninitialized pages as zeroed memory.

await using var device = new FileStorageDevice("mydb.cdb");
var buffer = new byte[4096];
int bytesRead = await device.ReadAsync(offset: 0, buffer);
Console.WriteLine($"Bytes on disk: {bytesRead}");
Console.WriteLine($"Remainder is zeros: {buffer[16] == 0}");

6. Pre-allocate / Extend File

Pre-allocating avoids fragmentation and is required before writing pages beyond the current end-of-file on some file systems.

await using var device = new FileStorageDevice("mydb.cdb", createNew: true);
const int PageSize = 4096;
await device.SetLengthAsync(PageSize * 8);
Console.WriteLine($"Pre-allocated: {device.Length} bytes"); // 32768

7. Flush to Disk (fsync)

FlushAsync calls RandomAccess.FlushToDisk, issuing a full fsync. Call this after committing a transaction to guarantee durability.

await using var device = new FileStorageDevice("mydb.cdb");
byte[] data = new byte[4096];
await device.WriteAsync(offset: 0, data);
await device.FlushAsync(); // durable on disk after this returns

8. Writing Fixed-Size Pages (4 KB)

The storage engine works in 4 096-byte pages. Write a page at a computed offset.

await using var device = new FileStorageDevice("mydb.cdb", createNew: true);
const int PageSize = 4096;
await device.SetLengthAsync(PageSize * 4);

byte[] page = new byte[PageSize];
System.Text.Encoding.UTF8.GetBytes("page-0 data").CopyTo(page.AsSpan());

uint pageId = 0;
await device.WriteAsync(offset: (long)pageId * PageSize, page);

9. Cancellation Support

All async methods accept a CancellationToken for clean abort of long I/O operations.

using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(5));
await using var device = new FileStorageDevice("mydb.cdb");
var buffer = new byte[4096];
try
{
    await device.ReadAsync(offset: 0, buffer, cts.Token);
}
catch (OperationCanceledException)
{
    Console.WriteLine("Read was cancelled.");
}

10. Testability via IStorageDevice

Program against IStorageDevice so you can substitute an in-memory implementation in tests.

public sealed class MemoryStorageDevice : IStorageDevice
{
    private byte[] _data = [];
    public long Length => _data.Length;

    public ValueTask<int> ReadAsync(long offset, Memory<byte> buffer, CancellationToken ct = default)
    {
        int available = (int)Math.Max(0, _data.Length - offset);
        int toCopy = Math.Min(buffer.Length, available);
        _data.AsMemory((int)offset, toCopy).CopyTo(buffer);
        buffer[toCopy..].Span.Clear();
        return ValueTask.FromResult(toCopy);
    }

    public ValueTask WriteAsync(long offset, ReadOnlyMemory<byte> buffer, CancellationToken ct = default)
    {
        long needed = offset + buffer.Length;
        if (needed > _data.Length) Array.Resize(ref _data, (int)needed);
        buffer.CopyTo(_data.AsMemory((int)offset));
        return ValueTask.CompletedTask;
    }

    public ValueTask FlushAsync(CancellationToken ct = default) => ValueTask.CompletedTask;
    public ValueTask SetLengthAsync(long length, CancellationToken ct = default)
    {
        Array.Resize(ref _data, (int)length);
        return ValueTask.CompletedTask;
    }

    public ValueTask DisposeAsync() => ValueTask.CompletedTask;
    public void Dispose() { }
}

The Pager sits between the B+tree layer and the storage device. It owns the page cache, tracks dirty pages, coordinates transactions, manages WAL integration, and drives checkpointing.

P1. Create a New Database

await using var device = new FileStorageDevice("mydb.cdb", createNew: true);
var walIndex = new WalIndex();
await using var wal = new WriteAheadLog("mydb.cdb", walIndex);
await wal.OpenAsync(currentDbPageCount: 0);

var pager = await Pager.CreateAsync(device, wal, walIndex);
await pager.InitializeNewDatabaseAsync(); // writes file header (page 0)

Console.WriteLine($"Pages: {pager.PageCount}"); // 1

P2. Open and Recover

On startup, call RecoverAsync to redo any committed WAL frames that were not yet checkpointed.

await using var device = new FileStorageDevice("mydb.cdb");
var walIndex = new WalIndex();
await using var wal = new WriteAheadLog("mydb.cdb", walIndex);

var pager = await Pager.CreateAsync(device, wal, walIndex);
await pager.RecoverAsync(); // replays committed WAL frames

P3. Read and Write Pages

// Read a page (checks cache -> WAL -> device)
byte[] page = await pager.GetPageAsync(pageId: 1);

// Modify in-place, then mark dirty
page[0] = 0xFF;
await pager.MarkDirtyAsync(pageId: 1); // tracked for WAL write on commit

P4. Allocate and Free Pages

// Allocate (extends file or reuses from freelist)
uint newPageId = await pager.AllocatePageAsync();

// Free (adds to freelist for reuse)
await pager.FreePageAsync(newPageId);

P5. Transaction Lifecycle

Single writer per database. Reads do not require transactions.

await pager.BeginTransactionAsync();
try
{
    // ... modify pages via B+tree ...
    await pager.CommitAsync(); // writes dirty pages to WAL, fsync
}
catch
{
    await pager.RollbackAsync(); // discards uncommitted WAL frames
    throw;
}

P6. Snapshot Isolation

Multiple readers run concurrently with the writer. Each reader sees a consistent point-in-time snapshot.

Key Insight: Readers capture the WAL state at the moment they start. Even as the writer commits new transactions, existing readers continue to see their original snapshot.

WalSnapshot snapshot = pager.AcquireReaderSnapshot();

Pager snapshotPager = pager.CreateSnapshotReader(snapshot);
byte[] page = await snapshotPager.GetPageAsync(pageId: 1);

pager.ReleaseReaderSnapshot();

P7. Checkpoint Policies

var options = new PagerOptions
{
    CheckpointPolicy = new AnyCheckpointPolicy(
        new FrameCountCheckpointPolicy(threshold: 500),
        new TimeIntervalCheckpointPolicy(TimeSpan.FromMinutes(5))
    ),
    AutoCheckpointExecutionMode = AutoCheckpointExecutionMode.Background,
    AutoCheckpointMaxPagesPerStep = 64
};

Policy	Triggers When
`FrameCountCheckpointPolicy(n)`	Committed frame count exceeds `n`
`WalSizeCheckpointPolicy(bytes)`	Estimated WAL size exceeds `bytes`
`TimeIntervalCheckpointPolicy(span)`	Elapsed time since last checkpoint exceeds `span`
`AnyCheckpointPolicy(...)`	Any sub-policy triggers

B+Tree

B+tree keyed by signed 64-bit long keys. Leaf pages store (key, payload) pairs; interior pages store routing keys and child pointers. Supports forward-only cursor iteration and cache-only fast paths.

B1. Create a New B+Tree

uint rootPageId = await BTree.CreateNewAsync(pager);
var tree = new BTree(pager, rootPageId);

B2. Insert a Key-Value Pair

Payload is raw bytes — the B+tree has no opinion on format. If the leaf is full, the tree automatically splits.

byte[] payload = System.Text.Encoding.UTF8.GetBytes("Hello, B+tree!");
await tree.InsertAsync(key: 42, payload);

B3. Point Lookup

byte[]? result = await tree.FindAsync(key: 42);
if (result is not null)
    Console.WriteLine(System.Text.Encoding.UTF8.GetString(result));

B4. Cache-Only Fast Path

Avoids async I/O when all required pages are already cached.

if (tree.TryFindCached(key: 42, out byte[]? payload))
    Console.WriteLine($"Cache hit: {payload is not null}");
else
    payload = await tree.FindAsync(key: 42);

B5. Delete a Key

bool deleted = await tree.DeleteAsync(key: 42);
Console.WriteLine(deleted ? "Deleted." : "Key not found.");

B6. Forward Cursor Scan

Iterate all entries in key order. The cursor follows leaf-to-leaf next pointers without interior page I/O.

var cursor = tree.CreateCursor();
while (await cursor.MoveNextAsync())
{
    long key = cursor.CurrentKey;
    ReadOnlyMemory<byte> value = cursor.CurrentValue;
    Console.WriteLine($"Key={key}, PayloadSize={value.Length}");
}

B7. Seek to a Key

Position the cursor at the first key >= target, then iterate forward.

var cursor = tree.CreateCursor();
if (await cursor.SeekAsync(targetKey: 100))
{
    do
    {
        Console.WriteLine($"Key={cursor.CurrentKey}");
    } while (await cursor.MoveNextAsync());
}

B8. Count Entries

long count = await tree.CountEntriesAsync();
Console.WriteLine($"Tree contains {count} entries.");

Write-Ahead Log (WAL)

Redo-style WAL for crash recovery and concurrent snapshot-isolated readers. Each commit writes dirty pages as frames to the WAL file. On checkpoint, committed frames are copied to the main database file.

WAL File Format:
  [WAL Header: 32 bytes]
    Magic ─ Version ─ PageSize ─ Checksum salt
  [Frame 0: 4120 bytes]
    [FrameHeader: 24 bytes] ─ PageId ─ DbPageCount ─ Checksum
    [PageData: 4096 bytes]
  [Frame 1: 4120 bytes]
    ...

W1. Open or Create a WAL

var walIndex = new WalIndex();
await using var wal = new WriteAheadLog("mydb.cdb", walIndex);
await wal.OpenAsync(currentDbPageCount: pager.PageCount);

W2. Write Transaction to WAL

wal.BeginTransaction();

await wal.AppendFrameAsync(pageId: 1, pageData);
await wal.AppendFrameAsync(pageId: 5, pageData);

await wal.CommitAsync(newDbPageCount: pager.PageCount);
// Or rollback: await wal.RollbackAsync();

W3. Reader Snapshot

WalSnapshot snapshot = walIndex.TakeSnapshot();

if (snapshot.TryGet(pageId: 1, out long walOffset))
{
    byte[] page = await wal.ReadPageAsync(walOffset);
}

W4. Checkpoint

await wal.CheckpointAsync(device, pageCount: pager.PageCount);
walIndex.Reset();
// WAL is now empty; all data is in the main file

Slotted Page Layout

SlottedPage is a struct that overlays a byte[4096] buffer, providing structured access to variable-size cells within a fixed-size page.

[Header: 9 bytes]
  PageType (1) ─ CellCount (2) ─ CellContentStart (2) ─ RightChild/NextLeaf (4)
[Cell Pointers: 2 bytes each, growing forward]
[Free Space]
[Cell Data: growing backward from page end]

S1. Initialize a Page

byte[] buffer = new byte[4096];
var sp = new SlottedPage(buffer, pageId: 1);
sp.Initialize(PageConstants.PageTypeLeaf);

Console.WriteLine($"Type: {sp.PageType}");       // Leaf
Console.WriteLine($"Cells: {sp.CellCount}");     // 0
Console.WriteLine($"Free: {sp.FreeSpace} bytes"); // ~4085

S2. Insert and Read Cells

byte[] cellData = new byte[] { 0x01, 0x02, 0x03, 0x04 };
bool inserted = sp.InsertCell(index: 0, cellData);

Span<byte> cell = sp.GetCell(index: 0);
Console.WriteLine($"Cell[0] length: {cell.Length}"); // 4

S3. Delete and Defragment

sp.DeleteCell(index: 0);
Console.WriteLine($"Cells after delete: {sp.CellCount}"); // 0

// After many inserts/deletes, free space may be fragmented
sp.Defragment(); // rewrites cells contiguously at end of page

Indexing

Secondary B+tree-backed indexes with optional caching and ordered range scan support.

public interface IIndexStore
{
    uint RootPageId { get; }
    ValueTask<byte[]?> FindAsync(long key, CancellationToken ct = default);
    ValueTask InsertAsync(long key, ReadOnlyMemory<byte> payload, CancellationToken ct = default);
    ValueTask<bool> DeleteAsync(long key, CancellationToken ct = default);
    IIndexCursor CreateCursor(IndexScanRange range);
}

I1. Create an Index Store

uint indexRootPage = await BTree.CreateNewAsync(pager);
var indexTree = new BTree(pager, indexRootPage);
IIndexStore index = new BTreeIndexStore(indexTree);

I2. Insert and Lookup

// Insert: key = hashed column value, payload = rowid
byte[] rowIdPayload = BitConverter.GetBytes(42L);
await index.InsertAsync(key: hashOfColumnValue, rowIdPayload);

// Lookup
byte[]? result = await index.FindAsync(key: hashOfColumnValue);
if (result is not null)
{
    long rowId = BitConverter.ToInt64(result);
    Console.WriteLine($"Found rowid: {rowId}");
}

I3. Range Scan with Cursor

var range = new IndexScanRange(
    LowerBound: 100, LowerInclusive: true,
    UpperBound: 200, UpperInclusive: false);

var cursor = index.CreateCursor(range);
while (await cursor.MoveNextAsync())
    Console.WriteLine($"IndexKey={cursor.CurrentKey}");

// Full scan / point lookup:
var fullCursor  = index.CreateCursor(IndexScanRange.All);
var pointCursor = index.CreateCursor(IndexScanRange.At(42));

I4. Add Caching to an Index

IIndexStore cached = new CachingIndexStore(
    inner: new BTreeIndexStore(indexTree),
    capacity: 2048);

byte[]? result = await cached.FindAsync(key: 42);

Record Serialization

Compact binary encoding for database rows. Supports selective column projection and fast filter evaluation without materializing managed strings.

Binary Format:
  [columnCount: varint]
  [col0_typeTag: 1 byte] [col0_data: ...]
  [col1_typeTag: 1 byte] [col1_data: ...]

Type Tags:
  Null    (0x00) -> no data
  Integer (0x01) -> 8 bytes, little-endian long
  Text    (0x02) -> [length: varint] [UTF-8 bytes]
  Real    (0x03) -> 8 bytes, little-endian double (IEEE 754)
  Blob    (0x04) -> [length: varint] [raw bytes]

R1. Encode and Decode a Row

var values = new DbValue[]
{
    DbValue.FromInteger(1),
    DbValue.FromText("Alice"),
    DbValue.FromInteger(30)
};

byte[] encoded = RecordEncoder.Encode(values);
DbValue[] decoded = RecordEncoder.Decode(encoded);

Console.WriteLine($"Id={decoded[0].AsInteger}, Name={decoded[1].AsText}, Age={decoded[2].AsInteger}");

R2. Selective Column Projection

Decode only the columns you need — avoids materializing unused fields.

// Decode only columns 0 and 1 (skip column 2)
DbValue[] partial = RecordEncoder.DecodeUpTo(encoded, maxColumnIndexInclusive: 1);

// Decode a single column by index
DbValue age = RecordEncoder.DecodeColumn(encoded, columnIndex: 2);

R3. Fast Filter Without Materialization

// Check if column 1 equals "Alice" without creating a string
byte[] expectedUtf8 = "Alice"u8.ToArray();
if (RecordEncoder.TryColumnTextEquals(encoded, columnIndex: 1, expectedUtf8, out bool equals))
    Console.WriteLine($"Column 1 is Alice: {equals}");

// Check numeric column
if (RecordEncoder.TryDecodeNumericColumn(encoded, columnIndex: 2,
    out long intValue, out double realValue, out bool isReal))
    Console.WriteLine($"Age: {intValue}");

// Check for null
bool isNull = RecordEncoder.IsColumnNull(encoded, columnIndex: 0);

R4. Varint Encoding

Span<byte> buffer = stackalloc byte[10];
int bytesWritten = Varint.Write(buffer, 300UL);

ulong value = Varint.Read(buffer, out int bytesRead);
Console.WriteLine($"Value: {value}, Bytes: {bytesRead}"); // 300, 2

Schema Catalog

B+tree-backed metadata store for tables, indexes, views, and triggers with in-memory caching and schema version tracking.

C1. Initialize the Catalog

var catalog = await SchemaCatalog.CreateAsync(pager);
Console.WriteLine($"Schema version: {catalog.SchemaVersion}");

C2. Create and Query Tables

var schema = new TableSchema
{
    TableName = "users",
    Columns = new[]
    {
        new ColumnDefinition { Name = "id", Type = DbType.Integer, IsPrimaryKey = true },
        new ColumnDefinition { Name = "name", Type = DbType.Text },
        new ColumnDefinition { Name = "age", Type = DbType.Integer },
    }
};
await catalog.CreateTableAsync(schema);

TableSchema? users = catalog.GetTable("users");
uint rootPage = catalog.GetTableRootPage("users");
BTree tableTree = catalog.GetTableTree("users");

C3. Create and Query Indexes

var indexSchema = new IndexSchema
{
    IndexName = "idx_users_name",
    TableName = "users",
    Columns = new[] { "name" },
    IsUnique = false,
};
await catalog.CreateIndexAsync(indexSchema);

IIndexStore indexStore = catalog.GetIndexStore("idx_users_name");
IReadOnlyList<IndexSchema> indexes = catalog.GetIndexesForTable("users");

C4. Views and Triggers

// Views
await catalog.CreateViewAsync("active_users", "SELECT * FROM users WHERE age > 18");
string? viewSql = catalog.GetViewSql("active_users");

// Triggers
var trigger = new TriggerSchema
{
    TriggerName = "trg_users_audit",
    TableName = "users",
    Event = TriggerEvent.AfterInsert,
    Body = "INSERT INTO audit_log (table_name, action) VALUES ('users', 'INSERT')",
};
await catalog.CreateTriggerAsync(trigger);

Folder & File Storage

Build a complete folder/file storage system on top of the storage engine using the higher-level Database + Collection<T> API from CSharpDB.Engine.

Domain Models

public record FolderEntry(
    string Name,
    string Path,
    DateTime CreatedAt,
    string? Description = null);

public record FileEntry(
    string Name,
    string FolderPath,
    string Content,
    string ContentType,
    long SizeBytes,
    DateTime CreatedAt,
    DateTime UpdatedAt);

F1. Bootstrap the Storage

await using var db = await Database.OpenAsync("storage.cdb");
var folders = await db.GetCollectionAsync<FolderEntry>("folders");
var files   = await db.GetCollectionAsync<FileEntry>("files");

For long-lived processes, use hybrid lazy-resident mode:

await using var db = await Database.OpenHybridAsync(
    "storage.cdb",
    new DatabaseOptions(),
    new HybridDatabaseOptions
    {
        PersistenceMode = HybridPersistenceMode.IncrementalDurable,
        HotCollectionNames = ["folders", "files"]
    });

F2. Create a Folder

async Task CreateFolderAsync(string path, string? description = null)
{
    var entry = new FolderEntry(
        Name:        Path.GetFileName(path.TrimEnd('/')),
        Path:        path,
        CreatedAt:   DateTime.UtcNow,
        Description: description);
    await folders.PutAsync(path, entry);
}

await CreateFolderAsync("/documents");
await CreateFolderAsync("/documents/reports", description: "Monthly reports");

F3. Create a File

async Task CreateFileAsync(string folderPath, string fileName,
    string content, string contentType = "text/plain")
{
    string key = $"{folderPath}/{fileName}";
    var entry = new FileEntry(
        Name: fileName, FolderPath: folderPath, Content: content,
        ContentType: contentType,
        SizeBytes: System.Text.Encoding.UTF8.GetByteCount(content),
        CreatedAt: DateTime.UtcNow, UpdatedAt: DateTime.UtcNow);
    await files.PutAsync(key, entry);
}

await CreateFileAsync("/documents/reports", "q1.txt", "Q1 earnings: $1.2M");

F4. Read a File

FileEntry? file = await files.GetAsync("/documents/reports/q1.txt");
if (file is not null)
    Console.WriteLine($"{file.Name}: {file.Content}");

F5. List Files in a Folder

await foreach (var kvp in files.FindAsync(f => f.FolderPath == "/documents/reports"))
    Console.WriteLine($"  {kvp.Value.Name}  ({kvp.Value.SizeBytes} bytes)");

F6. Update File Content

PutAsync is an upsert — it replaces the document at the key if it already exists.

FileEntry? existing = await files.GetAsync("/documents/reports/q1.txt");
if (existing is not null)
{
    var updated = existing with
    {
        Content = "Q1 earnings: $1.4M (revised)",
        SizeBytes = System.Text.Encoding.UTF8.GetByteCount("Q1 earnings: $1.4M (revised)"),
        UpdatedAt = DateTime.UtcNow
    };
    await files.PutAsync("/documents/reports/q1.txt", updated);
}

F7. Delete a Folder and Contents

var toDelete = new List<string>();
await foreach (var kvp in files.FindAsync(f =>
    f.FolderPath.StartsWith("/documents/reports", StringComparison.Ordinal)))
    toDelete.Add(kvp.Key);

await db.BeginTransactionAsync();
try
{
    foreach (var key in toDelete) await files.DeleteAsync(key);
    await folders.DeleteAsync("/documents/reports");
    await db.CommitAsync();
}
catch { await db.RollbackAsync(); throw; }

F8. SQL-Based Approach

If you prefer a relational model, create tables with SQL and use ExecuteAsync.

await using var db = await Database.OpenAsync("storage.cdb");

await db.ExecuteAsync(@"
    CREATE TABLE IF NOT EXISTS folders (
        id          INTEGER PRIMARY KEY,
        path        TEXT NOT NULL,
        name        TEXT NOT NULL,
        created_at  TEXT NOT NULL
    )");

await db.ExecuteAsync(@"
    CREATE TABLE IF NOT EXISTS files (
        id           INTEGER PRIMARY KEY,
        folder_path  TEXT NOT NULL,
        name         TEXT NOT NULL,
        content      TEXT NOT NULL,
        content_type TEXT NOT NULL,
        size_bytes   INTEGER NOT NULL,
        created_at   TEXT NOT NULL,
        updated_at   TEXT NOT NULL
    )");

await using var result = await db.ExecuteAsync(
    "SELECT name, size_bytes FROM files WHERE folder_path = '/documents'");
while (await result.MoveNextAsync())
    Console.WriteLine($"{result.Current[0]}  ({result.Current[1]} bytes)");

Key Design Notes

Concern	Detail
No shared file pointer	`RandomAccess` APIs are stateless w.r.t. position, so concurrent reads at different offsets are safe without locking.
Async-first	All I/O is issued via `RandomAccess.ReadAsync` / `WriteAsync`, keeping the storage graph on the platform async file-I/O path.
Zero-fill on short reads	Pages beyond EOF are returned as zeros, matching the uninitialized page convention used by the Pager.
fsync on flush	`FlushAsync` maps to `FlushFileBuffers` (Windows) or `fsync` (Linux/macOS), guaranteeing crash durability.
4 KB page size	All pages are `PageConstants.PageSize` (4096 bytes). Page 0 reserves 100 bytes for the file header.
Single writer, multiple readers	`TransactionCoordinator` enforces a single writer via `SemaphoreSlim`. Readers use WAL snapshots for isolation.
Memory-mapped reads	Optional via `PagerOptions.UseMemoryMappedReads` for clean main-file pages when the storage device supports it.
Sequential scan read-ahead	Speculative next-leaf prefetch during forward scans via `EnableSequentialLeafReadAhead`.
Checkpoint residency	With `PreserveOwnedPagesOnCheckpoint`, owned pages stay resident across checkpoint for the hybrid engine.
B+tree leaf linking	Leaf pages are linked via `RightChildOrNextLeaf` pointers for efficient cursor scans without interior I/O.
Pluggable checkpoint policies	`ICheckpointPolicy` allows frame-count, WAL-size, time-interval, or custom composite triggers.
Schema versioning	`SchemaCatalog.SchemaVersion` increments on every DDL operation, enabling cache invalidation in upper layers.

Storage Engine Tutorial

Architecture Overview

Page Layout

FileStorageDevice

IStorageDevice Interface

Device Scenarios

1. Create a New File

2. Open an Existing File

3. Write Raw Bytes

4. Read Raw Bytes

5. Zero-Fill on Short Reads

6. Pre-allocate / Extend File

7. Flush to Disk (fsync)

8. Writing Fixed-Size Pages (4 KB)

9. Cancellation Support

10. Testability via IStorageDevice

Pager

P1. Create a New Database

P2. Open and Recover

P3. Read and Write Pages

P4. Allocate and Free Pages

P5. Transaction Lifecycle

P6. Snapshot Isolation

P7. Checkpoint Policies

B+Tree

B1. Create a New B+Tree

B2. Insert a Key-Value Pair

B3. Point Lookup

B4. Cache-Only Fast Path

B5. Delete a Key

B6. Forward Cursor Scan

B7. Seek to a Key

B8. Count Entries

Write-Ahead Log (WAL)

W1. Open or Create a WAL

W2. Write Transaction to WAL

W3. Reader Snapshot

W4. Checkpoint

Slotted Page Layout

S1. Initialize a Page

S2. Insert and Read Cells

S3. Delete and Defragment

Indexing

I1. Create an Index Store

I2. Insert and Lookup

I3. Range Scan with Cursor

I4. Add Caching to an Index

Record Serialization

R1. Encode and Decode a Row

R2. Selective Column Projection

R3. Fast Filter Without Materialization

R4. Varint Encoding

Schema Catalog

C1. Initialize the Catalog

C2. Create and Query Tables

C3. Create and Query Indexes

C4. Views and Triggers

Folder & File Storage

Domain Models

F1. Bootstrap the Storage

F2. Create a Folder

F3. Create a File

F4. Read a File

F5. List Files in a Folder

F6. Update File Content

F7. Delete a Folder and Contents

F8. SQL-Based Approach

Key Design Notes

Next Steps

Storage Engine Reference

Architecture

Benchmarks