docs/storage/README.md. For the shorter curated page, see Storage Engine Reference.CSharpDB.Storage
A low-level, high-performance storage engine for .NET 10 built on top of RandomAccess and SafeFileHandle. It provides random-access async I/O, page caching, write-ahead logging (WAL), crash recovery, and the B+tree/index primitives that power the SQL engine and collection API.
For the guided storage tutorial track, including the new advanced standalone examples, start with samples/storage-tutorials/README.md. For the package-oriented storage API overview and tuning presets, see src/CSharpDB.Storage/README.md.
Table of Contents
- Architecture Overview
- FileStorageDevice
- IStorageDevice Interface
- Device Scenarios
- 1. Create a New File
- 2. Open an Existing File
- 3. Write Raw Bytes at an Offset
- 4. Read Raw Bytes from an Offset
- 5. Read Past End of File (Zero-Fill Behavior)
- 6. Pre-allocate / Extend File Length
- 7. Flush to Disk (fsync)
- 8. Check File Length
- 9. Dispose Synchronously
- 10. Dispose Asynchronously
- 11. Cancellation Support
- 12. Injecting via IStorageDevice (Testability)
- 13. Writing Fixed-Size Pages (4 KB)
- 14. Reading Fixed-Size Pages (4 KB)
- 15. Appending Sequential Pages
- Pager
- B+Tree
- Write-Ahead Log (WAL)
- Slotted Page Layout
- Indexing
- Record Serialization
- Schema Catalog
- Folder & File Storage
- Domain Models
- F1. Bootstrap the Storage
- F2. Create a Folder
- F3. Create a File Inside a Folder
- F4. Read a File
- F5. List All Files in a Folder
- F6. List All Folders
- F7. Update File Content
- F8. Delete a File
- F9. Delete a Folder and Its Contents
- F10. Rename or Move a File
- F11. Search Files by Predicate
- F12. Bulk Create with an Explicit Transaction
- F13. SQL-Based Approach
- F14. One Database File per Folder (Multi-Volume)
- Key Design Notes
Architecture Overview
┌──────────────────────────────────────────────────────┐
│ Application │
│ (SQL Engine / Collection API) │
├──────────────────────────────────────────────────────┤
│ SchemaCatalog │
│ Tables ─ Indexes ─ Views ─ Triggers │
├──────────────┬───────────────┬───────────────────────┤
│ BTree │ IndexStore │ RecordEncoder │
│ (data) │ (secondary) │ (row format) │
├──────────────┴───────────────┴───────────────────────┤
│ Pager │
│ PageCache ─ DirtyTracking ─ PageAllocator │
├──────────────────────┬───────────────────────────────┤
│ WriteAheadLog │ CheckpointCoordinator │
│ (WAL + WalIndex) │ (policy-driven) │
├──────────────────────┴───────────────────────────────┤
│ IStorageDevice │
│ (FileStorageDevice / memory) │
└──────────────────────────────────────────────────────┘
Current storage builds can also enable a few newer read-path behaviors on top of this core layout:
- Memory-mapped reads for clean main-file pages when the storage device supports it
- Speculative B+tree leaf read-ahead during sequential forward scans
- Checkpoint residency preservation so already-owned main-file pages stay hot across checkpoint in lazy-resident hybrid engine mode
Page Layout:
Page 0 (File Header):
[Magic: 4 bytes "CSDB"]
[FormatVersion: 4 bytes]
[PageSize: 4 bytes = 4096]
[PageCount: 4 bytes]
[SchemaRootPage: 4 bytes]
[FreelistHead: 4 bytes]
[ChangeCounter: 4 bytes]
[... reserved to 100 bytes ...]
[Slotted page content: 3996 bytes]
Pages 1+:
[SlottedPage: 4096 bytes]
[Header: 9 bytes]
PageType (1) ─ CellCount (2) ─ CellContentStart (2) ─ RightChild/NextLeaf (4)
[CellPointers: 2 bytes each]
[Free space]
[Cells: growing backward from page end]
FileStorageDevice
FileStorageDevice wraps a SafeFileHandle opened with FileOptions.Asynchronous | FileOptions.RandomAccess, giving you:
- True async I/O via
RandomAccess.ReadAsync/RandomAccess.WriteAsync - Position-independent reads and writes -- no seek, no shared file pointer
- Concurrent reads from other processes (
FileShare.Read) - Direct fsync to durable storage via
RandomAccess.FlushToDisk
public FileStorageDevice(string filePath, bool createNew = false)
| Parameter | Description |
|---|---|
filePath |
Path to the database file. |
createNew |
true -> FileMode.CreateNew (fails if file exists). false -> OpenOrCreate. |
IStorageDevice Interface
All storage operations go through IStorageDevice, making it easy to swap implementations (e.g., for in-memory testing).
public interface IStorageDevice : IAsyncDisposable, IDisposable
{
long Length { get; }
ValueTask<int> ReadAsync(long offset, Memory<byte> buffer, CancellationToken ct = default);
ValueTask WriteAsync(long offset, ReadOnlyMemory<byte> buffer, CancellationToken ct = default);
ValueTask FlushAsync(CancellationToken ct = default);
ValueTask SetLengthAsync(long length, CancellationToken ct = default);
}
Device Scenarios
1. Create a New File
Creates the file, failing if it already exists.
await using var device = new FileStorageDevice("mydb.cdb", createNew: true);
Console.WriteLine($"File created. Length: {device.Length}"); // 0
2. Open an Existing File
Opens the file if it exists, or creates it if it does not.
await using var device = new FileStorageDevice("mydb.cdb");
Console.WriteLine($"Opened. Length: {device.Length}");
3. Write Raw Bytes at an Offset
Writes are position-independent. Multiple writes at different offsets can be issued concurrently without locking.
await using var device = new FileStorageDevice("mydb.cdb");
byte[] payload = "Hello, CSharpDB!"u8.ToArray();
await device.WriteAsync(offset: 0, payload);
4. Read Raw Bytes from an Offset
ReadAsync loops internally until the buffer is fully filled or EOF is reached.
await using var device = new FileStorageDevice("mydb.cdb");
var buffer = new byte[16];
int bytesRead = await device.ReadAsync(offset: 0, buffer);
Console.WriteLine($"Read {bytesRead} byte(s): {System.Text.Encoding.UTF8.GetString(buffer, 0, bytesRead)}");
5. Read Past End of File (Zero-Fill Behavior)
If you read a range that extends beyond the current file length, ReadAsync zero-fills the remainder of the buffer and returns the number of bytes that were actually on disk. This is useful for treating uninitialized pages as zeroed memory.
await using var device = new FileStorageDevice("mydb.cdb");
// File is currently 16 bytes; request 4096 bytes
var buffer = new byte[4096];
int bytesRead = await device.ReadAsync(offset: 0, buffer);
Console.WriteLine($"Bytes on disk: {bytesRead}"); // 16
Console.WriteLine($"Remainder is zeros: {buffer[16] == 0}"); // true
6. Pre-allocate / Extend File Length
Pre-allocating avoids fragmentation on spinning disks and is required before writing pages beyond the current end-of-file on some file systems.
await using var device = new FileStorageDevice("mydb.cdb", createNew: true);
const int PageSize = 4096;
const int InitialPages = 8;
await device.SetLengthAsync(PageSize * InitialPages);
Console.WriteLine($"Pre-allocated: {device.Length} bytes"); // 32768
7. Flush to Disk (fsync)
FlushAsync calls RandomAccess.FlushToDisk, which issues a full fsync/FlushFileBuffers. Call this after committing a transaction to guarantee durability.
await using var device = new FileStorageDevice("mydb.cdb");
byte[] data = new byte[4096];
await device.WriteAsync(offset: 0, data);
await device.FlushAsync(); // durable on disk after this returns
8. Check File Length
The Length property reads the current file size directly from the OS without seeking.
await using var device = new FileStorageDevice("mydb.cdb");
long pages = device.Length / 4096;
Console.WriteLine($"Database has {pages} page(s).");
9. Dispose Synchronously
FileStorageDevice implements IDisposable for non-async contexts such as unit tests or top-level using statements.
using var device = new FileStorageDevice("mydb.cdb");
// ... operations ...
// Disposed when leaving the scope.
10. Dispose Asynchronously
Prefer await using in async code paths to align with the async I/O model.
await using var device = new FileStorageDevice("mydb.cdb");
// ... operations ...
// DisposeAsync called automatically.
11. Cancellation Support
All async methods accept a CancellationToken. Pass one to abort long reads or writes cleanly.
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(5));
await using var device = new FileStorageDevice("mydb.cdb");
var buffer = new byte[4096];
try
{
await device.ReadAsync(offset: 0, buffer, cts.Token);
}
catch (OperationCanceledException)
{
Console.WriteLine("Read was cancelled.");
}
12. Injecting via IStorageDevice (Testability)
Program against IStorageDevice so you can substitute a fake or in-memory implementation in tests without touching the file system.
// Production wiring
IStorageDevice device = new FileStorageDevice("mydb.cdb");
var pager = await Pager.CreateAsync(device, wal, walIndex);
// In a unit test -- swap in your own IStorageDevice implementation
public sealed class MemoryStorageDevice : IStorageDevice
{
private byte[] _data = [];
public long Length => _data.Length;
public ValueTask<int> ReadAsync(long offset, Memory<byte> buffer, CancellationToken ct = default)
{
int available = (int)Math.Max(0, _data.Length - offset);
int toCopy = Math.Min(buffer.Length, available);
_data.AsMemory((int)offset, toCopy).CopyTo(buffer);
buffer[toCopy..].Span.Clear();
return ValueTask.FromResult(toCopy);
}
public ValueTask WriteAsync(long offset, ReadOnlyMemory<byte> buffer, CancellationToken ct = default)
{
long needed = offset + buffer.Length;
if (needed > _data.Length)
Array.Resize(ref _data, (int)needed);
buffer.CopyTo(_data.AsMemory((int)offset));
return ValueTask.CompletedTask;
}
public ValueTask FlushAsync(CancellationToken ct = default) => ValueTask.CompletedTask;
public ValueTask SetLengthAsync(long length, CancellationToken ct = default)
{
Array.Resize(ref _data, (int)length);
return ValueTask.CompletedTask;
}
public ValueTask DisposeAsync() => ValueTask.CompletedTask;
public void Dispose() { }
}
13. Writing Fixed-Size Pages (4 KB)
The storage engine works in 4 096-byte pages (see PageConstants.PageSize). Write a page at a computed offset.
await using var device = new FileStorageDevice("mydb.cdb", createNew: true);
const int PageSize = 4096;
// Pre-allocate space for 4 pages
await device.SetLengthAsync(PageSize * 4);
// Build a page payload
byte[] page = new byte[PageSize];
System.Text.Encoding.UTF8.GetBytes("page-0 data").CopyTo(page.AsSpan());
// Write page 0 at offset 0, page 1 at offset 4096, etc.
uint pageId = 0;
await device.WriteAsync(offset: (long)pageId * PageSize, page);
14. Reading Fixed-Size Pages (4 KB)
Read a page back by computing its byte offset from its page number.
await using var device = new FileStorageDevice("mydb.cdb");
const int PageSize = 4096;
uint pageId = 0;
byte[] page = new byte[PageSize];
int read = await device.ReadAsync(offset: (long)pageId * PageSize, page);
Console.WriteLine($"Page {pageId}: read={read}, first bytes={System.Text.Encoding.UTF8.GetString(page, 0, 10)}");
15. Appending Sequential Pages
Grow the file by one page at a time and write content into each new page.
await using var device = new FileStorageDevice("mydb.cdb", createNew: true);
const int PageSize = 4096;
int pagesToAppend = 3;
for (int i = 0; i < pagesToAppend; i++)
{
long newLength = device.Length + PageSize;
await device.SetLengthAsync(newLength);
byte[] page = new byte[PageSize];
BitConverter.TryWriteBytes(page, i); // store page index in first 4 bytes
await device.WriteAsync(offset: newLength - PageSize, page);
}
Console.WriteLine($"Final file size: {device.Length} bytes"); // 12288
await device.FlushAsync();
Pager
The Pager sits between the B+tree layer and the storage device. It owns the page cache, tracks dirty pages, coordinates transactions, manages WAL integration, and drives checkpointing.
public static async ValueTask<Pager> CreateAsync(
IStorageDevice device,
IWriteAheadLog wal,
WalIndex walIndex,
PagerOptions? options = null,
CancellationToken ct = default)
P1. Create a New Database
await using var device = new FileStorageDevice("mydb.cdb", createNew: true);
var walIndex = new WalIndex();
await using var wal = new WriteAheadLog("mydb.cdb", walIndex);
await wal.OpenAsync(currentDbPageCount: 0);
var pager = await Pager.CreateAsync(device, wal, walIndex);
await pager.InitializeNewDatabaseAsync(); // writes file header (page 0)
Console.WriteLine($"Pages: {pager.PageCount}"); // 1 (the file header page)
P2. Open and Recover an Existing Database
On startup, call RecoverAsync to redo any committed WAL frames that were not yet checkpointed.
await using var device = new FileStorageDevice("mydb.cdb");
var walIndex = new WalIndex();
await using var wal = new WriteAheadLog("mydb.cdb", walIndex);
var pager = await Pager.CreateAsync(device, wal, walIndex);
await pager.RecoverAsync(); // replays committed WAL frames
P3. Read and Write Pages
// Read a page (checks cache -> WAL -> device)
byte[] page = await pager.GetPageAsync(pageId: 1);
// Modify the page buffer in-place, then mark dirty
page[0] = 0xFF;
await pager.MarkDirtyAsync(pageId: 1); // tracked for WAL write on commit
P4. Allocate and Free Pages
// Allocate a new page (extends file or reuses from freelist)
uint newPageId = await pager.AllocatePageAsync();
// Free a page (adds to freelist for reuse)
await pager.FreePageAsync(newPageId);
P5. Transaction Lifecycle
Single writer per database. Reads do not require transactions.
await pager.BeginTransactionAsync();
try
{
// ... modify pages via B+tree ...
await pager.CommitAsync(); // writes dirty pages to WAL, fsync
}
catch
{
await pager.RollbackAsync(); // discards uncommitted WAL frames
throw;
}
P6. Snapshot Isolation (Concurrent Readers)
Multiple readers can run concurrently with a single writer. Each reader sees a consistent snapshot.
// Writer thread: acquire snapshot for a reader
WalSnapshot snapshot = pager.AcquireReaderSnapshot();
// Reader thread: create a snapshot pager that sees only committed data
Pager snapshotPager = pager.CreateSnapshotReader(snapshot);
byte[] page = await snapshotPager.GetPageAsync(pageId: 1);
// When reader is done:
pager.ReleaseReaderSnapshot();
P7. Manual Checkpoint
Copy all committed WAL frames to the main database file, then reset the WAL. This invalidates transient WAL-backed reads, and can also invalidate transient memory-mapped views; with PreserveOwnedPagesOnCheckpoint = true, already-owned main-file pages can remain resident in the shared cache.
await pager.CheckpointAsync();
P8. Configure Checkpoint Policy
var options = new PagerOptions
{
CheckpointPolicy = new AnyCheckpointPolicy(
new FrameCountCheckpointPolicy(threshold: 500),
new TimeIntervalCheckpointPolicy(TimeSpan.FromMinutes(5))
),
AutoCheckpointExecutionMode = AutoCheckpointExecutionMode.Background,
AutoCheckpointMaxPagesPerStep = 64
};
var pager = await Pager.CreateAsync(device, wal, walIndex, options);
// Auto-checkpoint triggers after 500 frames OR 5 minutes, whichever comes first.
// In background mode, the checkpoint runs after commit in smaller slices.
Built-in policies:
| Policy | Triggers When |
|---|---|
FrameCountCheckpointPolicy(n) |
Committed frame count exceeds n |
WalSizeCheckpointPolicy(bytes) |
Estimated WAL size exceeds bytes |
TimeIntervalCheckpointPolicy(span) |
Elapsed time since last checkpoint exceeds span |
AnyCheckpointPolicy(...) |
Any sub-policy triggers |
B+Tree
B+tree keyed by signed 64-bit long keys. Leaf pages store (key, payload) pairs; interior pages store routing keys and child pointers. Supports forward-only cursor iteration, cache-only fast paths, and page-level rebalance/merge on delete.
B1. Create a New B+Tree
// Allocates a root page and returns its ID
uint rootPageId = await BTree.CreateNewAsync(pager);
// Open the tree
var tree = new BTree(pager, rootPageId);
B2. Insert a Key-Value Pair
Payload is raw bytes -- the B+tree has no opinion on format.
byte[] payload = System.Text.Encoding.UTF8.GetBytes("Hello, B+tree!");
await tree.InsertAsync(key: 42, payload);
If the leaf page is full, the tree automatically splits the leaf and propagates the split key up to interior pages.
B3. Point Lookup
byte[]? result = await tree.FindAsync(key: 42);
if (result is not null)
Console.WriteLine(System.Text.Encoding.UTF8.GetString(result));
B4. Cache-Only Fast Path
Avoids async I/O when all required pages are already cached.
if (tree.TryFindCached(key: 42, out byte[]? payload))
{
// Hit: payload is definitive (null = not found, non-null = value)
Console.WriteLine($"Cache hit: {payload is not null}");
}
else
{
// Miss: need to call FindAsync for full traversal
payload = await tree.FindAsync(key: 42);
}
B5. Delete a Key
bool deleted = await tree.DeleteAsync(key: 42);
Console.WriteLine(deleted ? "Deleted." : "Key not found.");
B6. Forward Cursor Scan
Iterate all entries in key order. The cursor follows leaf-to-leaf next pointers (no interior page I/O after the first leaf).
var cursor = tree.CreateCursor();
while (await cursor.MoveNextAsync())
{
long key = cursor.CurrentKey;
ReadOnlyMemory<byte> value = cursor.CurrentValue;
Console.WriteLine($"Key={key}, PayloadSize={value.Length}");
}
B7. Seek to a Specific Key
Position the cursor at the first key >= target, then iterate forward.
var cursor = tree.CreateCursor();
if (await cursor.SeekAsync(targetKey: 100))
{
do
{
Console.WriteLine($"Key={cursor.CurrentKey}");
} while (await cursor.MoveNextAsync());
}
B8. Count Entries
Walks all leaf pages and sums cell counts.
long count = await tree.CountEntriesAsync();
Console.WriteLine($"Tree contains {count} entries.");
Write-Ahead Log (WAL)
Redo-style WAL for crash recovery and concurrent snapshot-isolated readers. Each commit writes dirty pages as frames to the WAL file. On checkpoint, committed frames are copied to the main database file.
WAL File Format:
[WAL Header: 32 bytes]
Magic ─ Version ─ PageSize ─ Checksum salt
[Frame 0: 4120 bytes]
[FrameHeader: 24 bytes] ─ PageId ─ DbPageCount ─ Checksum
[PageData: 4096 bytes]
[Frame 1: 4120 bytes]
...
W1. Open or Create a WAL
var walIndex = new WalIndex();
await using var wal = new WriteAheadLog("mydb.cdb", walIndex);
await wal.OpenAsync(currentDbPageCount: pager.PageCount);
// WAL file: "mydb.cdb.wal"
W2. Write Transaction to WAL
wal.BeginTransaction();
// Append modified pages as frames
await wal.AppendFrameAsync(pageId: 1, pageData);
await wal.AppendFrameAsync(pageId: 5, pageData);
// Commit: the last frame gets dbPageCount > 0, marking the commit boundary
await wal.CommitAsync(newDbPageCount: pager.PageCount);
// Or rollback: truncates uncommitted frames
await wal.RollbackAsync();
W3. Take a Reader Snapshot
Freeze the WAL index at a point in time for a concurrent reader.
WalSnapshot snapshot = walIndex.TakeSnapshot();
// Reader uses snapshot to resolve page lookups:
if (snapshot.TryGet(pageId: 1, out long walOffset))
{
byte[] page = await wal.ReadPageAsync(walOffset);
// page is the committed version at snapshot time
}
W4. Checkpoint WAL to Database File
Copy all committed WAL pages to the main database file, then reset the WAL.
await wal.CheckpointAsync(device, pageCount: pager.PageCount);
walIndex.Reset();
// WAL is now empty; all data is in the main file
Slotted Page Layout
SlottedPage is a struct that overlays a byte[4096] buffer, providing structured access to variable-size cells within a fixed-size page.
[Header: 9 bytes]
PageType (1) ─ CellCount (2) ─ CellContentStart (2) ─ RightChild/NextLeaf (4)
[Cell Pointers: 2 bytes each, growing forward]
Offset to cell data for each cell
[Free Space]
[Cell Data: growing backward from page end]
Variable-size cells packed from the end
S1. Initialize a Page
byte[] buffer = new byte[4096];
var sp = new SlottedPage(buffer, pageId: 1);
sp.Initialize(PageConstants.PageTypeLeaf);
Console.WriteLine($"Type: {sp.PageType}"); // Leaf
Console.WriteLine($"Cells: {sp.CellCount}"); // 0
Console.WriteLine($"Free: {sp.FreeSpace} bytes"); // ~4085
S2. Insert and Read Cells
byte[] cellData = new byte[] { 0x01, 0x02, 0x03, 0x04 };
bool inserted = sp.InsertCell(index: 0, cellData);
Console.WriteLine($"Inserted: {inserted}"); // true
Span<byte> cell = sp.GetCell(index: 0);
Console.WriteLine($"Cell[0] length: {cell.Length}"); // 4
S3. Delete a Cell and Defragment
sp.DeleteCell(index: 0);
Console.WriteLine($"Cells after delete: {sp.CellCount}"); // 0
// After many inserts/deletes, free space may be fragmented
sp.Defragment(); // rewrites cells contiguously at end of page
Indexing
Secondary B+tree-backed indexes with optional caching and ordered range scan support.
public interface IIndexStore
{
uint RootPageId { get; }
ValueTask<byte[]?> FindAsync(long key, CancellationToken ct = default);
ValueTask InsertAsync(long key, ReadOnlyMemory<byte> payload, CancellationToken ct = default);
ValueTask<bool> DeleteAsync(long key, CancellationToken ct = default);
IIndexCursor CreateCursor(IndexScanRange range);
}
I1. Create an Index Store
uint indexRootPage = await BTree.CreateNewAsync(pager);
var indexTree = new BTree(pager, indexRootPage);
IIndexStore index = new BTreeIndexStore(indexTree);
I2. Insert and Lookup Index Entries
Index payload typically contains the rowid(s) of matching rows.
// Insert: key = hashed column value, payload = rowid (8 bytes)
byte[] rowIdPayload = BitConverter.GetBytes(42L);
await index.InsertAsync(key: hashOfColumnValue, rowIdPayload);
// Lookup
byte[]? result = await index.FindAsync(key: hashOfColumnValue);
if (result is not null)
{
long rowId = BitConverter.ToInt64(result);
Console.WriteLine($"Found rowid: {rowId}");
}
I3. Range Scan with Cursor
var range = new IndexScanRange(
LowerBound: 100, LowerInclusive: true,
UpperBound: 200, UpperInclusive: false);
var cursor = index.CreateCursor(range);
while (await cursor.MoveNextAsync())
{
Console.WriteLine($"IndexKey={cursor.CurrentKey}");
}
// Full scan:
var fullCursor = index.CreateCursor(IndexScanRange.All);
// Point lookup as cursor:
var pointCursor = index.CreateCursor(IndexScanRange.At(42));
I4. Add Caching to an Index
Wrap any IIndexStore with an LRU cache for repeated lookups.
IIndexStore cached = new CachingIndexStore(
inner: new BTreeIndexStore(indexTree),
capacity: 2048);
// Lookups check cache first; inserts/deletes update cache
byte[]? result = await cached.FindAsync(key: 42);
Record Serialization
Compact binary encoding for database rows. Supports selective column projection and fast filter evaluation without materializing managed strings.
Binary Format:
[columnCount: varint]
[col0_typeTag: 1 byte] [col0_data: ...]
[col1_typeTag: 1 byte] [col1_data: ...]
...
Type Tags:
Null (0x00) -> no data
Integer (0x01) -> 8 bytes, little-endian long
Text (0x02) -> [length: varint] [UTF-8 bytes]
Real (0x03) -> 8 bytes, little-endian double (IEEE 754)
Blob (0x04) -> [length: varint] [raw bytes]
R1. Encode and Decode a Row
var values = new DbValue[]
{
DbValue.FromInteger(1),
DbValue.FromText("Alice"),
DbValue.FromInteger(30)
};
byte[] encoded = RecordEncoder.Encode(values);
DbValue[] decoded = RecordEncoder.Decode(encoded);
Console.WriteLine($"Id={decoded[0].AsInteger}, Name={decoded[1].AsText}, Age={decoded[2].AsInteger}");
R2. Selective Column Projection
Decode only the columns you need -- avoids materializing unused fields.
// Decode only columns 0 and 1 (skip column 2)
DbValue[] partial = RecordEncoder.DecodeUpTo(encoded, maxColumnIndexInclusive: 1);
// Decode a single column by index
DbValue age = RecordEncoder.DecodeColumn(encoded, columnIndex: 2);
R3. Fast Filter Without Materialization
Evaluate filters on encoded rows without allocating managed strings.
// Check if column 1 equals "Alice" without creating a string
byte[] expectedUtf8 = "Alice"u8.ToArray();
if (RecordEncoder.TryColumnTextEquals(encoded, columnIndex: 1, expectedUtf8, out bool equals))
{
Console.WriteLine($"Column 1 is Alice: {equals}");
}
// Check numeric column for comparison
if (RecordEncoder.TryDecodeNumericColumn(encoded, columnIndex: 2,
out long intValue, out double realValue, out bool isReal))
{
Console.WriteLine($"Age: {intValue}");
}
// Check for null
bool isNull = RecordEncoder.IsColumnNull(encoded, columnIndex: 0);
R4. Varint Encoding
Variable-length unsigned integer encoding (LEB128-style). Small values encode in 1 byte; up to 64-bit values supported.
Span<byte> buffer = stackalloc byte[10];
int bytesWritten = Varint.Write(buffer, 300UL);
ulong value = Varint.Read(buffer, out int bytesRead);
Console.WriteLine($"Value: {value}, Bytes: {bytesRead}"); // 300, 2
int predictedSize = Varint.SizeOf(300UL); // 2
Schema Catalog
B+tree-backed metadata store for tables, indexes, views, and triggers. Provides in-memory caching with a schema version counter for cache invalidation.
C1. Initialize the Catalog
var catalog = await SchemaCatalog.CreateAsync(pager);
Console.WriteLine($"Schema version: {catalog.SchemaVersion}");
C2. Create and Query Tables
// Create a table
var schema = new TableSchema
{
TableName = "users",
Columns = new[]
{
new ColumnDefinition { Name = "id", Type = DbType.Integer, IsPrimaryKey = true },
new ColumnDefinition { Name = "name", Type = DbType.Text },
new ColumnDefinition { Name = "age", Type = DbType.Integer },
}
};
await catalog.CreateTableAsync(schema);
// Query table metadata
TableSchema? users = catalog.GetTable("users");
uint rootPage = catalog.GetTableRootPage("users");
BTree tableTree = catalog.GetTableTree("users");
// List all tables
IReadOnlyCollection<string> tableNames = catalog.GetTableNames();
// For snapshot readers
BTree snapshotTree = catalog.GetTableTree("users", snapshotPager);
C3. Create and Query Indexes
var indexSchema = new IndexSchema
{
IndexName = "idx_users_name",
TableName = "users",
Columns = new[] { "name" },
IsUnique = false,
};
await catalog.CreateIndexAsync(indexSchema);
// Get index store
IIndexStore indexStore = catalog.GetIndexStore("idx_users_name");
// List indexes for a table
IReadOnlyList<IndexSchema> indexes = catalog.GetIndexesForTable("users");
// For snapshot readers
IIndexStore snapshotIndex = catalog.GetIndexStore("idx_users_name", snapshotPager);
C4. Views and Triggers
// Views
await catalog.CreateViewAsync("active_users", "SELECT * FROM users WHERE age > 18");
string? viewSql = catalog.GetViewSql("active_users");
bool isView = catalog.IsView("active_users");
// Triggers
var trigger = new TriggerSchema
{
TriggerName = "trg_users_audit",
TableName = "users",
Event = TriggerEvent.AfterInsert,
Body = "INSERT INTO audit_log (table_name, action) VALUES ('users', 'INSERT')",
};
await catalog.CreateTriggerAsync(trigger);
IReadOnlyList<TriggerSchema> triggers = catalog.GetTriggersForTable("users");
C5. Persist Root Page Changes
After B+tree operations that split the root page, persist the new root page ID in the catalog.
// Persist for a single table + its indexes (fast)
await catalog.PersistRootPageChangesAsync("users");
// Persist for all tables and indexes (slower, used during batch operations)
await catalog.PersistAllRootPageChangesAsync();
Folder & File Storage
FileStorageDevice is a raw byte device. To build a folder/file storage system on top of it, use the higher-level Database + Collection<T> API from CSharpDB.Engine. A single .cdb file (backed by one FileStorageDevice) holds all folders and files as typed collection documents in B+tree-backed collections.
Add the Engine reference to your project if it is not already there:
<ProjectReference Include="..\CSharpDB.Engine\CSharpDB.Engine.csproj" />
Domain Models
Define records that represent a folder and a file entry. These are serialized as JSON by Collection<T>.
public record FolderEntry(
string Name,
string Path, // e.g. "/documents/reports"
DateTime CreatedAt,
string? Description = null);
public record FileEntry(
string Name,
string FolderPath, // parent folder path
string Content, // UTF-8 text content (or Base64 for binary)
string ContentType, // e.g. "text/plain", "application/json"
long SizeBytes,
DateTime CreatedAt,
DateTime UpdatedAt);
Keys follow a path convention:
- Folders ->
"/documents/reports" - Files ->
"/documents/reports/summary.txt"
F1. Bootstrap the Storage
Open (or create) a single .cdb file and obtain the two collections.
await using var db = await Database.OpenAsync("storage.cdb");
var folders = await db.GetCollectionAsync<FolderEntry>("folders");
var files = await db.GetCollectionAsync<FileEntry>("files");
Everything -- the B+tree pages, the WAL, and the page cache -- is managed by the single FileStorageDevice that Database.OpenAsync creates internally.
For a long-lived process that should keep touched pages hot while remaining durable on disk, use hybrid lazy-resident mode instead:
await using var db = await Database.OpenHybridAsync(
"storage.cdb",
new DatabaseOptions(),
new HybridDatabaseOptions
{
PersistenceMode = HybridPersistenceMode.IncrementalDurable,
HotCollectionNames = ["folders", "files"]
});
That opens from disk lazily, keeps touched pages resident by cache policy, and can preload selected hot collections into the hybrid pager cache at startup.
F2. Create a Folder
Use the folder's path as the collection key so point lookups stay efficient; collection keys are hashed internally before probing the backing trees.
async Task CreateFolderAsync(string path, string? description = null)
{
var entry = new FolderEntry(
Name: Path.GetFileName(path.TrimEnd('/')),
Path: path,
CreatedAt: DateTime.UtcNow,
Description: description);
await folders.PutAsync(path, entry);
}
await CreateFolderAsync("/documents");
await CreateFolderAsync("/documents/reports", description: "Monthly reports");
await CreateFolderAsync("/images");
F3. Create a File Inside a Folder
The key is the full file path, which guarantees uniqueness across all folders.
async Task CreateFileAsync(string folderPath, string fileName, string content, string contentType = "text/plain")
{
string key = $"{folderPath}/{fileName}";
var entry = new FileEntry(
Name: fileName,
FolderPath: folderPath,
Content: content,
ContentType: contentType,
SizeBytes: System.Text.Encoding.UTF8.GetByteCount(content),
CreatedAt: DateTime.UtcNow,
UpdatedAt: DateTime.UtcNow);
await files.PutAsync(key, entry);
}
await CreateFileAsync("/documents/reports", "q1.txt", "Q1 earnings: $1.2M");
await CreateFileAsync("/documents/reports", "q2.txt", "Q2 earnings: $1.5M");
await CreateFileAsync("/documents", "notes.md", "# Notes\nTodo list...", "text/markdown");
await CreateFileAsync("/images", "logo.svg", "<svg>...</svg>", "image/svg+xml");
F4. Read a File
Retrieve a file by its full path key.
FileEntry? file = await files.GetAsync("/documents/reports/q1.txt");
if (file is not null)
{
Console.WriteLine($"Name: {file.Name}");
Console.WriteLine($"Type: {file.ContentType}");
Console.WriteLine($"Size: {file.SizeBytes} bytes");
Console.WriteLine($"Content: {file.Content}");
}
else
{
Console.WriteLine("File not found.");
}
F5. List All Files in a Folder
Collection<T>.FindAsync performs a full scan with an in-memory predicate -- suitable for small-to-medium collections.
string targetFolder = "/documents/reports";
await foreach (var kvp in files.FindAsync(f => f.FolderPath == targetFolder))
{
Console.WriteLine($" {kvp.Value.Name} ({kvp.Value.SizeBytes} bytes) [{kvp.Value.UpdatedAt:u}]");
}
F6. List All Folders
await foreach (var kvp in folders.ScanAsync())
{
var f = kvp.Value;
Console.WriteLine($"{f.Path,-40} created {f.CreatedAt:u}");
}
F7. Update File Content
PutAsync is an upsert -- it replaces the document at the key if it already exists.
async Task UpdateFileAsync(string filePath, string newContent)
{
FileEntry? existing = await files.GetAsync(filePath);
if (existing is null) throw new FileNotFoundException($"File not found: {filePath}");
var updated = existing with
{
Content = newContent,
SizeBytes = System.Text.Encoding.UTF8.GetByteCount(newContent),
UpdatedAt = DateTime.UtcNow
};
await files.PutAsync(filePath, updated);
}
await UpdateFileAsync("/documents/reports/q1.txt", "Q1 earnings: $1.4M (revised)");
F8. Delete a File
bool deleted = await files.DeleteAsync("/documents/reports/q2.txt");
Console.WriteLine(deleted ? "File deleted." : "File not found.");
F9. Delete a Folder and Its Contents
There is no cascading delete built in, so collect the child keys first, then delete in a single transaction.
async Task DeleteFolderAsync(string folderPath)
{
// Collect all file keys under this folder
var toDelete = new List<string>();
await foreach (var kvp in files.FindAsync(f => f.FolderPath.StartsWith(folderPath, StringComparison.Ordinal)))
toDelete.Add(kvp.Key);
await db.BeginTransactionAsync();
try
{
foreach (var key in toDelete)
await files.DeleteAsync(key);
await folders.DeleteAsync(folderPath);
await db.CommitAsync();
}
catch
{
await db.RollbackAsync();
throw;
}
}
await DeleteFolderAsync("/documents/reports");
F10. Rename or Move a File
CSharpDB does not have a rename primitive; copy the document to the new key and delete the old one inside a transaction.
async Task MoveFileAsync(string sourcePath, string destinationPath)
{
FileEntry? source = await files.GetAsync(sourcePath);
if (source is null) throw new FileNotFoundException($"Source not found: {sourcePath}");
string newFolder = Path.GetDirectoryName(destinationPath)!.Replace('\\', '/');
string newFileName = Path.GetFileName(destinationPath);
var moved = source with
{
Name = newFileName,
FolderPath = newFolder,
UpdatedAt = DateTime.UtcNow
};
await db.BeginTransactionAsync();
try
{
await files.PutAsync(destinationPath, moved);
await files.DeleteAsync(sourcePath);
await db.CommitAsync();
}
catch
{
await db.RollbackAsync();
throw;
}
}
await MoveFileAsync("/documents/notes.md", "/documents/reports/notes.md");
F11. Search Files by Predicate
Find all Markdown files larger than 100 bytes modified after a given date.
DateTime since = new DateTime(2025, 1, 1, 0, 0, 0, DateTimeKind.Utc);
await foreach (var kvp in files.FindAsync(f =>
f.ContentType == "text/markdown" &&
f.SizeBytes > 100 &&
f.UpdatedAt > since))
{
Console.WriteLine($"{kvp.Key} ({kvp.Value.SizeBytes} bytes)");
}
F12. Bulk Create with an Explicit Transaction
Wrap multiple writes in a single transaction so they all succeed or all roll back together.
string[] reportNames = ["jan.txt", "feb.txt", "mar.txt", "apr.txt"];
await db.BeginTransactionAsync();
try
{
await CreateFolderAsync("/archive/2025");
foreach (var name in reportNames)
await CreateFileAsync("/archive/2025", name, $"Report: {name}");
await db.CommitAsync();
Console.WriteLine($"Committed {reportNames.Length} files in one transaction.");
}
catch
{
await db.RollbackAsync();
throw;
}
F13. SQL-Based Approach
If you prefer a relational model, create folders and files tables with SQL and use ExecuteAsync.
await using var db = await Database.OpenAsync("storage.cdb");
// Create schema
await db.ExecuteAsync("""
CREATE TABLE IF NOT EXISTS folders (
id INTEGER PRIMARY KEY,
path TEXT NOT NULL,
name TEXT NOT NULL,
description TEXT,
created_at TEXT NOT NULL
)
""");
await db.ExecuteAsync("""
CREATE TABLE IF NOT EXISTS files (
id INTEGER PRIMARY KEY,
folder_path TEXT NOT NULL,
name TEXT NOT NULL,
content TEXT NOT NULL,
content_type TEXT NOT NULL,
size_bytes INTEGER NOT NULL,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
)
""");
// Insert a folder
await db.ExecuteAsync("""
INSERT INTO folders (path, name, created_at)
VALUES ('/documents', 'documents', '2025-01-01T00:00:00Z')
""");
// Insert a file
await db.ExecuteAsync("""
INSERT INTO files (folder_path, name, content, content_type, size_bytes, created_at, updated_at)
VALUES ('/documents', 'readme.txt', 'Hello world', 'text/plain', 11,
'2025-01-01T00:00:00Z', '2025-01-01T00:00:00Z')
""");
// Query files in a folder
var result = await db.ExecuteAsync("SELECT name, size_bytes FROM files WHERE folder_path = '/documents'");
foreach (var row in result.Rows)
Console.WriteLine($"{row[0]} ({row[1]} bytes)");
F14. One Database File per Folder (Multi-Volume)
Map each top-level folder to its own .cdb file. Each file gets its own FileStorageDevice instance, giving you independent WAL, checkpoint, and locking per folder.
// Each folder is a separate database file
var volumes = new Dictionary<string, Database>(StringComparer.Ordinal);
async ValueTask<Database> GetVolumeAsync(string folderName)
{
if (!volumes.TryGetValue(folderName, out var db))
{
db = await Database.OpenAsync($"{folderName}.cdb");
volumes[folderName] = db;
}
return db;
}
// Write to the "documents" volume
var docsDb = await GetVolumeAsync("documents");
var docsFiles = await docsDb.GetCollectionAsync<FileEntry>("files");
await docsFiles.PutAsync("readme.txt", new FileEntry(
Name: "readme.txt",
FolderPath: "/",
Content: "Welcome to the documents volume.",
ContentType: "text/plain",
SizeBytes: 32,
CreatedAt: DateTime.UtcNow,
UpdatedAt: DateTime.UtcNow));
// Write to the "images" volume
var imagesDb = await GetVolumeAsync("images");
var imagesFiles = await imagesDb.GetCollectionAsync<FileEntry>("files");
await imagesFiles.PutAsync("logo.svg", new FileEntry(
Name: "logo.svg",
FolderPath: "/",
Content: "<svg>...</svg>",
ContentType: "image/svg+xml",
SizeBytes: 14,
CreatedAt: DateTime.UtcNow,
UpdatedAt: DateTime.UtcNow));
// Dispose all volumes on shutdown
foreach (var (_, volume) in volumes)
await volume.DisposeAsync();
When to use multi-volume: large datasets where you want per-folder backup, different checkpoint intervals, or parallel writes to disjoint folders. For most use-cases a single
.cdbfile is simpler.
Key Design Notes
| Concern | Detail |
|---|---|
| No shared file pointer | RandomAccess APIs are stateless w.r.t. position, so concurrent reads at different offsets are safe without locking. |
| Async-first | All I/O is issued via RandomAccess.ReadAsync / WriteAsync, keeping the storage graph on the platform async file-I/O path. |
| Zero-fill on short reads | ReadAsync always fills the entire buffer. Pages beyond EOF are returned as zeros, matching an uninitialized page convention used by the Pager. |
| fsync on flush | FlushAsync calls RandomAccess.FlushToDisk which maps to FlushFileBuffers (Windows) or fsync (Linux/macOS), guaranteeing crash durability. |
| FileShare.Read | Other processes can open the file read-only concurrently; write access is exclusive to the owning FileStorageDevice instance. |
| IDisposable + IAsyncDisposable | Both patterns are supported; prefer await using in async code. |
| 4 KB page size | All pages are PageConstants.PageSize (4096 bytes). Page 0 reserves 100 bytes for the file header. |
| Single writer, multiple readers | The TransactionCoordinator enforces a single writer via SemaphoreSlim. Readers use WAL snapshots for isolation. |
| Optional memory-mapped reads | The pager can use memory-mapped reads for clean main-file pages when PagerOptions.UseMemoryMappedReads is enabled and the storage device supports it. |
| Sequential scan read-ahead | The pager can speculatively pull the next B+tree leaf page during forward scans when EnableSequentialLeafReadAhead is enabled. |
| Checkpoint residency preservation | With PagerOptions.PreserveOwnedPagesOnCheckpoint, already-owned main-file pages can stay resident across checkpoint. This is what the engine's lazy-resident hybrid mode relies on. |
| B+tree leaf linking | Leaf pages are linked via RightChildOrNextLeaf pointers, enabling efficient forward-only cursor scans without interior I/O. |
| Pluggable checkpoint policies | ICheckpointPolicy allows frame-count, WAL-size, time-interval, or custom composite triggers. |
| Schema versioning | SchemaCatalog.SchemaVersion increments on every DDL operation, enabling cache invalidation in upper layers. |
| Interceptor pipeline | IPageOperationInterceptor provides hooks for diagnostics, metrics, and custom behavior on page reads, writes, transactions, and checkpoints. |
See Also
- Architecture Guide
- Storage Package README
- Storage Tutorial Track
- Advanced Storage Examples
- Engine README
- Storage Diagnostics README
- Benchmark Suite
- Roadmap