Benchmarks

Point Lookups

Single-row lookups by primary key on a warm 10,000-row dataset. This is the most common database operation and the one where CSharpDB's B+tree leaf-hint cache shines.

SQL SELECT WHERE id = ?

460.7 ns ± 15 ns

SQL SELECT WHERE id = ? (in-memory)

460.0 ns ± 13 ns

Collection GetAsync(key)

220.1 ns ± 6.7 ns

Collection GetAsync(key) (in-memory)

216.8 ns ± 3.8 ns

Why are file-backed and in-memory reads identical?

On warm lookups, all pages are already cached in the IPageCache. The Pager resolves from cache without touching the storage device or WAL, so the I/O backend makes no difference. The B+tree leaf-hint cache often skips root-to-leaf traversal entirely for clustered access patterns, which is why lookups stay under 500 ns even at 10K rows.

The Collection API is ~2x faster than SQL because it bypasses SQL parsing, planning, and row decoding — it goes straight from key to B+tree to JSON deserialization.

Allocations: SQL lookup allocates 1.05 KB per call (QueryResult + DbValue array). Collection get allocates just 208 bytes (deserialized object only).

Cold Lookups (Under Cache Pressure)

What happens when your dataset does not fit in cache? This benchmark seeds 200,000 rows but limits the page cache to just 16 pages, forcing constant eviction and re-reads. Each iteration performs 256 randomized lookups.

SQL cold lookup (file-backed)

32.4 μs ± 1.04 μs

SQL cold lookup (in-memory)

2.20 μs ± 0.20 μs

Collection cold get (file-backed)

32.9 μs ± 1.02 μs

Collection cold get (in-memory)

1.86 μs ± 0.13 μs

Why the 15–18x difference?

With only 16 cached pages and 200K rows, almost every lookup is a cache miss. File-backed lookups must read 4 KB pages from the FileStorageDevice via RandomAccess — each page read involves a kernel syscall and disk I/O. In-memory lookups hit the MemoryStorageDevice which is a simple array copy, staying in user-space memory.

This benchmark uses PagerOptions.MaxCachedPages = 16 to simulate cache pressure. In production, set MaxCachedPages large enough to hold your hot working set and you will see warm-cache performance (sub-500 ns) for most lookups.

Allocations: Cold file-backed lookups allocate ~9.5 KB (page buffers for cache misses + result). In-memory cold lookups allocate ~8 KB (no page buffer copies needed).

Single Inserts

Single-row INSERT with auto-commit on a 10,000-row table. Each insert is a complete transaction: parse → plan → B+tree insert → WAL append → commit → flush.

61.2 μs

SQL INSERT (file-backed)
~16,300 ops/s

2.86 μs

SQL INSERT (in-memory)
~349,000 ops/s

56.0 μs

Collection PUT (file-backed)
~17,900 ops/s

2.08 μs

Collection PUT (in-memory)
~481,000 ops/s

Why is file-backed 21–27x slower for writes?

Every auto-committed write must durably flush the WAL to disk before returning. This means at minimum one fsync per insert — the dominant cost is waiting for the NVMe write to complete (~50–60 μs on this hardware). In-memory mode skips the WAL entirely, so inserts are just B+tree mutations in RAM.

Optimization tip: If you need file-backed write throughput, batch multiple inserts into a single explicit transaction. The WAL flush cost is amortized across all rows in the batch (see Batch Inserts below).

Allocations: SQL insert allocates 2.63 KB (file) / 1.64 KB (in-memory). Collection put allocates 2.10 KB (file) / 1.14 KB (in-memory). The difference is WAL frame buffers.

Batch Inserts

Inserting 100 rows per transaction, repeated 256 times (25,600 total rows). Batching amortizes the WAL flush across many rows, dramatically improving file-backed throughput.

114.7 ms

File-backed • 25,600 rows
~223,000 rows/s

43.8 ms

In-memory • 25,600 rows
~584,000 rows/s

The batching effect

Compare single inserts (16,300 ops/s file-backed) to batched inserts (223,000 rows/s file-backed) — that is a 13.7x throughput improvement just from batching 100 rows per transaction. The cost of one WAL flush (~60 μs) is now shared across 100 rows instead of paid per row.

In-memory mode sees a smaller improvement (349K → 584K) because the baseline was already fast. The remaining overhead per row is SQL parsing and B+tree page splits. Use the PrepareInsertBatch API to skip repeated parsing for maximum throughput.

Allocations: ~10.9 MB total for 25,600 rows (~430 bytes per row including SQL strings, parsed AST, and B+tree operations).

File-Backed vs In-Memory: When Does It Matter?

The storage mode only affects performance when I/O is involved. Here is the complete picture:

Operation	File-Backed	In-Memory	Ratio	Why
SQL point lookup (warm)	460.7 ns	460.0 ns	1.0x	Both served from page cache
Collection get (warm)	220.1 ns	216.8 ns	1.0x	Both served from page cache
SQL point lookup (cold)	32.4 μs	2.20 μs	14.7x	File I/O on cache miss
Collection get (cold)	32.9 μs	1.86 μs	17.7x	File I/O on cache miss
SQL single insert	61.2 μs	2.86 μs	21.4x	WAL flush + fsync
Collection single put	56.0 μs	2.08 μs	26.9x	WAL flush + fsync
Batch insert (100 rows/tx)	223K rows/s	584K rows/s	2.6x	WAL flush amortized

The takeaway

Reads are free — once pages are cached, file-backed and in-memory are identical. The page cache is the performance equalizer.

Writes pay for durability — file-backed writes are 21–27x slower because each commit requires flushing the WAL to disk. Batching reduces the gap to 2.6x. If you do not need crash recovery, use in-memory mode for write-heavy workloads.

Cache sizing matters — if your hot working set fits in MaxCachedPages, even file-backed reads stay at ~460 ns. Cold reads are 15–18x slower because each cache miss is a kernel I/O call.

Scan & Query Patterns

The ScanBenchmarks suite tests 10 query patterns at 1K, 10K, and 100K row scales. These measure the full SQL pipeline: parse → plan → execute → iterate.

Pattern	What It Tests	Key Insight
`SELECT *` (full scan)	BTreeCursor leaf-walk + full RecordEncoder decode	Baseline throughput — measures raw sequential read speed
`WHERE value < X` (20% selectivity)	Full scan + row-level predicate filter	Filter overhead is minimal — most cost is page I/O, not evaluation
`WHERE value < X` (1% selectivity)	Full scan with high rejection rate	Still scans all pages; index scan would be faster at 1%
`ORDER BY value`	Full materialization + in-memory sort	Sort cost grows with row count; dominates at 100K rows
`ORDER BY (value + id)`	Expression evaluation per row during sort	Expression overhead is small vs sort/materialization
`ORDER BY value LIMIT 100`	Top-N optimization with heap-based partial sort	Much faster than full sort — only materializes top 100
`SELECT * LIMIT 100`	Early termination after 100 rows	Constant time regardless of table size
`COUNT(*)`	Aggregate scan without row materialization	Faster than SELECT * — skips RecordEncoder decode
`GROUP BY category`	Hash-based grouping + COUNT + AVG aggregates	Aggregate cost grows linearly with input rows

What to expect at scale

All scan benchmarks run at three dataset sizes (1K, 10K, 100K rows) to show how performance scales. Full scans are O(n) — they touch every leaf page sequentially. LIMIT queries are effectively O(1) because they stop early. ORDER BY + LIMIT is O(n) but with a constant factor much smaller than full sort because only the top-N rows are retained in memory.

The PointLookupBenchmarks suite additionally tests indexed vs non-indexed column lookups, projected columns, and miss-path performance at the same three scales.

Collection API Performance

The CollectionBenchmark macro suite runs 9 workloads for 15 seconds each, measuring sustained throughput and P50/P99 latencies.

Workload	Dataset	What It Measures
Single Put (15s)	Growing	Sustained single-document insert throughput with auto-commit
Batch Put ×100 (15s)	Growing	Batched puts in explicit transactions — amortized commit cost
Point Get (15s)	10K docs	Random key lookups on pre-seeded collection
80/20 Mixed (15s)	10K docs	Realistic read-heavy workload with concurrent writes
Full Scan (15s)	1K docs	`ScanAsync` cursor iteration — measures deserialization throughput
Find ~20% Match (15s)	1K docs	Filtered scan with `FindByPathAsync` — ~20% selectivity
Indexed Find (15s)	10K docs	`FindByIndexAsync` on secondary index — O(log n) lookup
Put with Index (15s)	Growing	Insert cost with secondary index maintenance overhead
SQL vs Collection (10s)	10K	Head-to-head: `SELECT WHERE id = ?` vs `GetAsync(key)`

SQL vs Collection: which is faster?

For point operations, the Collection API is consistently ~2x faster than SQL (217 ns vs 461 ns for reads, 2.1 μs vs 2.9 μs for writes). The Collection API bypasses SQL parsing, AST construction, query planning, and row-level RecordEncoder decoding. It goes directly from key → B+tree lookup → JSON deserialization.

Use SQL when you need joins, aggregates, or complex predicates. Use the Collection API when you know the key and want maximum single-document throughput.

Mixed Workloads

The MixedWorkloadBenchmark and InMemoryWorkloadBenchmark suites simulate realistic application patterns: 80% reads / 20% writes running for 10–15 seconds on 10,000 pre-seeded rows.

Workload	Mode	Reports
SQL 80/20 mix	File-backed	ops/sec, P50/P99 latency (reads and writes separately)
SQL 80/20 mix	In-memory	ops/sec, P50/P99 latency (reads and writes separately)
Collection 80/20 mix	File-backed	ops/sec, P50/P99 latency (reads and writes separately)
Collection 80/20 mix	In-memory	ops/sec, P50/P99 latency (reads and writes separately)

What to expect in production

In an 80/20 mix, read latency stays near warm-cache levels (~460 ns SQL, ~220 ns Collection) because the hot working set fits in cache. Write latency is dominated by WAL flush in file-backed mode (~60 μs) or is near-instant in-memory mode (~2–3 μs).

The single-writer lock means writes are serialized, but reads proceed concurrently via snapshot isolation. Read throughput scales linearly with reader count (verified by the ReaderScalingBenchmark), while write throughput remains constant.

Methodology & Environment

Detail	Value
Framework	BenchmarkDotNet v0.15.8
Runtime	.NET 10.0.5 (SDK 10.0.201)
OS	Windows 11 (10.0.26300.8068)
CPU	11th Gen Intel Core i9-11900K 3.50 GHz (8C/16T)
Storage	NVMe SSD
RAM	32 GB

Benchmark Categories

Category	Suite	Method
Micro	BenchmarkDotNet `[Benchmark]`	Precise mean/error/stddev/allocated per operation. Uses warmup, pilot, and statistical analysis.
Macro	Custom `MacroBenchmarkRunner`	Duration-based (10–15s). Reports total ops, ops/sec, P50/P99 latency histograms.
Scaling	Custom	Tests behavior at increasing dataset sizes (BTree depth, row count scaling).
Stress	Custom	Crash recovery correctness, WAL growth under sustained writes.

Running the Benchmarks

# Run all micro benchmarks (BenchmarkDotNet)
cd tests/CSharpDB.Benchmarks
dotnet run -c Release -- --filter "*"

# Run a specific benchmark suite
dotnet run -c Release -- --filter "*ColdLookupBenchmarks*"
dotnet run -c Release -- --filter "*InMemorySqlBenchmarks*"

# Run macro benchmarks (duration-based)
dotnet run -c Release -- --filter "*MixedWorkloadBenchmark*"
dotnet run -c Release -- --filter "*CollectionBenchmark*"

Results are written to BenchmarkDotNet.Artifacts/results/ in CSV, GitHub Markdown, and HTML formats. The full benchmark source is at tests/CSharpDB.Benchmarks.

At a Glance