Benchmarks

Real performance numbers from BenchmarkDotNet running against the CSharpDB engine. All results are from the CSharpDB.Benchmarks project.

At a Glance

460 ns
SQL Point Lookup
217 ns
Collection Get
2.9 μs
SQL Insert (in-memory)
2.1 μs
Collection Put (in-memory)

Point Lookups

Single-row lookups by primary key on a warm 10,000-row dataset. This is the most common database operation and the one where CSharpDB's B+tree leaf-hint cache shines.

SQL SELECT WHERE id = ?
460.7 ns ± 15 ns
SQL SELECT WHERE id = ? (in-memory)
460.0 ns ± 13 ns
Collection GetAsync(key)
220.1 ns ± 6.7 ns
Collection GetAsync(key) (in-memory)
216.8 ns ± 3.8 ns

Why are file-backed and in-memory reads identical?

On warm lookups, all pages are already cached in the IPageCache. The Pager resolves from cache without touching the storage device or WAL, so the I/O backend makes no difference. The B+tree leaf-hint cache often skips root-to-leaf traversal entirely for clustered access patterns, which is why lookups stay under 500 ns even at 10K rows.

The Collection API is ~2x faster than SQL because it bypasses SQL parsing, planning, and row decoding — it goes straight from key to B+tree to JSON deserialization.

Allocations: SQL lookup allocates 1.05 KB per call (QueryResult + DbValue array). Collection get allocates just 208 bytes (deserialized object only).

Cold Lookups (Under Cache Pressure)

What happens when your dataset does not fit in cache? This benchmark seeds 200,000 rows but limits the page cache to just 16 pages, forcing constant eviction and re-reads. Each iteration performs 256 randomized lookups.

SQL cold lookup (file-backed)
32.4 μs ± 1.04 μs
SQL cold lookup (in-memory)
2.20 μs ± 0.20 μs
Collection cold get (file-backed)
32.9 μs ± 1.02 μs
Collection cold get (in-memory)
1.86 μs ± 0.13 μs

Why the 15–18x difference?

With only 16 cached pages and 200K rows, almost every lookup is a cache miss. File-backed lookups must read 4 KB pages from the FileStorageDevice via RandomAccess — each page read involves a kernel syscall and disk I/O. In-memory lookups hit the MemoryStorageDevice which is a simple array copy, staying in user-space memory.

This benchmark uses PagerOptions.MaxCachedPages = 16 to simulate cache pressure. In production, set MaxCachedPages large enough to hold your hot working set and you will see warm-cache performance (sub-500 ns) for most lookups.

Allocations: Cold file-backed lookups allocate ~9.5 KB (page buffers for cache misses + result). In-memory cold lookups allocate ~8 KB (no page buffer copies needed).

Single Inserts

Single-row INSERT with auto-commit on a 10,000-row table. Each insert is a complete transaction: parse → plan → B+tree insert → WAL append → commit → flush.

61.2 μs
SQL INSERT (file-backed)
~16,300 ops/s
2.86 μs
SQL INSERT (in-memory)
~349,000 ops/s
56.0 μs
Collection PUT (file-backed)
~17,900 ops/s
2.08 μs
Collection PUT (in-memory)
~481,000 ops/s

Why is file-backed 21–27x slower for writes?

Every auto-committed write must durably flush the WAL to disk before returning. This means at minimum one fsync per insert — the dominant cost is waiting for the NVMe write to complete (~50–60 μs on this hardware). In-memory mode skips the WAL entirely, so inserts are just B+tree mutations in RAM.

Optimization tip: If you need file-backed write throughput, batch multiple inserts into a single explicit transaction. The WAL flush cost is amortized across all rows in the batch (see Batch Inserts below).

Allocations: SQL insert allocates 2.63 KB (file) / 1.64 KB (in-memory). Collection put allocates 2.10 KB (file) / 1.14 KB (in-memory). The difference is WAL frame buffers.

Batch Inserts

Inserting 100 rows per transaction, repeated 256 times (25,600 total rows). Batching amortizes the WAL flush across many rows, dramatically improving file-backed throughput.

114.7 ms
File-backed • 25,600 rows
~223,000 rows/s
43.8 ms
In-memory • 25,600 rows
~584,000 rows/s

The batching effect

Compare single inserts (16,300 ops/s file-backed) to batched inserts (223,000 rows/s file-backed) — that is a 13.7x throughput improvement just from batching 100 rows per transaction. The cost of one WAL flush (~60 μs) is now shared across 100 rows instead of paid per row.

In-memory mode sees a smaller improvement (349K → 584K) because the baseline was already fast. The remaining overhead per row is SQL parsing and B+tree page splits. Use the PrepareInsertBatch API to skip repeated parsing for maximum throughput.

Allocations: ~10.9 MB total for 25,600 rows (~430 bytes per row including SQL strings, parsed AST, and B+tree operations).

File-Backed vs In-Memory: When Does It Matter?

The storage mode only affects performance when I/O is involved. Here is the complete picture:

Operation File-Backed In-Memory Ratio Why
SQL point lookup (warm) 460.7 ns 460.0 ns 1.0x Both served from page cache
Collection get (warm) 220.1 ns 216.8 ns 1.0x Both served from page cache
SQL point lookup (cold) 32.4 μs 2.20 μs 14.7x File I/O on cache miss
Collection get (cold) 32.9 μs 1.86 μs 17.7x File I/O on cache miss
SQL single insert 61.2 μs 2.86 μs 21.4x WAL flush + fsync
Collection single put 56.0 μs 2.08 μs 26.9x WAL flush + fsync
Batch insert (100 rows/tx) 223K rows/s 584K rows/s 2.6x WAL flush amortized

The takeaway

Reads are free — once pages are cached, file-backed and in-memory are identical. The page cache is the performance equalizer.

Writes pay for durability — file-backed writes are 21–27x slower because each commit requires flushing the WAL to disk. Batching reduces the gap to 2.6x. If you do not need crash recovery, use in-memory mode for write-heavy workloads.

Cache sizing matters — if your hot working set fits in MaxCachedPages, even file-backed reads stay at ~460 ns. Cold reads are 15–18x slower because each cache miss is a kernel I/O call.

Scan & Query Patterns

The ScanBenchmarks suite tests 10 query patterns at 1K, 10K, and 100K row scales. These measure the full SQL pipeline: parse → plan → execute → iterate.

Pattern What It Tests Key Insight
SELECT * (full scan) BTreeCursor leaf-walk + full RecordEncoder decode Baseline throughput — measures raw sequential read speed
WHERE value < X (20% selectivity) Full scan + row-level predicate filter Filter overhead is minimal — most cost is page I/O, not evaluation
WHERE value < X (1% selectivity) Full scan with high rejection rate Still scans all pages; index scan would be faster at 1%
ORDER BY value Full materialization + in-memory sort Sort cost grows with row count; dominates at 100K rows
ORDER BY (value + id) Expression evaluation per row during sort Expression overhead is small vs sort/materialization
ORDER BY value LIMIT 100 Top-N optimization with heap-based partial sort Much faster than full sort — only materializes top 100
SELECT * LIMIT 100 Early termination after 100 rows Constant time regardless of table size
COUNT(*) Aggregate scan without row materialization Faster than SELECT * — skips RecordEncoder decode
GROUP BY category Hash-based grouping + COUNT + AVG aggregates Aggregate cost grows linearly with input rows

What to expect at scale

All scan benchmarks run at three dataset sizes (1K, 10K, 100K rows) to show how performance scales. Full scans are O(n) — they touch every leaf page sequentially. LIMIT queries are effectively O(1) because they stop early. ORDER BY + LIMIT is O(n) but with a constant factor much smaller than full sort because only the top-N rows are retained in memory.

The PointLookupBenchmarks suite additionally tests indexed vs non-indexed column lookups, projected columns, and miss-path performance at the same three scales.

Collection API Performance

The CollectionBenchmark macro suite runs 9 workloads for 15 seconds each, measuring sustained throughput and P50/P99 latencies.

Workload Dataset What It Measures
Single Put (15s) Growing Sustained single-document insert throughput with auto-commit
Batch Put ×100 (15s) Growing Batched puts in explicit transactions — amortized commit cost
Point Get (15s) 10K docs Random key lookups on pre-seeded collection
80/20 Mixed (15s) 10K docs Realistic read-heavy workload with concurrent writes
Full Scan (15s) 1K docs ScanAsync cursor iteration — measures deserialization throughput
Find ~20% Match (15s) 1K docs Filtered scan with FindByPathAsync — ~20% selectivity
Indexed Find (15s) 10K docs FindByIndexAsync on secondary index — O(log n) lookup
Put with Index (15s) Growing Insert cost with secondary index maintenance overhead
SQL vs Collection (10s) 10K Head-to-head: SELECT WHERE id = ? vs GetAsync(key)

SQL vs Collection: which is faster?

For point operations, the Collection API is consistently ~2x faster than SQL (217 ns vs 461 ns for reads, 2.1 μs vs 2.9 μs for writes). The Collection API bypasses SQL parsing, AST construction, query planning, and row-level RecordEncoder decoding. It goes directly from key → B+tree lookup → JSON deserialization.

Use SQL when you need joins, aggregates, or complex predicates. Use the Collection API when you know the key and want maximum single-document throughput.

Mixed Workloads

The MixedWorkloadBenchmark and InMemoryWorkloadBenchmark suites simulate realistic application patterns: 80% reads / 20% writes running for 10–15 seconds on 10,000 pre-seeded rows.

Workload Mode Reports
SQL 80/20 mix File-backed ops/sec, P50/P99 latency (reads and writes separately)
SQL 80/20 mix In-memory ops/sec, P50/P99 latency (reads and writes separately)
Collection 80/20 mix File-backed ops/sec, P50/P99 latency (reads and writes separately)
Collection 80/20 mix In-memory ops/sec, P50/P99 latency (reads and writes separately)

What to expect in production

In an 80/20 mix, read latency stays near warm-cache levels (~460 ns SQL, ~220 ns Collection) because the hot working set fits in cache. Write latency is dominated by WAL flush in file-backed mode (~60 μs) or is near-instant in-memory mode (~2–3 μs).

The single-writer lock means writes are serialized, but reads proceed concurrently via snapshot isolation. Read throughput scales linearly with reader count (verified by the ReaderScalingBenchmark), while write throughput remains constant.

Methodology & Environment

Detail Value
FrameworkBenchmarkDotNet v0.15.8
Runtime.NET 10.0.5 (SDK 10.0.201)
OSWindows 11 (10.0.26300.8068)
CPU11th Gen Intel Core i9-11900K 3.50 GHz (8C/16T)
StorageNVMe SSD
RAM32 GB

Benchmark Categories

Category Suite Method
Micro BenchmarkDotNet [Benchmark] Precise mean/error/stddev/allocated per operation. Uses warmup, pilot, and statistical analysis.
Macro Custom MacroBenchmarkRunner Duration-based (10–15s). Reports total ops, ops/sec, P50/P99 latency histograms.
Scaling Custom Tests behavior at increasing dataset sizes (BTree depth, row count scaling).
Stress Custom Crash recovery correctness, WAL growth under sustained writes.

Running the Benchmarks

# Run all micro benchmarks (BenchmarkDotNet)
cd tests/CSharpDB.Benchmarks
dotnet run -c Release -- --filter "*"

# Run a specific benchmark suite
dotnet run -c Release -- --filter "*ColdLookupBenchmarks*"
dotnet run -c Release -- --filter "*InMemorySqlBenchmarks*"

# Run macro benchmarks (duration-based)
dotnet run -c Release -- --filter "*MixedWorkloadBenchmark*"
dotnet run -c Release -- --filter "*CollectionBenchmark*"

Results are written to BenchmarkDotNet.Artifacts/results/ in CSV, GitHub Markdown, and HTML formats. The full benchmark source is at tests/CSharpDB.Benchmarks.