Source reference. This page preserves the original long-form markdown content that previously lived at docs/roadmap.md. For the shorter curated page, see Roadmap.

CSharpDB Roadmap

This document outlines the planned direction for CSharpDB, organized by timeframe and priority. Items are roughly ordered by expected impact within each tier, and statuses are intended to reflect the current v3.8.0 state of the repo.

Near-Term

Recently completed improvements to query performance, storage/runtime behavior, maintenance workflows, and developer ergonomics.

Feature	Description	Status
`DISTINCT` keyword	Deduplicate rows in SELECT output	Done
Composite indexes	Multi-column indexes for covering more query patterns	Done
Index range scans	Use indexes for `<`, `>`, `<=`, `>=`, `BETWEEN` — not just equality	Done
Prepared statement cache	Cache parsed ASTs and query plans to avoid re-parsing identical SQL	Done
Cached max rowid	Avoid repeated O(n) scans when generating row IDs on insert (in-memory + persisted high-water mark)	Done
B+tree delete rebalancing	Merge underflowed pages on delete to reclaim space	Done
In-memory database mode	Open a database fully in memory, load a disk database into memory, and save a committed snapshot back to disk	Done
Shared in-memory ADO.NET mode	Support `Data Source=:memory:` and named shared in-memory databases with explicit save/load	Done
Collection field indexes	Equality-based secondary indexes for `Collection<T>` via `EnsureIndexAsync` / `FindByIndexAsync`	Done
Reader session reuse	Reuse snapshot pager and query planner inside `ReaderSession` for burst concurrent reads	Done
Architecture enforcement	`CSharpDB.Client` is now the main caller-facing interaction layer across local and remote scenarios; ADO.NET now routes ordinary direct and daemon-backed access through that layer, with only named shared in-memory provider state still retaining an internal engine dependency	Done
Database administration	Maintenance report, reindex (database/table/index/collection), VACUUM/compact, fragmentation analysis, database size report	Done
Dedicated gRPC daemon	`CSharpDB.Daemon` host plus `CSharpDB.Client` gRPC coverage for SQL, schema, procedures, collections, and maintenance	Done
Storage tuning presets	`UseLookupOptimizedPreset()` and `UseWriteOptimizedPreset()` for file-backed workloads	Done
Memory-mapped main-file reads	Opt-in mapped clean-page reads plus copy-on-write materialization for mutable access on file-backed databases	Done
Background WAL checkpointing	Incremental/sliced auto-checkpointing to move work off the triggering commit	Done
SQL executor/read-path fast paths	Compact scan and indexed-range projections, broader join lookup/covered paths, grouped/composite index aggregates, correlated subquery filter fast paths, and lower row materialization overhead	Done
Table/index statistics	ANALYZE command with persisted row counts, column NDV/min/max, stale tracking, and initial stats-guided index selection in the query planner	Done
Collection binary payloads	Binary direct-payload format with faster hydration, direct field/path extraction, and richer path-based indexing	Done
Collection path indexes	Nested scalar, array-element, nested array-object, Guid, temporal, and ordered text path indexes with `FindByPathAsync` / `FindByPathRangeAsync`	Done
Hybrid storage mode	Lazy-resident durable storage with gRPC tunable file-cache configuration; Admin direct local hosting keeps a warm in-process database instance and uses hybrid incremental-durable options by default	Done
Client backup/restore	`BackupAsync` / `RestoreAsync` as first-class `ICSharpDbClient` operations across direct, HTTP, gRPC, CLI, and Admin	Done
Native table archives and external tables	Native `.csdbtable` table snapshots with fast Admin Import / Export, download or server-path destinations, `CREATE EXTERNAL TABLE` / `DROP EXTERNAL TABLE`, `sys.external_tables`, read-only external table scans/joins, and embedded primary-key archive lookup indexes for eligible point reads	Done
Older DB foreign-key retrofit migration	Validate/apply maintenance workflow that rewrites existing child tables with persisted FK metadata across direct, HTTP, gRPC, CLI, and Admin	Done

Mid-Term

SQL feature parity, provider/tooling compatibility, and ecosystem expansion.

Feature	Description	Status
User-defined functions and commands	Done for the trusted in-process model: host-registered C# scalar functions, common SQL/Admin built-ins, trusted commands, Admin Forms/Reports/pipeline hooks, declarative Admin Forms action sequences, and local Admin Forms C# code modules are implemented across the supported surfaces. Untrusted sandboxed UDF execution is intentionally out of scope	Done
Writable external tables	Planned opt-in writable external table registrations over mutable `.csdbx` files, backed by CSharpDB B+tree storage and limited to DML (`INSERT`, `UPDATE`, `DELETE`) in v1 while `.csdbtable` archives remain read-only	Planned
Subqueries	Scalar subqueries, `IN (SELECT ...)`, `EXISTS (SELECT ...)`, including correlated evaluation in `WHERE`, non-aggregate projection, and `UPDATE`/`DELETE` expressions	Done
`UNION` / `INTERSECT` / `EXCEPT`	Set operations across SELECT results, including use in top-level queries, views, and CTE query bodies	Done
Window functions	`ROW_NUMBER()`, `RANK()`, `DENSE_RANK()`, `LEAD()`, `LAG()`	Planned
`DEFAULT` column values	Allow default expressions in column definitions	Planned
`CHECK` constraints	Arbitrary expression-based constraints per column or per table	Planned
Foreign key constraints	v1 support for single-column, column-level `REFERENCES` with optional `ON DELETE CASCADE`, plus `sys.foreign_keys` and metadata/tooling surfaces	Done
Remote host consolidation	`CSharpDB.Daemon` now hosts the existing REST/HTTP `/api` surface and gRPC from one long-running process backed by the same warm daemon-hosted client; standalone `CSharpDB.Api` remains supported for REST-only hosting	Done
Remote API-key protection	Opt-in API-key mode protects REST `/api/*` and daemon gRPC calls with constant-time key comparison while keeping default no-auth behavior for compatibility	Done
Remote host security hardening	Add authorization, protected admin endpoint scopes, JWT/RBAC options, and TLS/mTLS deployment helpers for remote HTTP and gRPC access	Planned
Daemon service packaging	Package the existing `CSharpDB.Daemon` host as a persistent background service across systemd, Windows Service, and launchd	Done
Cross-platform deployment	Self-contained daemon archives and install scripts ship for Windows, Linux, and macOS; dotnet tool, Docker, Homebrew, and winget distribution remain future work	In Progress
NuGet package	Publish and maintain `CSharpDB.Engine`, `CSharpDB.Data`, `CSharpDB.Client`, and `CSharpDB.Primitives` as the primary NuGet packages	Done
Connection pooling	Pool underlying direct embedded sessions behind `CSharpDbConnection` to amortize open/close cost	Done
Admin dashboard improvements	Richer SQL editor UX, query history, deeper diagnostics, and integrated Forms/Reports tooling beyond the core schema/procedure/storage surface	Done
Admin Forms Access parity	Close the highest-impact Access-style form gaps: runtime responsive layouts, full inferred validation enforcement, richer record-source/filter/sort models, Layout View, form modes, broader action/event coverage, and broader control coverage; trusted command-backed form lifecycle events, command buttons, and selected control events are now started	Partial
Admin Reports Access parity	Close the highest-impact Access-style report gaps: bounded saved-query previews, full report rendering/export, parameter/filter prompts, richer grouping/totals options, Layout View, conditional formatting, subreports, and broader report controls; trusted command-backed report preview lifecycle events are now started	Partial
Visual query designer	Classic Admin query builder with source canvas, join editing, design grid, SQL preview, and saved designer layouts	Done
ETL pipelines	Built-in package-driven pipeline runtime with validation, dry-run, execute/resume flows, API/CLI/client coverage, run history, and Admin visual designer support	Done
VS Code extension	Schema explorer, SQL editor with IntelliSense, data browser, table designer, storage diagnostics	Done
ADO.NET `GetSchema` collections	Implement `DbConnection.GetSchema()` for standard metadata collections (MetaDataCollections, Tables, Columns, Indexes, Views, ForeignKeys) to support ORMs and tooling that discover schema through ADO.NET	Done
Multilingual text support	`BINARY`, `NOCASE`, `NOCASE_AI`, and built-in `ICU:<locale>` collation now work across SQL schema/query semantics, metadata surfaces, and collection path indexes; dedicated ordered SQL text index optimization remains planned	Done

Long-Term

Advanced features and fundamental architecture enhancements.

Feature	Description	Status
Full-text search	Inverted index with tokenization, stemming, and relevance ranking	Done
JSON path querying	Query into JSON document fields in the Collection API (e.g., `$.address.city`) via `FindByPathAsync` / `FindByPathRangeAsync`	Done
Advanced collection storage path	Binary direct-payload format with direct binary hydration, path-based field extraction, and richer expression/path indexes	Done
SQL batched row transport	Internal row-batch transport serves as the batch-first SQL execution foundation across batch-capable result boundaries, scans, joins, and generic aggregates	Done
External table index coverage	Follow writable `.csdbx` storage with broader external-table indexes, planner costing, and multi-column lookup/range support beyond the current archive primary-key point-lookup path	Planned
Source-generated collection fast path	Done for the current phase: opt-in generated collection models now provide `GetGeneratedCollectionAsync<T>(...)`, generated field descriptors/index bindings, analyzer-packaged collection model/codecs, generated binary direct-payload encode/decode for supported document graphs, source-generated JSON fallback for unsupported shapes, trim/NativeAOT smoke coverage, and a dedicated sample	Done
Generated collection package ergonomics	Streamline NuGet/analyzer packaging, templates, onboarding docs, and generated-collection setup so consumers can adopt the opt-in path with less project wiring	Planned
Broader generated collection model coverage	Expand generator support beyond the current scalar, scalar collection, nested scalar, and nested collection-scalar shapes; unsupported shapes currently warn and fall back to source-generated JSON instead of binary direct payloads	Planned
Page-level compression	Deep engine/page compression remains planned; application-level payload compression is available as a sample/SDK pattern without changing the storage format	Planned
At-rest encryption	Encrypt database and WAL files with passphrase-based key management and explicit plaintext/encrypted migration/export paths; implementation must meet the database-encryption plan entry criteria before shipping	Research
Advanced cost-based query optimizer	Done for the current phase: `ANALYZE`-driven stats-guided costing now uses internal equi-depth histograms, heavy hitters, composite-index prefix distinct-count summaries, skew-aware lookup/filter estimates, correlation-aware composite equality filters/joins, non-unique lookup costing, hash build-side choice, and bounded DP reordering for small inner-join chains	Done
Adaptive query re-optimization	Done for the current phase: opt-in adaptive join execution can switch eligible index nested-loop joins to hash joins and can flip inner hash build sides at safe pre-emission boundaries when observed rows diverge materially from estimates. Broader `EXPLAIN ANALYZE`, runtime actual-row reporting, adaptive stats persistence, and arbitrary mid-plan reordering remain future work	Done
Public planner histogram inspection	Stable SQL-first diagnostics now expose `sys.planner_histograms`, `sys.planner_heavy_hitters`, `sys.planner_index_prefix_stats`, and `EXPLAIN ESTIMATE FOR <query>` while keeping raw histogram/prefix storage payloads internal	Done
Async I/O batching	Done for the current phase: WAL frame-chunk writes, chunked checkpoint page copies, shared snapshot/export batching, reusable B-tree copy utilities, and the close-out audit now cover the main storage and maintenance write paths; future work is limited to specialized diagnostics or maintenance-path tuning	Done
Low-latency durable writes	Done in `v2.9.0`: advisory planner-stat persistence can stay deferred without weakening committed-row durability, and `sys.table_stats.row_count_is_exact` now makes exact versus estimated row-count semantics explicit to planner and `COUNT(*)` fast paths	Done
Group commit / deferred WAL flush	Done in `v2.9.0`: opt-in `UseDurableCommitBatchWindow(...)` batches durable WAL flushes across contending in-process transactions and remains an expert measure-first knob rather than default behavior	Done
Initial multi-writer support	Explicit `WriteTransaction` conflict-detected retry flow, shared auto-commit non-insert isolation, and opt-in `ConcurrentWriteTransactions` for shared implicit inserts	Done
Broader multi-writer insert optimization	Opt-in `ConcurrentWriteTransactions` now reserves shared row-id ranges and rebases hot right-edge insert pages against pending WAL images, improving concurrent one-row auto-ID and explicit-ID insert fan-in while keeping serialized inserts as the default	Done
API-level sharding	Route API/daemon requests across multiple warm CSharpDB database files so independent tenants or shard keys can use separate WAL and commit paths, with v1 focused on single-shard writes and point reads	Research
Replication / change feed	Retained commit-log change feeds and reactive query subscriptions for read replicas, live Admin views, and event-driven applications	Research

Current Limitations

These are known simplifications in the current implementation:

Area	Limitation
Functions and automation	CSharpDB's UDF/command model is trusted and in-process by design. Current supported surfaces include host-registered scalar functions, common built-ins, trusted commands, form/report/pipeline hooks, declarative action sequences, and local Admin Forms C# modules; untrusted sandboxed execution is intentionally out of scope
Query	Scalar/`IN`/`EXISTS` subqueries are supported, including correlated cases in `WHERE`, non-aggregate projection, and `UPDATE`/`DELETE` expressions; correlated subqueries are not yet supported in `JOIN ON`, `GROUP BY`, `HAVING`, `ORDER BY`, or aggregate projections
Query	`UNION`, `INTERSECT`, and `EXCEPT` are supported; `UNION ALL` is not implemented yet
Query	No window functions
Schema	No SQL `DEFAULT` column values or `CHECK` constraints yet. Foreign keys are currently v1 only: single-column, column-level `REFERENCES` with optional `ON DELETE CASCADE`; table-level/composite/deferred foreign keys and `ON UPDATE` actions are not implemented
Indexes	Equality lookups support current `INTEGER`/`TEXT` indexes, but ordered range-scan pushdown is still limited to single-column `INTEGER` index paths
RowId	Legacy table schemas without persisted high-water metadata may pay a one-time key scan on first insert
Collections	`FindByIndexAsync` supports declared field-equality lookups; `FindByPathAsync` and `FindByPathRangeAsync` support path-based queries on indexed paths; `FindAsync` remains a full scan for unindexed predicates. Generated collections require registered descriptors for existing collection indexes; unsupported generated model shapes warn and use the source-generated JSON fallback instead of binary direct payloads
External Tables	Native `.csdbtable` archives can be registered and queried as read-only external tables. Writable external tables are planned as an opt-in `.csdbx` format; current archives remain read-only, and broader external indexes, range seeks, and deeper planner costing remain planned
Networking	`CSharpDB.Daemon` now hosts both REST and gRPC from one process; named pipes remain reserved but are not implemented end to end today
Security	Remote REST and daemon gRPC support opt-in API-key authentication, defaulting to `None` for backward compatibility. JWT, RBAC, mTLS helpers, TLS-specific configuration, and at-rest encryption are not implemented
Admin Forms	The Forms designer/runtime supports the core generated-form and data-entry path plus trusted command-backed automation, including lifecycle events, command buttons, selected-control events, conditional UI rules, domain formula helpers, and declarative action sequences for current record, form navigation, filtering, SQL/procedure, and control-property workflows. It still needs Access-parity work for responsive runtime rendering, complete inferred validation, richer form modes, additional events, advanced filtering/sorting, report/query/import/export actions, macro loops/on-error/temp vars, and broader controls
Admin Reports	The Reports designer/runtime supports the core banded preview path plus trusted command-backed preview lifecycle events, but still needs Access-parity work for bounded saved-query previews, full report output/export, parameters, richer grouping and totals semantics, conditional formatting, subreports, and broader controls
Text / Multilingual	Text is stored as UTF-8 and supports all Unicode languages; default semantics remain ordinal, but opt-in `BINARY`, `NOCASE`, `NOCASE_AI`, and `ICU:<locale>` collation are implemented for SQL and collection indexes. Dedicated ordered SQL text index optimization remains planned
Concurrency	The physical WAL commit path is still serialized at the storage boundary. Initial multi-writer support is shipped, but observed gains still depend on conflict shape and whether shared auto-commit `INSERT` is left on the default serialized path
Storage	No page-level compression; the compression SDK sample stores compressed payloads as ordinary application-managed `BLOB` values
Storage	No at-rest encryption for database/WAL files; on-disk storage is plaintext only
Storage	Memory-mapped reads are opt-in and currently apply only to clean main-file pages; WAL-backed reads still rely on the WAL/cache path
Storage	By default, durable auto-commit single-row writes still pay a physical WAL flush per commit; opt-in `UseDurableCommitBatchWindow(...)` can trade some commit latency for higher throughput across contending in-process writers, but default behavior remains per-commit durable
Query	Phase-2 cost-based planning is in place: `ANALYZE`, `sys.table_stats`, `sys.column_stats`, public planner-stat diagnostics, histogram/heavy-hitter/prefix estimates, and bounded small-chain join reordering now feed join/access-path costing. Opt-in adaptive join re-optimization can react to stale-stat or parameter-sensitive join cardinality misses, while broader runtime actuals, `EXPLAIN ANALYZE`, and full mid-plan reordering remain future work
Query	Internal row-batch transport is now the default scan-heavy execution foundation across batch-capable scans, joins, aggregates, and result boundaries; remaining work is broader kernel specialization and optional SIMD-style tuning rather than missing core batch coverage

Completed Milestones

Major features already implemented:

Single-file database with 4 KB page-oriented storage
B+tree-backed tables and secondary indexes
Write-Ahead Log with crash recovery and auto-checkpoint
Concurrent snapshot-isolated readers via WAL-based MVCC
Full SQL pipeline: tokenizer, parser, query planner, operator tree
JOINs (INNER, LEFT, RIGHT, CROSS), aggregates, GROUP BY, HAVING, CTEs
Set operations: UNION, INTERSECT, EXCEPT
SELECT DISTINCT and DISTINCT aggregates
Scalar subqueries, IN (SELECT ...), and EXISTS (SELECT ...), including correlated evaluation in filters, non-aggregate projections, and UPDATE/DELETE expressions
Scalar TEXT(expr) for filter-friendly text coercion
Composite (multi-column) indexes
Ordered integer index range scans (<, <=, >, >=, BETWEEN) in the fast lookup path
ANALYZE, persisted sys.table_stats / sys.column_stats, and stale-aware column-stat refresh
Phase-2 cost-based query planning: statistics-guided access-path selection, join method choice, hash build-side choice, histogram/heavy-hitter/cardinality estimation, composite-prefix correlation modeling, and bounded small-chain inner-join reordering
Public planner diagnostics through EXPLAIN ESTIMATE FOR SELECT, WITH, compound queries, and sys.planner_* virtual catalogs
Opt-in adaptive join re-optimization behind DatabaseOptions.AdaptiveQueryReoptimization
SQL statement and SELECT plan caching
First-class IDENTITY / AUTOINCREMENT support for INTEGER PRIMARY KEY columns
Persisted table NextRowId high-water mark with compatibility fallback for legacy metadata
Views and triggers (BEFORE/AFTER on INSERT/UPDATE/DELETE)
Foreign key constraints: single-column, column-level REFERENCES with optional ON DELETE CASCADE
Older-database foreign-key retrofit migration across direct, HTTP, gRPC, CLI, and Admin
ADO.NET provider (DbConnection, DbCommand, DbDataReader, DbTransaction)
ADO.NET GetSchema() metadata collections for MetaDataCollections, Tables, Columns, Indexes, Views, and ForeignKeys
ADO.NET connection pooling with ClearPool / ClearAllPools
In-memory database mode with explicit load-from-disk and save-to-disk APIs
Shared/private in-memory ADO.NET connections with named shared-memory hosts
Document Collection API (NoSQL) with typed Put/Get/Delete/Scan/Find
Collection UTF-8 payload fast path with compatibility for legacy backing rows
Collection secondary field indexes via EnsureIndexAsync / FindByIndexAsync
Maintenance report, REINDEX, and VACUUM flows across client, CLI, API, and Admin UI
Dedicated CSharpDB.Daemon gRPC host for remote CSharpDB.Client access
Remote host consolidation in CSharpDB.Daemon, with REST /api and gRPC sharing the same warm hosted database client
Opt-in API-key protection for REST /api/* and daemon gRPC access
Storage tuning presets, bounded WAL read caching, memory-mapped main-file reads, and sliced background WAL auto-checkpointing
SQL executor/read-path fast paths for compact projections, broader join/index coverage, grouped aggregates, and correlated subquery filters
Batch-first SQL row-batch execution foundation with batch-aware scan/index/join roots, shared predicate/projection kernels, and batch-native generic aggregate paths
Interactive CLI with meta-commands and file execution
REST API with 34 endpoints and OpenAPI/Scalar documentation
Blazor Server admin dashboard
Integrated Admin Forms and Reports designers with runtime preview/entry, database-backed metadata persistence, and print-ready report output
Trusted C# callbacks, commands, Admin automation hooks, and local Admin Forms C# code modules
B+tree delete rebalancing with underflow handling (borrow/merge + interior collapse path)
Reusable snapshot reader sessions for higher concurrent-read throughput
Comprehensive benchmark suite (micro, macro, stress, scaling, in-memory, shared-memory)
Binary direct-payload collection storage with direct hydration and field/path extraction
Collection path indexes: nested scalar, array-element, nested array-object, Guid, temporal, ordered text
Collection path query APIs: FindByPathAsync and FindByPathRangeAsync
Source-generated typed collection fast path: generated collection models/codecs/field descriptors, generated binary direct payloads for supported shapes, trim-safe GetGeneratedCollectionAsync<T>(...), generator diagnostics, NativeAOT trim-smoke validation, and a dedicated sample
Full-text search with tokenization, stemming, and relevance ranking
Hybrid storage mode with lazy-resident durable storage and gRPC tunable file-cache
Client-wide BackupAsync / RestoreAsync across direct, HTTP, gRPC, CLI, and Admin
Native .csdbtable table archives with Admin Import / Export, read-only external table registration, sys.external_tables, external table scans/joins, and embedded archive primary-key lookup indexes
ReplaceAsync for index stores
Package-driven ETL pipelines with validation, dry-run, execute/resume, persisted run history, and Admin visual designer support

CSharpDB Roadmap

Near-Term

Mid-Term

Long-Term

Current Limitations

Completed Milestones

See Also