Source reference. This page preserves the original long-form markdown content that previously lived at
docs/roadmap.md. For the shorter curated page, see Roadmap.CSharpDB Roadmap
This document outlines the planned direction for CSharpDB, organized by timeframe and priority. Items are roughly ordered by expected impact within each tier, and statuses are intended to reflect the current v3.8.0 state of the repo.
Near-Term
Recently completed improvements to query performance, storage/runtime behavior, maintenance workflows, and developer ergonomics.
| Feature | Description | Status |
|---|---|---|
DISTINCT keyword |
Deduplicate rows in SELECT output | Done |
| Composite indexes | Multi-column indexes for covering more query patterns | Done |
| Index range scans | Use indexes for <, >, <=, >=, BETWEEN — not just equality |
Done |
| Prepared statement cache | Cache parsed ASTs and query plans to avoid re-parsing identical SQL | Done |
| Cached max rowid | Avoid repeated O(n) scans when generating row IDs on insert (in-memory + persisted high-water mark) | Done |
| B+tree delete rebalancing | Merge underflowed pages on delete to reclaim space | Done |
| In-memory database mode | Open a database fully in memory, load a disk database into memory, and save a committed snapshot back to disk | Done |
| Shared in-memory ADO.NET mode | Support Data Source=:memory: and named shared in-memory databases with explicit save/load |
Done |
| Collection field indexes | Equality-based secondary indexes for Collection<T> via EnsureIndexAsync / FindByIndexAsync |
Done |
| Reader session reuse | Reuse snapshot pager and query planner inside ReaderSession for burst concurrent reads |
Done |
| Architecture enforcement | CSharpDB.Client is now the main caller-facing interaction layer across local and remote scenarios; ADO.NET now routes ordinary direct and daemon-backed access through that layer, with only named shared in-memory provider state still retaining an internal engine dependency |
Done |
| Database administration | Maintenance report, reindex (database/table/index/collection), VACUUM/compact, fragmentation analysis, database size report | Done |
| Dedicated gRPC daemon | CSharpDB.Daemon host plus CSharpDB.Client gRPC coverage for SQL, schema, procedures, collections, and maintenance |
Done |
| Storage tuning presets | UseLookupOptimizedPreset() and UseWriteOptimizedPreset() for file-backed workloads |
Done |
| Memory-mapped main-file reads | Opt-in mapped clean-page reads plus copy-on-write materialization for mutable access on file-backed databases | Done |
| Background WAL checkpointing | Incremental/sliced auto-checkpointing to move work off the triggering commit | Done |
| SQL executor/read-path fast paths | Compact scan and indexed-range projections, broader join lookup/covered paths, grouped/composite index aggregates, correlated subquery filter fast paths, and lower row materialization overhead | Done |
| Table/index statistics | ANALYZE command with persisted row counts, column NDV/min/max, stale tracking, and initial stats-guided index selection in the query planner | Done |
| Collection binary payloads | Binary direct-payload format with faster hydration, direct field/path extraction, and richer path-based indexing | Done |
| Collection path indexes | Nested scalar, array-element, nested array-object, Guid, temporal, and ordered text path indexes with FindByPathAsync / FindByPathRangeAsync |
Done |
| Hybrid storage mode | Lazy-resident durable storage with gRPC tunable file-cache configuration; Admin direct local hosting keeps a warm in-process database instance and uses hybrid incremental-durable options by default | Done |
| Client backup/restore | BackupAsync / RestoreAsync as first-class ICSharpDbClient operations across direct, HTTP, gRPC, CLI, and Admin |
Done |
| Native table archives and external tables | Native .csdbtable table snapshots with fast Admin Import / Export, download or server-path destinations, CREATE EXTERNAL TABLE / DROP EXTERNAL TABLE, sys.external_tables, read-only external table scans/joins, and embedded primary-key archive lookup indexes for eligible point reads |
Done |
| Older DB foreign-key retrofit migration | Validate/apply maintenance workflow that rewrites existing child tables with persisted FK metadata across direct, HTTP, gRPC, CLI, and Admin | Done |
Mid-Term
SQL feature parity, provider/tooling compatibility, and ecosystem expansion.
| Feature | Description | Status |
|---|---|---|
| User-defined functions and commands | Done for the trusted in-process model: host-registered C# scalar functions, common SQL/Admin built-ins, trusted commands, Admin Forms/Reports/pipeline hooks, declarative Admin Forms action sequences, and local Admin Forms C# code modules are implemented across the supported surfaces. Untrusted sandboxed UDF execution is intentionally out of scope | Done |
| Writable external tables | Planned opt-in writable external table registrations over mutable .csdbx files, backed by CSharpDB B+tree storage and limited to DML (INSERT, UPDATE, DELETE) in v1 while .csdbtable archives remain read-only |
Planned |
| Subqueries | Scalar subqueries, IN (SELECT ...), EXISTS (SELECT ...), including correlated evaluation in WHERE, non-aggregate projection, and UPDATE/DELETE expressions |
Done |
UNION / INTERSECT / EXCEPT |
Set operations across SELECT results, including use in top-level queries, views, and CTE query bodies | Done |
| Window functions | ROW_NUMBER(), RANK(), DENSE_RANK(), LEAD(), LAG() |
Planned |
DEFAULT column values |
Allow default expressions in column definitions | Planned |
CHECK constraints |
Arbitrary expression-based constraints per column or per table | Planned |
| Foreign key constraints | v1 support for single-column, column-level REFERENCES with optional ON DELETE CASCADE, plus sys.foreign_keys and metadata/tooling surfaces |
Done |
| Remote host consolidation | CSharpDB.Daemon now hosts the existing REST/HTTP /api surface and gRPC from one long-running process backed by the same warm daemon-hosted client; standalone CSharpDB.Api remains supported for REST-only hosting |
Done |
| Remote API-key protection | Opt-in API-key mode protects REST /api/* and daemon gRPC calls with constant-time key comparison while keeping default no-auth behavior for compatibility |
Done |
| Remote host security hardening | Add authorization, protected admin endpoint scopes, JWT/RBAC options, and TLS/mTLS deployment helpers for remote HTTP and gRPC access | Planned |
| Daemon service packaging | Package the existing CSharpDB.Daemon host as a persistent background service across systemd, Windows Service, and launchd |
Done |
| Cross-platform deployment | Self-contained daemon archives and install scripts ship for Windows, Linux, and macOS; dotnet tool, Docker, Homebrew, and winget distribution remain future work | In Progress |
| NuGet package | Publish and maintain CSharpDB.Engine, CSharpDB.Data, CSharpDB.Client, and CSharpDB.Primitives as the primary NuGet packages |
Done |
| Connection pooling | Pool underlying direct embedded sessions behind CSharpDbConnection to amortize open/close cost |
Done |
| Admin dashboard improvements | Richer SQL editor UX, query history, deeper diagnostics, and integrated Forms/Reports tooling beyond the core schema/procedure/storage surface | Done |
| Admin Forms Access parity | Close the highest-impact Access-style form gaps: runtime responsive layouts, full inferred validation enforcement, richer record-source/filter/sort models, Layout View, form modes, broader action/event coverage, and broader control coverage; trusted command-backed form lifecycle events, command buttons, and selected control events are now started | Partial |
| Admin Reports Access parity | Close the highest-impact Access-style report gaps: bounded saved-query previews, full report rendering/export, parameter/filter prompts, richer grouping/totals options, Layout View, conditional formatting, subreports, and broader report controls; trusted command-backed report preview lifecycle events are now started | Partial |
| Visual query designer | Classic Admin query builder with source canvas, join editing, design grid, SQL preview, and saved designer layouts | Done |
| ETL pipelines | Built-in package-driven pipeline runtime with validation, dry-run, execute/resume flows, API/CLI/client coverage, run history, and Admin visual designer support | Done |
| VS Code extension | Schema explorer, SQL editor with IntelliSense, data browser, table designer, storage diagnostics | Done |
ADO.NET GetSchema collections |
Implement DbConnection.GetSchema() for standard metadata collections (MetaDataCollections, Tables, Columns, Indexes, Views, ForeignKeys) to support ORMs and tooling that discover schema through ADO.NET |
Done |
| Multilingual text support | BINARY, NOCASE, NOCASE_AI, and built-in ICU:<locale> collation now work across SQL schema/query semantics, metadata surfaces, and collection path indexes; dedicated ordered SQL text index optimization remains planned |
Done |
Long-Term
Advanced features and fundamental architecture enhancements.
| Feature | Description | Status |
|---|---|---|
| Full-text search | Inverted index with tokenization, stemming, and relevance ranking | Done |
| JSON path querying | Query into JSON document fields in the Collection API (e.g., $.address.city) via FindByPathAsync / FindByPathRangeAsync |
Done |
| Advanced collection storage path | Binary direct-payload format with direct binary hydration, path-based field extraction, and richer expression/path indexes | Done |
| SQL batched row transport | Internal row-batch transport serves as the batch-first SQL execution foundation across batch-capable result boundaries, scans, joins, and generic aggregates | Done |
| External table index coverage | Follow writable .csdbx storage with broader external-table indexes, planner costing, and multi-column lookup/range support beyond the current archive primary-key point-lookup path |
Planned |
| Source-generated collection fast path | Done for the current phase: opt-in generated collection models now provide GetGeneratedCollectionAsync<T>(...), generated field descriptors/index bindings, analyzer-packaged collection model/codecs, generated binary direct-payload encode/decode for supported document graphs, source-generated JSON fallback for unsupported shapes, trim/NativeAOT smoke coverage, and a dedicated sample |
Done |
| Generated collection package ergonomics | Streamline NuGet/analyzer packaging, templates, onboarding docs, and generated-collection setup so consumers can adopt the opt-in path with less project wiring | Planned |
| Broader generated collection model coverage | Expand generator support beyond the current scalar, scalar collection, nested scalar, and nested collection-scalar shapes; unsupported shapes currently warn and fall back to source-generated JSON instead of binary direct payloads | Planned |
| Page-level compression | Deep engine/page compression remains planned; application-level payload compression is available as a sample/SDK pattern without changing the storage format | Planned |
| At-rest encryption | Encrypt database and WAL files with passphrase-based key management and explicit plaintext/encrypted migration/export paths; implementation must meet the database-encryption plan entry criteria before shipping | Research |
| Advanced cost-based query optimizer | Done for the current phase: ANALYZE-driven stats-guided costing now uses internal equi-depth histograms, heavy hitters, composite-index prefix distinct-count summaries, skew-aware lookup/filter estimates, correlation-aware composite equality filters/joins, non-unique lookup costing, hash build-side choice, and bounded DP reordering for small inner-join chains |
Done |
| Adaptive query re-optimization | Done for the current phase: opt-in adaptive join execution can switch eligible index nested-loop joins to hash joins and can flip inner hash build sides at safe pre-emission boundaries when observed rows diverge materially from estimates. Broader EXPLAIN ANALYZE, runtime actual-row reporting, adaptive stats persistence, and arbitrary mid-plan reordering remain future work |
Done |
| Public planner histogram inspection | Stable SQL-first diagnostics now expose sys.planner_histograms, sys.planner_heavy_hitters, sys.planner_index_prefix_stats, and EXPLAIN ESTIMATE FOR <query> while keeping raw histogram/prefix storage payloads internal |
Done |
| Async I/O batching | Done for the current phase: WAL frame-chunk writes, chunked checkpoint page copies, shared snapshot/export batching, reusable B-tree copy utilities, and the close-out audit now cover the main storage and maintenance write paths; future work is limited to specialized diagnostics or maintenance-path tuning | Done |
| Low-latency durable writes | Done in v2.9.0: advisory planner-stat persistence can stay deferred without weakening committed-row durability, and sys.table_stats.row_count_is_exact now makes exact versus estimated row-count semantics explicit to planner and COUNT(*) fast paths |
Done |
| Group commit / deferred WAL flush | Done in v2.9.0: opt-in UseDurableCommitBatchWindow(...) batches durable WAL flushes across contending in-process transactions and remains an expert measure-first knob rather than default behavior |
Done |
| Initial multi-writer support | Explicit WriteTransaction conflict-detected retry flow, shared auto-commit non-insert isolation, and opt-in ConcurrentWriteTransactions for shared implicit inserts |
Done |
| Broader multi-writer insert optimization | Opt-in ConcurrentWriteTransactions now reserves shared row-id ranges and rebases hot right-edge insert pages against pending WAL images, improving concurrent one-row auto-ID and explicit-ID insert fan-in while keeping serialized inserts as the default |
Done |
| API-level sharding | Route API/daemon requests across multiple warm CSharpDB database files so independent tenants or shard keys can use separate WAL and commit paths, with v1 focused on single-shard writes and point reads | Research |
| Replication / change feed | Retained commit-log change feeds and reactive query subscriptions for read replicas, live Admin views, and event-driven applications | Research |
Current Limitations
These are known simplifications in the current implementation:
| Area | Limitation |
|---|---|
| Functions and automation | CSharpDB's UDF/command model is trusted and in-process by design. Current supported surfaces include host-registered scalar functions, common built-ins, trusted commands, form/report/pipeline hooks, declarative action sequences, and local Admin Forms C# modules; untrusted sandboxed execution is intentionally out of scope |
| Query | Scalar/IN/EXISTS subqueries are supported, including correlated cases in WHERE, non-aggregate projection, and UPDATE/DELETE expressions; correlated subqueries are not yet supported in JOIN ON, GROUP BY, HAVING, ORDER BY, or aggregate projections |
| Query | UNION, INTERSECT, and EXCEPT are supported; UNION ALL is not implemented yet |
| Query | No window functions |
| Schema | No SQL DEFAULT column values or CHECK constraints yet. Foreign keys are currently v1 only: single-column, column-level REFERENCES with optional ON DELETE CASCADE; table-level/composite/deferred foreign keys and ON UPDATE actions are not implemented |
| Indexes | Equality lookups support current INTEGER/TEXT indexes, but ordered range-scan pushdown is still limited to single-column INTEGER index paths |
| RowId | Legacy table schemas without persisted high-water metadata may pay a one-time key scan on first insert |
| Collections | FindByIndexAsync supports declared field-equality lookups; FindByPathAsync and FindByPathRangeAsync support path-based queries on indexed paths; FindAsync remains a full scan for unindexed predicates. Generated collections require registered descriptors for existing collection indexes; unsupported generated model shapes warn and use the source-generated JSON fallback instead of binary direct payloads |
| External Tables | Native .csdbtable archives can be registered and queried as read-only external tables. Writable external tables are planned as an opt-in .csdbx format; current archives remain read-only, and broader external indexes, range seeks, and deeper planner costing remain planned |
| Networking | CSharpDB.Daemon now hosts both REST and gRPC from one process; named pipes remain reserved but are not implemented end to end today |
| Security | Remote REST and daemon gRPC support opt-in API-key authentication, defaulting to None for backward compatibility. JWT, RBAC, mTLS helpers, TLS-specific configuration, and at-rest encryption are not implemented |
| Admin Forms | The Forms designer/runtime supports the core generated-form and data-entry path plus trusted command-backed automation, including lifecycle events, command buttons, selected-control events, conditional UI rules, domain formula helpers, and declarative action sequences for current record, form navigation, filtering, SQL/procedure, and control-property workflows. It still needs Access-parity work for responsive runtime rendering, complete inferred validation, richer form modes, additional events, advanced filtering/sorting, report/query/import/export actions, macro loops/on-error/temp vars, and broader controls |
| Admin Reports | The Reports designer/runtime supports the core banded preview path plus trusted command-backed preview lifecycle events, but still needs Access-parity work for bounded saved-query previews, full report output/export, parameters, richer grouping and totals semantics, conditional formatting, subreports, and broader controls |
| Text / Multilingual | Text is stored as UTF-8 and supports all Unicode languages; default semantics remain ordinal, but opt-in BINARY, NOCASE, NOCASE_AI, and ICU:<locale> collation are implemented for SQL and collection indexes. Dedicated ordered SQL text index optimization remains planned |
| Concurrency | The physical WAL commit path is still serialized at the storage boundary. Initial multi-writer support is shipped, but observed gains still depend on conflict shape and whether shared auto-commit INSERT is left on the default serialized path |
| Storage | No page-level compression; the compression SDK sample stores compressed payloads as ordinary application-managed BLOB values |
| Storage | No at-rest encryption for database/WAL files; on-disk storage is plaintext only |
| Storage | Memory-mapped reads are opt-in and currently apply only to clean main-file pages; WAL-backed reads still rely on the WAL/cache path |
| Storage | By default, durable auto-commit single-row writes still pay a physical WAL flush per commit; opt-in UseDurableCommitBatchWindow(...) can trade some commit latency for higher throughput across contending in-process writers, but default behavior remains per-commit durable |
| Query | Phase-2 cost-based planning is in place: ANALYZE, sys.table_stats, sys.column_stats, public planner-stat diagnostics, histogram/heavy-hitter/prefix estimates, and bounded small-chain join reordering now feed join/access-path costing. Opt-in adaptive join re-optimization can react to stale-stat or parameter-sensitive join cardinality misses, while broader runtime actuals, EXPLAIN ANALYZE, and full mid-plan reordering remain future work |
| Query | Internal row-batch transport is now the default scan-heavy execution foundation across batch-capable scans, joins, aggregates, and result boundaries; remaining work is broader kernel specialization and optional SIMD-style tuning rather than missing core batch coverage |
Completed Milestones
Major features already implemented:
- Single-file database with 4 KB page-oriented storage
- B+tree-backed tables and secondary indexes
- Write-Ahead Log with crash recovery and auto-checkpoint
- Concurrent snapshot-isolated readers via WAL-based MVCC
- Full SQL pipeline: tokenizer, parser, query planner, operator tree
- JOINs (INNER, LEFT, RIGHT, CROSS), aggregates, GROUP BY, HAVING, CTEs
- Set operations:
UNION,INTERSECT,EXCEPT SELECT DISTINCTand DISTINCT aggregates- Scalar subqueries,
IN (SELECT ...), andEXISTS (SELECT ...), including correlated evaluation in filters, non-aggregate projections, andUPDATE/DELETEexpressions - Scalar
TEXT(expr)for filter-friendly text coercion - Composite (multi-column) indexes
- Ordered integer index range scans (
<,<=,>,>=,BETWEEN) in the fast lookup path ANALYZE, persistedsys.table_stats/sys.column_stats, and stale-aware column-stat refresh- Phase-2 cost-based query planning: statistics-guided access-path selection, join method choice, hash build-side choice, histogram/heavy-hitter/cardinality estimation, composite-prefix correlation modeling, and bounded small-chain inner-join reordering
- Public planner diagnostics through
EXPLAIN ESTIMATE FOR SELECT,WITH, compound queries, andsys.planner_*virtual catalogs - Opt-in adaptive join re-optimization behind
DatabaseOptions.AdaptiveQueryReoptimization - SQL statement and SELECT plan caching
- First-class
IDENTITY/AUTOINCREMENTsupport forINTEGER PRIMARY KEYcolumns - Persisted table
NextRowIdhigh-water mark with compatibility fallback for legacy metadata - Views and triggers (BEFORE/AFTER on INSERT/UPDATE/DELETE)
- Foreign key constraints: single-column, column-level
REFERENCESwith optionalON DELETE CASCADE - Older-database foreign-key retrofit migration across direct, HTTP, gRPC, CLI, and Admin
- ADO.NET provider (DbConnection, DbCommand, DbDataReader, DbTransaction)
- ADO.NET
GetSchema()metadata collections forMetaDataCollections,Tables,Columns,Indexes,Views, andForeignKeys - ADO.NET connection pooling with
ClearPool/ClearAllPools - In-memory database mode with explicit load-from-disk and save-to-disk APIs
- Shared/private in-memory ADO.NET connections with named shared-memory hosts
- Document Collection API (NoSQL) with typed Put/Get/Delete/Scan/Find
- Collection UTF-8 payload fast path with compatibility for legacy backing rows
- Collection secondary field indexes via
EnsureIndexAsync/FindByIndexAsync - Maintenance report,
REINDEX, andVACUUMflows across client, CLI, API, and Admin UI - Dedicated
CSharpDB.DaemongRPC host for remoteCSharpDB.Clientaccess - Remote host consolidation in
CSharpDB.Daemon, with REST/apiand gRPC sharing the same warm hosted database client - Opt-in API-key protection for REST
/api/*and daemon gRPC access - Storage tuning presets, bounded WAL read caching, memory-mapped main-file reads, and sliced background WAL auto-checkpointing
- SQL executor/read-path fast paths for compact projections, broader join/index coverage, grouped aggregates, and correlated subquery filters
- Batch-first SQL row-batch execution foundation with batch-aware scan/index/join roots, shared predicate/projection kernels, and batch-native generic aggregate paths
- Interactive CLI with meta-commands and file execution
- REST API with 34 endpoints and OpenAPI/Scalar documentation
- Blazor Server admin dashboard
- Integrated Admin Forms and Reports designers with runtime preview/entry, database-backed metadata persistence, and print-ready report output
- Trusted C# callbacks, commands, Admin automation hooks, and local Admin Forms C# code modules
- B+tree delete rebalancing with underflow handling (borrow/merge + interior collapse path)
- Reusable snapshot reader sessions for higher concurrent-read throughput
- Comprehensive benchmark suite (micro, macro, stress, scaling, in-memory, shared-memory)
- Binary direct-payload collection storage with direct hydration and field/path extraction
- Collection path indexes: nested scalar, array-element, nested array-object, Guid, temporal, ordered text
- Collection path query APIs:
FindByPathAsyncandFindByPathRangeAsync - Source-generated typed collection fast path: generated collection models/codecs/field descriptors, generated binary direct payloads for supported shapes, trim-safe
GetGeneratedCollectionAsync<T>(...), generator diagnostics, NativeAOT trim-smoke validation, and a dedicated sample - Full-text search with tokenization, stemming, and relevance ranking
- Hybrid storage mode with lazy-resident durable storage and gRPC tunable file-cache
- Client-wide
BackupAsync/RestoreAsyncacross direct, HTTP, gRPC, CLI, and Admin - Native
.csdbtabletable archives with Admin Import / Export, read-only external table registration,sys.external_tables, external table scans/joins, and embedded archive primary-key lookup indexes ReplaceAsyncfor index stores- Package-driven ETL pipelines with validation, dry-run, execute/resume, persisted run history, and Admin visual designer support
See Also
- Architecture Guide — How the engine is structured
- Internals & Contributing — How to extend the engine
- Deployment & Installation Plan — Cross-platform distribution via dotnet tool, Docker, Homebrew, winget, and install scripts
- Multi-Writer Follow-Up Plan — Post-initial multi-writer roadmap, insert-path gaps, and release criteria for broader completion
- API-Level Sharding Plan — API/daemon-level routing across multiple database files for write-throughput scaling
- Query And Durable Write Performance Plan — Combined optimizer phase-2 plus durable-write completion plan, shipped state, and remaining benchmark/future-work boundaries
- Multilingual Text Support Plan — Build on existing Unicode text storage with case-insensitive matching, locale-aware sorting, and
COLLATEclause support for queries and index definitions - Database Encryption Plan — Encrypted storage format, key management, migration, and managed-surface rollout
- Storage Engine Guide — CSharpDB.Storage API reference: device, pager, B+tree, WAL, indexing, serialization, and catalog
- Native FFI Tutorials — Python and Node.js examples using the NativeAOT shared library
- User-Defined Functions Plan — Trusted C# functions, commands, form/report/pipeline hooks, and code modules
- Writable External Tables Plan — Opt-in writable external tables backed by mutable B+tree external files
- Pub/Sub Change Events Plan — Engine-level change events with channel-based delivery for real-time data subscriptions
- Benchmark Suite — Performance data informing optimization priorities
- Native Table Archives Blog — v3.8 table archive, external table, and Admin Import / Export overview