Source reference. This page preserves the original long-form markdown content that previously lived at docs/roadmap.md. For the shorter curated page, see Roadmap.

CSharpDB Roadmap

This document outlines the planned direction for CSharpDB, organized by timeframe and priority. Items are roughly ordered by expected impact within each tier, and statuses are intended to reflect the current v3.8.0 state of the repo.


Near-Term

Recently completed improvements to query performance, storage/runtime behavior, maintenance workflows, and developer ergonomics.

Feature Description Status
DISTINCT keyword Deduplicate rows in SELECT output Done
Composite indexes Multi-column indexes for covering more query patterns Done
Index range scans Use indexes for <, >, <=, >=, BETWEEN — not just equality Done
Prepared statement cache Cache parsed ASTs and query plans to avoid re-parsing identical SQL Done
Cached max rowid Avoid repeated O(n) scans when generating row IDs on insert (in-memory + persisted high-water mark) Done
B+tree delete rebalancing Merge underflowed pages on delete to reclaim space Done
In-memory database mode Open a database fully in memory, load a disk database into memory, and save a committed snapshot back to disk Done
Shared in-memory ADO.NET mode Support Data Source=:memory: and named shared in-memory databases with explicit save/load Done
Collection field indexes Equality-based secondary indexes for Collection<T> via EnsureIndexAsync / FindByIndexAsync Done
Reader session reuse Reuse snapshot pager and query planner inside ReaderSession for burst concurrent reads Done
Architecture enforcement CSharpDB.Client is now the main caller-facing interaction layer across local and remote scenarios; ADO.NET now routes ordinary direct and daemon-backed access through that layer, with only named shared in-memory provider state still retaining an internal engine dependency Done
Database administration Maintenance report, reindex (database/table/index/collection), VACUUM/compact, fragmentation analysis, database size report Done
Dedicated gRPC daemon CSharpDB.Daemon host plus CSharpDB.Client gRPC coverage for SQL, schema, procedures, collections, and maintenance Done
Storage tuning presets UseLookupOptimizedPreset() and UseWriteOptimizedPreset() for file-backed workloads Done
Memory-mapped main-file reads Opt-in mapped clean-page reads plus copy-on-write materialization for mutable access on file-backed databases Done
Background WAL checkpointing Incremental/sliced auto-checkpointing to move work off the triggering commit Done
SQL executor/read-path fast paths Compact scan and indexed-range projections, broader join lookup/covered paths, grouped/composite index aggregates, correlated subquery filter fast paths, and lower row materialization overhead Done
Table/index statistics ANALYZE command with persisted row counts, column NDV/min/max, stale tracking, and initial stats-guided index selection in the query planner Done
Collection binary payloads Binary direct-payload format with faster hydration, direct field/path extraction, and richer path-based indexing Done
Collection path indexes Nested scalar, array-element, nested array-object, Guid, temporal, and ordered text path indexes with FindByPathAsync / FindByPathRangeAsync Done
Hybrid storage mode Lazy-resident durable storage with gRPC tunable file-cache configuration; Admin direct local hosting keeps a warm in-process database instance and uses hybrid incremental-durable options by default Done
Client backup/restore BackupAsync / RestoreAsync as first-class ICSharpDbClient operations across direct, HTTP, gRPC, CLI, and Admin Done
Native table archives and external tables Native .csdbtable table snapshots with fast Admin Import / Export, download or server-path destinations, CREATE EXTERNAL TABLE / DROP EXTERNAL TABLE, sys.external_tables, read-only external table scans/joins, and embedded primary-key archive lookup indexes for eligible point reads Done
Older DB foreign-key retrofit migration Validate/apply maintenance workflow that rewrites existing child tables with persisted FK metadata across direct, HTTP, gRPC, CLI, and Admin Done

Mid-Term

SQL feature parity, provider/tooling compatibility, and ecosystem expansion.

Feature Description Status
User-defined functions and commands Done for the trusted in-process model: host-registered C# scalar functions, common SQL/Admin built-ins, trusted commands, Admin Forms/Reports/pipeline hooks, declarative Admin Forms action sequences, and local Admin Forms C# code modules are implemented across the supported surfaces. Untrusted sandboxed UDF execution is intentionally out of scope Done
Writable external tables Planned opt-in writable external table registrations over mutable .csdbx files, backed by CSharpDB B+tree storage and limited to DML (INSERT, UPDATE, DELETE) in v1 while .csdbtable archives remain read-only Planned
Subqueries Scalar subqueries, IN (SELECT ...), EXISTS (SELECT ...), including correlated evaluation in WHERE, non-aggregate projection, and UPDATE/DELETE expressions Done
UNION / INTERSECT / EXCEPT Set operations across SELECT results, including use in top-level queries, views, and CTE query bodies Done
Window functions ROW_NUMBER(), RANK(), DENSE_RANK(), LEAD(), LAG() Planned
DEFAULT column values Allow default expressions in column definitions Planned
CHECK constraints Arbitrary expression-based constraints per column or per table Planned
Foreign key constraints v1 support for single-column, column-level REFERENCES with optional ON DELETE CASCADE, plus sys.foreign_keys and metadata/tooling surfaces Done
Remote host consolidation CSharpDB.Daemon now hosts the existing REST/HTTP /api surface and gRPC from one long-running process backed by the same warm daemon-hosted client; standalone CSharpDB.Api remains supported for REST-only hosting Done
Remote API-key protection Opt-in API-key mode protects REST /api/* and daemon gRPC calls with constant-time key comparison while keeping default no-auth behavior for compatibility Done
Remote host security hardening Add authorization, protected admin endpoint scopes, JWT/RBAC options, and TLS/mTLS deployment helpers for remote HTTP and gRPC access Planned
Daemon service packaging Package the existing CSharpDB.Daemon host as a persistent background service across systemd, Windows Service, and launchd Done
Cross-platform deployment Self-contained daemon archives and install scripts ship for Windows, Linux, and macOS; dotnet tool, Docker, Homebrew, and winget distribution remain future work In Progress
NuGet package Publish and maintain CSharpDB.Engine, CSharpDB.Data, CSharpDB.Client, and CSharpDB.Primitives as the primary NuGet packages Done
Connection pooling Pool underlying direct embedded sessions behind CSharpDbConnection to amortize open/close cost Done
Admin dashboard improvements Richer SQL editor UX, query history, deeper diagnostics, and integrated Forms/Reports tooling beyond the core schema/procedure/storage surface Done
Admin Forms Access parity Close the highest-impact Access-style form gaps: runtime responsive layouts, full inferred validation enforcement, richer record-source/filter/sort models, Layout View, form modes, broader action/event coverage, and broader control coverage; trusted command-backed form lifecycle events, command buttons, and selected control events are now started Partial
Admin Reports Access parity Close the highest-impact Access-style report gaps: bounded saved-query previews, full report rendering/export, parameter/filter prompts, richer grouping/totals options, Layout View, conditional formatting, subreports, and broader report controls; trusted command-backed report preview lifecycle events are now started Partial
Visual query designer Classic Admin query builder with source canvas, join editing, design grid, SQL preview, and saved designer layouts Done
ETL pipelines Built-in package-driven pipeline runtime with validation, dry-run, execute/resume flows, API/CLI/client coverage, run history, and Admin visual designer support Done
VS Code extension Schema explorer, SQL editor with IntelliSense, data browser, table designer, storage diagnostics Done
ADO.NET GetSchema collections Implement DbConnection.GetSchema() for standard metadata collections (MetaDataCollections, Tables, Columns, Indexes, Views, ForeignKeys) to support ORMs and tooling that discover schema through ADO.NET Done
Multilingual text support BINARY, NOCASE, NOCASE_AI, and built-in ICU:<locale> collation now work across SQL schema/query semantics, metadata surfaces, and collection path indexes; dedicated ordered SQL text index optimization remains planned Done

Long-Term

Advanced features and fundamental architecture enhancements.

Feature Description Status
Full-text search Inverted index with tokenization, stemming, and relevance ranking Done
JSON path querying Query into JSON document fields in the Collection API (e.g., $.address.city) via FindByPathAsync / FindByPathRangeAsync Done
Advanced collection storage path Binary direct-payload format with direct binary hydration, path-based field extraction, and richer expression/path indexes Done
SQL batched row transport Internal row-batch transport serves as the batch-first SQL execution foundation across batch-capable result boundaries, scans, joins, and generic aggregates Done
External table index coverage Follow writable .csdbx storage with broader external-table indexes, planner costing, and multi-column lookup/range support beyond the current archive primary-key point-lookup path Planned
Source-generated collection fast path Done for the current phase: opt-in generated collection models now provide GetGeneratedCollectionAsync<T>(...), generated field descriptors/index bindings, analyzer-packaged collection model/codecs, generated binary direct-payload encode/decode for supported document graphs, source-generated JSON fallback for unsupported shapes, trim/NativeAOT smoke coverage, and a dedicated sample Done
Generated collection package ergonomics Streamline NuGet/analyzer packaging, templates, onboarding docs, and generated-collection setup so consumers can adopt the opt-in path with less project wiring Planned
Broader generated collection model coverage Expand generator support beyond the current scalar, scalar collection, nested scalar, and nested collection-scalar shapes; unsupported shapes currently warn and fall back to source-generated JSON instead of binary direct payloads Planned
Page-level compression Deep engine/page compression remains planned; application-level payload compression is available as a sample/SDK pattern without changing the storage format Planned
At-rest encryption Encrypt database and WAL files with passphrase-based key management and explicit plaintext/encrypted migration/export paths; implementation must meet the database-encryption plan entry criteria before shipping Research
Advanced cost-based query optimizer Done for the current phase: ANALYZE-driven stats-guided costing now uses internal equi-depth histograms, heavy hitters, composite-index prefix distinct-count summaries, skew-aware lookup/filter estimates, correlation-aware composite equality filters/joins, non-unique lookup costing, hash build-side choice, and bounded DP reordering for small inner-join chains Done
Adaptive query re-optimization Done for the current phase: opt-in adaptive join execution can switch eligible index nested-loop joins to hash joins and can flip inner hash build sides at safe pre-emission boundaries when observed rows diverge materially from estimates. Broader EXPLAIN ANALYZE, runtime actual-row reporting, adaptive stats persistence, and arbitrary mid-plan reordering remain future work Done
Public planner histogram inspection Stable SQL-first diagnostics now expose sys.planner_histograms, sys.planner_heavy_hitters, sys.planner_index_prefix_stats, and EXPLAIN ESTIMATE FOR <query> while keeping raw histogram/prefix storage payloads internal Done
Async I/O batching Done for the current phase: WAL frame-chunk writes, chunked checkpoint page copies, shared snapshot/export batching, reusable B-tree copy utilities, and the close-out audit now cover the main storage and maintenance write paths; future work is limited to specialized diagnostics or maintenance-path tuning Done
Low-latency durable writes Done in v2.9.0: advisory planner-stat persistence can stay deferred without weakening committed-row durability, and sys.table_stats.row_count_is_exact now makes exact versus estimated row-count semantics explicit to planner and COUNT(*) fast paths Done
Group commit / deferred WAL flush Done in v2.9.0: opt-in UseDurableCommitBatchWindow(...) batches durable WAL flushes across contending in-process transactions and remains an expert measure-first knob rather than default behavior Done
Initial multi-writer support Explicit WriteTransaction conflict-detected retry flow, shared auto-commit non-insert isolation, and opt-in ConcurrentWriteTransactions for shared implicit inserts Done
Broader multi-writer insert optimization Opt-in ConcurrentWriteTransactions now reserves shared row-id ranges and rebases hot right-edge insert pages against pending WAL images, improving concurrent one-row auto-ID and explicit-ID insert fan-in while keeping serialized inserts as the default Done
API-level sharding Route API/daemon requests across multiple warm CSharpDB database files so independent tenants or shard keys can use separate WAL and commit paths, with v1 focused on single-shard writes and point reads Research
Replication / change feed Retained commit-log change feeds and reactive query subscriptions for read replicas, live Admin views, and event-driven applications Research

Current Limitations

These are known simplifications in the current implementation:

Area Limitation
Functions and automation CSharpDB's UDF/command model is trusted and in-process by design. Current supported surfaces include host-registered scalar functions, common built-ins, trusted commands, form/report/pipeline hooks, declarative action sequences, and local Admin Forms C# modules; untrusted sandboxed execution is intentionally out of scope
Query Scalar/IN/EXISTS subqueries are supported, including correlated cases in WHERE, non-aggregate projection, and UPDATE/DELETE expressions; correlated subqueries are not yet supported in JOIN ON, GROUP BY, HAVING, ORDER BY, or aggregate projections
Query UNION, INTERSECT, and EXCEPT are supported; UNION ALL is not implemented yet
Query No window functions
Schema No SQL DEFAULT column values or CHECK constraints yet. Foreign keys are currently v1 only: single-column, column-level REFERENCES with optional ON DELETE CASCADE; table-level/composite/deferred foreign keys and ON UPDATE actions are not implemented
Indexes Equality lookups support current INTEGER/TEXT indexes, but ordered range-scan pushdown is still limited to single-column INTEGER index paths
RowId Legacy table schemas without persisted high-water metadata may pay a one-time key scan on first insert
Collections FindByIndexAsync supports declared field-equality lookups; FindByPathAsync and FindByPathRangeAsync support path-based queries on indexed paths; FindAsync remains a full scan for unindexed predicates. Generated collections require registered descriptors for existing collection indexes; unsupported generated model shapes warn and use the source-generated JSON fallback instead of binary direct payloads
External Tables Native .csdbtable archives can be registered and queried as read-only external tables. Writable external tables are planned as an opt-in .csdbx format; current archives remain read-only, and broader external indexes, range seeks, and deeper planner costing remain planned
Networking CSharpDB.Daemon now hosts both REST and gRPC from one process; named pipes remain reserved but are not implemented end to end today
Security Remote REST and daemon gRPC support opt-in API-key authentication, defaulting to None for backward compatibility. JWT, RBAC, mTLS helpers, TLS-specific configuration, and at-rest encryption are not implemented
Admin Forms The Forms designer/runtime supports the core generated-form and data-entry path plus trusted command-backed automation, including lifecycle events, command buttons, selected-control events, conditional UI rules, domain formula helpers, and declarative action sequences for current record, form navigation, filtering, SQL/procedure, and control-property workflows. It still needs Access-parity work for responsive runtime rendering, complete inferred validation, richer form modes, additional events, advanced filtering/sorting, report/query/import/export actions, macro loops/on-error/temp vars, and broader controls
Admin Reports The Reports designer/runtime supports the core banded preview path plus trusted command-backed preview lifecycle events, but still needs Access-parity work for bounded saved-query previews, full report output/export, parameters, richer grouping and totals semantics, conditional formatting, subreports, and broader controls
Text / Multilingual Text is stored as UTF-8 and supports all Unicode languages; default semantics remain ordinal, but opt-in BINARY, NOCASE, NOCASE_AI, and ICU:<locale> collation are implemented for SQL and collection indexes. Dedicated ordered SQL text index optimization remains planned
Concurrency The physical WAL commit path is still serialized at the storage boundary. Initial multi-writer support is shipped, but observed gains still depend on conflict shape and whether shared auto-commit INSERT is left on the default serialized path
Storage No page-level compression; the compression SDK sample stores compressed payloads as ordinary application-managed BLOB values
Storage No at-rest encryption for database/WAL files; on-disk storage is plaintext only
Storage Memory-mapped reads are opt-in and currently apply only to clean main-file pages; WAL-backed reads still rely on the WAL/cache path
Storage By default, durable auto-commit single-row writes still pay a physical WAL flush per commit; opt-in UseDurableCommitBatchWindow(...) can trade some commit latency for higher throughput across contending in-process writers, but default behavior remains per-commit durable
Query Phase-2 cost-based planning is in place: ANALYZE, sys.table_stats, sys.column_stats, public planner-stat diagnostics, histogram/heavy-hitter/prefix estimates, and bounded small-chain join reordering now feed join/access-path costing. Opt-in adaptive join re-optimization can react to stale-stat or parameter-sensitive join cardinality misses, while broader runtime actuals, EXPLAIN ANALYZE, and full mid-plan reordering remain future work
Query Internal row-batch transport is now the default scan-heavy execution foundation across batch-capable scans, joins, aggregates, and result boundaries; remaining work is broader kernel specialization and optional SIMD-style tuning rather than missing core batch coverage

Completed Milestones

Major features already implemented:

  • Single-file database with 4 KB page-oriented storage
  • B+tree-backed tables and secondary indexes
  • Write-Ahead Log with crash recovery and auto-checkpoint
  • Concurrent snapshot-isolated readers via WAL-based MVCC
  • Full SQL pipeline: tokenizer, parser, query planner, operator tree
  • JOINs (INNER, LEFT, RIGHT, CROSS), aggregates, GROUP BY, HAVING, CTEs
  • Set operations: UNION, INTERSECT, EXCEPT
  • SELECT DISTINCT and DISTINCT aggregates
  • Scalar subqueries, IN (SELECT ...), and EXISTS (SELECT ...), including correlated evaluation in filters, non-aggregate projections, and UPDATE/DELETE expressions
  • Scalar TEXT(expr) for filter-friendly text coercion
  • Composite (multi-column) indexes
  • Ordered integer index range scans (<, <=, >, >=, BETWEEN) in the fast lookup path
  • ANALYZE, persisted sys.table_stats / sys.column_stats, and stale-aware column-stat refresh
  • Phase-2 cost-based query planning: statistics-guided access-path selection, join method choice, hash build-side choice, histogram/heavy-hitter/cardinality estimation, composite-prefix correlation modeling, and bounded small-chain inner-join reordering
  • Public planner diagnostics through EXPLAIN ESTIMATE FOR SELECT, WITH, compound queries, and sys.planner_* virtual catalogs
  • Opt-in adaptive join re-optimization behind DatabaseOptions.AdaptiveQueryReoptimization
  • SQL statement and SELECT plan caching
  • First-class IDENTITY / AUTOINCREMENT support for INTEGER PRIMARY KEY columns
  • Persisted table NextRowId high-water mark with compatibility fallback for legacy metadata
  • Views and triggers (BEFORE/AFTER on INSERT/UPDATE/DELETE)
  • Foreign key constraints: single-column, column-level REFERENCES with optional ON DELETE CASCADE
  • Older-database foreign-key retrofit migration across direct, HTTP, gRPC, CLI, and Admin
  • ADO.NET provider (DbConnection, DbCommand, DbDataReader, DbTransaction)
  • ADO.NET GetSchema() metadata collections for MetaDataCollections, Tables, Columns, Indexes, Views, and ForeignKeys
  • ADO.NET connection pooling with ClearPool / ClearAllPools
  • In-memory database mode with explicit load-from-disk and save-to-disk APIs
  • Shared/private in-memory ADO.NET connections with named shared-memory hosts
  • Document Collection API (NoSQL) with typed Put/Get/Delete/Scan/Find
  • Collection UTF-8 payload fast path with compatibility for legacy backing rows
  • Collection secondary field indexes via EnsureIndexAsync / FindByIndexAsync
  • Maintenance report, REINDEX, and VACUUM flows across client, CLI, API, and Admin UI
  • Dedicated CSharpDB.Daemon gRPC host for remote CSharpDB.Client access
  • Remote host consolidation in CSharpDB.Daemon, with REST /api and gRPC sharing the same warm hosted database client
  • Opt-in API-key protection for REST /api/* and daemon gRPC access
  • Storage tuning presets, bounded WAL read caching, memory-mapped main-file reads, and sliced background WAL auto-checkpointing
  • SQL executor/read-path fast paths for compact projections, broader join/index coverage, grouped aggregates, and correlated subquery filters
  • Batch-first SQL row-batch execution foundation with batch-aware scan/index/join roots, shared predicate/projection kernels, and batch-native generic aggregate paths
  • Interactive CLI with meta-commands and file execution
  • REST API with 34 endpoints and OpenAPI/Scalar documentation
  • Blazor Server admin dashboard
  • Integrated Admin Forms and Reports designers with runtime preview/entry, database-backed metadata persistence, and print-ready report output
  • Trusted C# callbacks, commands, Admin automation hooks, and local Admin Forms C# code modules
  • B+tree delete rebalancing with underflow handling (borrow/merge + interior collapse path)
  • Reusable snapshot reader sessions for higher concurrent-read throughput
  • Comprehensive benchmark suite (micro, macro, stress, scaling, in-memory, shared-memory)
  • Binary direct-payload collection storage with direct hydration and field/path extraction
  • Collection path indexes: nested scalar, array-element, nested array-object, Guid, temporal, ordered text
  • Collection path query APIs: FindByPathAsync and FindByPathRangeAsync
  • Source-generated typed collection fast path: generated collection models/codecs/field descriptors, generated binary direct payloads for supported shapes, trim-safe GetGeneratedCollectionAsync<T>(...), generator diagnostics, NativeAOT trim-smoke validation, and a dedicated sample
  • Full-text search with tokenization, stemming, and relevance ranking
  • Hybrid storage mode with lazy-resident durable storage and gRPC tunable file-cache
  • Client-wide BackupAsync / RestoreAsync across direct, HTTP, gRPC, CLI, and Admin
  • Native .csdbtable table archives with Admin Import / Export, read-only external table registration, sys.external_tables, external table scans/joins, and embedded archive primary-key lookup indexes
  • ReplaceAsync for index stores
  • Package-driven ETL pipelines with validation, dry-run, execute/resume, persisted run history, and Admin visual designer support

See Also

  • Architecture Guide — How the engine is structured
  • Internals & Contributing — How to extend the engine
  • Deployment & Installation Plan — Cross-platform distribution via dotnet tool, Docker, Homebrew, winget, and install scripts
  • Multi-Writer Follow-Up Plan — Post-initial multi-writer roadmap, insert-path gaps, and release criteria for broader completion
  • API-Level Sharding Plan — API/daemon-level routing across multiple database files for write-throughput scaling
  • Query And Durable Write Performance Plan — Combined optimizer phase-2 plus durable-write completion plan, shipped state, and remaining benchmark/future-work boundaries
  • Multilingual Text Support Plan — Build on existing Unicode text storage with case-insensitive matching, locale-aware sorting, and COLLATE clause support for queries and index definitions
  • Database Encryption Plan — Encrypted storage format, key management, migration, and managed-surface rollout
  • Storage Engine Guide — CSharpDB.Storage API reference: device, pager, B+tree, WAL, indexing, serialization, and catalog
  • Native FFI Tutorials — Python and Node.js examples using the NativeAOT shared library
  • User-Defined Functions Plan — Trusted C# functions, commands, form/report/pipeline hooks, and code modules
  • Writable External Tables Plan — Opt-in writable external tables backed by mutable B+tree external files
  • Pub/Sub Change Events Plan — Engine-level change events with channel-based delivery for real-time data subscriptions
  • Benchmark Suite — Performance data informing optimization priorities
  • Native Table Archives Blog — v3.8 table archive, external table, and Admin Import / Export overview