Collation Guide

How text comparison and sorting work in CSharpDB today, when to use BINARY, NOCASE, NOCASE_AI, or ICU:<locale>, and how to apply collation in SQL and collection indexes.

Available now: CSharpDB supports opt-in BINARY, NOCASE, NOCASE_AI, and ICU:<locale> collation across SQL schema definitions, query expressions, metadata surfaces, ordered SQL text indexes, and collection path indexes.

Overview

CSharpDB already stores text as UTF-8, so data from any language can be stored and retrieved correctly. Collation is about something different: how text is compared, filtered, and sorted.

If you only need multilingual storage, you do not need to do anything special. If you need case-insensitive matching, case-insensitive ordering, or stable text lookup semantics across SQL and collection indexes, declare a collation explicitly.

What Collation Solves

Collation helps with text comparison behavior, not translation or UI localization.

Case-insensitive equality: make 'Alice', 'alice', and 'ALICE' compare as equal when you want that behavior.
Case-insensitive ordering: sort names or tags without uppercase values grouping separately from lowercase values.
Consistent index semantics: make indexed lookups and unique enforcement follow the same text rules as query execution.
Collection path behavior: let collection indexes on text fields or array elements use the same matching semantics as SQL.

Collation does not provide translation, localized UI text, or language-specific full-text tokenization.

Current Support

Surface	Support
Text storage	UTF-8 / full Unicode storage is supported by default
Default text behavior	Ordinal / byte-order semantics unless a collation is declared
SQL DDL	`TEXT COLLATE ...` in table definitions and `COLLATE` in index definitions
SQL queries	Query-level `COLLATE` overrides for comparisons and ordering
SQL indexes	Collation-aware equality and uniqueness behavior for text indexes
Collections API	Optional collation on `EnsureIndexAsync` for text path indexes, including array-element paths
Metadata	Visible through system catalogs, client models, REST, gRPC, Admin, and ADO.NET `GetSchema()`

Supported Collations

Name	Use When	Behavior
`BINARY`	You want exact ordinal behavior	Default byte/code-point comparison
`NOCASE`	You want case-insensitive matching without locale-specific rules	Invariant uppercase normalization for indexed and runtime compare paths
`NOCASE_AI`	You want case-insensitive and accent-insensitive behavior	Accent-stripped, invariant-case normalization for equality, ordering, and index semantics
`ICU:<locale>`	You need locale-aware linguistic sorting rules	Built-in locale-aware comparison and ordering using .NET globalization sort keys for the specified culture

SQL Usage

You can apply collation in three places: at the column definition, on an index expression, or directly inside a query.

Column-Level Default

Use this when a text column should normally behave case-insensitively.

CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    name TEXT COLLATE NOCASE,
    email TEXT COLLATE NOCASE
);

With this schema, comparisons and ordering against name and email use NOCASE unless a query overrides them.

Index-Level Collation

Use this when only a specific index needs case-insensitive behavior.

CREATE INDEX idx_users_email_nocase
ON users (email COLLATE NOCASE);

This is useful when you want fast case-insensitive lookups but do not want to change the default semantics of every query that touches the column.

Query-Level Override

Use this when you want an explicit override in a single query.

SELECT *
FROM users
WHERE name = 'alice' COLLATE NOCASE;

SELECT *
FROM users
ORDER BY name COLLATE NOCASE;

Query-level collation is the right choice when the table defaults to BINARY but one report, search, or sort needs case-insensitive behavior.

Locale-Aware Example

Use ICU:<locale> when sorting or matching should follow a specific culture's linguistic rules.

CREATE TABLE products (
    id INTEGER PRIMARY KEY,
    name TEXT COLLATE ICU:sv-SE
);

SELECT *
FROM products
ORDER BY name COLLATE ICU:sv-SE;

Collection Usage

The collection API supports the same idea for text path indexes. Declare the collation when you create the index; lookups and range scans then use that index's rules.

Case-Insensitive Text Index

var users = await db.GetCollectionAsync<User>("users", ct);

await users.EnsureIndexAsync(x => x.Email, "NOCASE", ct);

await foreach (var match in users.FindByPathAsync("Email", "ALPHA@EXAMPLE.COM", ct))
{
    Console.WriteLine(match.Value.Email);
}

Case-Insensitive Array Element Index

var users = await db.GetCollectionAsync<UserWithTags>("users", ct);

await users.EnsureIndexAsync("$.tags[]", "NOCASE", ct);

await foreach (var match in users.FindByPathAsync("$.tags[]", "GREEN", ct))
{
    Console.WriteLine(match.Key);
}

If you create a collection index with one collation and later request the same path with a different collation, CSharpDB treats that as a mismatch. Recreate the index instead of expecting the engine to silently reinterpret existing index data.

Choosing a Collation

If you need...	Use
Exact case-sensitive matching or exact ordinal ordering	`BINARY`
Case-insensitive matching for names, emails, usernames, tags, or search fields	`NOCASE`
Accent-insensitive behavior	`NOCASE_AI`
Locale-aware linguistic ordering for a specific language or region	`ICU:<locale>`

Performance

Collation is opt-in. If you do not use it, the default ordinal path keeps its current behavior and performance profile.

Area	Default / `BINARY`	`NOCASE` / `NOCASE_AI`	`ICU:<locale>`
Reads without collation	No extra overhead	Not applicable	Not applicable
Indexed equality lookup	Ordinal key match	Normalized key match	Culture sort-key match
INSERT / UPDATE on collated index	Raw text path	Extra normalization work on write	Extra sort-key generation work on write
Collection text index	Ordinal text key	Normalized text key	Normalized culture sort-key text

The benchmark suite includes focused ordered-text collation coverage for equality lookups, range scans, top-N ordering, and indexed writes.

Current Limitations

Default behavior is still ordinal. If you never declare collation, text comparisons and ordering stay case-sensitive and byte-order based.
ICU behavior follows runtime globalization data. If you move a database across environments or upgrade globalization data, rebuild ICU-backed indexes with REINDEX if you need deterministic persisted ordering.
Index-size growth is workload-dependent. Collated keys, especially ICU sort keys, can be materially larger than raw ordinal text keys, and the public docs do not yet publish a stable size-growth table.
Changing collation means rebuilding index state. Existing index bytes were written with a specific interpretation and should not be silently reused under different rules.

FAQ

Does this give me multilingual support?

Partly. CSharpDB already stores multilingual text because it uses UTF-8. Collation adds better multilingual comparison and sorting behavior, especially for case-insensitive scenarios. It does not provide translation, localized UI strings, or language-aware full-text tokenization.

If I do not use `COLLATE`, is there a performance penalty?

No meaningful one is expected. Collation is opt-in, and the default ordinal path stays in place when you do not declare it.

Can I use locale-aware sorting now?

Yes. Use ICU:<locale> such as ICU:sv-SE or ICU:tr-TR when you want sorting and equality to follow a specific culture's rules.

Will `NOCASE` change the stored text?

No. Your original text is still stored and returned as written. Collation only affects how values are compared, sorted, and indexed.

Should I use column-level or query-level collation?

Use column-level collation when a field should usually behave case-insensitively. Use query-level collation when most queries should stay ordinal but one report or lookup needs an override.

Collation Guide

Overview

What Collation Solves

Current Support

Supported Collations

SQL Usage

Column-Level Default

Index-Level Collation

Query-Level Override

Locale-Aware Example

Collection Usage

Case-Insensitive Text Index

Case-Insensitive Array Element Index

Choosing a Collation

Performance

Current Limitations

FAQ

Does this give me multilingual support?

If I do not use COLLATE, is there a performance penalty?

Can I use locale-aware sorting now?

Will NOCASE change the stored text?

Should I use column-level or query-level collation?

See Also

If I do not use `COLLATE`, is there a performance penalty?

Will `NOCASE` change the stored text?