Collation Guide
How text comparison and sorting work in CSharpDB today, when to use BINARY, NOCASE, NOCASE_AI, or ICU:<locale>, and how to apply collation in SQL and collection indexes.
BINARY, NOCASE, NOCASE_AI, and ICU:<locale> collation across SQL schema definitions, query expressions, metadata surfaces, ordered SQL text indexes, and collection path indexes.
Overview
CSharpDB already stores text as UTF-8, so data from any language can be stored and retrieved correctly. Collation is about something different: how text is compared, filtered, and sorted.
If you only need multilingual storage, you do not need to do anything special. If you need case-insensitive matching, case-insensitive ordering, or stable text lookup semantics across SQL and collection indexes, declare a collation explicitly.
What Collation Solves
Collation helps with text comparison behavior, not translation or UI localization.
- Case-insensitive equality: make
'Alice','alice', and'ALICE'compare as equal when you want that behavior. - Case-insensitive ordering: sort names or tags without uppercase values grouping separately from lowercase values.
- Consistent index semantics: make indexed lookups and unique enforcement follow the same text rules as query execution.
- Collection path behavior: let collection indexes on text fields or array elements use the same matching semantics as SQL.
Collation does not provide translation, localized UI text, or language-specific full-text tokenization.
Current Support
| Surface | Support |
|---|---|
| Text storage | UTF-8 / full Unicode storage is supported by default |
| Default text behavior | Ordinal / byte-order semantics unless a collation is declared |
| SQL DDL | TEXT COLLATE ... in table definitions and COLLATE in index definitions |
| SQL queries | Query-level COLLATE overrides for comparisons and ordering |
| SQL indexes | Collation-aware equality and uniqueness behavior for text indexes |
| Collections API | Optional collation on EnsureIndexAsync for text path indexes, including array-element paths |
| Metadata | Visible through system catalogs, client models, REST, gRPC, Admin, and ADO.NET GetSchema() |
Supported Collations
| Name | Use When | Behavior |
|---|---|---|
BINARY |
You want exact ordinal behavior | Default byte/code-point comparison |
NOCASE |
You want case-insensitive matching without locale-specific rules | Invariant uppercase normalization for indexed and runtime compare paths |
NOCASE_AI |
You want case-insensitive and accent-insensitive behavior | Accent-stripped, invariant-case normalization for equality, ordering, and index semantics |
ICU:<locale> |
You need locale-aware linguistic sorting rules | Built-in locale-aware comparison and ordering using .NET globalization sort keys for the specified culture |
SQL Usage
You can apply collation in three places: at the column definition, on an index expression, or directly inside a query.
Column-Level Default
Use this when a text column should normally behave case-insensitively.
CREATE TABLE users (
id INTEGER PRIMARY KEY,
name TEXT COLLATE NOCASE,
email TEXT COLLATE NOCASE
);
With this schema, comparisons and ordering against name and email use NOCASE unless a query overrides them.
Index-Level Collation
Use this when only a specific index needs case-insensitive behavior.
CREATE INDEX idx_users_email_nocase
ON users (email COLLATE NOCASE);
This is useful when you want fast case-insensitive lookups but do not want to change the default semantics of every query that touches the column.
Query-Level Override
Use this when you want an explicit override in a single query.
SELECT *
FROM users
WHERE name = 'alice' COLLATE NOCASE;
SELECT *
FROM users
ORDER BY name COLLATE NOCASE;
Query-level collation is the right choice when the table defaults to BINARY but one report, search, or sort needs case-insensitive behavior.
Locale-Aware Example
Use ICU:<locale> when sorting or matching should follow a specific culture's linguistic rules.
CREATE TABLE products (
id INTEGER PRIMARY KEY,
name TEXT COLLATE ICU:sv-SE
);
SELECT *
FROM products
ORDER BY name COLLATE ICU:sv-SE;
Collection Usage
The collection API supports the same idea for text path indexes. Declare the collation when you create the index; lookups and range scans then use that index's rules.
Case-Insensitive Text Index
var users = await db.GetCollectionAsync<User>("users", ct);
await users.EnsureIndexAsync(x => x.Email, "NOCASE", ct);
await foreach (var match in users.FindByPathAsync("Email", "ALPHA@EXAMPLE.COM", ct))
{
Console.WriteLine(match.Value.Email);
}
Case-Insensitive Array Element Index
var users = await db.GetCollectionAsync<UserWithTags>("users", ct);
await users.EnsureIndexAsync("$.tags[]", "NOCASE", ct);
await foreach (var match in users.FindByPathAsync("$.tags[]", "GREEN", ct))
{
Console.WriteLine(match.Key);
}
If you create a collection index with one collation and later request the same path with a different collation, CSharpDB treats that as a mismatch. Recreate the index instead of expecting the engine to silently reinterpret existing index data.
Choosing a Collation
| If you need... | Use |
|---|---|
| Exact case-sensitive matching or exact ordinal ordering | BINARY |
| Case-insensitive matching for names, emails, usernames, tags, or search fields | NOCASE |
| Accent-insensitive behavior | NOCASE_AI |
| Locale-aware linguistic ordering for a specific language or region | ICU:<locale> |
Performance
Collation is opt-in. If you do not use it, the default ordinal path keeps its current behavior and performance profile.
| Area | Default / BINARY |
NOCASE / NOCASE_AI |
ICU:<locale> |
|---|---|---|---|
| Reads without collation | No extra overhead | Not applicable | Not applicable |
| Indexed equality lookup | Ordinal key match | Normalized key match | Culture sort-key match |
| INSERT / UPDATE on collated index | Raw text path | Extra normalization work on write | Extra sort-key generation work on write |
| Collection text index | Ordinal text key | Normalized text key | Normalized culture sort-key text |
The benchmark suite includes focused ordered-text collation coverage for equality lookups, range scans, top-N ordering, and indexed writes.
Current Limitations
- Default behavior is still ordinal. If you never declare collation, text comparisons and ordering stay case-sensitive and byte-order based.
- ICU behavior follows runtime globalization data. If you move a database across environments or upgrade globalization data, rebuild ICU-backed indexes with
REINDEXif you need deterministic persisted ordering. - Index-size growth is workload-dependent. Collated keys, especially ICU sort keys, can be materially larger than raw ordinal text keys, and the public docs do not yet publish a stable size-growth table.
- Changing collation means rebuilding index state. Existing index bytes were written with a specific interpretation and should not be silently reused under different rules.
FAQ
Does this give me multilingual support?
Partly. CSharpDB already stores multilingual text because it uses UTF-8. Collation adds better multilingual comparison and sorting behavior, especially for case-insensitive scenarios. It does not provide translation, localized UI strings, or language-aware full-text tokenization.
If I do not use COLLATE, is there a performance penalty?
No meaningful one is expected. Collation is opt-in, and the default ordinal path stays in place when you do not declare it.
Can I use locale-aware sorting now?
Yes. Use ICU:<locale> such as ICU:sv-SE or ICU:tr-TR when you want sorting and equality to follow a specific culture's rules.
Will NOCASE change the stored text?
No. Your original text is still stored and returned as written. Collation only affects how values are compared, sorted, and indexed.
Should I use column-level or query-level collation?
Use column-level collation when a field should usually behave case-insensitively. Use query-level collation when most queries should stay ordinal but one report or lookup needs an override.