diff --git a/docs/query-dsl.md b/docs/query-dsl.md new file mode 100644 index 0000000..91fee6a --- /dev/null +++ b/docs/query-dsl.md @@ -0,0 +1,470 @@ +# Query DSL Documentation + +The Query DSL (Domain Specific Language) provides a powerful and flexible way to query documents in the system. It allows you to construct complex queries using a SQL-like syntax with support for filtering by URI, document type, and metadata fields. + +## Table of Contents + +- [Overview](#overview) +- [Syntax Overview](#syntax-overview) +- [Field Types](#field-types) +- [Operators](#operators) +- [Logical Operations](#logical-operations) +- [Data Types](#data-types) +- [Examples](#examples) +- [Error Handling](#error-handling) +- [API Reference](#api-reference) + +## Overview + +The Query DSL parses human-readable query strings and converts them into `DocumentSearchOptions` objects that can be used to search documents. It supports: + +- **URI filtering**: Filter documents by their unique resource identifiers +- **Type filtering**: Filter documents by their type +- **Metadata filtering**: Filter documents by their metadata fields with type-aware operations +- **Logical operations**: Combine conditions using AND/OR logic +- **Parenthetical grouping**: Group conditions for complex boolean logic +- **Multiple data types**: String, number, boolean values with appropriate operators + +## Syntax Overview + +The basic syntax follows this pattern: + +``` +field operator value [logical_operator field operator value...] +``` + +### Basic Examples + +``` +uri = "doc-123" +type = "article" +meta.priority = 5 +meta.title like "%search%" +``` + +### Complex Examples + +``` +uri in ["doc-1", "doc-2"] and meta.priority >= 5 +(meta.published = true or meta.draft = false) and type = "article" +``` + +## Field Types + +### 1. URI Fields + +Filter documents by their URI (Unique Resource Identifier). + +**Syntax**: `uri` + +**Supported Operations**: +- Equality: `uri = "document-id"` +- Array membership: `uri in ["doc-1", "doc-2", "doc-3"]` + +### 2. Type Fields + +Filter documents by their type. + +**Syntax**: `type` + +**Supported Operations**: +- Equality: `type = "article"` +- Array membership: `type in ["article", "blog", "news"]` + +### 3. Metadata Fields + +Filter documents by their metadata fields. Metadata fields are accessed using dot notation. + +**Syntax**: `meta.fieldName` + +**Examples**: +- `meta.title` +- `meta.priority` +- `meta.published` +- `meta.created_at` + +## Operators + +### Comparison Operators + +| Operator | Description | Example | +|----------|-------------|---------| +| `=` | Equals | `meta.priority = 5` | +| `!=` | Not equals | `meta.status != "archived"` | +| `>` | Greater than | `meta.score > 8.5` | +| `>=` | Greater than or equal | `meta.priority >= 5` | +| `<` | Less than | `meta.age < 30` | +| `<=` | Less than or equal | `meta.count <= 100` | +| `like` | Pattern matching (SQL-like) | `meta.title like "%test%"` | +| `not like` | Negative pattern matching | `meta.title not like "%draft%"` | +| `in` | Array membership | `uri in ["a", "b", "c"]` | + +### Logical Operators + +| Operator | Syntax | Description | Example | +|----------|---------|-------------|---------| +| AND | `and` or `&` | Logical AND | `meta.a = 1 and meta.b = 2` | +| OR | `or` or `\|` | Logical OR | `meta.urgent = true or meta.priority > 8` | + +## Logical Operations + +### AND Operations + +Combine conditions where **both** must be true: + +``` +meta.published = true and meta.priority >= 5 +meta.created >= 1234567890 & meta.updated < 1234567999 +``` + +### OR Operations + +Combine conditions where **either** can be true: + +``` +type = "article" or type = "blog" +meta.urgent = true | meta.priority > 9 +``` + +### Parenthetical Grouping + +Use parentheses to control the order of operations: + +``` +(meta.priority > 5 or meta.urgent = true) and type = "article" +((meta.a = 1 and meta.b = 2) or meta.c = 3) and uri = "test" +``` + +## Data Types + +The DSL automatically detects and handles different data types based on the value syntax: + +### Strings + +Enclosed in double quotes (`"`) or single quotes (`'`): + +``` +meta.title = "Article Title" +meta.status = 'published' +``` + +**Escape Sequences**: +- `\"` or `\'` - Quote characters +- `\\` - Backslash +- `\n` - Newline +- `\t` - Tab +- `\r` - Carriage return + +Example: +``` +meta.content = "String with \"quotes\" and \n newlines" +``` + +### Numbers + +Integer or floating-point numbers, including negative values: + +``` +meta.priority = 5 +meta.score = 8.75 +meta.balance = -150.50 +``` + +### Booleans + +Case-insensitive `true` or `false`: + +``` +meta.published = true +meta.archived = false +meta.draft = True +``` + +### Arrays + +Square bracket notation for array values (used with `in` operator): + +``` +uri in ["doc-1", "doc-2", "doc-3"] +type in ["article", "blog", "news"] +``` + +Empty arrays are supported: +``` +uri in [] +``` + +## Examples + +### Basic Filtering + +**Filter by URI**: +``` +uri = "my-document" +``` + +**Filter by multiple URIs**: +``` +uri in ["doc-1", "doc-2", "doc-3"] +``` + +**Filter by document type**: +``` +type = "article" +``` + +**Filter by metadata**: +``` +meta.priority = 5 +meta.title = "Important Document" +meta.published = true +``` + +### Comparison Operations + +**Numeric comparisons**: +``` +meta.priority > 5 +meta.score >= 8.0 +meta.count < 100 +meta.rating <= 4.5 +meta.value != 0 +``` + +**Text operations**: +``` +meta.title like "%report%" +meta.category != "archived" +meta.description not like "%draft%" +``` + +### Logical Combinations + +**Simple AND**: +``` +type = "article" and meta.published = true +``` + +**Simple OR**: +``` +meta.urgent = true or meta.priority > 8 +``` + +**Mixed field types**: +``` +uri in ["doc-1", "doc-2"] and meta.priority >= 5 +``` + +### Complex Queries + +**Grouped conditions**: +``` +(meta.priority > 5 or meta.urgent = true) and type = "article" +``` + +**Nested grouping**: +``` +((meta.a = 1 and meta.b = 2) or meta.c = 3) and type = "test" +``` + +**Real-world example**: +``` +uri in ["article-1", "article-2"] and (meta.created >= 1640995200 and meta.created < 1672531200 or (type = "urgent" and meta.status != "archived")) +``` + +### Date/Time Queries + +Since dates are typically stored as Unix timestamps (numbers): + +``` +meta.created >= 1640995200 +meta.updated > 1672531200 and meta.updated < 1675209600 +``` + +### String Pattern Matching + +**Contains pattern**: +``` +meta.title like "%search term%" +``` + +**Starts with pattern**: +``` +meta.filename like "report_%" +``` + +**Ends with pattern**: +``` +meta.extension like "%.pdf" +``` + +**Exclude pattern**: +``` +meta.title not like "%draft%" +``` + +## Error Handling + +The DSL parser provides detailed error messages for various error conditions: + +### Syntax Errors + +**Unterminated string**: +``` +meta.title = "unterminated string +// Error: Unterminated string at position X +``` + +**Missing parentheses**: +``` +(meta.a = 1 and meta.b = 2 +// Error: Expected RPAREN at position X +``` + +**Invalid characters**: +``` +meta.field = test@invalid +// Error: Unexpected character '@' at position X +``` + +### Semantic Errors + +**Unknown fields**: +``` +unknown_field = "value" +// Error: Unknown field 'unknown_field' at position X +``` + +**Unsupported operators**: +``` +uri like "pattern" +// Error: Unsupported operator 'like' for uri field +``` + +**Type mismatches**: +``` +meta.numeric_field > "string_value" +// Error: Unsupported operator '>' for text field +``` + +### Validation + +- Field names are validated (only `uri`, `type`, and `meta.*` are allowed) +- Operators are validated based on field type and data type +- Syntax is strictly enforced + +## API Reference + +### Function: `parseDSL(query: string): DocumentSearchOptions` + +Parses a DSL query string into a `DocumentSearchOptions` object. + +**Parameters**: +- `query` (string): The DSL query string to parse + +**Returns**: `DocumentSearchOptions` object with the following structure: + +```typescript +type DocumentSearchOptions = { + uris?: string[]; // Array of URIs to filter by + types?: string[]; // Array of document types to filter by + meta?: MetaCondition; // Metadata filtering conditions + limit?: number; // Result limit (not set by DSL) + offset?: number; // Result offset (not set by DSL) +}; +``` + +**Throws**: Error with descriptive message if parsing fails + +### Types + +#### MetaCondition + +```typescript +type MetaCondition = + | { type: 'and'; conditions: MetaCondition[] } + | { type: 'or'; conditions: MetaCondition[] } + | MetaFilter; +``` + +#### MetaFilter + +```typescript +type MetaFilter = + | MetaNumberFilter + | MetaTextFilter + | MetaBoolFilter; + +type MetaNumberFilter = { + type: 'number'; + field: string; + filter: { + gt?: number; // Greater than + gte?: number; // Greater than or equal + lt?: number; // Less than + lte?: number; // Less than or equal + eq?: number; // Equal + neq?: number; // Not equal + nill?: boolean; // Is null/undefined + }; +}; + +type MetaTextFilter = { + type: 'text'; + field: string; + filter: { + eq?: string; // Equal + neq?: string; // Not equal + like?: string; // Pattern match + nlike?: string; // Negative pattern match + nill?: boolean; // Is null/undefined + }; +}; + +type MetaBoolFilter = { + type: 'bool'; + field: string; + filter: { + eq: boolean; // Equal (required) + nill?: boolean; // Is null/undefined + }; +}; +``` + +### Usage Example + +```typescript +import { parseDSL } from './documents.dsl'; + +// Parse a query +const options = parseDSL('uri in ["doc-1", "doc-2"] and meta.priority >= 5'); + +// Result: +// { +// uris: ['doc-1', 'doc-2'], +// meta: { +// type: 'number', +// field: 'priority', +// filter: { gte: 5 } +// } +// } + +// Use with document search +const results = await searchDocuments(options); +``` + +## Best Practices + +1. **Use quotes for strings**: Always wrap string values in quotes to avoid parsing issues +2. **Group complex conditions**: Use parentheses to make complex boolean logic clear +3. **Choose appropriate operators**: Use `like` for pattern matching, `in` for multiple values +4. **Consider performance**: Simpler queries with fewer conditions perform better +5. **Handle errors gracefully**: Wrap DSL parsing in try-catch blocks in production code + +## Limitations + +1. **Field restrictions**: Only `uri`, `type`, and `meta.*` fields are supported +2. **Operator compatibility**: Not all operators work with all data types +3. **No nested metadata**: Metadata fields must be flat (no `meta.nested.field`) +4. **Case sensitivity**: Field names and operators are case-sensitive +5. **No functions**: No support for functions like `date()`, `count()`, etc.