# Query DSL Documentation The Query DSL (Domain Specific Language) provides a powerful and flexible way to query documents in the system. It allows you to construct complex queries using a SQL-like syntax with support for filtering by URI, document type, and metadata fields. ## Table of Contents - [Overview](#overview) - [Syntax Overview](#syntax-overview) - [Field Types](#field-types) - [Operators](#operators) - [Logical Operations](#logical-operations) - [Data Types](#data-types) - [Examples](#examples) - [Error Handling](#error-handling) - [API Reference](#api-reference) ## Overview The Query DSL parses human-readable query strings and converts them into `DocumentSearchOptions` objects that can be used to search documents. It supports: - **URI filtering**: Filter documents by their unique resource identifiers - **Type filtering**: Filter documents by their type - **Metadata filtering**: Filter documents by their metadata fields with type-aware operations - **Logical operations**: Combine conditions using AND/OR logic - **Parenthetical grouping**: Group conditions for complex boolean logic - **Multiple data types**: String, number, boolean values with appropriate operators ## Syntax Overview The basic syntax follows this pattern: ``` field operator value [logical_operator field operator value...] ``` ### Basic Examples ``` uri = "doc-123" type = "article" meta.priority = 5 meta.title like "%search%" ``` ### Complex Examples ``` uri in ["doc-1", "doc-2"] and meta.priority >= 5 (meta.published = true or meta.draft = false) and type = "article" ``` ## Field Types ### 1. URI Fields Filter documents by their URI (Unique Resource Identifier). **Syntax**: `uri` **Supported Operations**: - Equality: `uri = "document-id"` - Array membership: `uri in ["doc-1", "doc-2", "doc-3"]` ### 2. Type Fields Filter documents by their type. **Syntax**: `type` **Supported Operations**: - Equality: `type = "article"` - Array membership: `type in ["article", "blog", "news"]` ### 3. Metadata Fields Filter documents by their metadata fields. Metadata fields are accessed using dot notation. **Syntax**: `meta.fieldName` **Examples**: - `meta.title` - `meta.priority` - `meta.published` - `meta.created_at` ## Operators ### Comparison Operators | Operator | Description | Example | |----------|-------------|---------| | `=` | Equals | `meta.priority = 5` | | `!=` | Not equals | `meta.status != "archived"` | | `>` | Greater than | `meta.score > 8.5` | | `>=` | Greater than or equal | `meta.priority >= 5` | | `<` | Less than | `meta.age < 30` | | `<=` | Less than or equal | `meta.count <= 100` | | `like` | Pattern matching (SQL-like) | `meta.title like "%test%"` | | `not like` | Negative pattern matching | `meta.title not like "%draft%"` | | `in` | Array membership | `uri in ["a", "b", "c"]` | ### Logical Operators | Operator | Syntax | Description | Example | |----------|---------|-------------|---------| | AND | `and` or `&` | Logical AND | `meta.a = 1 and meta.b = 2` | | OR | `or` or `\|` | Logical OR | `meta.urgent = true or meta.priority > 8` | ## Logical Operations ### AND Operations Combine conditions where **both** must be true: ``` meta.published = true and meta.priority >= 5 meta.created >= 1234567890 & meta.updated < 1234567999 ``` ### OR Operations Combine conditions where **either** can be true: ``` type = "article" or type = "blog" meta.urgent = true | meta.priority > 9 ``` ### Parenthetical Grouping Use parentheses to control the order of operations: ``` (meta.priority > 5 or meta.urgent = true) and type = "article" ((meta.a = 1 and meta.b = 2) or meta.c = 3) and uri = "test" ``` ## Data Types The DSL automatically detects and handles different data types based on the value syntax: ### Strings Enclosed in double quotes (`"`) or single quotes (`'`): ``` meta.title = "Article Title" meta.status = 'published' ``` **Escape Sequences**: - `\"` or `\'` - Quote characters - `\\` - Backslash - `\n` - Newline - `\t` - Tab - `\r` - Carriage return Example: ``` meta.content = "String with \"quotes\" and \n newlines" ``` ### Numbers Integer or floating-point numbers, including negative values: ``` meta.priority = 5 meta.score = 8.75 meta.balance = -150.50 ``` ### Booleans Case-insensitive `true` or `false`: ``` meta.published = true meta.archived = false meta.draft = True ``` ### Arrays Square bracket notation for array values (used with `in` operator): ``` uri in ["doc-1", "doc-2", "doc-3"] type in ["article", "blog", "news"] ``` Empty arrays are supported: ``` uri in [] ``` ## Examples ### Basic Filtering **Filter by URI**: ``` uri = "my-document" ``` **Filter by multiple URIs**: ``` uri in ["doc-1", "doc-2", "doc-3"] ``` **Filter by document type**: ``` type = "article" ``` **Filter by metadata**: ``` meta.priority = 5 meta.title = "Important Document" meta.published = true ``` ### Comparison Operations **Numeric comparisons**: ``` meta.priority > 5 meta.score >= 8.0 meta.count < 100 meta.rating <= 4.5 meta.value != 0 ``` **Text operations**: ``` meta.title like "%report%" meta.category != "archived" meta.description not like "%draft%" ``` ### Logical Combinations **Simple AND**: ``` type = "article" and meta.published = true ``` **Simple OR**: ``` meta.urgent = true or meta.priority > 8 ``` **Mixed field types**: ``` uri in ["doc-1", "doc-2"] and meta.priority >= 5 ``` ### Complex Queries **Grouped conditions**: ``` (meta.priority > 5 or meta.urgent = true) and type = "article" ``` **Nested grouping**: ``` ((meta.a = 1 and meta.b = 2) or meta.c = 3) and type = "test" ``` **Real-world example**: ``` uri in ["article-1", "article-2"] and (meta.created >= 1640995200 and meta.created < 1672531200 or (type = "urgent" and meta.status != "archived")) ``` ### Date/Time Queries Since dates are typically stored as Unix timestamps (numbers): ``` meta.created >= 1640995200 meta.updated > 1672531200 and meta.updated < 1675209600 ``` ### String Pattern Matching **Contains pattern**: ``` meta.title like "%search term%" ``` **Starts with pattern**: ``` meta.filename like "report_%" ``` **Ends with pattern**: ``` meta.extension like "%.pdf" ``` **Exclude pattern**: ``` meta.title not like "%draft%" ``` ## Error Handling The DSL parser provides detailed error messages for various error conditions: ### Syntax Errors **Unterminated string**: ``` meta.title = "unterminated string // Error: Unterminated string at position X ``` **Missing parentheses**: ``` (meta.a = 1 and meta.b = 2 // Error: Expected RPAREN at position X ``` **Invalid characters**: ``` meta.field = test@invalid // Error: Unexpected character '@' at position X ``` ### Semantic Errors **Unknown fields**: ``` unknown_field = "value" // Error: Unknown field 'unknown_field' at position X ``` **Unsupported operators**: ``` uri like "pattern" // Error: Unsupported operator 'like' for uri field ``` **Type mismatches**: ``` meta.numeric_field > "string_value" // Error: Unsupported operator '>' for text field ``` ### Validation - Field names are validated (only `uri`, `type`, and `meta.*` are allowed) - Operators are validated based on field type and data type - Syntax is strictly enforced ## API Reference ### Function: `parseDSL(query: string): DocumentSearchOptions` Parses a DSL query string into a `DocumentSearchOptions` object. **Parameters**: - `query` (string): The DSL query string to parse **Returns**: `DocumentSearchOptions` object with the following structure: ```typescript type DocumentSearchOptions = { uris?: string[]; // Array of URIs to filter by types?: string[]; // Array of document types to filter by meta?: MetaCondition; // Metadata filtering conditions limit?: number; // Result limit (not set by DSL) offset?: number; // Result offset (not set by DSL) }; ``` **Throws**: Error with descriptive message if parsing fails ### Types #### MetaCondition ```typescript type MetaCondition = | { type: 'and'; conditions: MetaCondition[] } | { type: 'or'; conditions: MetaCondition[] } | MetaFilter; ``` #### MetaFilter ```typescript type MetaFilter = | MetaNumberFilter | MetaTextFilter | MetaBoolFilter; type MetaNumberFilter = { type: 'number'; field: string; filter: { gt?: number; // Greater than gte?: number; // Greater than or equal lt?: number; // Less than lte?: number; // Less than or equal eq?: number; // Equal neq?: number; // Not equal nill?: boolean; // Is null/undefined }; }; type MetaTextFilter = { type: 'text'; field: string; filter: { eq?: string; // Equal neq?: string; // Not equal like?: string; // Pattern match nlike?: string; // Negative pattern match nill?: boolean; // Is null/undefined }; }; type MetaBoolFilter = { type: 'bool'; field: string; filter: { eq: boolean; // Equal (required) nill?: boolean; // Is null/undefined }; }; ``` ### Usage Example ```typescript import { parseDSL } from './documents.dsl'; // Parse a query const options = parseDSL('uri in ["doc-1", "doc-2"] and meta.priority >= 5'); // Result: // { // uris: ['doc-1', 'doc-2'], // meta: { // type: 'number', // field: 'priority', // filter: { gte: 5 } // } // } // Use with document search const results = await searchDocuments(options); ``` ## Best Practices 1. **Use quotes for strings**: Always wrap string values in quotes to avoid parsing issues 2. **Group complex conditions**: Use parentheses to make complex boolean logic clear 3. **Choose appropriate operators**: Use `like` for pattern matching, `in` for multiple values 4. **Consider performance**: Simpler queries with fewer conditions perform better 5. **Handle errors gracefully**: Wrap DSL parsing in try-catch blocks in production code ## Limitations 1. **Field restrictions**: Only `uri`, `type`, and `meta.*` fields are supported 2. **Operator compatibility**: Not all operators work with all data types 3. **No nested metadata**: Metadata fields must be flat (no `meta.nested.field`) 4. **Case sensitivity**: Field names and operators are case-sensitive 5. **No functions**: No support for functions like `date()`, `count()`, etc.