add DSL documentation

This commit is contained in:
Morten Olsen
2025-09-09 18:48:55 +02:00
parent d5b6a8269b
commit fa49d7bc63

470
docs/query-dsl.md Normal file
View File

@@ -0,0 +1,470 @@
# Query DSL Documentation
The Query DSL (Domain Specific Language) provides a powerful and flexible way to query documents in the system. It allows you to construct complex queries using a SQL-like syntax with support for filtering by URI, document type, and metadata fields.
## Table of Contents
- [Overview](#overview)
- [Syntax Overview](#syntax-overview)
- [Field Types](#field-types)
- [Operators](#operators)
- [Logical Operations](#logical-operations)
- [Data Types](#data-types)
- [Examples](#examples)
- [Error Handling](#error-handling)
- [API Reference](#api-reference)
## Overview
The Query DSL parses human-readable query strings and converts them into `DocumentSearchOptions` objects that can be used to search documents. It supports:
- **URI filtering**: Filter documents by their unique resource identifiers
- **Type filtering**: Filter documents by their type
- **Metadata filtering**: Filter documents by their metadata fields with type-aware operations
- **Logical operations**: Combine conditions using AND/OR logic
- **Parenthetical grouping**: Group conditions for complex boolean logic
- **Multiple data types**: String, number, boolean values with appropriate operators
## Syntax Overview
The basic syntax follows this pattern:
```
field operator value [logical_operator field operator value...]
```
### Basic Examples
```
uri = "doc-123"
type = "article"
meta.priority = 5
meta.title like "%search%"
```
### Complex Examples
```
uri in ["doc-1", "doc-2"] and meta.priority >= 5
(meta.published = true or meta.draft = false) and type = "article"
```
## Field Types
### 1. URI Fields
Filter documents by their URI (Unique Resource Identifier).
**Syntax**: `uri`
**Supported Operations**:
- Equality: `uri = "document-id"`
- Array membership: `uri in ["doc-1", "doc-2", "doc-3"]`
### 2. Type Fields
Filter documents by their type.
**Syntax**: `type`
**Supported Operations**:
- Equality: `type = "article"`
- Array membership: `type in ["article", "blog", "news"]`
### 3. Metadata Fields
Filter documents by their metadata fields. Metadata fields are accessed using dot notation.
**Syntax**: `meta.fieldName`
**Examples**:
- `meta.title`
- `meta.priority`
- `meta.published`
- `meta.created_at`
## Operators
### Comparison Operators
| Operator | Description | Example |
|----------|-------------|---------|
| `=` | Equals | `meta.priority = 5` |
| `!=` | Not equals | `meta.status != "archived"` |
| `>` | Greater than | `meta.score > 8.5` |
| `>=` | Greater than or equal | `meta.priority >= 5` |
| `<` | Less than | `meta.age < 30` |
| `<=` | Less than or equal | `meta.count <= 100` |
| `like` | Pattern matching (SQL-like) | `meta.title like "%test%"` |
| `not like` | Negative pattern matching | `meta.title not like "%draft%"` |
| `in` | Array membership | `uri in ["a", "b", "c"]` |
### Logical Operators
| Operator | Syntax | Description | Example |
|----------|---------|-------------|---------|
| AND | `and` or `&` | Logical AND | `meta.a = 1 and meta.b = 2` |
| OR | `or` or `\|` | Logical OR | `meta.urgent = true or meta.priority > 8` |
## Logical Operations
### AND Operations
Combine conditions where **both** must be true:
```
meta.published = true and meta.priority >= 5
meta.created >= 1234567890 & meta.updated < 1234567999
```
### OR Operations
Combine conditions where **either** can be true:
```
type = "article" or type = "blog"
meta.urgent = true | meta.priority > 9
```
### Parenthetical Grouping
Use parentheses to control the order of operations:
```
(meta.priority > 5 or meta.urgent = true) and type = "article"
((meta.a = 1 and meta.b = 2) or meta.c = 3) and uri = "test"
```
## Data Types
The DSL automatically detects and handles different data types based on the value syntax:
### Strings
Enclosed in double quotes (`"`) or single quotes (`'`):
```
meta.title = "Article Title"
meta.status = 'published'
```
**Escape Sequences**:
- `\"` or `\'` - Quote characters
- `\\` - Backslash
- `\n` - Newline
- `\t` - Tab
- `\r` - Carriage return
Example:
```
meta.content = "String with \"quotes\" and \n newlines"
```
### Numbers
Integer or floating-point numbers, including negative values:
```
meta.priority = 5
meta.score = 8.75
meta.balance = -150.50
```
### Booleans
Case-insensitive `true` or `false`:
```
meta.published = true
meta.archived = false
meta.draft = True
```
### Arrays
Square bracket notation for array values (used with `in` operator):
```
uri in ["doc-1", "doc-2", "doc-3"]
type in ["article", "blog", "news"]
```
Empty arrays are supported:
```
uri in []
```
## Examples
### Basic Filtering
**Filter by URI**:
```
uri = "my-document"
```
**Filter by multiple URIs**:
```
uri in ["doc-1", "doc-2", "doc-3"]
```
**Filter by document type**:
```
type = "article"
```
**Filter by metadata**:
```
meta.priority = 5
meta.title = "Important Document"
meta.published = true
```
### Comparison Operations
**Numeric comparisons**:
```
meta.priority > 5
meta.score >= 8.0
meta.count < 100
meta.rating <= 4.5
meta.value != 0
```
**Text operations**:
```
meta.title like "%report%"
meta.category != "archived"
meta.description not like "%draft%"
```
### Logical Combinations
**Simple AND**:
```
type = "article" and meta.published = true
```
**Simple OR**:
```
meta.urgent = true or meta.priority > 8
```
**Mixed field types**:
```
uri in ["doc-1", "doc-2"] and meta.priority >= 5
```
### Complex Queries
**Grouped conditions**:
```
(meta.priority > 5 or meta.urgent = true) and type = "article"
```
**Nested grouping**:
```
((meta.a = 1 and meta.b = 2) or meta.c = 3) and type = "test"
```
**Real-world example**:
```
uri in ["article-1", "article-2"] and (meta.created >= 1640995200 and meta.created < 1672531200 or (type = "urgent" and meta.status != "archived"))
```
### Date/Time Queries
Since dates are typically stored as Unix timestamps (numbers):
```
meta.created >= 1640995200
meta.updated > 1672531200 and meta.updated < 1675209600
```
### String Pattern Matching
**Contains pattern**:
```
meta.title like "%search term%"
```
**Starts with pattern**:
```
meta.filename like "report_%"
```
**Ends with pattern**:
```
meta.extension like "%.pdf"
```
**Exclude pattern**:
```
meta.title not like "%draft%"
```
## Error Handling
The DSL parser provides detailed error messages for various error conditions:
### Syntax Errors
**Unterminated string**:
```
meta.title = "unterminated string
// Error: Unterminated string at position X
```
**Missing parentheses**:
```
(meta.a = 1 and meta.b = 2
// Error: Expected RPAREN at position X
```
**Invalid characters**:
```
meta.field = test@invalid
// Error: Unexpected character '@' at position X
```
### Semantic Errors
**Unknown fields**:
```
unknown_field = "value"
// Error: Unknown field 'unknown_field' at position X
```
**Unsupported operators**:
```
uri like "pattern"
// Error: Unsupported operator 'like' for uri field
```
**Type mismatches**:
```
meta.numeric_field > "string_value"
// Error: Unsupported operator '>' for text field
```
### Validation
- Field names are validated (only `uri`, `type`, and `meta.*` are allowed)
- Operators are validated based on field type and data type
- Syntax is strictly enforced
## API Reference
### Function: `parseDSL(query: string): DocumentSearchOptions`
Parses a DSL query string into a `DocumentSearchOptions` object.
**Parameters**:
- `query` (string): The DSL query string to parse
**Returns**: `DocumentSearchOptions` object with the following structure:
```typescript
type DocumentSearchOptions = {
uris?: string[]; // Array of URIs to filter by
types?: string[]; // Array of document types to filter by
meta?: MetaCondition; // Metadata filtering conditions
limit?: number; // Result limit (not set by DSL)
offset?: number; // Result offset (not set by DSL)
};
```
**Throws**: Error with descriptive message if parsing fails
### Types
#### MetaCondition
```typescript
type MetaCondition =
| { type: 'and'; conditions: MetaCondition[] }
| { type: 'or'; conditions: MetaCondition[] }
| MetaFilter;
```
#### MetaFilter
```typescript
type MetaFilter =
| MetaNumberFilter
| MetaTextFilter
| MetaBoolFilter;
type MetaNumberFilter = {
type: 'number';
field: string;
filter: {
gt?: number; // Greater than
gte?: number; // Greater than or equal
lt?: number; // Less than
lte?: number; // Less than or equal
eq?: number; // Equal
neq?: number; // Not equal
nill?: boolean; // Is null/undefined
};
};
type MetaTextFilter = {
type: 'text';
field: string;
filter: {
eq?: string; // Equal
neq?: string; // Not equal
like?: string; // Pattern match
nlike?: string; // Negative pattern match
nill?: boolean; // Is null/undefined
};
};
type MetaBoolFilter = {
type: 'bool';
field: string;
filter: {
eq: boolean; // Equal (required)
nill?: boolean; // Is null/undefined
};
};
```
### Usage Example
```typescript
import { parseDSL } from './documents.dsl';
// Parse a query
const options = parseDSL('uri in ["doc-1", "doc-2"] and meta.priority >= 5');
// Result:
// {
// uris: ['doc-1', 'doc-2'],
// meta: {
// type: 'number',
// field: 'priority',
// filter: { gte: 5 }
// }
// }
// Use with document search
const results = await searchDocuments(options);
```
## Best Practices
1. **Use quotes for strings**: Always wrap string values in quotes to avoid parsing issues
2. **Group complex conditions**: Use parentheses to make complex boolean logic clear
3. **Choose appropriate operators**: Use `like` for pattern matching, `in` for multiple values
4. **Consider performance**: Simpler queries with fewer conditions perform better
5. **Handle errors gracefully**: Wrap DSL parsing in try-catch blocks in production code
## Limitations
1. **Field restrictions**: Only `uri`, `type`, and `meta.*` fields are supported
2. **Operator compatibility**: Not all operators work with all data types
3. **No nested metadata**: Metadata fields must be flat (no `meta.nested.field`)
4. **Case sensitivity**: Field names and operators are case-sensitive
5. **No functions**: No support for functions like `date()`, `count()`, etc.