Files
stash/packages/query-dsl/docs/query-language.md
Morten Olsen f9494c88e2 update
2025-12-10 09:11:03 +01:00

337 lines
8.2 KiB
Markdown

# Query Language Specification
This document describes the SQL-like query language syntax for building database queries. The language supports filtering on both text and numeric fields, including nested JSON fields, with logical operators for complex queries.
## Overview
The query language provides a human-readable, SQL-like syntax that can be parsed into the internal JSON query format used by the system. It supports:
- Text field conditions (equality, pattern matching, membership)
- Numeric field conditions (comparison operators, membership)
- Nested JSON field access using dot notation
- Logical operators (AND, OR) with grouping
- NULL value checks
## Syntax
### Field References
Fields are referenced using dot notation for nested JSON paths:
```
field_name
metadata.foo
metadata.nested.deep.field
```
**Examples:**
- `content` - top-level field
- `metadata.author` - nested field in metadata object
- `metadata.tags.0` - array element (if needed)
### Text Conditions
Text conditions operate on string values:
| Operator | Syntax | Description |
|----------|--------|-------------|
| Equality | `field = 'value'` | Exact match |
| Inequality | `field != 'value'` | Not equal |
| NULL check | `field IS NULL` | Field is null |
| NOT NULL | `field IS NOT NULL` | Field is not null |
| Pattern match | `field LIKE 'pattern'` | SQL LIKE pattern matching |
| Not like | `field NOT LIKE 'pattern'` | Negated pattern matching |
| In list | `field IN ('val1', 'val2', 'val3')` | Value in list |
| Not in list | `field NOT IN ('val1', 'val2')` | Value not in list |
**String Literals:**
- Single quotes: `'value'`
- Escaped quotes: `'O''Brien'` (double single quote)
- Empty string: `''`
**LIKE Patterns:**
- `%` matches any sequence of characters
- `_` matches any single character
- Examples: `'%cat%'`, `'test_%'`, `'exact'`
**Examples:**
```sql
content = 'hello world'
metadata.foo = 'bar'
type != 'draft'
source IS NULL
title LIKE '%cat%'
author NOT LIKE '%admin%'
status IN ('published', 'archived')
category NOT IN ('deleted', 'hidden')
```
### Numeric Conditions
Numeric conditions operate on number values:
| Operator | Syntax | Description |
|----------|--------|-------------|
| Equality | `field = 123` | Exact match |
| Inequality | `field != 123` | Not equal |
| NULL check | `field IS NULL` | Field is null |
| NOT NULL | `field IS NOT NULL` | Field is not null |
| Greater than | `field > 10` | Greater than |
| Greater or equal | `field >= 10` | Greater than or equal |
| Less than | `field < 10` | Less than |
| Less or equal | `field <= 10` | Less than or equal |
| In list | `field IN (1, 2, 3)` | Value in list |
| Not in list | `field NOT IN (1, 2, 3)` | Value not in list |
**Numeric Literals:**
- Integers: `123`, `-45`, `0`
- Decimals: `123.45`, `-0.5`, `3.14159`
- Scientific notation: `1e10`, `2.5e-3` (if supported)
**Examples:**
```sql
typeVersion = 1
score > 0.5
views >= 100
priority < 5
age <= 65
rating IN (1, 2, 3, 4, 5)
count NOT IN (0, -1)
```
### Logical Operators
Combine conditions using `AND` and `OR` operators:
| Operator | Syntax | Description |
|----------|--------|-------------|
| AND | `condition1 AND condition2` | Both conditions must be true |
| OR | `condition1 OR condition2` | At least one condition must be true |
**Grouping:**
Use parentheses `()` to group conditions and control operator precedence:
```sql
(condition1 AND condition2) OR condition3
condition1 AND (condition2 OR condition3)
```
**Examples:**
```sql
type = 'article' AND status = 'published'
metadata.foo = 'bar' OR metadata.foo = 'baz'
(type = 'post' OR type = 'page') AND views > 100
```
### Operator Precedence
1. Parentheses `()` - highest precedence
2. `AND` - evaluated before OR
3. `OR` - lowest precedence
**Examples:**
```sql
-- Equivalent to: (A AND B) OR C
A AND B OR C
-- Equivalent to: A AND (B OR C)
A AND (B OR C)
-- Explicit grouping
(A OR B) AND (C OR D)
```
## Complete Examples
### Simple Conditions
```sql
-- Text equality
metadata.author = 'John Doe'
-- Numeric comparison
views >= 1000
-- Pattern matching
title LIKE '%tutorial%'
-- NULL check
source IS NULL
```
### Multiple Conditions
```sql
-- AND operator
type = 'article' AND status = 'published' AND views > 100
-- OR operator
category = 'tech' OR category = 'science'
-- Mixed operators
(type = 'post' OR type = 'page') AND published = true
```
### Complex Nested Queries
```sql
-- Nested AND within OR
(metadata.foo = 'bar' AND type = 'demo') OR metadata.foo = 'baz'
-- Multiple levels of nesting
((status = 'active' AND views > 100) OR (status = 'featured' AND views > 50)) AND category = 'news'
-- Complex query with multiple field types
type = 'article' AND (metadata.author = 'John' OR metadata.author = 'Jane') AND views >= 100 AND rating IN (4, 5)
```
### Array/List Operations
```sql
-- Text IN
status IN ('published', 'archived', 'draft')
-- Numeric IN
priority IN (1, 2, 3)
-- NOT IN
category NOT IN ('deleted', 'hidden')
```
## Type Inference
The parser will infer the condition type (text vs number) based on:
1. **Operator context**: Operators like `>`, `<`, `>=`, `<=` imply numeric
2. **Value type**:
- Quoted strings (`'value'`) → text condition
- Unquoted numbers (`123`, `45.6`) → numeric condition
- `NULL` → can be either (context-dependent)
3. **Field name**: If a field is known to be numeric, numeric operators are used
**Examples:**
```sql
-- Text condition (quoted string)
author = 'John'
-- Numeric condition (unquoted number)
age = 30
-- Numeric comparison
score > 0.5
-- Text pattern
title LIKE '%test%'
```
## Escaping and Special Characters
### String Escaping
- Single quotes in strings: `'O''Brien'``O'Brien`
- Empty string: `''`
### Field Name Escaping
If field names contain special characters or reserved words, they can be quoted (implementation-dependent):
```sql
-- Reserved words or special characters (if supported)
"order" = 'asc'
"metadata.field-name" = 'value'
```
## Error Handling
The parser should provide clear error messages for:
- Invalid syntax
- Mismatched parentheses
- Invalid operators for field types
- Missing values
- Invalid escape sequences
## Grammar (BNF-like)
```
query ::= expression
expression ::= condition | group
group ::= '(' expression ')'
| expression AND expression
| expression OR expression
condition ::= text_condition | numeric_condition
text_condition ::= field ( '=' | '!=' | 'LIKE' | 'NOT LIKE' ) string_literal
| field 'IS' ( 'NULL' | 'NOT NULL' )
| field 'IN' '(' string_list ')'
| field 'NOT IN' '(' string_list ')'
numeric_condition ::= field ( '=' | '!=' | '>' | '>=' | '<' | '<=' ) number
| field 'IS' ( 'NULL' | 'NOT NULL' )
| field 'IN' '(' number_list ')'
| field 'NOT IN' '(' number_list ')'
field ::= identifier ( '.' identifier )*
identifier ::= [a-zA-Z_][a-zA-Z0-9_]*
string_literal ::= "'" ( escaped_char | [^'] )* "'"
escaped_char ::= "''"
string_list ::= string_literal ( ',' string_literal )*
number ::= [0-9]+ ( '.' [0-9]+ )? ( [eE] [+-]? [0-9]+ )?
number_list ::= number ( ',' number )*
```
## Migration from JSON Format
The SQL-like syntax maps to the JSON format as follows:
**JSON:**
```json
{
"type": "text",
"field": ["metadata", "foo"],
"conditions": {
"equal": "bar"
}
}
```
**SQL:**
```sql
metadata.foo = 'bar'
```
**JSON (with operator):**
```json
{
"type": "operator",
"operator": "and",
"conditions": [
{
"type": "text",
"field": ["metadata", "foo"],
"conditions": {
"equal": "bar"
}
},
{
"type": "text",
"field": ["type"],
"conditions": {
"equal": "demo"
}
}
]
}
```
**SQL:**
```sql
metadata.foo = 'bar' AND type = 'demo'
```
## Implementation Notes
1. **Whitespace**: Whitespace is generally ignored except within string literals
2. **Case sensitivity**:
- Operators (`AND`, `OR`, `LIKE`, etc.) are case-insensitive
- Field names and string values are case-sensitive
3. **Comments**: Not supported in initial version (can be added later)
4. **Table prefixes**: The parser may support optional table name prefixes (e.g., `documents.metadata.foo`) if needed