# Query Language Specification This document describes the SQL-like query language syntax for building database queries. The language supports filtering on both text and numeric fields, including nested JSON fields, with logical operators for complex queries. ## Overview The query language provides a human-readable, SQL-like syntax that can be parsed into the internal JSON query format used by the system. It supports: - Text field conditions (equality, pattern matching, membership) - Numeric field conditions (comparison operators, membership) - Nested JSON field access using dot notation - Logical operators (AND, OR) with grouping - NULL value checks ## Syntax ### Field References Fields are referenced using dot notation for nested JSON paths: ``` field_name metadata.foo metadata.nested.deep.field ``` **Examples:** - `content` - top-level field - `metadata.author` - nested field in metadata object - `metadata.tags.0` - array element (if needed) ### Text Conditions Text conditions operate on string values: | Operator | Syntax | Description | |----------|--------|-------------| | Equality | `field = 'value'` | Exact match | | Inequality | `field != 'value'` | Not equal | | NULL check | `field IS NULL` | Field is null | | NOT NULL | `field IS NOT NULL` | Field is not null | | Pattern match | `field LIKE 'pattern'` | SQL LIKE pattern matching | | Not like | `field NOT LIKE 'pattern'` | Negated pattern matching | | In list | `field IN ('val1', 'val2', 'val3')` | Value in list | | Not in list | `field NOT IN ('val1', 'val2')` | Value not in list | **String Literals:** - Single quotes: `'value'` - Escaped quotes: `'O''Brien'` (double single quote) - Empty string: `''` **LIKE Patterns:** - `%` matches any sequence of characters - `_` matches any single character - Examples: `'%cat%'`, `'test_%'`, `'exact'` **Examples:** ```sql content = 'hello world' metadata.foo = 'bar' type != 'draft' source IS NULL title LIKE '%cat%' author NOT LIKE '%admin%' status IN ('published', 'archived') category NOT IN ('deleted', 'hidden') ``` ### Numeric Conditions Numeric conditions operate on number values: | Operator | Syntax | Description | |----------|--------|-------------| | Equality | `field = 123` | Exact match | | Inequality | `field != 123` | Not equal | | NULL check | `field IS NULL` | Field is null | | NOT NULL | `field IS NOT NULL` | Field is not null | | Greater than | `field > 10` | Greater than | | Greater or equal | `field >= 10` | Greater than or equal | | Less than | `field < 10` | Less than | | Less or equal | `field <= 10` | Less than or equal | | In list | `field IN (1, 2, 3)` | Value in list | | Not in list | `field NOT IN (1, 2, 3)` | Value not in list | **Numeric Literals:** - Integers: `123`, `-45`, `0` - Decimals: `123.45`, `-0.5`, `3.14159` - Scientific notation: `1e10`, `2.5e-3` (if supported) **Examples:** ```sql typeVersion = 1 score > 0.5 views >= 100 priority < 5 age <= 65 rating IN (1, 2, 3, 4, 5) count NOT IN (0, -1) ``` ### Logical Operators Combine conditions using `AND` and `OR` operators: | Operator | Syntax | Description | |----------|--------|-------------| | AND | `condition1 AND condition2` | Both conditions must be true | | OR | `condition1 OR condition2` | At least one condition must be true | **Grouping:** Use parentheses `()` to group conditions and control operator precedence: ```sql (condition1 AND condition2) OR condition3 condition1 AND (condition2 OR condition3) ``` **Examples:** ```sql type = 'article' AND status = 'published' metadata.foo = 'bar' OR metadata.foo = 'baz' (type = 'post' OR type = 'page') AND views > 100 ``` ### Operator Precedence 1. Parentheses `()` - highest precedence 2. `AND` - evaluated before OR 3. `OR` - lowest precedence **Examples:** ```sql -- Equivalent to: (A AND B) OR C A AND B OR C -- Equivalent to: A AND (B OR C) A AND (B OR C) -- Explicit grouping (A OR B) AND (C OR D) ``` ## Complete Examples ### Simple Conditions ```sql -- Text equality metadata.author = 'John Doe' -- Numeric comparison views >= 1000 -- Pattern matching title LIKE '%tutorial%' -- NULL check source IS NULL ``` ### Multiple Conditions ```sql -- AND operator type = 'article' AND status = 'published' AND views > 100 -- OR operator category = 'tech' OR category = 'science' -- Mixed operators (type = 'post' OR type = 'page') AND published = true ``` ### Complex Nested Queries ```sql -- Nested AND within OR (metadata.foo = 'bar' AND type = 'demo') OR metadata.foo = 'baz' -- Multiple levels of nesting ((status = 'active' AND views > 100) OR (status = 'featured' AND views > 50)) AND category = 'news' -- Complex query with multiple field types type = 'article' AND (metadata.author = 'John' OR metadata.author = 'Jane') AND views >= 100 AND rating IN (4, 5) ``` ### Array/List Operations ```sql -- Text IN status IN ('published', 'archived', 'draft') -- Numeric IN priority IN (1, 2, 3) -- NOT IN category NOT IN ('deleted', 'hidden') ``` ## Type Inference The parser will infer the condition type (text vs number) based on: 1. **Operator context**: Operators like `>`, `<`, `>=`, `<=` imply numeric 2. **Value type**: - Quoted strings (`'value'`) → text condition - Unquoted numbers (`123`, `45.6`) → numeric condition - `NULL` → can be either (context-dependent) 3. **Field name**: If a field is known to be numeric, numeric operators are used **Examples:** ```sql -- Text condition (quoted string) author = 'John' -- Numeric condition (unquoted number) age = 30 -- Numeric comparison score > 0.5 -- Text pattern title LIKE '%test%' ``` ## Escaping and Special Characters ### String Escaping - Single quotes in strings: `'O''Brien'` → `O'Brien` - Empty string: `''` ### Field Name Escaping If field names contain special characters or reserved words, they can be quoted (implementation-dependent): ```sql -- Reserved words or special characters (if supported) "order" = 'asc' "metadata.field-name" = 'value' ``` ## Error Handling The parser should provide clear error messages for: - Invalid syntax - Mismatched parentheses - Invalid operators for field types - Missing values - Invalid escape sequences ## Grammar (BNF-like) ``` query ::= expression expression ::= condition | group group ::= '(' expression ')' | expression AND expression | expression OR expression condition ::= text_condition | numeric_condition text_condition ::= field ( '=' | '!=' | 'LIKE' | 'NOT LIKE' ) string_literal | field 'IS' ( 'NULL' | 'NOT NULL' ) | field 'IN' '(' string_list ')' | field 'NOT IN' '(' string_list ')' numeric_condition ::= field ( '=' | '!=' | '>' | '>=' | '<' | '<=' ) number | field 'IS' ( 'NULL' | 'NOT NULL' ) | field 'IN' '(' number_list ')' | field 'NOT IN' '(' number_list ')' field ::= identifier ( '.' identifier )* identifier ::= [a-zA-Z_][a-zA-Z0-9_]* string_literal ::= "'" ( escaped_char | [^'] )* "'" escaped_char ::= "''" string_list ::= string_literal ( ',' string_literal )* number ::= [0-9]+ ( '.' [0-9]+ )? ( [eE] [+-]? [0-9]+ )? number_list ::= number ( ',' number )* ``` ## Migration from JSON Format The SQL-like syntax maps to the JSON format as follows: **JSON:** ```json { "type": "text", "field": ["metadata", "foo"], "conditions": { "equal": "bar" } } ``` **SQL:** ```sql metadata.foo = 'bar' ``` **JSON (with operator):** ```json { "type": "operator", "operator": "and", "conditions": [ { "type": "text", "field": ["metadata", "foo"], "conditions": { "equal": "bar" } }, { "type": "text", "field": ["type"], "conditions": { "equal": "demo" } } ] } ``` **SQL:** ```sql metadata.foo = 'bar' AND type = 'demo' ``` ## Implementation Notes 1. **Whitespace**: Whitespace is generally ignored except within string literals 2. **Case sensitivity**: - Operators (`AND`, `OR`, `LIKE`, etc.) are case-insensitive - Field names and string values are case-sensitive 3. **Comments**: Not supported in initial version (can be added later) 4. **Table prefixes**: The parser may support optional table name prefixes (e.g., `documents.metadata.foo`) if needed