8.2 KiB
Query Language Specification
This document describes the SQL-like query language syntax for building database queries. The language supports filtering on both text and numeric fields, including nested JSON fields, with logical operators for complex queries.
Overview
The query language provides a human-readable, SQL-like syntax that can be parsed into the internal JSON query format used by the system. It supports:
- Text field conditions (equality, pattern matching, membership)
- Numeric field conditions (comparison operators, membership)
- Nested JSON field access using dot notation
- Logical operators (AND, OR) with grouping
- NULL value checks
Syntax
Field References
Fields are referenced using dot notation for nested JSON paths:
field_name
metadata.foo
metadata.nested.deep.field
Examples:
content- top-level fieldmetadata.author- nested field in metadata objectmetadata.tags.0- array element (if needed)
Text Conditions
Text conditions operate on string values:
| Operator | Syntax | Description |
|---|---|---|
| Equality | field = 'value' |
Exact match |
| Inequality | field != 'value' |
Not equal |
| NULL check | field IS NULL |
Field is null |
| NOT NULL | field IS NOT NULL |
Field is not null |
| Pattern match | field LIKE 'pattern' |
SQL LIKE pattern matching |
| Not like | field NOT LIKE 'pattern' |
Negated pattern matching |
| In list | field IN ('val1', 'val2', 'val3') |
Value in list |
| Not in list | field NOT IN ('val1', 'val2') |
Value not in list |
String Literals:
- Single quotes:
'value' - Escaped quotes:
'O''Brien'(double single quote) - Empty string:
''
LIKE Patterns:
%matches any sequence of characters_matches any single character- Examples:
'%cat%','test_%','exact'
Examples:
content = 'hello world'
metadata.foo = 'bar'
type != 'draft'
source IS NULL
title LIKE '%cat%'
author NOT LIKE '%admin%'
status IN ('published', 'archived')
category NOT IN ('deleted', 'hidden')
Numeric Conditions
Numeric conditions operate on number values:
| Operator | Syntax | Description |
|---|---|---|
| Equality | field = 123 |
Exact match |
| Inequality | field != 123 |
Not equal |
| NULL check | field IS NULL |
Field is null |
| NOT NULL | field IS NOT NULL |
Field is not null |
| Greater than | field > 10 |
Greater than |
| Greater or equal | field >= 10 |
Greater than or equal |
| Less than | field < 10 |
Less than |
| Less or equal | field <= 10 |
Less than or equal |
| In list | field IN (1, 2, 3) |
Value in list |
| Not in list | field NOT IN (1, 2, 3) |
Value not in list |
Numeric Literals:
- Integers:
123,-45,0 - Decimals:
123.45,-0.5,3.14159 - Scientific notation:
1e10,2.5e-3(if supported)
Examples:
typeVersion = 1
score > 0.5
views >= 100
priority < 5
age <= 65
rating IN (1, 2, 3, 4, 5)
count NOT IN (0, -1)
Logical Operators
Combine conditions using AND and OR operators:
| Operator | Syntax | Description |
|---|---|---|
| AND | condition1 AND condition2 |
Both conditions must be true |
| OR | condition1 OR condition2 |
At least one condition must be true |
Grouping:
Use parentheses () to group conditions and control operator precedence:
(condition1 AND condition2) OR condition3
condition1 AND (condition2 OR condition3)
Examples:
type = 'article' AND status = 'published'
metadata.foo = 'bar' OR metadata.foo = 'baz'
(type = 'post' OR type = 'page') AND views > 100
Operator Precedence
- Parentheses
()- highest precedence AND- evaluated before OROR- lowest precedence
Examples:
-- Equivalent to: (A AND B) OR C
A AND B OR C
-- Equivalent to: A AND (B OR C)
A AND (B OR C)
-- Explicit grouping
(A OR B) AND (C OR D)
Complete Examples
Simple Conditions
-- Text equality
metadata.author = 'John Doe'
-- Numeric comparison
views >= 1000
-- Pattern matching
title LIKE '%tutorial%'
-- NULL check
source IS NULL
Multiple Conditions
-- AND operator
type = 'article' AND status = 'published' AND views > 100
-- OR operator
category = 'tech' OR category = 'science'
-- Mixed operators
(type = 'post' OR type = 'page') AND published = true
Complex Nested Queries
-- Nested AND within OR
(metadata.foo = 'bar' AND type = 'demo') OR metadata.foo = 'baz'
-- Multiple levels of nesting
((status = 'active' AND views > 100) OR (status = 'featured' AND views > 50)) AND category = 'news'
-- Complex query with multiple field types
type = 'article' AND (metadata.author = 'John' OR metadata.author = 'Jane') AND views >= 100 AND rating IN (4, 5)
Array/List Operations
-- Text IN
status IN ('published', 'archived', 'draft')
-- Numeric IN
priority IN (1, 2, 3)
-- NOT IN
category NOT IN ('deleted', 'hidden')
Type Inference
The parser will infer the condition type (text vs number) based on:
- Operator context: Operators like
>,<,>=,<=imply numeric - Value type:
- Quoted strings (
'value') → text condition - Unquoted numbers (
123,45.6) → numeric condition NULL→ can be either (context-dependent)
- Quoted strings (
- Field name: If a field is known to be numeric, numeric operators are used
Examples:
-- Text condition (quoted string)
author = 'John'
-- Numeric condition (unquoted number)
age = 30
-- Numeric comparison
score > 0.5
-- Text pattern
title LIKE '%test%'
Escaping and Special Characters
String Escaping
- Single quotes in strings:
'O''Brien'→O'Brien - Empty string:
''
Field Name Escaping
If field names contain special characters or reserved words, they can be quoted (implementation-dependent):
-- Reserved words or special characters (if supported)
"order" = 'asc'
"metadata.field-name" = 'value'
Error Handling
The parser should provide clear error messages for:
- Invalid syntax
- Mismatched parentheses
- Invalid operators for field types
- Missing values
- Invalid escape sequences
Grammar (BNF-like)
query ::= expression
expression ::= condition | group
group ::= '(' expression ')'
| expression AND expression
| expression OR expression
condition ::= text_condition | numeric_condition
text_condition ::= field ( '=' | '!=' | 'LIKE' | 'NOT LIKE' ) string_literal
| field 'IS' ( 'NULL' | 'NOT NULL' )
| field 'IN' '(' string_list ')'
| field 'NOT IN' '(' string_list ')'
numeric_condition ::= field ( '=' | '!=' | '>' | '>=' | '<' | '<=' ) number
| field 'IS' ( 'NULL' | 'NOT NULL' )
| field 'IN' '(' number_list ')'
| field 'NOT IN' '(' number_list ')'
field ::= identifier ( '.' identifier )*
identifier ::= [a-zA-Z_][a-zA-Z0-9_]*
string_literal ::= "'" ( escaped_char | [^'] )* "'"
escaped_char ::= "''"
string_list ::= string_literal ( ',' string_literal )*
number ::= [0-9]+ ( '.' [0-9]+ )? ( [eE] [+-]? [0-9]+ )?
number_list ::= number ( ',' number )*
Migration from JSON Format
The SQL-like syntax maps to the JSON format as follows:
JSON:
{
"type": "text",
"field": ["metadata", "foo"],
"conditions": {
"equal": "bar"
}
}
SQL:
metadata.foo = 'bar'
JSON (with operator):
{
"type": "operator",
"operator": "and",
"conditions": [
{
"type": "text",
"field": ["metadata", "foo"],
"conditions": {
"equal": "bar"
}
},
{
"type": "text",
"field": ["type"],
"conditions": {
"equal": "demo"
}
}
]
}
SQL:
metadata.foo = 'bar' AND type = 'demo'
Implementation Notes
- Whitespace: Whitespace is generally ignored except within string literals
- Case sensitivity:
- Operators (
AND,OR,LIKE, etc.) are case-insensitive - Field names and string values are case-sensitive
- Operators (
- Comments: Not supported in initial version (can be added later)
- Table prefixes: The parser may support optional table name prefixes (e.g.,
documents.metadata.foo) if needed