Files
stash/packages/server/docs/query-language.md
2025-12-09 21:32:09 +01:00

8.2 KiB

Query Language Specification

This document describes the SQL-like query language syntax for building database queries. The language supports filtering on both text and numeric fields, including nested JSON fields, with logical operators for complex queries.

Overview

The query language provides a human-readable, SQL-like syntax that can be parsed into the internal JSON query format used by the system. It supports:

  • Text field conditions (equality, pattern matching, membership)
  • Numeric field conditions (comparison operators, membership)
  • Nested JSON field access using dot notation
  • Logical operators (AND, OR) with grouping
  • NULL value checks

Syntax

Field References

Fields are referenced using dot notation for nested JSON paths:

field_name
metadata.foo
metadata.nested.deep.field

Examples:

  • content - top-level field
  • metadata.author - nested field in metadata object
  • metadata.tags.0 - array element (if needed)

Text Conditions

Text conditions operate on string values:

Operator Syntax Description
Equality field = 'value' Exact match
Inequality field != 'value' Not equal
NULL check field IS NULL Field is null
NOT NULL field IS NOT NULL Field is not null
Pattern match field LIKE 'pattern' SQL LIKE pattern matching
Not like field NOT LIKE 'pattern' Negated pattern matching
In list field IN ('val1', 'val2', 'val3') Value in list
Not in list field NOT IN ('val1', 'val2') Value not in list

String Literals:

  • Single quotes: 'value'
  • Escaped quotes: 'O''Brien' (double single quote)
  • Empty string: ''

LIKE Patterns:

  • % matches any sequence of characters
  • _ matches any single character
  • Examples: '%cat%', 'test_%', 'exact'

Examples:

content = 'hello world'
metadata.foo = 'bar'
type != 'draft'
source IS NULL
title LIKE '%cat%'
author NOT LIKE '%admin%'
status IN ('published', 'archived')
category NOT IN ('deleted', 'hidden')

Numeric Conditions

Numeric conditions operate on number values:

Operator Syntax Description
Equality field = 123 Exact match
Inequality field != 123 Not equal
NULL check field IS NULL Field is null
NOT NULL field IS NOT NULL Field is not null
Greater than field > 10 Greater than
Greater or equal field >= 10 Greater than or equal
Less than field < 10 Less than
Less or equal field <= 10 Less than or equal
In list field IN (1, 2, 3) Value in list
Not in list field NOT IN (1, 2, 3) Value not in list

Numeric Literals:

  • Integers: 123, -45, 0
  • Decimals: 123.45, -0.5, 3.14159
  • Scientific notation: 1e10, 2.5e-3 (if supported)

Examples:

typeVersion = 1
score > 0.5
views >= 100
priority < 5
age <= 65
rating IN (1, 2, 3, 4, 5)
count NOT IN (0, -1)

Logical Operators

Combine conditions using AND and OR operators:

Operator Syntax Description
AND condition1 AND condition2 Both conditions must be true
OR condition1 OR condition2 At least one condition must be true

Grouping: Use parentheses () to group conditions and control operator precedence:

(condition1 AND condition2) OR condition3
condition1 AND (condition2 OR condition3)

Examples:

type = 'article' AND status = 'published'
metadata.foo = 'bar' OR metadata.foo = 'baz'
(type = 'post' OR type = 'page') AND views > 100

Operator Precedence

  1. Parentheses () - highest precedence
  2. AND - evaluated before OR
  3. OR - lowest precedence

Examples:

-- Equivalent to: (A AND B) OR C
A AND B OR C

-- Equivalent to: A AND (B OR C)
A AND (B OR C)

-- Explicit grouping
(A OR B) AND (C OR D)

Complete Examples

Simple Conditions

-- Text equality
metadata.author = 'John Doe'

-- Numeric comparison
views >= 1000

-- Pattern matching
title LIKE '%tutorial%'

-- NULL check
source IS NULL

Multiple Conditions

-- AND operator
type = 'article' AND status = 'published' AND views > 100

-- OR operator
category = 'tech' OR category = 'science'

-- Mixed operators
(type = 'post' OR type = 'page') AND published = true

Complex Nested Queries

-- Nested AND within OR
(metadata.foo = 'bar' AND type = 'demo') OR metadata.foo = 'baz'

-- Multiple levels of nesting
((status = 'active' AND views > 100) OR (status = 'featured' AND views > 50)) AND category = 'news'

-- Complex query with multiple field types
type = 'article' AND (metadata.author = 'John' OR metadata.author = 'Jane') AND views >= 100 AND rating IN (4, 5)

Array/List Operations

-- Text IN
status IN ('published', 'archived', 'draft')

-- Numeric IN
priority IN (1, 2, 3)

-- NOT IN
category NOT IN ('deleted', 'hidden')

Type Inference

The parser will infer the condition type (text vs number) based on:

  1. Operator context: Operators like >, <, >=, <= imply numeric
  2. Value type:
    • Quoted strings ('value') → text condition
    • Unquoted numbers (123, 45.6) → numeric condition
    • NULL → can be either (context-dependent)
  3. Field name: If a field is known to be numeric, numeric operators are used

Examples:

-- Text condition (quoted string)
author = 'John'

-- Numeric condition (unquoted number)
age = 30

-- Numeric comparison
score > 0.5

-- Text pattern
title LIKE '%test%'

Escaping and Special Characters

String Escaping

  • Single quotes in strings: 'O''Brien'O'Brien
  • Empty string: ''

Field Name Escaping

If field names contain special characters or reserved words, they can be quoted (implementation-dependent):

-- Reserved words or special characters (if supported)
"order" = 'asc'
"metadata.field-name" = 'value'

Error Handling

The parser should provide clear error messages for:

  • Invalid syntax
  • Mismatched parentheses
  • Invalid operators for field types
  • Missing values
  • Invalid escape sequences

Grammar (BNF-like)

query          ::= expression
expression     ::= condition | group
group          ::= '(' expression ')'
                 | expression AND expression
                 | expression OR expression
condition      ::= text_condition | numeric_condition
text_condition ::= field ( '=' | '!=' | 'LIKE' | 'NOT LIKE' ) string_literal
                 | field 'IS' ( 'NULL' | 'NOT NULL' )
                 | field 'IN' '(' string_list ')'
                 | field 'NOT IN' '(' string_list ')'
numeric_condition ::= field ( '=' | '!=' | '>' | '>=' | '<' | '<=' ) number
                 | field 'IS' ( 'NULL' | 'NOT NULL' )
                 | field 'IN' '(' number_list ')'
                 | field 'NOT IN' '(' number_list ')'
field          ::= identifier ( '.' identifier )*
identifier     ::= [a-zA-Z_][a-zA-Z0-9_]*
string_literal ::= "'" ( escaped_char | [^'] )* "'"
escaped_char   ::= "''"
string_list    ::= string_literal ( ',' string_literal )*
number         ::= [0-9]+ ( '.' [0-9]+ )? ( [eE] [+-]? [0-9]+ )?
number_list    ::= number ( ',' number )*

Migration from JSON Format

The SQL-like syntax maps to the JSON format as follows:

JSON:

{
  "type": "text",
  "field": ["metadata", "foo"],
  "conditions": {
    "equal": "bar"
  }
}

SQL:

metadata.foo = 'bar'

JSON (with operator):

{
  "type": "operator",
  "operator": "and",
  "conditions": [
    {
      "type": "text",
      "field": ["metadata", "foo"],
      "conditions": {
        "equal": "bar"
      }
    },
    {
      "type": "text",
      "field": ["type"],
      "conditions": {
        "equal": "demo"
      }
    }
  ]
}

SQL:

metadata.foo = 'bar' AND type = 'demo'

Implementation Notes

  1. Whitespace: Whitespace is generally ignored except within string literals
  2. Case sensitivity:
    • Operators (AND, OR, LIKE, etc.) are case-insensitive
    • Field names and string values are case-sensitive
  3. Comments: Not supported in initial version (can be added later)
  4. Table prefixes: The parser may support optional table name prefixes (e.g., documents.metadata.foo) if needed