Evaluate queries

FeatureQL supports familiar SQL clauses for filtering, ordering, and aggregating data in evaluation queries. This brings the power of SQL analytics to feature engineering workflows.

Important: These SQL clauses are for query evaluation only and cannot be used when persisting features with CREATE FEATURES.

Query structure

Standard FeatureQL evaluation queries can be extended with SQL clauses:

WITH
    [internal feature definitions]
SELECT
    [output feature definitions]
WHERE [filter conditions]
ORDER BY [sort expressions]
OFFSET [skip rows]
LIMIT [max rows]

sql

Basic filtering and ordering

This example demonstrates:

ORDER BY: Sort results by feature values
LIMIT: Restrict the number of returned rows
OFFSET: Skip rows for pagination

Common use cases:

Top-N analysis: Find the highest/lowest values
Pagination: Handle large result sets in chunks
Sample data: Get a subset for analysis or testing

Aggregations with filtering

Single-key aggregations

Key aggregation concepts:

FILTER clauses: SUM(price) FILTER (WHERE condition) applies conditions during aggregation
GROUP BY in aggregates: GROUP BY category groups the aggregation by specified columns
Aggregate functions: SUM(), COUNT(), AVG(), MAX(), MIN()

Note: FeatureQL uses aggregate functions with built-in GROUP BY syntax rather than SQL's separate GROUP BY clause.

Multi-key aggregations

Multi-key grouping patterns:

Multiple grouping columns: GROUP BY category, item_cat within aggregate functions
Complex aggregates: Combine multiple aggregate functions
Conditional aggregation: Use FILTER (WHERE condition) within aggregates

Data filtering strategies

When evaluating features for specific data subsets, FeatureQL offers two complementary approaches:

Method 1: Precise key binding (BIND_SQL or BIND_KEYSET)

-- Example: Specific customer set
CUSTOMER_ID := BIND_SQL(SELECT customer_id FROM customers WHERE registration_date > '2024-01-01')

sql

Characteristics:

Left join semantics: Returns NULL for missing keys
Exact control: You specify exactly which keys to evaluate
Performance: First left join overhead

Method 2: Filter after binding (WHERE clause)

-- Example: Filter available data
CUSTOMER_ID := BIND_KEYSET(AVAILABLE)
...
WHERE REGISTRATION_DATE > '2024-01-01'

sql

Characteristics:

Natural filtering: Uses available data then filters. No guarantee that all keys will be available depending on data sources used.
SQL familiarity: Standard WHERE clause syntax but less aligned with feature engineering principles
Performance: Better performance as there is no left join and uses predicate pushdown when possible

Choosing the right approach

Scenario	Recommended Method	Rationale
Fixed key list	Method 1 (BIND_SQL or BIND_KEYSET)	Precise control, explicit NULL handling
Conditional filtering	Method 2 (WHERE)	Familiar SQL syntax, natural filtering
Missing data handling	Method 1	Explicit NULL values for missing keys
Performance-critical	Depends on data size	Test both approaches with your data
SQL familiarity	Method 2 (WHERE)	Familiar SQL syntax, natural filtering

Important considerations:

NULL handling: Method 1 returns NULLs for missing keys; Method 2 excludes them: beware when using IS NULL in the WHERE clause as it will exclude missing keys
Performance: Method 1 has join costs; Method 2 has scanning costs but better performance as there is no left join and uses predicate pushdown when possible
Query optimization: FeatureQL pushes filters down when possible in both methods
Index awareness: Consider how filters align with data source indexes or partitions

Suggest changes to this page

Last update at: 2025/10/13 10:23:46

On this page

Query structure Basic filtering and ordering Aggregations with filtering Single-key aggregations Multi-key aggregations Data filtering strategies Method 1: Precise key binding (BIND_SQL or BIND_KEYSET)Method 2: Filter after binding (WHERE clause)Choosing the right approach