Evaluate queries
FeatureQL supports familiar SQL clauses for filtering, ordering, and aggregating data in evaluation queries. This brings the power of SQL analytics to feature engineering workflows.
Important: These SQL clauses are for query evaluation only and cannot be used when persisting features with CREATE FEATURES
.
Query structure
Standard FeatureQL evaluation queries can be extended with SQL clauses:
WITH
[internal feature definitions]
SELECT
[output feature definitions]
WHERE [filter conditions]
ORDER BY [sort expressions]
OFFSET [skip rows]
LIMIT [max rows]
sql
Basic filtering and ordering
This example demonstrates:
- ORDER BY: Sort results by feature values
- LIMIT: Restrict the number of returned rows
- OFFSET: Skip rows for pagination
Common use cases:
- Top-N analysis: Find the highest/lowest values
- Pagination: Handle large result sets in chunks
- Sample data: Get a subset for analysis or testing
Aggregations with filtering
Single-key aggregations
Key aggregation concepts:
- FILTER clauses:
SUM(price) FILTER (WHERE condition)
applies conditions during aggregation - GROUP BY in aggregates:
GROUP BY category
groups the aggregation by specified columns - Aggregate functions:
SUM()
,COUNT()
,AVG()
,MAX()
,MIN()
Note: FeatureQL uses aggregate functions with built-in GROUP BY
syntax rather than SQL's separate GROUP BY
clause.
Multi-key aggregations
Multi-key grouping patterns:
- Multiple grouping columns:
GROUP BY category, item_cat
within aggregate functions - Complex aggregates: Combine multiple aggregate functions
- Conditional aggregation: Use
FILTER (WHERE condition)
within aggregates
Data filtering strategies
When evaluating features for specific data subsets, FeatureQL offers two complementary approaches:
Method 1: Precise key binding (BIND_SQL or BIND_KEYSET)
-- Example: Specific customer set
CUSTOMER_ID := BIND_SQL(SELECT customer_id FROM customers WHERE registration_date > '2024-01-01')
sql
Characteristics:
- Left join semantics: Returns NULL for missing keys
- Exact control: You specify exactly which keys to evaluate
- Performance: First left join overhead
Method 2: Filter after binding (WHERE clause)
-- Example: Filter available data
CUSTOMER_ID := BIND_KEYSET(AVAILABLE)
...
WHERE REGISTRATION_DATE > '2024-01-01'
sql
Characteristics:
- Natural filtering: Uses available data then filters. No guarantee that all keys will be available depending on data sources used.
- SQL familiarity: Standard WHERE clause syntax but less aligned with feature engineering principles
- Performance: Better performance as there is no left join and uses predicate pushdown when possible
Choosing the right approach
Scenario | Recommended Method | Rationale |
---|---|---|
Fixed key list | Method 1 (BIND_SQL or BIND_KEYSET) | Precise control, explicit NULL handling |
Conditional filtering | Method 2 (WHERE) | Familiar SQL syntax, natural filtering |
Missing data handling | Method 1 | Explicit NULL values for missing keys |
Performance-critical | Depends on data size | Test both approaches with your data |
SQL familiarity | Method 2 (WHERE) | Familiar SQL syntax, natural filtering |
Important considerations:
- NULL handling: Method 1 returns NULLs for missing keys; Method 2 excludes them: beware when using
IS NULL
in the WHERE clause as it will exclude missing keys - Performance: Method 1 has join costs; Method 2 has scanning costs but better performance as there is no left join and uses predicate pushdown when possible
- Query optimization: FeatureQL pushes filters down when possible in both methods
- Index awareness: Consider how filters align with data source indexes or partitions