Evaluate queries

FeatureQL supports familiar SQL clauses for filtering, ordering, and aggregating data in evaluation queries. This brings the power of SQL analytics to feature engineering workflows.

Important: These SQL clauses are for query evaluation only and cannot be used when persisting features with CREATE FEATURES.

Query structure

Standard FeatureQL evaluation queries can be extended with SQL clauses:

WITH
    [internal feature definitions]
SELECT
    [output feature definitions]
WHERE [filter conditions]
ORDER BY [sort expressions]
OFFSET [skip rows]
LIMIT [max rows]
sql

Basic filtering and ordering

This example demonstrates:

  • ORDER BY: Sort results by feature values
  • LIMIT: Restrict the number of returned rows
  • OFFSET: Skip rows for pagination

Common use cases:

  • Top-N analysis: Find the highest/lowest values
  • Pagination: Handle large result sets in chunks
  • Sample data: Get a subset for analysis or testing

Aggregations with filtering

Single-key aggregations

Key aggregation concepts:

  • FILTER clauses: SUM(price) FILTER (WHERE condition) applies conditions during aggregation
  • GROUP BY in aggregates: GROUP BY category groups the aggregation by specified columns
  • Aggregate functions: SUM(), COUNT(), AVG(), MAX(), MIN()

Note: FeatureQL uses aggregate functions with built-in GROUP BY syntax rather than SQL's separate GROUP BY clause.

Multi-key aggregations

Multi-key grouping patterns:

  • Multiple grouping columns: GROUP BY category, item_cat within aggregate functions
  • Complex aggregates: Combine multiple aggregate functions
  • Conditional aggregation: Use FILTER (WHERE condition) within aggregates

Data filtering strategies

When evaluating features for specific data subsets, FeatureQL offers two complementary approaches:

Method 1: Precise key binding (BIND_SQL or BIND_KEYSET)

-- Example: Specific customer set
CUSTOMER_ID := BIND_SQL(SELECT customer_id FROM customers WHERE registration_date > '2024-01-01')
sql

Characteristics:

  • Left join semantics: Returns NULL for missing keys
  • Exact control: You specify exactly which keys to evaluate
  • Performance: First left join overhead

Method 2: Filter after binding (WHERE clause)

-- Example: Filter available data
CUSTOMER_ID := BIND_KEYSET(AVAILABLE)
...
WHERE REGISTRATION_DATE > '2024-01-01'
sql

Characteristics:

  • Natural filtering: Uses available data then filters. No guarantee that all keys will be available depending on data sources used.
  • SQL familiarity: Standard WHERE clause syntax but less aligned with feature engineering principles
  • Performance: Better performance as there is no left join and uses predicate pushdown when possible

Choosing the right approach

ScenarioRecommended MethodRationale
Fixed key listMethod 1 (BIND_SQL or BIND_KEYSET)Precise control, explicit NULL handling
Conditional filteringMethod 2 (WHERE)Familiar SQL syntax, natural filtering
Missing data handlingMethod 1Explicit NULL values for missing keys
Performance-criticalDepends on data sizeTest both approaches with your data
SQL familiarityMethod 2 (WHERE)Familiar SQL syntax, natural filtering

Important considerations:

  • NULL handling: Method 1 returns NULLs for missing keys; Method 2 excludes them: beware when using IS NULL in the WHERE clause as it will exclude missing keys
  • Performance: Method 1 has join costs; Method 2 has scanning costs but better performance as there is no left join and uses predicate pushdown when possible
  • Query optimization: FeatureQL pushes filters down when possible in both methods
  • Index awareness: Consider how filters align with data source indexes or partitions
Last update at: 2025/10/13 10:23:46