Filtering strategies

Data filtering strategies

When evaluating features for specific data subsets, FeatureQL offers two complementary approaches:

Method 1: Precise key binding (BIND_SQL or BIND_KEYSET)

-- Example: Specific customer set
CUSTOMER_ID := BIND_SQL(SELECT customer_id FROM customers WHERE registration_date > '2024-01-01')
sql

Characteristics:

  • Left join semantics: Returns NULL for missing keys
  • Exact control: You specify exactly which keys to evaluate
  • Performance: First left join overhead

Method 2: Filter after binding (WHERE clause)

-- Example: Filter available data
CUSTOMER_ID := BIND_KEYSET(AVAILABLE)
...
WHERE REGISTRATION_DATE > '2024-01-01'
sql

Characteristics:

  • Natural filtering: Uses available data then filters. No guarantee that all keys will be available depending on data sources used.
  • SQL familiarity: Standard WHERE clause syntax but less aligned with feature engineering principles
  • Performance: Better performance as there is no left join and uses predicate pushdown when possible

Choosing the right approach

ScenarioRecommended MethodRationale
Fixed key listMethod 1 (BIND_SQL or BIND_KEYSET)Precise control, explicit NULL handling
Conditional filteringMethod 2 (WHERE)Familiar SQL syntax, natural filtering
Missing data handlingMethod 1Explicit NULL values for missing keys
Performance-criticalDepends on data sizeTest both approaches with your data
SQL familiarityMethod 2 (WHERE)Familiar SQL syntax, natural filtering

Important considerations:

  • NULL handling: Method 1 returns NULLs for missing keys; Method 2 excludes them: beware when using IS NULL in the WHERE clause as it will exclude missing keys
  • Performance: Method 1 has join costs; Method 2 has scanning costs but better performance as there is no left join and uses predicate pushdown when possible
  • Query optimization: FeatureQL pushes filters down when possible in both methods
  • Index awareness: Consider how filters align with data source indexes or partitions
Last update at: 2025/12/05 16:03:14
Last updated: 2025-12-05 16:07:55