Filtering strategies
Data filtering strategies
When evaluating features for specific data subsets, FeatureQL offers two complementary approaches:
Method 1: Precise key binding (BIND_SQL or BIND_KEYSET)
-- Example: Specific customer set
CUSTOMER_ID := BIND_SQL(SELECT customer_id FROM customers WHERE registration_date > '2024-01-01') sql
Characteristics:
- Left join semantics: Returns NULL for missing keys
- Exact control: You specify exactly which keys to evaluate
- Performance: First left join overhead
Method 2: Filter after binding (WHERE clause)
-- Example: Filter available data
CUSTOMER_ID := BIND_KEYSET(AVAILABLE)
...
WHERE REGISTRATION_DATE > '2024-01-01' sql
Characteristics:
- Natural filtering: Uses available data then filters. No guarantee that all keys will be available depending on data sources used.
- SQL familiarity: Standard WHERE clause syntax but less aligned with feature engineering principles
- Performance: Better performance as there is no left join and uses predicate pushdown when possible
Choosing the right approach
| Scenario | Recommended Method | Rationale |
|---|---|---|
| Fixed key list | Method 1 (BIND_SQL or BIND_KEYSET) | Precise control, explicit NULL handling |
| Conditional filtering | Method 2 (WHERE) | Familiar SQL syntax, natural filtering |
| Missing data handling | Method 1 | Explicit NULL values for missing keys |
| Performance-critical | Depends on data size | Test both approaches with your data |
| SQL familiarity | Method 2 (WHERE) | Familiar SQL syntax, natural filtering |
Important considerations:
- NULL handling: Method 1 returns NULLs for missing keys; Method 2 excludes them: beware when using
IS NULLin the WHERE clause as it will exclude missing keys - Performance: Method 1 has join costs; Method 2 has scanning costs but better performance as there is no left join and uses predicate pushdown when possible
- Query optimization: FeatureQL pushes filters down when possible in both methods
- Index awareness: Consider how filters align with data source indexes or partitions