Analytics data sources

EXTERNAL_SQL() connects FeatureQL to external data sources by executing SQL queries and mapping their results to features. This enables seamless integration with existing data warehouses, databases, and analytical systems.

Understanding EXTERNAL_SQL()

The EXTERNAL_SQL() function bridges FeatureQL features with external data through structured queries:

SELECT
    source := EXTERNAL_SQL(
        ⟨query⟩                 -- SQL query string (literal or computed)
        ON ⟨conditions⟩         -- Optional: join/filter conditions
        AS ROW(field_name1 ⟨datatype1⟩, field_name2 ⟨datatype2⟩, ...)
    ),
    feature1 := source[field_name1],        -- Map column to feature
    feature2 := source[field_name2] + 1     -- Use in expressions
;
sql

Parameters explained:

  • query: SQL query string that conforms to your backend's dialect. Can be computed using metaprogramming.
  • conditions: Optional join conditions using SELF.column=%INPUT_NAME syntax
  • datatype: ROW type specification defining the expected result schema

Key principles:

  • Type safety: All columns must be explicitly typed in the ROW specification
  • Entity linking: Use entity annotations (BIGINT#CUSTOMERS) to maintain semantic relationships
  • Result cardinality: Must return 0 or 1 row per unique input combination

Static vs. Dynamic Filtering

Static filtering with compile-time values

When you have compile-time constants (like partition dates), include them directly in the query as a computed literal.

Key characteristics:

  • Compile-time filtering: The STATIC_PARTITION constant is embedded in the query
  • Efficient execution: Database can optimize based on partitions or other known filter values
  • Use cases: Daily partitions for dimension tables or facts tables

Dynamic filtering with runtime conditions

For values that vary per input row, include them in the query as an input feature placeholder starting with %.

Key characteristics:

  • Runtime filtering: Conditions are applied per input using SELF.column=%INPUT_NAME
  • Flexible joins: Can handle complex relationships between inputs and external data
  • Use cases: Point-in-time lookups

Non-lateral join compatibility

For databases that don't support LEFT LATERAL JOIN, separate the data source from join logic:

Backend compatibility considerations:

  • DuckDB/Trino: Support lateral joins for optimal performance
  • BigQuery: Uses different syntax but similar semantics

Best practices

  1. Query optimization: Do not put overly complicated queries in the EXTERNAL_SQL() function, stay close to the table sources
  2. Type precision: Use entity types (BIGINT#ENTITY)
Last update at: 2025/10/13 10:23:46