EXTERNAL_COLUMNS vs EXTERNAL_SQL

FeatureQL offers many ways to map external data to features:

  • Use EXTERNAL_COLUMNS with FROM TABLE() in general
  • Use EXTERNAL_COLUMNS with FROM SQL() when you need flexibility over your query
  • Use EXTERNAL_SQL when you need flexibility over your query and JOIN logic

Note that BIND_COLUMNS() and BIND_SQL() follow the same pattern.

EXTERNAL_COLUMNS with TABLE

FeatureQL
WITH
    CUSTOMERS := ENTITY(),
    CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
    DIM_CUSTOMERS := EXTERNAL_COLUMNS(
        customer_id BIGINT#CUSTOMERS BOUND TO CUSTOMER_ID,
        name VARCHAR,
        FROM TABLE(ecomm.dim_customers)
    ),
    CUSTOMER_NAME := DIM_CUSTOMERS[name],
SELECT
    CUSTOMER_ID := BIND_COLUMNS(customer_id FROM TABLE(ecomm.dim_customers)),
    DIM_CUSTOMERS,
    CUSTOMER_NAME
;
Result
CUSTOMER_ID BIGINTDIM_CUSTOMERS ROWCUSTOMER_NAME VARCHAR
100{customer_id: 100, name: John Doe}John Doe
101{customer_id: 101, name: Jane Doe}Jane Doe
102{customer_id: 102, name: Jack Doe}Jack Doe

EXTERNAL_COLUMNS with SQL

FeatureQL
WITH
    CUSTOMERS := ENTITY(),
    CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
    DIM_CUSTOMERS := EXTERNAL_COLUMNS(
        customer_id BIGINT#CUSTOMERS BOUND TO CUSTOMER_ID,
        name VARCHAR,
        FROM SQL(SELECT * FROM ecomm.dim_customers)
    ),
    CUSTOMER_NAME := DIM_CUSTOMERS[name],
SELECT
    CUSTOMER_ID := BIND_COLUMNS(customer_id FROM SQL(SELECT customer_id FROM ecomm.dim_customers)),
    DIM_CUSTOMERS,
    CUSTOMER_NAME
;
Result
CUSTOMER_ID BIGINTDIM_CUSTOMERS ROWCUSTOMER_NAME VARCHAR
100{customer_id: 100, name: John Doe}John Doe
101{customer_id: 101, name: Jane Doe}Jane Doe
102{customer_id: 102, name: Jack Doe}Jack Doe

EXTERNAL_SQL

FeatureQL
WITH
    CUSTOMERS := ENTITY(),
    CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
    DIM_CUSTOMERS := EXTERNAL_SQL(
        `SELECT * FROM ecomm.dim_customers`
        ON `SELF.customer_id=%CUSTOMER_ID`
        AS ROW(customer_id BIGINT#CUSTOMERS, name VARCHAR)
    ),
    CUSTOMER_NAME := DIM_CUSTOMERS[name],
SELECT
    CUSTOMER_ID := BIND_SQL(SELECT customer_id FROM ecomm.dim_customers),
    DIM_CUSTOMERS,
    CUSTOMER_NAME
;
Result
CUSTOMER_ID BIGINTDIM_CUSTOMERS ROWCUSTOMER_NAME VARCHAR
100{customer_id: 100, name: John Doe}John Doe
101{customer_id: 101, name: Jane Doe}Jane Doe
102{customer_id: 102, name: Jack Doe}Jack Doe

Understanding EXTERNAL_SQL()

The EXTERNAL_SQL() function maps external data to features through a SQL query:

SELECT
    source := EXTERNAL_SQL(
        ⟨query⟩                 -- SQL query string (literal or computed)
        ON ⟨conditions⟩         -- Optional: join/filter conditions
        AS ROW(field_name1 ⟨datatype1⟩, field_name2 ⟨datatype2⟩, ...)
    ),
    feature1 := source[field_name1],        -- Map column to feature
    feature2 := source[field_name2] + 1     -- Use in expressions
;
sql

Parameters explained:

  • query: SQL query string that conforms to your backend's dialect. Can be computed using metaprogramming.
  • conditions: Optional join conditions using SELF.column=%INPUT_NAME syntax
  • datatype: ROW type specification defining the expected result schema

Key principles:

  • Type safety: All columns must be explicitly typed in the ROW specification
  • Entity linking: Use entity annotations (BIGINT#CUSTOMERS) to maintain semantic relationships
  • Result cardinality: Must return 0 or 1 row per unique input combination
  • Query clarity: Do not put overly complicated queries in the EXTERNAL_SQL() function, stay close to the table sources

Static vs. Dynamic Filtering

Static filtering with compile-time values

When you have compile-time constants (like partition dates), include them directly in the query as a computed literal.

FeatureQL
CONST
    STATIC_PARTITION := '2025-01-05'
WITH
    CUSTOMERS := ENTITY(),
    CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
    SOURCE_CUSTOMER1 := EXTERNAL_SQL(
        @literal(`SELECT * FROM (VALUES
            (1, '2025-01-05', '1-5'),
            (2, '2025-01-05', '2-5'),
            (3, '2025-01-05', '3-5'),
            (1, '2025-01-06', '1-6'),
            (2, '2025-01-06', '2-6'),
            (3, '2025-01-06', '3-6')
        ) AS t(customer_id, ds, attribute1)
        WHERE customer_id=%CUSTOMER_ID AND ds='` || STATIC_PARTITION || `'`)
        AS ROW(customer_id BIGINT#CUSTOMERS, ds VARCHAR, attribute1 VARCHAR)
    ),
    ATTRIBUTE1 := SOURCE_CUSTOMER1[attribute1]
SELECT
    CUSTOMER_ID := BIND_VALUES(ARRAY[1, 2]),
    STATIC_PARTITION := @literal(STATIC_PARTITION),
    ATTRIBUTE1
;
Result
CUSTOMER_ID BIGINTSTATIC_PARTITION VARCHARATTRIBUTE1 VARCHAR
12025-01-051-5
22025-01-052-5

Key characteristics:

  • Compile-time filtering: The STATIC_PARTITION constant is embedded in the query
  • Efficient execution: Database can optimize based on partitions or other known filter values
  • Use cases: Daily partitions for dimension tables or facts tables

Dynamic filtering with runtime conditions

For values that vary per input row, include them in the query as an input feature placeholder starting with %.

FeatureQL
WITH
    CUSTOMERS := ENTITY(),
    CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
    DYNAMIC_PARTITION := INPUT(VARCHAR),
    SOURCE_CUSTOMER1 := EXTERNAL_SQL(
        `SELECT * FROM (VALUES
            (1, '2025-01-05', '1-5'),
            (2, '2025-01-05', '2-5'),
            (3, '2025-01-05', '3-5'),
            (1, '2025-01-06', '1-6'),
            (2, '2025-01-06', '2-6'),
            (3, '2025-01-06', '3-6')
        ) AS t(customer_id, ds, attribute1)
        WHERE customer_id=%CUSTOMER_ID AND ds=%DYNAMIC_PARTITION`
        AS ROW(customer_id BIGINT#CUSTOMERS, ds VARCHAR, attribute1 VARCHAR)
    ),
    ATTRIBUTE1 := SOURCE_CUSTOMER1[attribute1]
SELECT
    (CUSTOMER_ID, DYNAMIC_PARTITION) := BIND_VALUES(ARRAY[
        ROW(1, '2025-01-05'),
        ROW(2, '2025-01-05'),
        ROW(3, '2025-01-05'),
        ROW(1, '2025-01-06')
    ]),
    ATTRIBUTE1
;
Result
CUSTOMER_ID BIGINTDYNAMIC_PARTITION VARCHARATTRIBUTE1 VARCHAR
12025-01-051-5
12025-01-061-6
22025-01-052-5
32025-01-053-5

Key characteristics:

  • Runtime filtering: Conditions are applied per input using SELF.column=%INPUT_NAME
  • Flexible joins: Can handle complex relationships between inputs and external data
  • Use cases: Point-in-time lookups

When lateral join is not supported

For databases that don't support LEFT LATERAL JOIN (like BigQuery or DataFusion), separate the data source from join logic:

FeatureQL
WITH
    CUSTOMERS := ENTITY(),
    CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
    DYNAMIC_PARTITION := INPUT(VARCHAR),
    SOURCE_CUSTOMER1 := EXTERNAL_SQL(
        `SELECT * FROM (VALUES
            (1, '2025-01-05', '1-5'),
            (2, '2025-01-05', '2-5'),
            (3, '2025-01-05', '3-5'),
            (1, '2025-01-06', '1-6'),
            (2, '2025-01-06', '2-6'),
            (3, '2025-01-06', '3-6')
        ) AS t(customer_id, ds, attribute1)`
        ON `SELF.customer_id=%CUSTOMER_ID AND SELF.ds=%DYNAMIC_PARTITION`
        AS ROW(customer_id BIGINT#CUSTOMERS, ds VARCHAR, attribute1 VARCHAR)
    ),
    ATTRIBUTE1 := SOURCE_CUSTOMER1[attribute1]
SELECT
    (CUSTOMER_ID, DYNAMIC_PARTITION) := BIND_VALUES(ARRAY[
        ROW(1, '2025-01-05'),
        ROW(2, '2025-01-05'),
        ROW(3, '2025-01-05'),
        ROW(1, '2025-01-06')
    ]),
    ATTRIBUTE1
;
Result
CUSTOMER_ID BIGINTDYNAMIC_PARTITION VARCHARATTRIBUTE1 VARCHAR
12025-01-051-5
12025-01-061-6
22025-01-052-5
32025-01-053-5

Last update at: 2026/02/16 15:46:17
Last updated: 2026-02-16 15:46:51