EXTERNAL_COLUMNS vs EXTERNAL_SQL
FeatureQL offers many ways to map external data to features:
- Use
EXTERNAL_COLUMNSwith FROM TABLE() in general - Use
EXTERNAL_COLUMNSwith FROM SQL() when you need flexibility over your query - Use
EXTERNAL_SQLwhen you need flexibility over your query and JOIN logic
Note that BIND_COLUMNS() and BIND_SQL() follow the same pattern.
EXTERNAL_COLUMNS with TABLE
FeatureQL
WITH
CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
DIM_CUSTOMERS := EXTERNAL_COLUMNS(
customer_id BIGINT#CUSTOMERS BOUND TO CUSTOMER_ID,
name VARCHAR,
FROM TABLE(ecomm.dim_customers)
),
CUSTOMER_NAME := DIM_CUSTOMERS[name],
SELECT
CUSTOMER_ID := BIND_COLUMNS(customer_id FROM TABLE(ecomm.dim_customers)),
DIM_CUSTOMERS,
CUSTOMER_NAME
;Result
| CUSTOMER_ID BIGINT | DIM_CUSTOMERS ROW | CUSTOMER_NAME VARCHAR |
|---|---|---|
| 100 | {customer_id: 100, name: John Doe} | John Doe |
| 101 | {customer_id: 101, name: Jane Doe} | Jane Doe |
| 102 | {customer_id: 102, name: Jack Doe} | Jack Doe |
EXTERNAL_COLUMNS with SQL
FeatureQL
WITH
CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
DIM_CUSTOMERS := EXTERNAL_COLUMNS(
customer_id BIGINT#CUSTOMERS BOUND TO CUSTOMER_ID,
name VARCHAR,
FROM SQL(SELECT * FROM ecomm.dim_customers)
),
CUSTOMER_NAME := DIM_CUSTOMERS[name],
SELECT
CUSTOMER_ID := BIND_COLUMNS(customer_id FROM SQL(SELECT customer_id FROM ecomm.dim_customers)),
DIM_CUSTOMERS,
CUSTOMER_NAME
;Result
| CUSTOMER_ID BIGINT | DIM_CUSTOMERS ROW | CUSTOMER_NAME VARCHAR |
|---|---|---|
| 100 | {customer_id: 100, name: John Doe} | John Doe |
| 101 | {customer_id: 101, name: Jane Doe} | Jane Doe |
| 102 | {customer_id: 102, name: Jack Doe} | Jack Doe |
EXTERNAL_SQL
FeatureQL
WITH
CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
DIM_CUSTOMERS := EXTERNAL_SQL(
`SELECT * FROM ecomm.dim_customers`
ON `SELF.customer_id=%CUSTOMER_ID`
AS ROW(customer_id BIGINT#CUSTOMERS, name VARCHAR)
),
CUSTOMER_NAME := DIM_CUSTOMERS[name],
SELECT
CUSTOMER_ID := BIND_SQL(SELECT customer_id FROM ecomm.dim_customers),
DIM_CUSTOMERS,
CUSTOMER_NAME
;Result
| CUSTOMER_ID BIGINT | DIM_CUSTOMERS ROW | CUSTOMER_NAME VARCHAR |
|---|---|---|
| 100 | {customer_id: 100, name: John Doe} | John Doe |
| 101 | {customer_id: 101, name: Jane Doe} | Jane Doe |
| 102 | {customer_id: 102, name: Jack Doe} | Jack Doe |
Understanding EXTERNAL_SQL()
The EXTERNAL_SQL() function maps external data to features through a SQL query:
SELECT
source := EXTERNAL_SQL(
⟨query⟩ -- SQL query string (literal or computed)
ON ⟨conditions⟩ -- Optional: join/filter conditions
AS ROW(field_name1 ⟨datatype1⟩, field_name2 ⟨datatype2⟩, ...)
),
feature1 := source[field_name1], -- Map column to feature
feature2 := source[field_name2] + 1 -- Use in expressions
; sql
Parameters explained:
query: SQL query string that conforms to your backend's dialect. Can be computed using metaprogramming.conditions: Optional join conditions usingSELF.column=%INPUT_NAMEsyntaxdatatype: ROW type specification defining the expected result schema
Key principles:
- Type safety: All columns must be explicitly typed in the ROW specification
- Entity linking: Use entity annotations (
BIGINT#CUSTOMERS) to maintain semantic relationships - Result cardinality: Must return 0 or 1 row per unique input combination
- Query clarity: Do not put overly complicated queries in the
EXTERNAL_SQL()function, stay close to the table sources
Static vs. Dynamic Filtering
Static filtering with compile-time values
When you have compile-time constants (like partition dates), include them directly in the query as a computed literal.
FeatureQL
CONST
STATIC_PARTITION := '2025-01-05'
WITH
CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
SOURCE_CUSTOMER1 := EXTERNAL_SQL(
@literal(`SELECT * FROM (VALUES
(1, '2025-01-05', '1-5'),
(2, '2025-01-05', '2-5'),
(3, '2025-01-05', '3-5'),
(1, '2025-01-06', '1-6'),
(2, '2025-01-06', '2-6'),
(3, '2025-01-06', '3-6')
) AS t(customer_id, ds, attribute1)
WHERE customer_id=%CUSTOMER_ID AND ds='` || STATIC_PARTITION || `'`)
AS ROW(customer_id BIGINT#CUSTOMERS, ds VARCHAR, attribute1 VARCHAR)
),
ATTRIBUTE1 := SOURCE_CUSTOMER1[attribute1]
SELECT
CUSTOMER_ID := BIND_VALUES(ARRAY[1, 2]),
STATIC_PARTITION := @literal(STATIC_PARTITION),
ATTRIBUTE1
;Result
| CUSTOMER_ID BIGINT | STATIC_PARTITION VARCHAR | ATTRIBUTE1 VARCHAR |
|---|---|---|
| 1 | 2025-01-05 | 1-5 |
| 2 | 2025-01-05 | 2-5 |
Key characteristics:
- Compile-time filtering: The
STATIC_PARTITIONconstant is embedded in the query - Efficient execution: Database can optimize based on partitions or other known filter values
- Use cases: Daily partitions for dimension tables or facts tables
Dynamic filtering with runtime conditions
For values that vary per input row, include them in the query as an input feature placeholder starting with %.
FeatureQL
WITH
CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
DYNAMIC_PARTITION := INPUT(VARCHAR),
SOURCE_CUSTOMER1 := EXTERNAL_SQL(
`SELECT * FROM (VALUES
(1, '2025-01-05', '1-5'),
(2, '2025-01-05', '2-5'),
(3, '2025-01-05', '3-5'),
(1, '2025-01-06', '1-6'),
(2, '2025-01-06', '2-6'),
(3, '2025-01-06', '3-6')
) AS t(customer_id, ds, attribute1)
WHERE customer_id=%CUSTOMER_ID AND ds=%DYNAMIC_PARTITION`
AS ROW(customer_id BIGINT#CUSTOMERS, ds VARCHAR, attribute1 VARCHAR)
),
ATTRIBUTE1 := SOURCE_CUSTOMER1[attribute1]
SELECT
(CUSTOMER_ID, DYNAMIC_PARTITION) := BIND_VALUES(ARRAY[
ROW(1, '2025-01-05'),
ROW(2, '2025-01-05'),
ROW(3, '2025-01-05'),
ROW(1, '2025-01-06')
]),
ATTRIBUTE1
;Result
| CUSTOMER_ID BIGINT | DYNAMIC_PARTITION VARCHAR | ATTRIBUTE1 VARCHAR |
|---|---|---|
| 1 | 2025-01-05 | 1-5 |
| 1 | 2025-01-06 | 1-6 |
| 2 | 2025-01-05 | 2-5 |
| 3 | 2025-01-05 | 3-5 |
Key characteristics:
- Runtime filtering: Conditions are applied per input using
SELF.column=%INPUT_NAME - Flexible joins: Can handle complex relationships between inputs and external data
- Use cases: Point-in-time lookups
When lateral join is not supported
For databases that don't support LEFT LATERAL JOIN (like BigQuery or DataFusion), separate the data source from join logic:
FeatureQL
WITH
CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
DYNAMIC_PARTITION := INPUT(VARCHAR),
SOURCE_CUSTOMER1 := EXTERNAL_SQL(
`SELECT * FROM (VALUES
(1, '2025-01-05', '1-5'),
(2, '2025-01-05', '2-5'),
(3, '2025-01-05', '3-5'),
(1, '2025-01-06', '1-6'),
(2, '2025-01-06', '2-6'),
(3, '2025-01-06', '3-6')
) AS t(customer_id, ds, attribute1)`
ON `SELF.customer_id=%CUSTOMER_ID AND SELF.ds=%DYNAMIC_PARTITION`
AS ROW(customer_id BIGINT#CUSTOMERS, ds VARCHAR, attribute1 VARCHAR)
),
ATTRIBUTE1 := SOURCE_CUSTOMER1[attribute1]
SELECT
(CUSTOMER_ID, DYNAMIC_PARTITION) := BIND_VALUES(ARRAY[
ROW(1, '2025-01-05'),
ROW(2, '2025-01-05'),
ROW(3, '2025-01-05'),
ROW(1, '2025-01-06')
]),
ATTRIBUTE1
;Result
| CUSTOMER_ID BIGINT | DYNAMIC_PARTITION VARCHAR | ATTRIBUTE1 VARCHAR |
|---|---|---|
| 1 | 2025-01-05 | 1-5 |
| 1 | 2025-01-06 | 1-6 |
| 2 | 2025-01-05 | 2-5 |
| 3 | 2025-01-05 | 3-5 |