EXTERNAL_COLUMNS vs EXTERNAL_SQL
FeatureQL provides three ways to map external data to features. Each trades off simplicity for flexibility, so choosing the right one depends on how much control you need over the SQL and join logic.
EXTERNAL_COLUMNS()withFROM TABLE()— the simplest option. FeatureQL generates all SQL for you.EXTERNAL_COLUMNS()withFROM SQL()— you write the source query, but FeatureQL still handles the join.EXTERNAL_SQL()— you control both the source query and the join conditions.
The same pattern applies to binding functions: BIND_COLUMNS() mirrors EXTERNAL_COLUMNS(), and BIND_SQL() mirrors EXTERNAL_SQL().
EXTERNAL_COLUMNS with TABLE
The most declarative approach. You specify column names, types, and which column is the key. FeatureQL generates the SELECT and join logic automatically:
WITH
CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
DIM_CUSTOMERS := EXTERNAL_COLUMNS(
customer_id BIGINT#CUSTOMERS BOUND TO CUSTOMER_ID,
name VARCHAR,
FROM TABLE(ecomm.dim_customers)
),
CUSTOMER_NAME := DIM_CUSTOMERS[name],
SELECT
CUSTOMER_ID := BIND_COLUMNS(customer_id FROM TABLE(ecomm.dim_customers)),
DIM_CUSTOMERS,
CUSTOMER_NAME
;| CUSTOMER_ID BIGINT | DIM_CUSTOMERS ROW | CUSTOMER_NAME VARCHAR |
|---|---|---|
| 100 | {customer_id: 100, name: John Doe} | John Doe |
| 101 | {customer_id: 101, name: Jane Doe} | Jane Doe |
| 102 | {customer_id: 102, name: Jack Doe} | Jack Doe |
EXTERNAL_COLUMNS with SQL
When you need to customize the source query — for example, to join two tables or apply a transformation — wrap it in FROM SQL(). FeatureQL still manages the join between the result and your bound keys:
WITH
CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
DIM_CUSTOMERS := EXTERNAL_COLUMNS(
customer_id BIGINT#CUSTOMERS BOUND TO CUSTOMER_ID,
name VARCHAR,
FROM SQL(SELECT * FROM ecomm.dim_customers)
),
CUSTOMER_NAME := DIM_CUSTOMERS[name],
SELECT
CUSTOMER_ID := BIND_COLUMNS(customer_id FROM SQL(SELECT customer_id FROM ecomm.dim_customers)),
DIM_CUSTOMERS,
CUSTOMER_NAME
;| CUSTOMER_ID BIGINT | DIM_CUSTOMERS ROW | CUSTOMER_NAME VARCHAR |
|---|---|---|
| 100 | {customer_id: 100, name: John Doe} | John Doe |
| 101 | {customer_id: 101, name: Jane Doe} | Jane Doe |
| 102 | {customer_id: 102, name: Jack Doe} | Jack Doe |
EXTERNAL_SQL
For full control over both the source query and the join logic. You write the SQL string and specify join conditions explicitly with the ON clause using SELF.column=%INPUT_NAME syntax:
WITH
CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
DIM_CUSTOMERS := EXTERNAL_SQL(
`SELECT * FROM ecomm.dim_customers`
ON `SELF.customer_id=%CUSTOMER_ID`
AS ROW(customer_id BIGINT#CUSTOMERS, name VARCHAR)
),
CUSTOMER_NAME := DIM_CUSTOMERS[name],
SELECT
CUSTOMER_ID := BIND_SQL(SELECT customer_id FROM ecomm.dim_customers),
DIM_CUSTOMERS,
CUSTOMER_NAME
;| CUSTOMER_ID BIGINT | DIM_CUSTOMERS ROW | CUSTOMER_NAME VARCHAR |
|---|---|---|
| 100 | {customer_id: 100, name: John Doe} | John Doe |
| 101 | {customer_id: 101, name: Jane Doe} | Jane Doe |
| 102 | {customer_id: 102, name: Jack Doe} | Jack Doe |
Understanding EXTERNAL_SQL() in depth
source := EXTERNAL_SQL(
⟨query⟩ -- SQL query string (literal or computed)
ON ⟨conditions⟩ -- Optional: join/filter conditions
AS ROW(field1 ⟨type1⟩, field2 ⟨type2⟩, ...)
),
feature1 := source[field1],
feature2 := source[field2] + 1 All columns must be explicitly typed in the ROW specification, and entity annotations (BIGINT#CUSTOMERS) preserve semantic relationships. The query must return 0 or 1 row per unique input combination. Keep the SQL close to the table source — complex transformations belong in features, not inside EXTERNAL_SQL().
Static vs dynamic filtering
Compile-time values
When filter values are known at compile time (like a partition date), embed them directly in the query using metaprogramming. The database can optimize based on the known value:
CONST
STATIC_PARTITION := '2025-01-05'
WITH
CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
SOURCE_CUSTOMER1 := EXTERNAL_SQL(
@literal(`SELECT * FROM (VALUES
(1, '2025-01-05', '1-5'),
(2, '2025-01-05', '2-5'),
(3, '2025-01-05', '3-5'),
(1, '2025-01-06', '1-6'),
(2, '2025-01-06', '2-6'),
(3, '2025-01-06', '3-6')
) AS t(customer_id, ds, attribute1)
WHERE customer_id=%CUSTOMER_ID AND ds='` || STATIC_PARTITION || `'`)
AS ROW(customer_id BIGINT#CUSTOMERS, ds VARCHAR, attribute1 VARCHAR)
),
ATTRIBUTE1 := SOURCE_CUSTOMER1[attribute1]
SELECT
CUSTOMER_ID := BIND_VALUES(ARRAY[1, 2]),
STATIC_PARTITION := @literal(STATIC_PARTITION),
ATTRIBUTE1
;| CUSTOMER_ID BIGINT | STATIC_PARTITION VARCHAR | ATTRIBUTE1 VARCHAR |
|---|---|---|
| 1 | 2025-01-05 | 1-5 |
| 2 | 2025-01-05 | 2-5 |
The STATIC_PARTITION constant is resolved at compile time and injected into the SQL string via @literal(). This is ideal for daily partition filters on dimension or fact tables.
Runtime values
When filter values vary per input row, use the ON clause with SELF.column=%INPUT_NAME placeholders. FeatureQL translates this into a lateral join:
WITH
CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
DYNAMIC_PARTITION := INPUT(VARCHAR),
SOURCE_CUSTOMER1 := EXTERNAL_SQL(
`SELECT * FROM (VALUES
(1, '2025-01-05', '1-5'),
(2, '2025-01-05', '2-5'),
(3, '2025-01-05', '3-5'),
(1, '2025-01-06', '1-6'),
(2, '2025-01-06', '2-6'),
(3, '2025-01-06', '3-6')
) AS t(customer_id, ds, attribute1)
WHERE customer_id=%CUSTOMER_ID AND ds=%DYNAMIC_PARTITION`
AS ROW(customer_id BIGINT#CUSTOMERS, ds VARCHAR, attribute1 VARCHAR)
),
ATTRIBUTE1 := SOURCE_CUSTOMER1[attribute1]
SELECT
(CUSTOMER_ID, DYNAMIC_PARTITION) := BIND_VALUES(ARRAY[
ROW(1, '2025-01-05'),
ROW(2, '2025-01-05'),
ROW(3, '2025-01-05'),
ROW(1, '2025-01-06')
]),
ATTRIBUTE1
;| CUSTOMER_ID BIGINT | DYNAMIC_PARTITION VARCHAR | ATTRIBUTE1 VARCHAR |
|---|---|---|
| 1 | 2025-01-05 | 1-5 |
| 1 | 2025-01-06 | 1-6 |
| 2 | 2025-01-05 | 2-5 |
| 3 | 2025-01-05 | 3-5 |
Each input row gets its own lookup, so the filter conditions can reference any bound input.
Backends without lateral join support
BigQuery and DataFusion don't support LEFT LATERAL JOIN. For these backends, separate the data source from the join logic — put the full dataset in the query and the filter conditions in the ON clause:
WITH
CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
DYNAMIC_PARTITION := INPUT(VARCHAR),
SOURCE_CUSTOMER1 := EXTERNAL_SQL(
`SELECT * FROM (VALUES
(1, '2025-01-05', '1-5'),
(2, '2025-01-05', '2-5'),
(3, '2025-01-05', '3-5'),
(1, '2025-01-06', '1-6'),
(2, '2025-01-06', '2-6'),
(3, '2025-01-06', '3-6')
) AS t(customer_id, ds, attribute1)`
ON `SELF.customer_id=%CUSTOMER_ID AND SELF.ds=%DYNAMIC_PARTITION`
AS ROW(customer_id BIGINT#CUSTOMERS, ds VARCHAR, attribute1 VARCHAR)
),
ATTRIBUTE1 := SOURCE_CUSTOMER1[attribute1]
SELECT
(CUSTOMER_ID, DYNAMIC_PARTITION) := BIND_VALUES(ARRAY[
ROW(1, '2025-01-05'),
ROW(2, '2025-01-05'),
ROW(3, '2025-01-05'),
ROW(1, '2025-01-06')
]),
ATTRIBUTE1
;| CUSTOMER_ID BIGINT | DYNAMIC_PARTITION VARCHAR | ATTRIBUTE1 VARCHAR |
|---|---|---|
| 1 | 2025-01-05 | 1-5 |
| 1 | 2025-01-06 | 1-6 |
| 2 | 2025-01-05 | 2-5 |
| 3 | 2025-01-05 | 3-5 |
When to use each approach
| Approach | Use when | Limitations |
|---|---|---|
EXTERNAL_COLUMNS + TABLE | Simple table lookups | No custom SQL |
EXTERNAL_COLUMNS + SQL | Custom source query, standard join | No custom join conditions |
EXTERNAL_SQL | Full control over query and join | More verbose, requires explicit typing |
See data source mapping for the full reference.