LLM reference — data sources and binding
How to connect features to external data and provide input values. Load this page when you need to write EXTERNAL_COLUMNS(), INLINE_COLUMNS(), EXTERNAL_SQL(), or any BIND_XXX() pattern.
The binding contract
Every FeatureQL query must bind concrete values to its INPUT() features via the FOR clause. Without binding, the query will not execute.
WITH
ID := INPUT(BIGINT),
SELECT
ID,
DOUBLED := ID * 2
FOR
ID := BIND_VALUES(ARRAY[1, 2, 3]); Binding patterns
-- Single value (one row)
FOR NAME := BIND_VALUE('Alice');
-- Array of values (multiple rows, one input)
FOR ID := BIND_VALUES(ARRAY[1, 3, 5]);
-- Multiple inputs paired together (no cross-product)
FOR (INPUT1, INPUT2) := BIND_VALUES(ARRAY[ROW(1, 'a'), ROW(3, 'b')]);
-- Separate bindings = Cartesian product (n × m rows)
FOR INPUT1 := BIND_VALUES(ARRAY[1, 3]), INPUT2 := BIND_VALUES(ARRAY[2, 4]);
-- From a SQL query (backend-specific SQL)
FOR (ID, NAME) := BIND_SQL(SELECT id, name FROM (VALUES (1, 'Alice'), (2, 'Bob')) AS t(id, name));
-- From table columns
FOR (INPUT1, INPUT2) := BIND_COLUMNS(col1, col2 FROM TABLE(schema.table)); EXTERNAL_COLUMNS — the primary data source mapping
EXTERNAL_COLUMNS() is the recommended way to map table data to features. It handles 95% of use cases. You declare columns, bind key columns to inputs, and specify a source. FeatureQL generates all join logic.
General pattern
FEATURE_NAME := EXTERNAL_COLUMNS(
key_column TYPE BOUND TO SOME_INPUT, -- Key column linked to an INPUT feature
value_column1 TYPE, -- Value columns
value_column2 TYPE
FROM <SOURCE>(...) -- TABLE(), SQL(), or VIEW()
),
-- Access individual fields with [] operator:
FIELD1 := FEATURE_NAME[value_column1],
FIELD2 := FEATURE_NAME[value_column2], The BOUND TO clause links a key column to an INPUT() feature, enforcing 0 or 1 row per input value. The result is a ROW — extract fields with [].
From a table (simplest)
DIM_CUSTOMERS := EXTERNAL_COLUMNS(
customer_id BIGINT BOUND TO CUSTOMER_ID,
name VARCHAR,
email VARCHAR
FROM TABLE(schema.dim_customers)
),
CUSTOMER_NAME := DIM_CUSTOMERS[name],
CUSTOMER_EMAIL := DIM_CUSTOMERS[email], From a SQL query (when you need transforms or joins)
DIM_CUSTOMERS := EXTERNAL_COLUMNS(
customer_id BIGINT BOUND TO CUSTOMER_ID,
name VARCHAR,
total_orders BIGINT
FROM SQL(
SELECT c.customer_id, c.name, COUNT(o.order_id) as total_orders
FROM customers c LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.customer_id, c.name
)
), From a view (realtime serving)
DIM_CUSTOMERS := EXTERNAL_COLUMNS(
customer_id BIGINT BOUND TO CUSTOMER_ID,
name VARCHAR
FROM VIEW(ext.v_dim_customers)
), Entity annotations on columns
Use #ENTITY annotations on columns that are foreign keys to maintain semantic relationships:
FCT_ORDERS := EXTERNAL_COLUMNS(
order_id BIGINT#ORDERS BOUND TO ORDER_ID,
customer_id BIGINT#CUSTOMERS, -- Foreign key to CUSTOMERS entity
amount DOUBLE,
order_date DATE
FROM TABLE(schema.fct_orders)
),
ORDER_CUSTOMER_ID := FCT_ORDERS[customer_id], -- Typed as BIGINT#CUSTOMERS INLINE_COLUMNS — for prototyping and tests
Same syntax as EXTERNAL_COLUMNS() but embeds data directly in the query. Use for prototyping without a database.
With CSV
DIM_CUSTOMERS := INLINE_COLUMNS(
customer_id BIGINT BOUND TO CUSTOMER_ID,
name VARCHAR,
country VARCHAR
FROM CSV(
customer_id,name,country
101,Alice,US
102,Bob,FR
103,Charlie,UK
)
),
CUSTOMER_NAME := DIM_CUSTOMERS[name], With JSON
DIM_CUSTOMERS := INLINE_COLUMNS(
customer_id BIGINT BOUND TO CUSTOMER_ID,
name VARCHAR
FROM JSON([
{"customer_id": 101, "name": "Alice"},
{"customer_id": 102, "name": "Bob"}
])
), EXTERNAL_SQL — full control (use sparingly)
EXTERNAL_SQL() gives you full control over the SQL query and join conditions. Use it only when EXTERNAL_COLUMNS() cannot express what you need — typically for complex join conditions or backend-specific SQL.
SOURCE := EXTERNAL_SQL(
'SELECT customer_id, name, email FROM customers',
ON `SELF.customer_id = %CUSTOMER_ID`, -- Join condition with %INPUT placeholders
AS ROW(customer_id BIGINT, name VARCHAR, email VARCHAR)
),
CUSTOMER_NAME := SOURCE[name], Key differences from EXTERNAL_COLUMNS:
- You write raw SQL as a string (must match the backend dialect)
- You specify join conditions explicitly with
ONusingSELF.column = %INPUT_NAMEsyntax - You must declare the return type as
ROW(...)explicitly - The SQL string can include
GROUP BYfor aggregation
When to use EXTERNAL_SQL
- Complex join conditions that
EXTERNAL_COLUMNS()cannot express - Backend-specific SQL features (e.g., BigQuery
UNNEST, Trino-specific functions) - Dynamic SQL via metaprogramming (
@echo(),@literal())
Aggregation example (one-to-many → single row)
CUSTOMER_ORDER_IDS := EXTERNAL_SQL(
'SELECT customer_id, ARRAY_AGG(order_id) as order_ids FROM orders GROUP BY customer_id',
ON `SELF.customer_id = %CUSTOMER_ID`,
AS ROW(customer_id BIGINT, order_ids ARRAY(BIGINT#ORDERS))
)[order_ids], Quick mockup patterns for prototyping
When you just need to test logic without real tables:
All columns as inputs (fastest)
WITH
ID := INPUT(BIGINT),
NAME := INPUT(VARCHAR),
GREETING := 'Hello, ' || NAME || '!',
SELECT ID, GREETING
FOR (ID, NAME) := BIND_VALUES(ARRAY[ROW(1, 'Alice'), ROW(2, 'Bob')]); Key-value separation with INLINE_COLUMNS (mirrors production)
WITH
CUSTOMER_ID := INPUT(BIGINT),
DIM := INLINE_COLUMNS(
customer_id BIGINT BOUND TO CUSTOMER_ID,
name VARCHAR, country VARCHAR
FROM CSV(customer_id,name,country
1,Alice,US
2,Bob,FR)
),
GREETING := 'Hello, ' || DIM[name] || ' from ' || DIM[country] || '!',
SELECT CUSTOMER_ID, GREETING
FOR CUSTOMER_ID := BIND_VALUES(ARRAY[1, 2]); The second pattern is preferred because it mirrors production structure: keys are bound, values are looked up. When ready for production, replace INLINE_COLUMNS with EXTERNAL_COLUMNS ... FROM TABLE(...).
Rules for data source queries
- Keep SQL simple.
SELECTwith optionalGROUP BYonly. NoWHEREclauses — filters belong in features, not in the data source. - One mapping per entity+table. Don't create multiple mappings to the same table for the same entity.
- 0 or 1 row per input. The
BOUND TOconstraint enforces this. For one-to-many, aggregate withARRAY_AGGin the SQL or useEXTERNAL_SQLwithGROUP BY. - Cast in the SQL if needed. If a column type doesn't match, cast it in the source SQL:
CAST(date_col AS DATE) AS order_date.