LLM reference — common recipes
Default page for day-to-day work: copy-paste shapes for aggregates, time, windows, and multi-hop graphs. When adapting snippets, prefer operators (+, ||) over ADD() / CONCAT(), SQL CASE WHEN … END over CASE_WHEN(ARRAY, ARRAY, …) unless parallel arrays are the point, and .chain() over deep nesting — see Syntax → Style preferences . Pair with SHOW TESTS / SHOW FEATURES from the bootstrap page so patterns match what is already persisted in the project.
Feature engineering vs analytics
FeatureQL serves two modes. The query structure is the same; the difference is intent and how you bind inputs.
Feature engineering: describe a known entity
You compute features for specific entity instances (e.g., "customer 42's total spend"). Keys come from a known set of entity IDs.
WITH
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
-- ... entity definitions, data sources, derived features ...
TOTAL_SPEND := ENRICHED_ORDERS.TRANSFORM(SELECT SUM(amount)).UNWRAP_ONE(),
SELECT
CUSTOMER_ID,
TOTAL_SPEND
FOR
CUSTOMER_ID := BIND_VALUES(ARRAY[42, 43, 44]);
-- or: CUSTOMER_ID := BIND_COLUMNS(customer_id FROM SQL(SELECT customer_id FROM active_customers)); Feature engineering queries can run online (realtime serving) or offline (batch).
Analytics: explore and aggregate across entities
You compute metrics across a population (e.g., "revenue by country"). Keys are the full dataset; you filter and aggregate in the query.
WITH
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
-- ... same definitions ...
TOTAL_SPEND := ENRICHED_ORDERS.TRANSFORM(SELECT SUM(amount)).UNWRAP_ONE(),
SELECT
CUSTOMER_ID,
CUSTOMER_COUNTRY,
TOTAL_SPEND
FOR
CUSTOMER_ID := BIND_COLUMNS(customer_id FROM SQL(SELECT customer_id FROM dim_customers))
WHERE TOTAL_SPEND > 1000e0
ORDER BY TOTAL_SPEND DESC
LIMIT 100; Analytics queries are always offline (batch).
Aggregation recipes
Sum, count, average on an array
TOTAL := ORDERS.TRANSFORM(SELECT SUM(amount)).UNWRAP_ONE(),
COUNT := ORDERS.TRANSFORM(SELECT COUNT(*)).UNWRAP_ONE(),
AVG_AMOUNT := ORDERS.TRANSFORM(SELECT AVG(amount)).UNWRAP_ONE(), Aggregation with filter
JULY_REVENUE := ORDERS.TRANSFORM(
SELECT SUM(amount) WHERE order_date >= DATE '2017-07-01' AND order_date < DATE '2017-08-01'
).UNWRAP_ONE(), Group by within an array
REVENUE_BY_MONTH := ORDERS.TRANSFORM(
WITH month := EXTRACT(MONTH FROM order_date)
SELECT month, SUM(amount) GROUP BY month AS total
ORDER BY month ASC
),
-- Result type: ARRAY(ROW(month BIGINT, total DOUBLE)) Aggregation across entities with RELATED
-- Count orders per customer (no array needed)
ORDER_COUNT := CUSTOMER_ID.RELATED(COUNT(1) GROUP BY FCT_ORDERS[customer_id]),
-- Sum with filter (predicate must be a boolean at order grain — inline or a named feature)
JULY_REVENUE := CUSTOMER_ID.RELATED(
SUM(ORDER_AMOUNT) FILTER (WHERE ORDER_DATE >= DATE '2017-07-01' AND ORDER_DATE < DATE '2017-08-01')
GROUP BY FCT_ORDERS[customer_id]
), Aggregate at one grain, attach to a finer grain (ARRAY_MERGE)
When each base row should carry a value that was aggregated at a coarser key (e.g. every order line sees that day’s total revenue), group inside TRANSFORM, then merge back on the shared key (see Array of rows overview ):
BY_DAY := ORDERS.TRANSFORM(
SELECT order_date, SUM(amount) AS daily_total GROUP BY order_date
),
ORDERS_WITH_DAILY := ARRAY_MERGE(ORDERS, BY_DAY JOIN ON order_date), -- grammar: JOIN ON (not JOINED ON) Field names must exist on both row types for JOIN ON. For ARRAY_AGG / windowed collection patterns at top level, see Grouping and unnesting and Window functions .
Multi-entity query with NESTED sub-entity bindings
When a query selects features at the outer entity grain (e.g. customer) but those features depend on finer-grained data (e.g. invoices, tickets), provide the sub-entity key pools with NESTED bindings. The outer binding controls result cardinality; NESTED bindings are consumed only inside RELATED/EXTEND subqueries.
SELECT
CUSTOMER_ID,
AVG_PAYMENT_DAYS := CUSTOMER_AVG_DAYS_PAID_INVOICES, -- persisted; uses INVOICE_ID internally
OPEN_TICKETS := CUSTOMER_OPEN_TICKET_COUNT, -- persisted; uses TICKET_ID internally
STATUS := CUSTOMER_HEALTH_STATUS
FROM FM.MY_NAMESPACE
FOR
CUSTOMER_ID := BIND_COLUMNS(id FROM TABLE(dim_customers)),
NESTED INVOICE_ID := BIND_COLUMNS(id FROM TABLE(fct_invoices)),
NESTED TICKET_ID := BIND_COLUMNS(id FROM TABLE(fct_tickets)),
ORDER BY CUSTOMER_ID; Key rules:
NESTEDbindings do not count toward theFOR CROSSrequirement — use plainFORwhen only one non-nested binding is present.- An
INPUTthat is exclusively consumed viaRELATED,EXTEND, or as aBIND TOjoin key inEXTERNAL_COLUMNSis a sub-grain key. Bind itNESTED. - An
INPUTthat is directly selected or referenced by outer-grain computed features is a main-query input. Bind it non-nested. RELATEDfeatures used inline (not persisted) follow the same rule:ORDER_PRICE := RELATED(SUM(...) GROUP BY TABLES.FCT_ITEMS[item_order_id] VIA ORDER_ID)→ORDER_IDmust beNESTED-bound.
Time-series and bucketing
Fixed calendar grain (daily / monthly metrics)
DAY_BUCKET := DATE_TRUNC('DAY', event_ts),
MONTH_BUCKET := DATE_TRUNC('MONTH', event_ts), Use inside TRANSFORM when event_ts is a field on each row of an array of rows.
Running / rolling metrics with windows
Windows work in top-level SELECT or inside TRANSFORM inner SELECT:
-- Fragment: assumes `amount`, `customer_id`, `order_ts` exist in the same scope (e.g. unnested order grain or inner TRANSFORM fields)
RUNNING_REVENUE := SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_ts ASC),
PREV_AMOUNT := LAG(amount, 1) OVER (PARTITION BY customer_id ORDER BY order_ts ASC), Compare to “today” or a snapshot date
AS_OF := @eval_as_literal(CURRENT_DATE()),
DAYS_SINCE := DATE_DIFF(AS_OF, order_ts, 'DAY'), Use @eval_as_literal so the snapshot is evaluated once, not per row, when that is what you intend.
Filtering recipes
Filter an array of rows
HIGH_VALUE := ORDERS.TRANSFORM(SELECT * WHERE amount > 100e0),
RECENT := ORDERS.TRANSFORM(SELECT * WHERE order_date >= DATE '2024-01-01'), Top N / Bottom N
TOP_3 := ORDERS.TRANSFORM(SELECT * ORDER BY amount DESC LIMIT 3),
BOTTOM_3 := ORDERS.TRANSFORM(SELECT * ORDER BY amount ASC LIMIT 3), Boolean existence check
HAS_HIGH_VALUE := ORDERS.TRANSFORM(SELECT BOOL_OR(amount > 100e0)).UNWRAP_ONE(),
ALL_SHIPPED := ORDERS.TRANSFORM(SELECT BOOL_AND(status = 'shipped')).UNWRAP_ONE(), Distinct values
UNIQUE_CATEGORIES := ORDERS.TRANSFORM(SELECT DISTINCT category),
-- Result: ARRAY(ROW(category VARCHAR))
-- As a simple array:
UNIQUE_CATEGORIES_FLAT := ORDERS.TRANSFORM(SELECT DISTINCT category).UNWRAP(),
-- Result: ARRAY(VARCHAR) Enrichment recipes
Enrich scalar FK array with entity data
-- customer_order_ids is ARRAY(BIGINT#ORDERS)
ENRICHED := EXTEND(
ZIP(CUSTOMER_ORDER_IDS AS order_id)
WITH ORDER_DATE, ORDER_AMOUNT
VIA order_id BIND TO ORDER_ID
), Enrich existing array of rows
-- order_items is ARRAY(ROW(product_id BIGINT#PRODUCTS, quantity BIGINT))
ENRICHED_ITEMS := EXTEND(
ORDER_ITEMS
WITH PRODUCT_NAME, PRODUCT_PRICE
VIA product_id BIND TO PRODUCT_ID
), Multi-hop enrichment (Customer → Orders → Items → Products)
Invariant: each EXTEND adds fields at the granularity of the array you pass in. You cannot “skip inside” TRANSFORM to reach another entity; you extend the array until the fields you need exist as inner row fields, then TRANSFORM.
-- Step 1: Enrich customer's orders with order-level data
CUSTOMER_ORDERS := EXTEND(
ZIP(CUSTOMER_ORDER_IDS AS order_id)
WITH ORDER_DATE, ORDER_ITEMS
VIA order_id BIND TO ORDER_ID
),
-- Step 2 (schematic): enrich the per-order line array. In a full query, `ORDER_ITEMS` must be whatever
-- feature name exposes that array at order grain (often the same name as the field you pulled in step 1).
ORDER_ITEMS_ENRICHED := EXTEND(
ORDER_ITEMS
WITH PRODUCT_NAME, PRODUCT_PRICE
VIA product_id BIND TO PRODUCT_ID
),
-- Step 3: When building the per-customer order array, expose enriched items under a field name (e.g. items).
CUSTOMER_ORDERS_FULL := EXTEND(
ZIP(CUSTOMER_ORDER_IDS AS order_id)
WITH ORDER_DATE, ORDER_ITEMS_ENRICHED AS items
VIA order_id BIND TO ORDER_ID
), Checklist
- List entities left-to-right along the path (e.g. customer → order → line → product).
- For each hop, ensure the child key exists on the current array (often via
ZIPonARRAY(BIGINT#CHILD)or fields on rows). EXTEND(... WITH parent_features VIA child_key BIND TO CHILD_INPUT)— always prefer explicitVIA ... BIND TOin real code.- Only then
TRANSFORMto filter, sort, or aggregate within that enriched array.
For a full worked model, see Modeling .
Array manipulation recipes
Extract field from array of rows as simple array
-- array_of_rows[field_name] extracts that field as an array
ORDER_AMOUNTS := ENRICHED_ORDERS[amount], -- ARRAY(DOUBLE)
ORDER_DATES := ENRICHED_ORDERS[order_date], -- ARRAY(DATE) Combine arrays into array of rows
COMBINED := ZIP(
ARRAY[1, 2, 3] AS id,
ARRAY['a', 'b', 'c'] AS letter,
ARRAY[10e0, 20e0, 30e0] AS value
),
-- Result: ARRAY(ROW(id BIGINT, letter VARCHAR, value DOUBLE)) Merge arrays by join key
MERGED := ARRAY_MERGE(orders_array, returns_array JOIN ON order_id), Concatenate arrays with deduplication
ALL_ITEMS := ARRAY_CONCAT(order1_items, order2_items DEDUPLICATED ON product_id), Array aggregation functions (no TRANSFORM needed)
TOTAL := ARRAY_SUM(amounts),
MAX_VAL := ARRAY_MAX(amounts),
COUNT := ARRAY_LENGTH(amounts),
SORTED := ARRAY_SORT(amounts),
UNIQUE := ARRAY_DISTINCT(categories),
HAS_ITEM := CONTAINS(product_ids, 42), Date and time recipes
-- Extract parts
YEAR := EXTRACT(YEAR FROM order_date),
MONTH := EXTRACT(MONTH FROM order_date),
-- Truncate
MONTH_START := DATE_TRUNC('MONTH', order_date),
-- Arithmetic
NEXT_WEEK := DATE_ADD(order_date, 'DAY', 7),
DAYS_SINCE := DATE_DIFF(@eval_as_literal(CURRENT_DATE()), order_date, 'DAY'),
-- Format
FORMATTED := DATE_FORMAT(order_date, '%Y-%m-%d'),
-- Parse
PARSED := DATE_PARSE('2024-01-15', '%Y-%m-%d'), String recipes
FULL_NAME := first_name || ' ' || last_name,
CLEAN := ' hello '.TRIM().UPPER(),
PARTS := SPLIT('a,b,c', ','), -- ARRAY['a', 'b', 'c']
FIRST := SPLIT_PART('a.b.c', '.', 1), -- 'a'
MATCHES := name LIKE 'Jo%',
PADDED := LPAD(id::VARCHAR, 5, '0'), -- '00042' Conditional recipes
-- Simple ternary
LABEL := IF(amount > 100e0, 'high', 'low'),
-- Multi-branch (SQL-style searched CASE)
TIER := CASE WHEN amount > 1000e0 THEN 'platinum' WHEN amount > 100e0 THEN 'gold' WHEN amount > 10e0 THEN 'silver' ELSE 'bronze' END,
-- Same logic, functional form (parallel boolean array + result array)
TIER_FN := CASE_WHEN(
ARRAY[amount > 1000e0, amount > 100e0, amount > 10e0],
ARRAY['platinum', 'gold', 'silver'],
'bronze'
),
-- Value matching (simple CASE / equality dispatch)
STATUS_LABEL := CASE_WHEN_VALUE(
status_code,
ARRAY[1, 2, 3],
ARRAY['active', 'pending', 'closed'],
'unknown'
),
-- First non-null
DISPLAY_NAME := COALESCE(preferred_name, full_name, email), Window function recipes
Window functions work at the top-level SELECT or inside TRANSFORM:
-- Fragment: assumes CUSTOMER_ID and TOTAL_SPEND exist in WITH / FROM namespace for this SELECT
SELECT
CUSTOMER_ID,
TOTAL_SPEND,
RANK := RANK() OVER (ORDER BY TOTAL_SPEND DESC),
RUNNING_TOTAL := SUM(TOTAL_SPEND) OVER (ORDER BY CUSTOMER_ID ASC),
PERCENTILE := NTILE(4) OVER (ORDER BY TOTAL_SPEND DESC), GROUP BY at top level (analytics)
In FeatureQL, GROUP BY is attached to each aggregate feature, not a separate clause:
-- Fragment: assumes CUSTOMER_COUNTRY and REVENUE are in scope (from persisted features or WITH)
SELECT
CUSTOMER_COUNTRY,
TOTAL_REVENUE := SUM(REVENUE) GROUP BY CUSTOMER_COUNTRY,
CUSTOMER_COUNT := COUNT(1) GROUP BY CUSTOMER_COUNTRY,
AVG_REVENUE := AVG(REVENUE) GROUP BY CUSTOMER_COUNTRY,
FOR
CUSTOMER_ID := BIND_COLUMNS(customer_id FROM SQL(SELECT customer_id FROM dim_customers)); UNNEST
UNNEST flattens an array of rows to one output row per element — not a plain ARRAY<BIGINT>. Build rows with ZIP first if needed. Shape rules for what else may appear in SELECT with UNNEST are stricter than SQL; see Grouping and unnesting .
Hybrid queries (FeatureQL inside SQL)
When you need SQL capabilities that FeatureQL doesn't have (e.g., complex window functions, UNION), wrap FeatureQL in SQL. Prefer FEATUREQL(...) as the table expression; a /* FEATUREQL */ … comment form exists for tools that reject FEATUREQL(...) as invalid SQL — see Hybrid queries .
/* SQL */
SELECT *
FROM FEATUREQL(
WITH
ID := INPUT(BIGINT),
SELECT
ID,
VALUE := ID * 2
FOR
ID := BIND_VALUES(ARRAY[1, 2, 3])
)
WHERE VALUE > 2; Anti-patterns (quick)
- Huge single
WITH: persist layers withCREATE FEATURESandFROM fm.namespace. - Outer feature names inside
.TRANSFORM(SELECT …): only row fields exist —CARRY(orEXTEND/ARRAY_MERGE) to put outer values on each row.IMPORTis forTRANSFORM(... USING (...) IMPORT …): it exposes outer feature definitions insideUSING, not row columns; it must be pure (TABLE/SQL-backed graphs often fail — see CARRY vs IMPORT ). CASE_WHENwith non-staticthenarrays: can fail evaluation-mode checks; use SQLCASE WHEN … ENDor seeSHOW TESTSforCASE_WHEN.
Persistence scope — FROM is mandatory
Persisted features are not automatically in scope. To reference them, use FROM fm.namespace:
-- Step 1: Create features
CREATE OR REPLACE FEATURES IN fm.shop AS
SELECT
CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
CUST_DATA := EXTERNAL_COLUMNS(
customer_id BIGINT#CUSTOMERS BIND TO CUSTOMER_ID, name VARCHAR
FROM TABLE(schema.customers)
),
CUSTOMER_NAME := CUST_DATA[name];
-- Step 2: Add derived features (MUST include FROM fm.shop to see step 1 features)
CREATE OR REPLACE FEATURES IN fm.shop AS
SELECT
ORDER_COUNT := CUSTOMER_ID.RELATED(COUNT(1) GROUP BY ORD_DATA[order_customer_id])
FROM fm.shop; -- <-- without this, CUSTOMER_ID and ORD_DATA are unknown
-- Step 3: Query (also needs FROM)
SELECT CUSTOMER_ID, CUSTOMER_NAME, ORDER_COUNT
FROM fm.shop
FOR CUSTOMER_ID := BIND_VALUES(ARRAY[1, 2, 3]); Common mistake: writing CREATE OR REPLACE FEATURES IN fm.ns AS SELECT ... that references features from a previous CREATE without FROM fm.ns. The features from the first CREATE are persisted but not in the current query's scope unless you import them with FROM.
Function catalog quick reference
When you need a function, check SHOW SIGNATURES WHERE NAME = 'name' first. Here are the most-used categories:
| Category | Key functions |
|---|---|
| Core | ENTITY(), INPUT(), BIND_VALUE(), BIND_VALUES(), BIND_COLUMNS() |
| Data sources | EXTERNAL_COLUMNS(), INLINE_COLUMNS() |
| Relationships | RELATED(), EXTEND(), TRANSFORM() |
| Array of rows | ZIP(), UNWRAP(), UNWRAP_ONE(), ARRAY_MERGE(), ARRAY_CONCAT() |
| Array | ARRAY_LENGTH(), ARRAY_SUM(), CONTAINS(), ARRAY_SORT(), FLATTEN() |
| Aggregation | SUM(), COUNT(), AVG(), MIN(), MAX(), ARRAY_AGG(), COUNT(DISTINCT ...) |
| Conditional | IF(), CASE WHEN … THEN … END, CASE_WHEN(), CASE_WHEN_VALUE(), COALESCE() |
| String | CONCAT(), LOWER(), UPPER(), TRIM(), SPLIT(), REPLACE(), SUBSTR() |
| Date | EXTRACT(), DATE_ADD(), DATE_DIFF(), DATE_TRUNC(), DATE_FORMAT() |
| Math | ROUND(), ABS(), FLOOR(), CEIL(), SQRT(), POW() |
| Geospatial | ST_GEOGPOINT(), ST_GEOMFROMTEXT(), ST_DISTANCE(), ST_INTERSECTION(), ST_BUFFER(), ST_CONTAINS(), ST_UNION_AGG(), … |
| JSON | JSON_PARSE(), JSON_EXTRACT(), JSON_EXTRACT_SCALAR() |
| Indexes | INDEX_UNIQUE(), INDEX_MULTI(), ELEMENT_AT_KEY() |
| Metaprogramming | @eval_as_literal(), @eval_as_text(), EVAL_CONST(), CONST |
| Business | HASH01(), GRADUAL_ROLLOUT(), SCD_AT_TIME() |