LLM reference — common recipes

Default page for day-to-day work: copy-paste shapes for aggregates, time, windows, and multi-hop graphs. When adapting snippets, prefer operators (+, ||) over ADD() / CONCAT(), SQL CASE WHEN … END over CASE_WHEN(ARRAY, ARRAY, …) unless parallel arrays are the point, and .chain() over deep nesting — see Syntax → Style preferences . Pair with SHOW TESTS / SHOW FEATURES from the bootstrap page so patterns match what is already persisted in the project.

Feature engineering vs analytics

FeatureQL serves two modes. The query structure is the same; the difference is intent and how you bind inputs.

Feature engineering: describe a known entity

You compute features for specific entity instances (e.g., "customer 42's total spend"). Keys come from a known set of entity IDs.

WITH
    CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
    -- ... entity definitions, data sources, derived features ...
    TOTAL_SPEND := ENRICHED_ORDERS.TRANSFORM(SELECT SUM(amount)).UNWRAP_ONE(),
SELECT
    CUSTOMER_ID,
    TOTAL_SPEND
FOR
    CUSTOMER_ID := BIND_VALUES(ARRAY[42, 43, 44]);
-- or: CUSTOMER_ID := BIND_COLUMNS(customer_id FROM SQL(SELECT customer_id FROM active_customers));
sql

Feature engineering queries can run online (realtime serving) or offline (batch).

Analytics: explore and aggregate across entities

You compute metrics across a population (e.g., "revenue by country"). Keys are the full dataset; you filter and aggregate in the query.

WITH
    CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
    -- ... same definitions ...
    TOTAL_SPEND := ENRICHED_ORDERS.TRANSFORM(SELECT SUM(amount)).UNWRAP_ONE(),
SELECT
    CUSTOMER_ID,
    CUSTOMER_COUNTRY,
    TOTAL_SPEND
FOR
    CUSTOMER_ID := BIND_COLUMNS(customer_id FROM SQL(SELECT customer_id FROM dim_customers))
WHERE TOTAL_SPEND > 1000e0
ORDER BY TOTAL_SPEND DESC
LIMIT 100;
sql

Analytics queries are always offline (batch).

Aggregation recipes

Sum, count, average on an array

TOTAL := ORDERS.TRANSFORM(SELECT SUM(amount)).UNWRAP_ONE(),
COUNT := ORDERS.TRANSFORM(SELECT COUNT(*)).UNWRAP_ONE(),
AVG_AMOUNT := ORDERS.TRANSFORM(SELECT AVG(amount)).UNWRAP_ONE(),
sql

Aggregation with filter

JULY_REVENUE := ORDERS.TRANSFORM(
    SELECT SUM(amount) WHERE order_date >= DATE '2017-07-01' AND order_date < DATE '2017-08-01'
).UNWRAP_ONE(),
sql

Group by within an array

REVENUE_BY_MONTH := ORDERS.TRANSFORM(
    WITH month := EXTRACT(MONTH FROM order_date)
    SELECT month, SUM(amount) GROUP BY month AS total
    ORDER BY month ASC
),
-- Result type: ARRAY(ROW(month BIGINT, total DOUBLE))
sql

Aggregation across entities with RELATED

-- Count orders per customer (no array needed)
ORDER_COUNT := CUSTOMER_ID.RELATED(COUNT(1) GROUP BY FCT_ORDERS[customer_id]),

-- Sum with filter (predicate must be a boolean at order grain — inline or a named feature)
JULY_REVENUE := CUSTOMER_ID.RELATED(
    SUM(ORDER_AMOUNT) FILTER (WHERE ORDER_DATE >= DATE '2017-07-01' AND ORDER_DATE < DATE '2017-08-01')
    GROUP BY FCT_ORDERS[customer_id]
),
sql

Aggregate at one grain, attach to a finer grain (ARRAY_MERGE)

When each base row should carry a value that was aggregated at a coarser key (e.g. every order line sees that day’s total revenue), group inside TRANSFORM, then merge back on the shared key (see Array of rows overview ):

BY_DAY := ORDERS.TRANSFORM(
    SELECT order_date, SUM(amount) AS daily_total GROUP BY order_date
),
ORDERS_WITH_DAILY := ARRAY_MERGE(ORDERS, BY_DAY JOIN ON order_date),  -- grammar: JOIN ON (not JOINED ON)
sql

Field names must exist on both row types for JOIN ON. For ARRAY_AGG / windowed collection patterns at top level, see Grouping and unnesting and Window functions .

Multi-entity query with NESTED sub-entity bindings

When a query selects features at the outer entity grain (e.g. customer) but those features depend on finer-grained data (e.g. invoices, tickets), provide the sub-entity key pools with NESTED bindings. The outer binding controls result cardinality; NESTED bindings are consumed only inside RELATED/EXTEND subqueries.

SELECT
    CUSTOMER_ID,
    AVG_PAYMENT_DAYS := CUSTOMER_AVG_DAYS_PAID_INVOICES,   -- persisted; uses INVOICE_ID internally
    OPEN_TICKETS     := CUSTOMER_OPEN_TICKET_COUNT,         -- persisted; uses TICKET_ID internally
    STATUS           := CUSTOMER_HEALTH_STATUS
FROM FM.MY_NAMESPACE
FOR
    CUSTOMER_ID := BIND_COLUMNS(id FROM TABLE(dim_customers)),
    NESTED INVOICE_ID := BIND_COLUMNS(id FROM TABLE(fct_invoices)),
    NESTED TICKET_ID  := BIND_COLUMNS(id FROM TABLE(fct_tickets)),
ORDER BY CUSTOMER_ID;
sql

Key rules:

  • NESTED bindings do not count toward the FOR CROSS requirement — use plain FOR when only one non-nested binding is present.
  • An INPUT that is exclusively consumed via RELATED, EXTEND, or as a BIND TO join key in EXTERNAL_COLUMNS is a sub-grain key. Bind it NESTED.
  • An INPUT that is directly selected or referenced by outer-grain computed features is a main-query input. Bind it non-nested.
  • RELATED features used inline (not persisted) follow the same rule: ORDER_PRICE := RELATED(SUM(...) GROUP BY TABLES.FCT_ITEMS[item_order_id] VIA ORDER_ID)ORDER_ID must be NESTED-bound.

Time-series and bucketing

Fixed calendar grain (daily / monthly metrics)

DAY_BUCKET := DATE_TRUNC('DAY', event_ts),
MONTH_BUCKET := DATE_TRUNC('MONTH', event_ts),
sql

Use inside TRANSFORM when event_ts is a field on each row of an array of rows.

Running / rolling metrics with windows

Windows work in top-level SELECT or inside TRANSFORM inner SELECT:

-- Fragment: assumes `amount`, `customer_id`, `order_ts` exist in the same scope (e.g. unnested order grain or inner TRANSFORM fields)
RUNNING_REVENUE := SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_ts ASC),
PREV_AMOUNT := LAG(amount, 1) OVER (PARTITION BY customer_id ORDER BY order_ts ASC),
sql

Compare to “today” or a snapshot date

AS_OF := @eval_as_literal(CURRENT_DATE()),
DAYS_SINCE := DATE_DIFF(AS_OF, order_ts, 'DAY'),
sql

Use @eval_as_literal so the snapshot is evaluated once, not per row, when that is what you intend.

Filtering recipes

Filter an array of rows

HIGH_VALUE := ORDERS.TRANSFORM(SELECT * WHERE amount > 100e0),
RECENT := ORDERS.TRANSFORM(SELECT * WHERE order_date >= DATE '2024-01-01'),
sql

Top N / Bottom N

TOP_3 := ORDERS.TRANSFORM(SELECT * ORDER BY amount DESC LIMIT 3),
BOTTOM_3 := ORDERS.TRANSFORM(SELECT * ORDER BY amount ASC LIMIT 3),
sql

Boolean existence check

HAS_HIGH_VALUE := ORDERS.TRANSFORM(SELECT BOOL_OR(amount > 100e0)).UNWRAP_ONE(),
ALL_SHIPPED := ORDERS.TRANSFORM(SELECT BOOL_AND(status = 'shipped')).UNWRAP_ONE(),
sql

Distinct values

UNIQUE_CATEGORIES := ORDERS.TRANSFORM(SELECT DISTINCT category),
-- Result: ARRAY(ROW(category VARCHAR))

-- As a simple array:
UNIQUE_CATEGORIES_FLAT := ORDERS.TRANSFORM(SELECT DISTINCT category).UNWRAP(),
-- Result: ARRAY(VARCHAR)
sql

Enrichment recipes

Enrich scalar FK array with entity data

-- customer_order_ids is ARRAY(BIGINT#ORDERS)
ENRICHED := EXTEND(
    ZIP(CUSTOMER_ORDER_IDS AS order_id)
    WITH ORDER_DATE, ORDER_AMOUNT
    VIA order_id BIND TO ORDER_ID
),
sql

Enrich existing array of rows

-- order_items is ARRAY(ROW(product_id BIGINT#PRODUCTS, quantity BIGINT))
ENRICHED_ITEMS := EXTEND(
    ORDER_ITEMS
    WITH PRODUCT_NAME, PRODUCT_PRICE
    VIA product_id BIND TO PRODUCT_ID
),
sql

Multi-hop enrichment (Customer → Orders → Items → Products)

Invariant: each EXTEND adds fields at the granularity of the array you pass in. You cannot “skip inside” TRANSFORM to reach another entity; you extend the array until the fields you need exist as inner row fields, then TRANSFORM.

-- Step 1: Enrich customer's orders with order-level data
CUSTOMER_ORDERS := EXTEND(
    ZIP(CUSTOMER_ORDER_IDS AS order_id)
    WITH ORDER_DATE, ORDER_ITEMS
    VIA order_id BIND TO ORDER_ID
),

-- Step 2 (schematic): enrich the per-order line array. In a full query, `ORDER_ITEMS` must be whatever
-- feature name exposes that array at order grain (often the same name as the field you pulled in step 1).
ORDER_ITEMS_ENRICHED := EXTEND(
    ORDER_ITEMS
    WITH PRODUCT_NAME, PRODUCT_PRICE
    VIA product_id BIND TO PRODUCT_ID
),

-- Step 3: When building the per-customer order array, expose enriched items under a field name (e.g. items).
CUSTOMER_ORDERS_FULL := EXTEND(
    ZIP(CUSTOMER_ORDER_IDS AS order_id)
    WITH ORDER_DATE, ORDER_ITEMS_ENRICHED AS items
    VIA order_id BIND TO ORDER_ID
),
sql

Checklist

  1. List entities left-to-right along the path (e.g. customer → order → line → product).
  2. For each hop, ensure the child key exists on the current array (often via ZIP on ARRAY(BIGINT#CHILD) or fields on rows).
  3. EXTEND(... WITH parent_features VIA child_key BIND TO CHILD_INPUT) — always prefer explicit VIA ... BIND TO in real code.
  4. Only then TRANSFORM to filter, sort, or aggregate within that enriched array.

For a full worked model, see Modeling .

Array manipulation recipes

Extract field from array of rows as simple array

-- array_of_rows[field_name] extracts that field as an array
ORDER_AMOUNTS := ENRICHED_ORDERS[amount],        -- ARRAY(DOUBLE)
ORDER_DATES := ENRICHED_ORDERS[order_date],      -- ARRAY(DATE)
sql

Combine arrays into array of rows

COMBINED := ZIP(
    ARRAY[1, 2, 3] AS id,
    ARRAY['a', 'b', 'c'] AS letter,
    ARRAY[10e0, 20e0, 30e0] AS value
),
-- Result: ARRAY(ROW(id BIGINT, letter VARCHAR, value DOUBLE))
sql

Merge arrays by join key

MERGED := ARRAY_MERGE(orders_array, returns_array JOIN ON order_id),
sql

Concatenate arrays with deduplication

ALL_ITEMS := ARRAY_CONCAT(order1_items, order2_items DEDUPLICATED ON product_id),
sql

Array aggregation functions (no TRANSFORM needed)

TOTAL := ARRAY_SUM(amounts),
MAX_VAL := ARRAY_MAX(amounts),
COUNT := ARRAY_LENGTH(amounts),
SORTED := ARRAY_SORT(amounts),
UNIQUE := ARRAY_DISTINCT(categories),
HAS_ITEM := CONTAINS(product_ids, 42),
sql

Date and time recipes

-- Extract parts
YEAR := EXTRACT(YEAR FROM order_date),
MONTH := EXTRACT(MONTH FROM order_date),

-- Truncate
MONTH_START := DATE_TRUNC('MONTH', order_date),

-- Arithmetic
NEXT_WEEK := DATE_ADD(order_date, 'DAY', 7),
DAYS_SINCE := DATE_DIFF(@eval_as_literal(CURRENT_DATE()), order_date, 'DAY'),

-- Format
FORMATTED := DATE_FORMAT(order_date, '%Y-%m-%d'),

-- Parse
PARSED := DATE_PARSE('2024-01-15', '%Y-%m-%d'),
sql

String recipes

FULL_NAME := first_name || ' ' || last_name,
CLEAN := ' hello '.TRIM().UPPER(),
PARTS := SPLIT('a,b,c', ','),               -- ARRAY['a', 'b', 'c']
FIRST := SPLIT_PART('a.b.c', '.', 1),       -- 'a'
MATCHES := name LIKE 'Jo%',
PADDED := LPAD(id::VARCHAR, 5, '0'),        -- '00042'
sql

Conditional recipes

-- Simple ternary
LABEL := IF(amount > 100e0, 'high', 'low'),

-- Multi-branch (SQL-style searched CASE)
TIER := CASE WHEN amount > 1000e0 THEN 'platinum' WHEN amount > 100e0 THEN 'gold' WHEN amount > 10e0 THEN 'silver' ELSE 'bronze' END,

-- Same logic, functional form (parallel boolean array + result array)
TIER_FN := CASE_WHEN(
    ARRAY[amount > 1000e0, amount > 100e0, amount > 10e0],
    ARRAY['platinum', 'gold', 'silver'],
    'bronze'
),

-- Value matching (simple CASE / equality dispatch)
STATUS_LABEL := CASE_WHEN_VALUE(
    status_code,
    ARRAY[1, 2, 3],
    ARRAY['active', 'pending', 'closed'],
    'unknown'
),

-- First non-null
DISPLAY_NAME := COALESCE(preferred_name, full_name, email),
sql

Window function recipes

Window functions work at the top-level SELECT or inside TRANSFORM:

-- Fragment: assumes CUSTOMER_ID and TOTAL_SPEND exist in WITH / FROM namespace for this SELECT
SELECT
    CUSTOMER_ID,
    TOTAL_SPEND,
    RANK := RANK() OVER (ORDER BY TOTAL_SPEND DESC),
    RUNNING_TOTAL := SUM(TOTAL_SPEND) OVER (ORDER BY CUSTOMER_ID ASC),
    PERCENTILE := NTILE(4) OVER (ORDER BY TOTAL_SPEND DESC),
sql

GROUP BY at top level (analytics)

In FeatureQL, GROUP BY is attached to each aggregate feature, not a separate clause:

-- Fragment: assumes CUSTOMER_COUNTRY and REVENUE are in scope (from persisted features or WITH)
SELECT
    CUSTOMER_COUNTRY,
    TOTAL_REVENUE := SUM(REVENUE) GROUP BY CUSTOMER_COUNTRY,
    CUSTOMER_COUNT := COUNT(1) GROUP BY CUSTOMER_COUNTRY,
    AVG_REVENUE := AVG(REVENUE) GROUP BY CUSTOMER_COUNTRY,
FOR
    CUSTOMER_ID := BIND_COLUMNS(customer_id FROM SQL(SELECT customer_id FROM dim_customers));
sql

UNNEST

UNNEST flattens an array of rows to one output row per element — not a plain ARRAY<BIGINT>. Build rows with ZIP first if needed. Shape rules for what else may appear in SELECT with UNNEST are stricter than SQL; see Grouping and unnesting .

Hybrid queries (FeatureQL inside SQL)

When you need SQL capabilities that FeatureQL doesn't have (e.g., complex window functions, UNION), wrap FeatureQL in SQL. Prefer FEATUREQL(...) as the table expression; a /* FEATUREQL */ … comment form exists for tools that reject FEATUREQL(...) as invalid SQL — see Hybrid queries .

/* SQL */
SELECT *
FROM FEATUREQL(
    WITH
        ID := INPUT(BIGINT),
    SELECT
        ID,
        VALUE := ID * 2
    FOR
        ID := BIND_VALUES(ARRAY[1, 2, 3])
)
WHERE VALUE > 2;
sql

Anti-patterns (quick)

  • Huge single WITH: persist layers with CREATE FEATURES and FROM fm.namespace.
  • Outer feature names inside .TRANSFORM(SELECT …): only row fields exist — CARRY (or EXTEND / ARRAY_MERGE) to put outer values on each row. IMPORT is for TRANSFORM(... USING (...) IMPORT …): it exposes outer feature definitions inside USING, not row columns; it must be pure (TABLE/SQL-backed graphs often fail — see CARRY vs IMPORT ).
  • CASE_WHEN with non-static then arrays: can fail evaluation-mode checks; use SQL CASE WHEN … END or see SHOW TESTS for CASE_WHEN.

Persistence scope — FROM is mandatory

Persisted features are not automatically in scope. To reference them, use FROM fm.namespace:

-- Step 1: Create features
CREATE OR REPLACE FEATURES IN fm.shop AS
SELECT
    CUSTOMERS := ENTITY(),
    CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
    CUST_DATA := EXTERNAL_COLUMNS(
        customer_id BIGINT#CUSTOMERS BIND TO CUSTOMER_ID, name VARCHAR
        FROM TABLE(schema.customers)
    ),
    CUSTOMER_NAME := CUST_DATA[name];

-- Step 2: Add derived features (MUST include FROM fm.shop to see step 1 features)
CREATE OR REPLACE FEATURES IN fm.shop AS
SELECT
    ORDER_COUNT := CUSTOMER_ID.RELATED(COUNT(1) GROUP BY ORD_DATA[order_customer_id])
FROM fm.shop;    -- <-- without this, CUSTOMER_ID and ORD_DATA are unknown

-- Step 3: Query (also needs FROM)
SELECT CUSTOMER_ID, CUSTOMER_NAME, ORDER_COUNT
FROM fm.shop
FOR CUSTOMER_ID := BIND_VALUES(ARRAY[1, 2, 3]);
sql

Common mistake: writing CREATE OR REPLACE FEATURES IN fm.ns AS SELECT ... that references features from a previous CREATE without FROM fm.ns. The features from the first CREATE are persisted but not in the current query's scope unless you import them with FROM.

Function catalog quick reference

When you need a function, check SHOW SIGNATURES WHERE NAME = 'name' first. Here are the most-used categories:

CategoryKey functions
CoreENTITY(), INPUT(), BIND_VALUE(), BIND_VALUES(), BIND_COLUMNS()
Data sourcesEXTERNAL_COLUMNS(), INLINE_COLUMNS()
RelationshipsRELATED(), EXTEND(), TRANSFORM()
Array of rowsZIP(), UNWRAP(), UNWRAP_ONE(), ARRAY_MERGE(), ARRAY_CONCAT()
ArrayARRAY_LENGTH(), ARRAY_SUM(), CONTAINS(), ARRAY_SORT(), FLATTEN()
AggregationSUM(), COUNT(), AVG(), MIN(), MAX(), ARRAY_AGG(), COUNT(DISTINCT ...)
ConditionalIF(), CASE WHEN … THEN … END, CASE_WHEN(), CASE_WHEN_VALUE(), COALESCE()
StringCONCAT(), LOWER(), UPPER(), TRIM(), SPLIT(), REPLACE(), SUBSTR()
DateEXTRACT(), DATE_ADD(), DATE_DIFF(), DATE_TRUNC(), DATE_FORMAT()
MathROUND(), ABS(), FLOOR(), CEIL(), SQRT(), POW()
GeospatialST_GEOGPOINT(), ST_GEOMFROMTEXT(), ST_DISTANCE(), ST_INTERSECTION(), ST_BUFFER(), ST_CONTAINS(), ST_UNION_AGG(), …
JSONJSON_PARSE(), JSON_EXTRACT(), JSON_EXTRACT_SCALAR()
IndexesINDEX_UNIQUE(), INDEX_MULTI(), ELEMENT_AT_KEY()
Metaprogramming@eval_as_literal(), @eval_as_text(), EVAL_CONST(), CONST
BusinessHASH01(), GRADUAL_ROLLOUT(), SCD_AT_TIME()
Last update at: 2026/05/05 08:09:16