LLM reference — entity-centric modeling
How to model business domains in FeatureQL using entities, relationships, and the core operations EXTEND(), TRANSFORM(), and RELATED(). Load this page when you need to design a feature model or traverse entity relationships.
The entity-centric mental model
FeatureQL organizes data around entities (CUSTOMER, ORDER, PRODUCT), not tables. Each entity has:
- An
ENTITY()declaration - An
INPUT()that represents its primary key - Data source mappings (
EXTERNAL_COLUMNS()) that bring in table data - Derived features that compute values from the mapped data
Features with the same INPUT() automatically align — no join needed. Cross-entity relationships use RELATED() or EXTEND().
The 5-step modeling workflow
Step 1: Identify entities and their keys
CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
ORDERS := ENTITY(),
ORDER_ID := INPUT(BIGINT#ORDERS),
PRODUCTS := ENTITY(),
PRODUCT_ID := INPUT(BIGINT#PRODUCTS), Each entity gets exactly one ENTITY() and one INPUT(). The #ENTITY annotation on the type is critical — it tells FeatureQL which entity a key belongs to.
Step 2: Map external data
One EXTERNAL_COLUMNS() per entity per source table. Keep mappings generic — no filters, no business logic.
-- Dimension table: one row per entity
DIM_CUSTOMERS := EXTERNAL_COLUMNS(
customer_id BIGINT#CUSTOMERS BIND TO CUSTOMER_ID,
name VARCHAR,
email VARCHAR,
country VARCHAR#COUNTRIES
FROM TABLE(schema.dim_customers)
),
CUSTOMER_NAME := DIM_CUSTOMERS[name],
CUSTOMER_COUNTRY := DIM_CUSTOMERS[country],
-- Fact table: one row per entity (use EXTERNAL_COLUMNS when key is unique)
FCT_ORDERS := EXTERNAL_COLUMNS(
order_id BIGINT#ORDERS BIND TO ORDER_ID,
customer_id BIGINT#CUSTOMERS,
amount DOUBLE,
order_date DATE
FROM TABLE(schema.fct_orders)
),
ORDER_AMOUNT := FCT_ORDERS[amount],
ORDER_DATE := FCT_ORDERS[order_date],
ORDER_CUSTOMER_ID := FCT_ORDERS[customer_id], -- Foreign key typed as BIGINT#CUSTOMERS Rule: never put WHERE clauses in data source mappings. Filters are features (Step 3).
Step 3: Define filters as features
Boolean features at the appropriate entity level. These are reusable across queries.
-- At ORDER level
IS_JULY_2017 := ORDER_DATE >= DATE '2017-07-01' AND ORDER_DATE < DATE '2017-08-01',
IS_HIGH_VALUE := ORDER_AMOUNT > 100e0,
-- At PRODUCT level
IS_ELECTRONICS := PRODUCT_CATEGORY = 'electronics', Step 4: Trace the relationship path
Before writing the query, identify which entities you need to traverse. Example: "total revenue from electronics orders per customer" requires:
CUSTOMER → (has many) ORDERS → (has many) ITEMS → (references) PRODUCT
At each hop, you use EXTEND() to bring in data from the next entity.
Step 5: Build by composing
Features with the same INPUT() auto-align. Use EXTEND() to enrich arrays with cross-entity data. Use TRANSFORM() to filter, aggregate, or sort within arrays.
EXTEND — enrich arrays with related entity data
EXTEND() adds fields to each row in an array by looking up data from another entity. It is the general-purpose join mechanism.
Enriching a scalar array (foreign key array → entity data)
When you have an array of foreign keys (e.g., ARRAY(BIGINT#ORDERS)), first convert to array of rows with ZIP(), then extend:
-- CUSTOMER_ORDER_IDS is ARRAY(BIGINT#ORDERS)
ENRICHED_ORDERS := EXTEND(
ZIP(CUSTOMER_ORDER_IDS AS order_id) -- Convert to ARRAY(ROW(order_id BIGINT#ORDERS))
WITH ORDER_DATE, ORDER_AMOUNT -- Features at ORDER level to add
VIA order_id BIND TO ORDER_ID -- Join: order_id field → ORDER_ID input
),
-- Result type: ARRAY(ROW(order_id BIGINT#ORDERS, order_date DATE, order_amount DOUBLE)) Enriching an existing array of rows
When the array already has rows (e.g., from EXTERNAL_COLUMNS with array fields), no ZIP() needed:
-- ORDER_ITEMS is ARRAY(ROW(product_id BIGINT#PRODUCTS, quantity BIGINT))
ITEMS_ENRICHED := EXTEND(
ORDER_ITEMS
WITH PRODUCT_NAME, PRODUCT_PRICE -- Features at PRODUCT level
VIA product_id BIND TO PRODUCT_ID -- Join on existing field
),
-- Result: ARRAY(ROW(product_id, quantity, product_name, product_price)) When VIA / BIND TO can be omitted
If the field in the array has an entity annotation that unambiguously matches an INPUT(), FeatureQL infers the binding:
-- order_id is typed as BIGINT#ORDERS, and ORDER_ID is INPUT(BIGINT#ORDERS)
-- FeatureQL can infer: VIA order_id BIND TO ORDER_ID
ENRICHED := EXTEND(
ZIP(CUSTOMER_ORDER_IDS AS order_id)
WITH ORDER_DATE, ORDER_AMOUNT
), Aliasing extended fields
ENRICHED := EXTEND(
ZIP(CUSTOMER_ORDER_IDS AS order_id)
WITH ORDER_DATE AS date, ORDER_AMOUNT AS amount
VIA order_id BIND TO ORDER_ID
),
-- Access as: ENRICHED[date], ENRICHED[amount] TRANSFORM — filter, aggregate, sort within arrays
TRANSFORM() runs a mini-query inside each entity's array of rows. It operates only on the fields present in the array — no external feature references.
Why TRANSFORM is self-contained (and why that's good)
This is not a limitation — it is a deliberate design choice that makes FeatureQL simpler and more powerful:
TRANSFORM is a pure function. Given the same array, it always produces the same result. No hidden dependencies, no side effects, no ambient state. This makes every
TRANSFORM()independently testable, cacheable, and parallelizable.Only one level of nesting. FeatureQL has exactly two scopes: the top-level feature graph (where features reference other features) and the inner
TRANSFORM()scope (where fields reference other fields within the array). There is never a third level. This means you never need to reason about "which scope am I in?" — if you're insideTRANSFORM(), you work with fields; if you're outside, you work with features.The EXTEND-then-TRANSFORM pattern is the standard workflow. Think of it as a pipeline:
EXTEND()is the "join" step (bring in cross-entity data),TRANSFORM()is the "compute" step (filter, aggregate, sort). Separating these concerns makes each step simple and debuggable. You can inspect the result ofEXTEND()before applyingTRANSFORM().It prevents accidental complexity. If
TRANSFORM()could reference external features, you'd need to reason about which entity level each reference belongs to, how dependencies resolve across scopes, and whether the external feature is scalar or array-valued. The self-containment rule eliminates all of this.
The mental model: prepare your data with EXTEND(), then process it with TRANSFORM(). If you find yourself wanting to reference an external feature inside TRANSFORM(), that's a signal to add it to the array first via EXTEND() — or use IMPORTING for simple scalar values.
Chained syntax (preferred for simple transforms)
-- Filter
JULY_ORDERS := ENRICHED_ORDERS.TRANSFORM(SELECT * WHERE order_date >= DATE '2017-07-01'),
-- Aggregate (returns single-element array of rows)
TOTAL_SPENT := ENRICHED_ORDERS.TRANSFORM(SELECT SUM(amount)),
-- Aggregate and unwrap to scalar
TOTAL_SPENT_SCALAR := ENRICHED_ORDERS.TRANSFORM(SELECT SUM(amount)).UNWRAP_ONE(),
-- Top N
TOP_3 := ENRICHED_ORDERS.TRANSFORM(SELECT * ORDER BY amount DESC LIMIT 3),
-- Group by
BY_MONTH := ENRICHED_ORDERS.TRANSFORM(
WITH month := EXTRACT(MONTH FROM order_date)
SELECT month, SUM(amount) GROUP BY month AS total
ORDER BY month ASC
),
-- Distinct
UNIQUE_CATEGORIES := ENRICHED_ORDERS.TRANSFORM(SELECT DISTINCT category), Operator syntax (when you need BY or IMPORTING)
-- BY: explicit aggregation key (needed when array comes from BIND_VALUES, not EXTERNAL_COLUMNS)
RESULT := TRANSFORM(some_array BY key_field USING (SELECT SUM(value) GROUP BY key_field)),
-- IMPORTING: bring external features into the inner query scope
RESULT := TRANSFORM(orders USING (SELECT * WHERE amount > threshold) IMPORTING threshold), Bringing external data into TRANSFORM
Since TRANSFORM() only sees fields in the array, use one of these patterns when you need external data:
Pattern A: EXTEND first (preferred for entity-level data)
-- Enrich the array with the data you need, then transform
ENRICHED := EXTEND(ORDERS WITH PRODUCT_NAME VIA product_id BIND TO PRODUCT_ID),
RESULT := ENRICHED.TRANSFORM(SELECT * WHERE product_name = 'Widget'), Pattern B: IMPORTING (for simple scalar values)
-- Import a scalar feature into the inner scope
RESULT := TRANSFORM(ORDERS USING (SELECT * WHERE amount > threshold) IMPORTING THRESHOLD), Do not try to reference external features directly inside the backtick query — it will fail:
-- WRONG: PRODUCT_NAME is a feature, not a field in the array
RESULT := ORDERS.TRANSFORM(SELECT * WHERE PRODUCT_NAME = 'Widget') UNWRAP and UNWRAP_ONE
After TRANSFORM() aggregation, the result is still an array of rows. Use UNWRAP() or UNWRAP_ONE() to extract values:
-- UNWRAP: extract single-field values as a simple array
-- ARRAY(ROW(amount DOUBLE)) → ARRAY(DOUBLE)
AMOUNTS := ORDERS.TRANSFORM(SELECT amount WHERE amount > 50e0).UNWRAP(),
-- UNWRAP_ONE: extract single value from single-element, single-field result
-- ARRAY(ROW(total DOUBLE)) where array has exactly 1 element → DOUBLE
TOTAL := ORDERS.TRANSFORM(SELECT SUM(amount)).UNWRAP_ONE(), RELATED — concise cross-entity joins
RELATED() is syntactic sugar for common cross-entity patterns. Use it for simple single-key joins; switch to EXTEND() for complex cases.
Lookup (FK → PK): get a value from another entity
-- Order has a customer_id FK. Get the customer's name.
ORDER_CUSTOMER_NAME := RELATED(CUSTOMER_NAME VIA FCT_ORDERS[customer_id]),
-- Chained syntax
ORDER_CUSTOMER_NAME := FCT_ORDERS[customer_id].RELATED(CUSTOMER_NAME), Aggregation (PK → FKs): aggregate across related entities
-- Count orders per customer (CUSTOMER_ID is PK side)
NUM_ORDERS := RELATED(COUNT(1) GROUP BY FCT_ORDERS[customer_id] VIA CUSTOMER_ID),
-- Chained syntax
NUM_ORDERS := CUSTOMER_ID.RELATED(COUNT(1) GROUP BY FCT_ORDERS[customer_id]),
-- Sum with filter
JULY_REVENUE := CUSTOMER_ID.RELATED(
SUM(ORDER_AMOUNT) FILTER (WHERE IS_JULY_2017)
GROUP BY FCT_ORDERS[customer_id]
), When to use RELATED vs EXTEND
| Scenario | Use |
|---|---|
| Simple FK lookup (get one value from another entity) | RELATED() |
| Simple aggregation across a FK relationship | RELATED() |
| Enrich an array of rows with multiple fields | EXTEND() |
| Multi-hop traversal (Customer → Orders → Items → Products) | EXTEND() chain |
| Need to filter/aggregate the enriched array afterward | EXTEND() + TRANSFORM() |
Complete modeling example
"For each customer, compute total revenue from food orders":
WITH
-- Entities
CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
ORDERS := ENTITY(),
ORDER_ID := INPUT(BIGINT#ORDERS),
-- Data sources (generic, no filters)
CUSTOMER_SOURCE := EXTERNAL_COLUMNS(
customer_id BIGINT#CUSTOMERS BIND TO CUSTOMER_ID,
order_ids ARRAY(BIGINT#ORDERS)
FROM TABLE(schema.customer_orders)
),
ORDER_SOURCE := EXTERNAL_COLUMNS(
order_id BIGINT#ORDERS BIND TO ORDER_ID,
price DOUBLE,
category VARCHAR
FROM TABLE(schema.orders)
),
-- Extract fields
ORDER_PRICE := ORDER_SOURCE[price],
ORDER_CATEGORY := ORDER_SOURCE[category],
-- Enrich customer's orders with order-level data
ENRICHED_ORDERS := EXTEND(
ZIP(CUSTOMER_SOURCE[order_ids] AS order_id)
WITH ORDER_PRICE, ORDER_CATEGORY
VIA order_id BIND TO ORDER_ID
),
-- Filter and aggregate
FOOD_REVENUE := ENRICHED_ORDERS
.TRANSFORM(SELECT SUM(order_price) WHERE order_category = 'food')
.UNWRAP_ONE(),
SELECT
CUSTOMER_ID,
ENRICHED_ORDERS,
FOOD_REVENUE
FOR
CUSTOMER_ID := BIND_VALUES(ARRAY[1, 2, 3]); Common modeling mistakes
DO NOT: put filters in data sources
-- WRONG
JULY_ORDERS := EXTERNAL_COLUMNS(... FROM SQL(SELECT * FROM orders WHERE date >= '2017-07-01'))
-- RIGHT: filter as a feature
ALL_ORDERS := EXTERNAL_COLUMNS(... FROM TABLE(schema.orders)),
IS_JULY := ORDER_DATE >= DATE '2017-07-01',
JULY_ORDERS := ENRICHED_ORDERS.TRANSFORM(SELECT * WHERE order_date >= DATE '2017-07-01'), DO NOT: use ZIP when you already have array of rows
-- WRONG: ORDER_ITEMS is already ARRAY(ROW(product_id, quantity))
EXTEND(ZIP(ORDER_ITEMS AS product_id) WITH PRODUCT_NAME VIA product_id BIND TO PRODUCT_ID)
-- RIGHT: pass the array of rows directly
EXTEND(ORDER_ITEMS WITH PRODUCT_NAME VIA product_id BIND TO PRODUCT_ID) DO NOT: reference external features inside TRANSFORM
-- WRONG
ORDERS.TRANSFORM(SELECT * WHERE ORDER_AMOUNT > THRESHOLD)
-- RIGHT: enrich first, or use IMPORTING
EXTEND(ORDERS WITH THRESHOLD ...).TRANSFORM(SELECT * WHERE threshold > 100e0)
-- or
TRANSFORM(ORDERS USING (SELECT * WHERE amount > threshold) IMPORTING THRESHOLD) DO NOT: forget to bind all entities in the dependency graph
-- If your query references CUSTOMER, ORDER, and PRODUCT features,
-- you must bind all three entity inputs in the FOR clause:
FOR
CUSTOMER_ID := BIND_VALUES(ARRAY[1, 2]),
ORDER_ID := BIND_VALUES(ARRAY[10, 20]),
PRODUCT_ID := BIND_VALUES(ARRAY[100, 200]);