LLM reference — entity-centric modeling

How to model business domains in FeatureQL using entities, relationships, and the core operations EXTEND(), TRANSFORM(), and RELATED(). Load this page when you need to design a feature model or traverse entity relationships.

The entity-centric mental model

FeatureQL organizes data around entities (CUSTOMER, ORDER, PRODUCT), not tables. Each entity has:

  • An ENTITY() declaration
  • An INPUT() that represents its primary key
  • Data source mappings (EXTERNAL_COLUMNS()) that bring in table data
  • Derived features that compute values from the mapped data

Features with the same INPUT() automatically align — no join needed. Cross-entity relationships use RELATED() or EXTEND().

The 5-step modeling workflow

Step 1: Identify entities and their keys

CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),

ORDERS := ENTITY(),
ORDER_ID := INPUT(BIGINT#ORDERS),

PRODUCTS := ENTITY(),
PRODUCT_ID := INPUT(BIGINT#PRODUCTS),
sql

Each entity gets exactly one ENTITY() and one INPUT(). The #ENTITY annotation on the type is critical — it tells FeatureQL which entity a key belongs to.

Step 2: Map external data

One EXTERNAL_COLUMNS() per entity per source table. Keep mappings generic — no filters, no business logic.

-- Dimension table: one row per entity
DIM_CUSTOMERS := EXTERNAL_COLUMNS(
    customer_id BIGINT#CUSTOMERS BIND TO CUSTOMER_ID,
    name VARCHAR,
    email VARCHAR,
    country VARCHAR#COUNTRIES
    FROM TABLE(schema.dim_customers)
),
CUSTOMER_NAME := DIM_CUSTOMERS[name],
CUSTOMER_COUNTRY := DIM_CUSTOMERS[country],

-- Fact table: one row per entity (use EXTERNAL_COLUMNS when key is unique)
FCT_ORDERS := EXTERNAL_COLUMNS(
    order_id BIGINT#ORDERS BIND TO ORDER_ID,
    customer_id BIGINT#CUSTOMERS,
    amount DOUBLE,
    order_date DATE
    FROM TABLE(schema.fct_orders)
),
ORDER_AMOUNT := FCT_ORDERS[amount],
ORDER_DATE := FCT_ORDERS[order_date],
ORDER_CUSTOMER_ID := FCT_ORDERS[customer_id],   -- Foreign key typed as BIGINT#CUSTOMERS
sql

Rule: never put WHERE clauses in data source mappings. Filters are features (Step 3).

Step 3: Define filters as features

Boolean features at the appropriate entity level. These are reusable across queries.

-- At ORDER level
IS_JULY_2017 := ORDER_DATE >= DATE '2017-07-01' AND ORDER_DATE < DATE '2017-08-01',
IS_HIGH_VALUE := ORDER_AMOUNT > 100e0,

-- At PRODUCT level
IS_ELECTRONICS := PRODUCT_CATEGORY = 'electronics',
sql

Step 4: Trace the relationship path

Before writing the query, identify which entities you need to traverse. Example: "total revenue from electronics orders per customer" requires:

CUSTOMER → (has many) ORDERS → (has many) ITEMS → (references) PRODUCT

At each hop, you use EXTEND() to bring in data from the next entity.

Step 5: Build by composing

Features with the same INPUT() auto-align. Use EXTEND() to enrich arrays with cross-entity data. Use TRANSFORM() to filter, aggregate, or sort within arrays.

EXTEND — enrich arrays with related entity data

EXTEND() adds fields to each row in an array by looking up data from another entity. It is the general-purpose join mechanism.

Enriching a scalar array (foreign key array → entity data)

When you have an array of foreign keys (e.g., ARRAY(BIGINT#ORDERS)), first convert to array of rows with ZIP(), then extend:

-- CUSTOMER_ORDER_IDS is ARRAY(BIGINT#ORDERS)
ENRICHED_ORDERS := EXTEND(
    ZIP(CUSTOMER_ORDER_IDS AS order_id)       -- Convert to ARRAY(ROW(order_id BIGINT#ORDERS))
    WITH ORDER_DATE, ORDER_AMOUNT             -- Features at ORDER level to add
    VIA order_id BIND TO ORDER_ID            -- Join: order_id field → ORDER_ID input
),
-- Result type: ARRAY(ROW(order_id BIGINT#ORDERS, order_date DATE, order_amount DOUBLE))
sql

Enriching an existing array of rows

When the array already has rows (e.g., from EXTERNAL_COLUMNS with array fields), no ZIP() needed:

-- ORDER_ITEMS is ARRAY(ROW(product_id BIGINT#PRODUCTS, quantity BIGINT))
ITEMS_ENRICHED := EXTEND(
    ORDER_ITEMS
    WITH PRODUCT_NAME, PRODUCT_PRICE          -- Features at PRODUCT level
    VIA product_id BIND TO PRODUCT_ID        -- Join on existing field
),
-- Result: ARRAY(ROW(product_id, quantity, product_name, product_price))
sql

When VIA / BIND TO can be omitted

If the field in the array has an entity annotation that unambiguously matches an INPUT(), FeatureQL infers the binding:

-- order_id is typed as BIGINT#ORDERS, and ORDER_ID is INPUT(BIGINT#ORDERS)
-- FeatureQL can infer: VIA order_id BIND TO ORDER_ID
ENRICHED := EXTEND(
    ZIP(CUSTOMER_ORDER_IDS AS order_id)
    WITH ORDER_DATE, ORDER_AMOUNT
),
sql

Aliasing extended fields

ENRICHED := EXTEND(
    ZIP(CUSTOMER_ORDER_IDS AS order_id)
    WITH ORDER_DATE AS date, ORDER_AMOUNT AS amount
    VIA order_id BIND TO ORDER_ID
),
-- Access as: ENRICHED[date], ENRICHED[amount]
sql

TRANSFORM — filter, aggregate, sort within arrays

TRANSFORM() runs a mini-query inside each entity's array of rows. It operates only on the fields present in the array — no external feature references.

Why TRANSFORM is self-contained (and why that's good)

This is not a limitation — it is a deliberate design choice that makes FeatureQL simpler and more powerful:

  1. TRANSFORM is a pure function. Given the same array, it always produces the same result. No hidden dependencies, no side effects, no ambient state. This makes every TRANSFORM() independently testable, cacheable, and parallelizable.

  2. Only one level of nesting. FeatureQL has exactly two scopes: the top-level feature graph (where features reference other features) and the inner TRANSFORM() scope (where fields reference other fields within the array). There is never a third level. This means you never need to reason about "which scope am I in?" — if you're inside TRANSFORM(), you work with fields; if you're outside, you work with features.

  3. The EXTEND-then-TRANSFORM pattern is the standard workflow. Think of it as a pipeline: EXTEND() is the "join" step (bring in cross-entity data), TRANSFORM() is the "compute" step (filter, aggregate, sort). Separating these concerns makes each step simple and debuggable. You can inspect the result of EXTEND() before applying TRANSFORM().

  4. It prevents accidental complexity. If TRANSFORM() could reference external features, you'd need to reason about which entity level each reference belongs to, how dependencies resolve across scopes, and whether the external feature is scalar or array-valued. The self-containment rule eliminates all of this.

The mental model: prepare your data with EXTEND(), then process it with TRANSFORM(). If you find yourself wanting to reference an external feature inside TRANSFORM(), that's a signal to add it to the array first via EXTEND() — or use IMPORTING for simple scalar values.

Chained syntax (preferred for simple transforms)

-- Filter
JULY_ORDERS := ENRICHED_ORDERS.TRANSFORM(SELECT * WHERE order_date >= DATE '2017-07-01'),

-- Aggregate (returns single-element array of rows)
TOTAL_SPENT := ENRICHED_ORDERS.TRANSFORM(SELECT SUM(amount)),

-- Aggregate and unwrap to scalar
TOTAL_SPENT_SCALAR := ENRICHED_ORDERS.TRANSFORM(SELECT SUM(amount)).UNWRAP_ONE(),

-- Top N
TOP_3 := ENRICHED_ORDERS.TRANSFORM(SELECT * ORDER BY amount DESC LIMIT 3),

-- Group by
BY_MONTH := ENRICHED_ORDERS.TRANSFORM(
    WITH month := EXTRACT(MONTH FROM order_date)
    SELECT month, SUM(amount) GROUP BY month AS total
    ORDER BY month ASC
),

-- Distinct
UNIQUE_CATEGORIES := ENRICHED_ORDERS.TRANSFORM(SELECT DISTINCT category),
sql

Operator syntax (when you need BY or IMPORTING)

-- BY: explicit aggregation key (needed when array comes from BIND_VALUES, not EXTERNAL_COLUMNS)
RESULT := TRANSFORM(some_array BY key_field USING (SELECT SUM(value) GROUP BY key_field)),

-- IMPORTING: bring external features into the inner query scope
RESULT := TRANSFORM(orders USING (SELECT * WHERE amount > threshold) IMPORTING threshold),
sql

Bringing external data into TRANSFORM

Since TRANSFORM() only sees fields in the array, use one of these patterns when you need external data:

Pattern A: EXTEND first (preferred for entity-level data)

-- Enrich the array with the data you need, then transform
ENRICHED := EXTEND(ORDERS WITH PRODUCT_NAME VIA product_id BIND TO PRODUCT_ID),
RESULT := ENRICHED.TRANSFORM(SELECT * WHERE product_name = 'Widget'),
sql

Pattern B: IMPORTING (for simple scalar values)

-- Import a scalar feature into the inner scope
RESULT := TRANSFORM(ORDERS USING (SELECT * WHERE amount > threshold) IMPORTING THRESHOLD),
sql

Do not try to reference external features directly inside the backtick query — it will fail:

-- WRONG: PRODUCT_NAME is a feature, not a field in the array
RESULT := ORDERS.TRANSFORM(SELECT * WHERE PRODUCT_NAME = 'Widget')
sql

UNWRAP and UNWRAP_ONE

After TRANSFORM() aggregation, the result is still an array of rows. Use UNWRAP() or UNWRAP_ONE() to extract values:

-- UNWRAP: extract single-field values as a simple array
-- ARRAY(ROW(amount DOUBLE)) → ARRAY(DOUBLE)
AMOUNTS := ORDERS.TRANSFORM(SELECT amount WHERE amount > 50e0).UNWRAP(),

-- UNWRAP_ONE: extract single value from single-element, single-field result
-- ARRAY(ROW(total DOUBLE)) where array has exactly 1 element → DOUBLE
TOTAL := ORDERS.TRANSFORM(SELECT SUM(amount)).UNWRAP_ONE(),
sql

RELATED — concise cross-entity joins

RELATED() is syntactic sugar for common cross-entity patterns. Use it for simple single-key joins; switch to EXTEND() for complex cases.

Lookup (FK → PK): get a value from another entity

-- Order has a customer_id FK. Get the customer's name.
ORDER_CUSTOMER_NAME := RELATED(CUSTOMER_NAME VIA FCT_ORDERS[customer_id]),

-- Chained syntax
ORDER_CUSTOMER_NAME := FCT_ORDERS[customer_id].RELATED(CUSTOMER_NAME),
sql

Aggregation (PK → FKs): aggregate across related entities

-- Count orders per customer (CUSTOMER_ID is PK side)
NUM_ORDERS := RELATED(COUNT(1) GROUP BY FCT_ORDERS[customer_id] VIA CUSTOMER_ID),

-- Chained syntax
NUM_ORDERS := CUSTOMER_ID.RELATED(COUNT(1) GROUP BY FCT_ORDERS[customer_id]),

-- Sum with filter
JULY_REVENUE := CUSTOMER_ID.RELATED(
    SUM(ORDER_AMOUNT) FILTER (WHERE IS_JULY_2017)
    GROUP BY FCT_ORDERS[customer_id]
),
sql

When to use RELATED vs EXTEND

ScenarioUse
Simple FK lookup (get one value from another entity)RELATED()
Simple aggregation across a FK relationshipRELATED()
Enrich an array of rows with multiple fieldsEXTEND()
Multi-hop traversal (Customer → Orders → Items → Products)EXTEND() chain
Need to filter/aggregate the enriched array afterwardEXTEND() + TRANSFORM()

Complete modeling example

"For each customer, compute total revenue from food orders":

WITH
    -- Entities
    CUSTOMERS := ENTITY(),
    CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
    ORDERS := ENTITY(),
    ORDER_ID := INPUT(BIGINT#ORDERS),

    -- Data sources (generic, no filters)
    CUSTOMER_SOURCE := EXTERNAL_COLUMNS(
        customer_id BIGINT#CUSTOMERS BIND TO CUSTOMER_ID,
        order_ids ARRAY(BIGINT#ORDERS)
        FROM TABLE(schema.customer_orders)
    ),
    ORDER_SOURCE := EXTERNAL_COLUMNS(
        order_id BIGINT#ORDERS BIND TO ORDER_ID,
        price DOUBLE,
        category VARCHAR
        FROM TABLE(schema.orders)
    ),

    -- Extract fields
    ORDER_PRICE := ORDER_SOURCE[price],
    ORDER_CATEGORY := ORDER_SOURCE[category],

    -- Enrich customer's orders with order-level data
    ENRICHED_ORDERS := EXTEND(
        ZIP(CUSTOMER_SOURCE[order_ids] AS order_id)
        WITH ORDER_PRICE, ORDER_CATEGORY
        VIA order_id BIND TO ORDER_ID
    ),

    -- Filter and aggregate
    FOOD_REVENUE := ENRICHED_ORDERS
        .TRANSFORM(SELECT SUM(order_price) WHERE order_category = 'food')
        .UNWRAP_ONE(),

SELECT
    CUSTOMER_ID,
    ENRICHED_ORDERS,
    FOOD_REVENUE
FOR
    CUSTOMER_ID := BIND_VALUES(ARRAY[1, 2, 3]);
sql

Common modeling mistakes

DO NOT: put filters in data sources

-- WRONG
JULY_ORDERS := EXTERNAL_COLUMNS(... FROM SQL(SELECT * FROM orders WHERE date >= '2017-07-01'))

-- RIGHT: filter as a feature
ALL_ORDERS := EXTERNAL_COLUMNS(... FROM TABLE(schema.orders)),
IS_JULY := ORDER_DATE >= DATE '2017-07-01',
JULY_ORDERS := ENRICHED_ORDERS.TRANSFORM(SELECT * WHERE order_date >= DATE '2017-07-01'),
sql

DO NOT: use ZIP when you already have array of rows

-- WRONG: ORDER_ITEMS is already ARRAY(ROW(product_id, quantity))
EXTEND(ZIP(ORDER_ITEMS AS product_id) WITH PRODUCT_NAME VIA product_id BIND TO PRODUCT_ID)

-- RIGHT: pass the array of rows directly
EXTEND(ORDER_ITEMS WITH PRODUCT_NAME VIA product_id BIND TO PRODUCT_ID)
sql

DO NOT: reference external features inside TRANSFORM

-- WRONG
ORDERS.TRANSFORM(SELECT * WHERE ORDER_AMOUNT > THRESHOLD)

-- RIGHT: enrich first, or use IMPORTING
EXTEND(ORDERS WITH THRESHOLD ...).TRANSFORM(SELECT * WHERE threshold > 100e0)
-- or
TRANSFORM(ORDERS USING (SELECT * WHERE amount > threshold) IMPORTING THRESHOLD)
sql

DO NOT: forget to bind all entities in the dependency graph

-- If your query references CUSTOMER, ORDER, and PRODUCT features,
-- you must bind all three entity inputs in the FOR clause:
FOR
    CUSTOMER_ID := BIND_VALUES(ARRAY[1, 2]),
    ORDER_ID := BIND_VALUES(ARRAY[10, 20]),
    PRODUCT_ID := BIND_VALUES(ARRAY[100, 200]);
sql
Last update at: 2026/03/03 16:47:38
Last updated: 2026-03-03 16:48:19