Tricky modeling patterns

Some entity-relationship patterns look straightforward but trip up both humans and LLMs when translated to FeatureQL. This page walks through six patterns that require careful thinking about entities, foreign keys, and array transformations.

P1: Dual foreign key — same entity referenced twice

A flights table has two columns that both reference the airports entity: flight_origin_id and flight_dest_id. The challenge: for each flight, resolve both to the airport's city name.

The key insight is that RELATED() can be called multiple times with different VIA paths pointing to the same entity. Each call follows a different FK column but resolves the same target feature (AIRPORT_CITY):

FeatureQL

WITH
    AIRPORTS := ENTITY(),
    AIRPORT_ID := INPUT(BIGINT#AIRPORTS),
    FLIGHTS := ENTITY(),
    FLIGHT_ID := INPUT(BIGINT#FLIGHTS),
    DIM_AIRPORTS := EXTERNAL_COLUMNS(
        airport_id BIGINT#AIRPORTS BIND TO AIRPORT_ID,
        city VARCHAR
        FROM TABLE(airport.dim_airports)
    ),
    AIRPORT_CITY := DIM_AIRPORTS[city],
    FCT_FLIGHTS := EXTERNAL_COLUMNS(
        flight_id BIGINT#FLIGHTS BIND TO FLIGHT_ID,
        flight_origin_id BIGINT#AIRPORTS,
        flight_dest_id BIGINT#AIRPORTS,
        airline VARCHAR
        FROM TABLE(airport.fct_flights)
    ),
    FLIGHT_AIRLINE := FCT_FLIGHTS[airline],
    ORIGIN_CITY := RELATED(AIRPORT_CITY VIA FCT_FLIGHTS[flight_origin_id]),
    DEST_CITY := RELATED(AIRPORT_CITY VIA FCT_FLIGHTS[flight_dest_id]),
SELECT
    FLIGHT_ID,
    FLIGHT_AIRLINE AS AIRLINE,
    ORIGIN_CITY,
    DEST_CITY,
FOR CROSS
    FLIGHT_ID := BIND_VALUES(SEQUENCE(200, 207)),
    AIRPORT_ID := BIND_VALUES(SEQUENCE(100, 104)),
;

Result

FLIGHT_ID BIGINT	AIRLINE VARCHAR	ORIGIN_CITY VARCHAR	DEST_CITY VARCHAR
200	BA	London	Paris
201	KLM	London	Amsterdam
202	AF	Paris	Madrid
203	KLM	Amsterdam	London
204	AZ	Amsterdam	Rome
205	IB	Madrid	London
206	IB	Madrid	Amsterdam
207	AZ	Rome	Paris

The same pattern works for reverse aggregations. To count departures and arrivals per airport, use two RELATED() calls with different GROUP BY FK columns. COALESCE() handles airports that may have zero departures or zero arrivals:

FeatureQL

WITH
    AIRPORTS := ENTITY(),
    AIRPORT_ID := INPUT(BIGINT#AIRPORTS),
    FLIGHTS := ENTITY(),
    FLIGHT_ID := INPUT(BIGINT#FLIGHTS),
    DIM_AIRPORTS := EXTERNAL_COLUMNS(
        airport_id BIGINT#AIRPORTS BIND TO AIRPORT_ID,
        name VARCHAR
        FROM TABLE(airport.dim_airports)
    ),
    AIRPORT_NAME := DIM_AIRPORTS[name],
    FCT_FLIGHTS := EXTERNAL_COLUMNS(
        flight_id BIGINT#FLIGHTS BIND TO FLIGHT_ID,
        flight_origin_id BIGINT#AIRPORTS,
        flight_dest_id BIGINT#AIRPORTS,
        FROM TABLE(airport.fct_flights)
    ),
    NUM_DEPARTURES := AIRPORT_ID.RELATED(COUNT(1) GROUP BY FCT_FLIGHTS[flight_origin_id]),
    NUM_ARRIVALS := AIRPORT_ID.RELATED(COUNT(1) GROUP BY FCT_FLIGHTS[flight_dest_id]),
    TOTAL_FLIGHTS := COALESCE(NUM_DEPARTURES, 0) + COALESCE(NUM_ARRIVALS, 0),
SELECT
    AIRPORT_ID,
    AIRPORT_NAME,
    NUM_DEPARTURES,
    NUM_ARRIVALS,
    TOTAL_FLIGHTS,
FOR CROSS
    AIRPORT_ID := BIND_VALUES(SEQUENCE(100, 104)),
    FLIGHT_ID := BIND_VALUES(SEQUENCE(200, 207)),
;

Result

AIRPORT_ID BIGINT	AIRPORT_NAME VARCHAR	NUM_DEPARTURES BIGINT	NUM_ARRIVALS BIGINT	TOTAL_FLIGHTS VARCHAR
100	Heathrow	2	2	4
101	Orly	1	2	3
102	Schiphol	2	2	4
103	Barajas	2	1	3
104	Fiumicino	1	1	2

Pattern: when the same entity appears as two different FK columns in a table, call RELATED() once per FK with a different VIA clause. No special syntax is needed — FeatureQL resolves each path independently.

P2: Successive aggregations — group then re-aggregate

For each customer, compute revenue per product category (first aggregation), then find the top category (second aggregation). The naive mistake is trying to do both in a single TRANSFORM() — but you cannot nest GROUP BY inside TRANSFORM.

The solution chains two TRANSFORM() calls. The first groups by category and sums amounts. The second takes that already-grouped result and picks the top row:

FeatureQL

WITH
    CUSTOMERS := ENTITY(),
    CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
    ORDERS := ENTITY(),
    ORDER_ID := INPUT(BIGINT#ORDERS),
    CUST_ORDERS := INLINE_COLUMNS(
        customer_id BIGINT#CUSTOMERS BIND TO CUSTOMER_ID,
        order_ids ARRAY(BIGINT#ORDERS)
        FROM CSV(
            customer_id,order_ids
            1,"[10, 11, 12]"
            2,"[13, 14, 15]"
        )
    ),
    ORDER_IDS := CUST_ORDERS[order_ids],
    ORD_DATA := INLINE_COLUMNS(
        order_id BIGINT#ORDERS BIND TO ORDER_ID,
        category VARCHAR,
        amount DOUBLE
        FROM CSV(
            order_id,category,amount
            10,electronics,50.0
            11,books,20.0
            12,electronics,80.0
            13,clothing,60.0
            14,electronics,150.0
            15,clothing,40.0
        )
    ),
    ORDER_CATEGORY := ORD_DATA[category],
    ORDER_AMOUNT := ORD_DATA[amount],
    ENRICHED_ORDERS := EXTEND(
        ZIP(ORDER_IDS AS order_id)
        WITH ORDER_CATEGORY AS category, ORDER_AMOUNT AS amount
        VIA order_id BIND TO ORDER_ID
    ),
    REVENUE_BY_CATEGORY := ENRICHED_ORDERS.TRANSFORM(
        SELECT category, SUM(amount) GROUP BY category AS total
        ORDER BY total DESC
    ),
    TOP_CATEGORY := REVENUE_BY_CATEGORY.TRANSFORM(
        SELECT category, total ORDER BY total DESC LIMIT 1
    ),
SELECT
    CUSTOMER_ID,
    REVENUE_BY_CATEGORY,
    TOP_CATEGORY,
FOR CROSS
    CUSTOMER_ID := BIND_VALUES(ARRAY[1, 2]),
    ORDER_ID := BIND_VALUES(SEQUENCE(10, 15)),
;

Result

CUSTOMER_ID BIGINT	REVENUE_BY_CATEGORY VARCHAR	TOP_CATEGORY VARCHAR
1	[{category: electronics, total: 130.0}, {category: books, total: 20.0}]	[{category: electronics, total: 130.0}]
2	[{category: electronics, total: 150.0}, {category: clothing, total: 100.0}]	[{category: electronics, total: 150.0}]

Pattern: when you need to aggregate, then aggregate again on the result, chain .TRANSFORM(SELECT ... GROUP BY ...) followed by .TRANSFORM(SELECT ... ORDER BY ... LIMIT 1). Each TRANSFORM operates on the output of the previous one.

P3: Self-referential entity (not yet supported)

An employees table where manager_id is a FK back to the same BIGINT#EMPLOYEES entity. The goal: for each employee, get their manager's name.

This requires a self-join — looking up EMP_NAME for a different row of the same entity. In standard SQL this is a simple JOIN employees e2 ON e1.manager_id = e2.employee_id.

In FeatureQL, RELATED(EMP_NAME VIA EMP_MANAGER_ID) does not produce the correct result because the FK points to the same entity type. The resolution stays within the current entity scope instead of performing a cross-row lookup.

Status: self-referential entities are a known limitation. The workaround is to pre-compute the manager name in SQL (e.g., in an OBT) and import it as a plain VARCHAR column rather than as an entity FK.

P4: Many-to-many through a junction table

Students enroll in courses through an enrollments junction table. Each enrollment has a student FK, a course FK, and a grade. The goal: for each student, list their enrollments with grades and compute their GPA.

The modeling approach: create a pre-aggregated OBT with enrollment_ids BIGINT[] per student, then EXTEND() the array with grade data from the enrollments table:

FeatureQL

WITH
    STUDENTS := ENTITY(),
    STUDENT_ID := INPUT(BIGINT#STUDENTS),
    ENROLLMENTS := ENTITY(),
    ENROLLMENT_ID := INPUT(BIGINT#ENROLLMENTS),
    STUDENT_OBT := EXTERNAL_COLUMNS(
        student_id BIGINT#STUDENTS BIND TO STUDENT_ID,
        enrollment_ids ARRAY(BIGINT#ENROLLMENTS)
        FROM TABLE(school.student_enrollments_obt)
    ),
    ENROLL_DATA := EXTERNAL_COLUMNS(
        enrollment_id BIGINT#ENROLLMENTS BIND TO ENROLLMENT_ID,
        enroll_course_id BIGINT,
        grade DOUBLE
        FROM TABLE(school.enrollments)
    ),
    ENROLL_GRADE := ENROLL_DATA[grade],
    ENROLL_COURSE_ID_RAW := ENROLL_DATA[enroll_course_id],
    ENRICHED := EXTEND(
        ZIP(STUDENT_OBT[enrollment_ids] AS enrollment_id)
        WITH ENROLL_GRADE AS grade, ENROLL_COURSE_ID_RAW AS course_id
        VIA enrollment_id BIND TO ENROLLMENT_ID
    ),
    GPA := ENRICHED.TRANSFORM(SELECT AVG(grade)).UNWRAP_ONE(),
SELECT
    STUDENT_ID,
    STUDENT_NAME := EXTERNAL_COLUMNS(
        student_id BIGINT#STUDENTS BIND TO STUDENT_ID,
        name VARCHAR
        FROM TABLE(school.students)
    )[name],
    ENRICHED AS ENROLLMENTS_WITH_GRADES,
    GPA,
FOR CROSS
    STUDENT_ID := BIND_VALUES(SEQUENCE(10, 14)),
    ENROLLMENT_ID := BIND_VALUES(SEQUENCE(30, 42)),
;

Result

STUDENT_ID BIGINT	STUDENT_NAME VARCHAR	ENROLLMENTS_WITH_GRADES VARCHAR	GPA VARCHAR
10	Ana	[{enrollment_id: 30, grade: 3.7, course_id: 20}, {enrollment_id: 31, grade: 3.3, course_id: 21}, {enrollment_id: 32, grade: 3.9, course_id: 22}]	3.6333333333333333
11	Ben	[{enrollment_id: 33, grade: 2.8, course_id: 21}, {enrollment_id: 34, grade: 3.1, course_id: 23}]	2.95
12	Cara	[{enrollment_id: 35, grade: 3.5, course_id: 20}, {enrollment_id: 36, grade: 3.8, course_id: 22}]	3.65
13	Dan	[{enrollment_id: 37, grade: 2.5, course_id: 21}, {enrollment_id: 38, grade: 2.9, course_id: 23}]	2.7
14	Eve	[{enrollment_id: 39, grade: 3.6, course_id: 20}, {enrollment_id: 40, grade: 3.4, course_id: 21}, {enrollment_id: 41, grade: 3.2, course_id: 22}, {enrollment_id: 42, grade: 3.8, course_id: 23}]	3.5

Pattern: for many-to-many relationships, model the junction table as its own entity. Create an OBT with the array of junction-entity IDs, then EXTEND() + TRANSFORM() to enrich and aggregate.

Known limitation: enriching with a feature that itself crosses entity boundaries via RELATED() through EXTERNAL_COLUMNS (e.g., resolving COURSE_TITLE through enroll_course_id) currently triggers an EXTERNAL_SQL_EXTRACT error on DuckDB. The workaround is to include the raw FK column (as a plain BIGINT, not BIGINT#COURSES) and resolve course names in a post-processing step.

P5: Mixed grain — scalar vs array comparison with CARRY (bug)

For each customer, compute their average order value (a scalar), then flag each individual order as above or below that average. This forces a grain mismatch: the average lives at the customer level, but the comparison must happen at the order level inside TRANSFORM().

The intended solution uses CARRY() to inject the customer-level scalar into each row of the order array, making it available inside TRANSFORM():

AVG_ORDER_VALUE := ENRICHED_ORDERS.TRANSFORM(SELECT AVG(amount)).UNWRAP_ONE(),
ORDERS_FLAGGED := ENRICHED_ORDERS.CARRY(AVG_ORDER_VALUE).TRANSFORM(
    SELECT order_id, amount,
        IF(amount > avg_order_value, 'above', 'below') AS flag
    ORDER BY order_id
),

null

Status: CARRY().TRANSFORM() chaining works, but the comparison inside TRANSFORM requires precision-annotated operators for DOUBLE values. Use >.2 instead of > (the .2 specifies 2 digits of precision). Plain > on DOUBLE values triggers COMPARISON-DIGITS-EMPTY. See Pitfall patterns P3 for a working example.

P6: Temporal sequences — window functions in TRANSFORM (bug)

For each customer's events, compute the time gap between successive events and flag gaps longer than 30 days. This requires LAG() inside TRANSFORM() to compare adjacent rows within the array.

The intended solution:

EVENTS_WITH_GAPS := ENRICHED_EVENTS.TRANSFORM(
    WITH prev_ts := LAG(ts, 1) OVER (ORDER BY ts ASC)
    SELECT event_id, ts, type, amount,
        DATE_DIFF(ts, prev_ts, 'DAY') AS gap_days
    ORDER BY ts ASC
),
HAS_LONG_GAP := EVENTS_WITH_GAPS.TRANSFORM(
    SELECT BOOL_OR(gap_days > 30)
).UNWRAP_ONE(),

null

Status: window functions (LAG, LEAD, ROW_NUMBER, etc.) inside TRANSFORM() after EXTEND() currently trigger an internal server error. Window functions work at the top level (see Window functions reference ) but not yet inside array-level TRANSFORM(). The workaround is to pre-compute temporal gaps in SQL before importing into FeatureQL.

Pattern summary

Pattern	Technique	Status
Dual FK (same entity, two columns)	Two `RELATED()` calls with different `VIA`	Working
Successive aggregations	Chain `.TRANSFORM(GROUP BY)` then `.TRANSFORM(ORDER BY LIMIT)`	Working
Self-referential entity	`RELATED()` via same-entity FK	Not yet supported
Many-to-many (junction table)	OBT with array of junction IDs + `EXTEND()`	Working (partial)
Mixed grain (CARRY)	`CARRY()` + `TRANSFORM()`	Working (requires `>.N` for DOUBLE comparisons)
Temporal sequences	`LAG()` inside `TRANSFORM()`	Bug — window functions in TRANSFORM

What's next

RELATED() reference — full syntax for cross-entity lookups
EXTEND() reference — enriching arrays with cross-entity data
TRANSFORM() reference — filtering, sorting, aggregating arrays
Window functions — top-level window function usage

Suggest changes to this page

Last update at: 2026/04/27 15:40:31

On this page

P1: Dual foreign key — same entity referenced twice P2: Successive aggregations — group then re-aggregate P3: Self-referential entity (not yet supported)P4: Many-to-many through a junction table P5: Mixed grain — scalar vs array comparison with CARRY (bug)P6: Temporal sequences — window functions in TRANSFORM (bug)Pattern summary What's next