Tricky modeling patterns

Some entity-relationship patterns look straightforward but trip up both humans and LLMs when translated to FeatureQL. This page walks through six patterns that require careful thinking about entities, foreign keys, and array transformations.

P1: Dual foreign key — same entity referenced twice

A flights table has two columns that both reference the airports entity: flight_origin_id and flight_dest_id. The challenge: for each flight, resolve both to the airport's city name.

The key insight is that RELATED() can be called multiple times with different VIA paths pointing to the same entity. Each call follows a different FK column but resolves the same target feature (AIRPORT_CITY):

FeatureQL
WITH
    AIRPORTS := ENTITY(),
    AIRPORT_ID := INPUT(BIGINT#AIRPORTS),
    FLIGHTS := ENTITY(),
    FLIGHT_ID := INPUT(BIGINT#FLIGHTS),
    DIM_AIRPORTS := EXTERNAL_COLUMNS(
        airport_id BIGINT#AIRPORTS BIND TO AIRPORT_ID,
        city VARCHAR
        FROM TABLE(airport.dim_airports)
    ),
    AIRPORT_CITY := DIM_AIRPORTS[city],
    FCT_FLIGHTS := EXTERNAL_COLUMNS(
        flight_id BIGINT#FLIGHTS BIND TO FLIGHT_ID,
        flight_origin_id BIGINT#AIRPORTS,
        flight_dest_id BIGINT#AIRPORTS,
        airline VARCHAR
        FROM TABLE(airport.fct_flights)
    ),
    FLIGHT_AIRLINE := FCT_FLIGHTS[airline],
    ORIGIN_CITY := RELATED(AIRPORT_CITY VIA FCT_FLIGHTS[flight_origin_id]),
    DEST_CITY := RELATED(AIRPORT_CITY VIA FCT_FLIGHTS[flight_dest_id]),
SELECT
    FLIGHT_ID,
    FLIGHT_AIRLINE AS AIRLINE,
    ORIGIN_CITY,
    DEST_CITY,
FOR CROSS
    FLIGHT_ID := BIND_VALUES(SEQUENCE(200, 207)),
    AIRPORT_ID := BIND_VALUES(SEQUENCE(100, 104)),
;
Result
FLIGHT_ID BIGINTAIRLINE VARCHARORIGIN_CITY VARCHARDEST_CITY VARCHAR
200BALondonParis
201KLMLondonAmsterdam
202AFParisMadrid
203KLMAmsterdamLondon
204AZAmsterdamRome
205IBMadridLondon
206IBMadridAmsterdam
207AZRomeParis

The same pattern works for reverse aggregations. To count departures and arrivals per airport, use two RELATED() calls with different GROUP BY FK columns. COALESCE() handles airports that may have zero departures or zero arrivals:

FeatureQL
WITH
    AIRPORTS := ENTITY(),
    AIRPORT_ID := INPUT(BIGINT#AIRPORTS),
    FLIGHTS := ENTITY(),
    FLIGHT_ID := INPUT(BIGINT#FLIGHTS),
    DIM_AIRPORTS := EXTERNAL_COLUMNS(
        airport_id BIGINT#AIRPORTS BIND TO AIRPORT_ID,
        name VARCHAR
        FROM TABLE(airport.dim_airports)
    ),
    AIRPORT_NAME := DIM_AIRPORTS[name],
    FCT_FLIGHTS := EXTERNAL_COLUMNS(
        flight_id BIGINT#FLIGHTS BIND TO FLIGHT_ID,
        flight_origin_id BIGINT#AIRPORTS,
        flight_dest_id BIGINT#AIRPORTS,
        FROM TABLE(airport.fct_flights)
    ),
    NUM_DEPARTURES := AIRPORT_ID.RELATED(COUNT(1) GROUP BY FCT_FLIGHTS[flight_origin_id]),
    NUM_ARRIVALS := AIRPORT_ID.RELATED(COUNT(1) GROUP BY FCT_FLIGHTS[flight_dest_id]),
    TOTAL_FLIGHTS := COALESCE(NUM_DEPARTURES, 0) + COALESCE(NUM_ARRIVALS, 0),
SELECT
    AIRPORT_ID,
    AIRPORT_NAME,
    NUM_DEPARTURES,
    NUM_ARRIVALS,
    TOTAL_FLIGHTS,
FOR CROSS
    AIRPORT_ID := BIND_VALUES(SEQUENCE(100, 104)),
    FLIGHT_ID := BIND_VALUES(SEQUENCE(200, 207)),
;
Result
AIRPORT_ID BIGINTAIRPORT_NAME VARCHARNUM_DEPARTURES BIGINTNUM_ARRIVALS BIGINTTOTAL_FLIGHTS VARCHAR
100Heathrow224
101Orly123
102Schiphol224
103Barajas213
104Fiumicino112

Pattern: when the same entity appears as two different FK columns in a table, call RELATED() once per FK with a different VIA clause. No special syntax is needed — FeatureQL resolves each path independently.

P2: Successive aggregations — group then re-aggregate

For each customer, compute revenue per product category (first aggregation), then find the top category (second aggregation). The naive mistake is trying to do both in a single TRANSFORM() — but you cannot nest GROUP BY inside TRANSFORM.

The solution chains two TRANSFORM() calls. The first groups by category and sums amounts. The second takes that already-grouped result and picks the top row:

FeatureQL
WITH
    CUSTOMERS := ENTITY(),
    CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
    ORDERS := ENTITY(),
    ORDER_ID := INPUT(BIGINT#ORDERS),
    CUST_ORDERS := INLINE_COLUMNS(
        customer_id BIGINT#CUSTOMERS BIND TO CUSTOMER_ID,
        order_ids ARRAY(BIGINT#ORDERS)
        FROM CSV(
            customer_id,order_ids
            1,"[10, 11, 12]"
            2,"[13, 14, 15]"
        )
    ),
    ORDER_IDS := CUST_ORDERS[order_ids],
    ORD_DATA := INLINE_COLUMNS(
        order_id BIGINT#ORDERS BIND TO ORDER_ID,
        category VARCHAR,
        amount DOUBLE
        FROM CSV(
            order_id,category,amount
            10,electronics,50.0
            11,books,20.0
            12,electronics,80.0
            13,clothing,60.0
            14,electronics,150.0
            15,clothing,40.0
        )
    ),
    ORDER_CATEGORY := ORD_DATA[category],
    ORDER_AMOUNT := ORD_DATA[amount],
    ENRICHED_ORDERS := EXTEND(
        ZIP(ORDER_IDS AS order_id)
        WITH ORDER_CATEGORY AS category, ORDER_AMOUNT AS amount
        VIA order_id BIND TO ORDER_ID
    ),
    REVENUE_BY_CATEGORY := ENRICHED_ORDERS.TRANSFORM(
        SELECT category, SUM(amount) GROUP BY category AS total
        ORDER BY total DESC
    ),
    TOP_CATEGORY := REVENUE_BY_CATEGORY.TRANSFORM(
        SELECT category, total ORDER BY total DESC LIMIT 1
    ),
SELECT
    CUSTOMER_ID,
    REVENUE_BY_CATEGORY,
    TOP_CATEGORY,
FOR CROSS
    CUSTOMER_ID := BIND_VALUES(ARRAY[1, 2]),
    ORDER_ID := BIND_VALUES(SEQUENCE(10, 15)),
;
Result
CUSTOMER_ID BIGINTREVENUE_BY_CATEGORY VARCHARTOP_CATEGORY VARCHAR
1[{category: electronics, total: 130.0}, {category: books, total: 20.0}][{category: electronics, total: 130.0}]
2[{category: electronics, total: 150.0}, {category: clothing, total: 100.0}][{category: electronics, total: 150.0}]

Pattern: when you need to aggregate, then aggregate again on the result, chain .TRANSFORM(SELECT ... GROUP BY ...) followed by .TRANSFORM(SELECT ... ORDER BY ... LIMIT 1). Each TRANSFORM operates on the output of the previous one.

P3: Self-referential entity (not yet supported)

An employees table where manager_id is a FK back to the same BIGINT#EMPLOYEES entity. The goal: for each employee, get their manager's name.

This requires a self-join — looking up EMP_NAME for a different row of the same entity. In standard SQL this is a simple JOIN employees e2 ON e1.manager_id = e2.employee_id.

In FeatureQL, RELATED(EMP_NAME VIA EMP_MANAGER_ID) does not produce the correct result because the FK points to the same entity type. The resolution stays within the current entity scope instead of performing a cross-row lookup.

Status: self-referential entities are a known limitation. The workaround is to pre-compute the manager name in SQL (e.g., in an OBT) and import it as a plain VARCHAR column rather than as an entity FK.

P4: Many-to-many through a junction table

Students enroll in courses through an enrollments junction table. Each enrollment has a student FK, a course FK, and a grade. The goal: for each student, list their enrollments with grades and compute their GPA.

The modeling approach: create a pre-aggregated OBT with enrollment_ids BIGINT[] per student, then EXTEND() the array with grade data from the enrollments table:

FeatureQL
WITH
    STUDENTS := ENTITY(),
    STUDENT_ID := INPUT(BIGINT#STUDENTS),
    ENROLLMENTS := ENTITY(),
    ENROLLMENT_ID := INPUT(BIGINT#ENROLLMENTS),
    STUDENT_OBT := EXTERNAL_COLUMNS(
        student_id BIGINT#STUDENTS BIND TO STUDENT_ID,
        enrollment_ids ARRAY(BIGINT#ENROLLMENTS)
        FROM TABLE(school.student_enrollments_obt)
    ),
    ENROLL_DATA := EXTERNAL_COLUMNS(
        enrollment_id BIGINT#ENROLLMENTS BIND TO ENROLLMENT_ID,
        enroll_course_id BIGINT,
        grade DOUBLE
        FROM TABLE(school.enrollments)
    ),
    ENROLL_GRADE := ENROLL_DATA[grade],
    ENROLL_COURSE_ID_RAW := ENROLL_DATA[enroll_course_id],
    ENRICHED := EXTEND(
        ZIP(STUDENT_OBT[enrollment_ids] AS enrollment_id)
        WITH ENROLL_GRADE AS grade, ENROLL_COURSE_ID_RAW AS course_id
        VIA enrollment_id BIND TO ENROLLMENT_ID
    ),
    GPA := ENRICHED.TRANSFORM(SELECT AVG(grade)).UNWRAP_ONE(),
SELECT
    STUDENT_ID,
    STUDENT_NAME := EXTERNAL_COLUMNS(
        student_id BIGINT#STUDENTS BIND TO STUDENT_ID,
        name VARCHAR
        FROM TABLE(school.students)
    )[name],
    ENRICHED AS ENROLLMENTS_WITH_GRADES,
    GPA,
FOR CROSS
    STUDENT_ID := BIND_VALUES(SEQUENCE(10, 14)),
    ENROLLMENT_ID := BIND_VALUES(SEQUENCE(30, 42)),
;
Result
STUDENT_ID BIGINTSTUDENT_NAME VARCHARENROLLMENTS_WITH_GRADES VARCHARGPA VARCHAR
10Ana[{enrollment_id: 30, grade: 3.7, course_id: 20}, {enrollment_id: 31, grade: 3.3, course_id: 21}, {enrollment_id: 32, grade: 3.9, course_id: 22}]3.6333333333333333
11Ben[{enrollment_id: 33, grade: 2.8, course_id: 21}, {enrollment_id: 34, grade: 3.1, course_id: 23}]2.95
12Cara[{enrollment_id: 35, grade: 3.5, course_id: 20}, {enrollment_id: 36, grade: 3.8, course_id: 22}]3.65
13Dan[{enrollment_id: 37, grade: 2.5, course_id: 21}, {enrollment_id: 38, grade: 2.9, course_id: 23}]2.7
14Eve[{enrollment_id: 39, grade: 3.6, course_id: 20}, {enrollment_id: 40, grade: 3.4, course_id: 21}, {enrollment_id: 41, grade: 3.2, course_id: 22}, {enrollment_id: 42, grade: 3.8, course_id: 23}]3.5

Pattern: for many-to-many relationships, model the junction table as its own entity. Create an OBT with the array of junction-entity IDs, then EXTEND() + TRANSFORM() to enrich and aggregate.

Known limitation: enriching with a feature that itself crosses entity boundaries via RELATED() through EXTERNAL_COLUMNS (e.g., resolving COURSE_TITLE through enroll_course_id) currently triggers an EXTERNAL_SQL_EXTRACT error on DuckDB. The workaround is to include the raw FK column (as a plain BIGINT, not BIGINT#COURSES) and resolve course names in a post-processing step.

P5: Mixed grain — scalar vs array comparison with CARRY (bug)

For each customer, compute their average order value (a scalar), then flag each individual order as above or below that average. This forces a grain mismatch: the average lives at the customer level, but the comparison must happen at the order level inside TRANSFORM().

The intended solution uses CARRY() to inject the customer-level scalar into each row of the order array, making it available inside TRANSFORM():

AVG_ORDER_VALUE := ENRICHED_ORDERS.TRANSFORM(SELECT AVG(amount)).UNWRAP_ONE(),
ORDERS_FLAGGED := ENRICHED_ORDERS.CARRY(AVG_ORDER_VALUE).TRANSFORM(
    SELECT order_id, amount,
        IF(amount > avg_order_value, 'above', 'below') AS flag
    ORDER BY order_id
),
null

Status: CARRY().TRANSFORM() chaining works, but the comparison inside TRANSFORM requires precision-annotated operators for DOUBLE values. Use >.2 instead of > (the .2 specifies 2 digits of precision). Plain > on DOUBLE values triggers COMPARISON-DIGITS-EMPTY. See Pitfall patterns P3 for a working example.

P6: Temporal sequences — window functions in TRANSFORM (bug)

For each customer's events, compute the time gap between successive events and flag gaps longer than 30 days. This requires LAG() inside TRANSFORM() to compare adjacent rows within the array.

The intended solution:

EVENTS_WITH_GAPS := ENRICHED_EVENTS.TRANSFORM(
    WITH prev_ts := LAG(ts, 1) OVER (ORDER BY ts ASC)
    SELECT event_id, ts, type, amount,
        DATE_DIFF(ts, prev_ts, 'DAY') AS gap_days
    ORDER BY ts ASC
),
HAS_LONG_GAP := EVENTS_WITH_GAPS.TRANSFORM(
    SELECT BOOL_OR(gap_days > 30)
).UNWRAP_ONE(),
null

Status: window functions (LAG, LEAD, ROW_NUMBER, etc.) inside TRANSFORM() after EXTEND() currently trigger an internal server error. Window functions work at the top level (see Window functions reference ) but not yet inside array-level TRANSFORM(). The workaround is to pre-compute temporal gaps in SQL before importing into FeatureQL.

Pattern summary

PatternTechniqueStatus
Dual FK (same entity, two columns)Two RELATED() calls with different VIAWorking
Successive aggregationsChain .TRANSFORM(GROUP BY) then .TRANSFORM(ORDER BY LIMIT)Working
Self-referential entityRELATED() via same-entity FKNot yet supported
Many-to-many (junction table)OBT with array of junction IDs + EXTEND()Working (partial)
Mixed grain (CARRY)CARRY() + TRANSFORM()Working (requires >.N for DOUBLE comparisons)
Temporal sequencesLAG() inside TRANSFORM()Bug — window functions in TRANSFORM

What's next

Last update at: 2026/04/27 15:40:31