Tricky modeling patterns
Some entity-relationship patterns look straightforward but trip up both humans and LLMs when translated to FeatureQL. This page walks through six patterns that require careful thinking about entities, foreign keys, and array transformations.
P1: Dual foreign key — same entity referenced twice
A flights table has two columns that both reference the airports entity: flight_origin_id and flight_dest_id. The challenge: for each flight, resolve both to the airport's city name.
The key insight is that RELATED() can be called multiple times with different VIA paths pointing to the same entity. Each call follows a different FK column but resolves the same target feature (AIRPORT_CITY):
WITH
AIRPORTS := ENTITY(),
AIRPORT_ID := INPUT(BIGINT#AIRPORTS),
FLIGHTS := ENTITY(),
FLIGHT_ID := INPUT(BIGINT#FLIGHTS),
DIM_AIRPORTS := EXTERNAL_COLUMNS(
airport_id BIGINT#AIRPORTS BIND TO AIRPORT_ID,
city VARCHAR
FROM TABLE(airport.dim_airports)
),
AIRPORT_CITY := DIM_AIRPORTS[city],
FCT_FLIGHTS := EXTERNAL_COLUMNS(
flight_id BIGINT#FLIGHTS BIND TO FLIGHT_ID,
flight_origin_id BIGINT#AIRPORTS,
flight_dest_id BIGINT#AIRPORTS,
airline VARCHAR
FROM TABLE(airport.fct_flights)
),
FLIGHT_AIRLINE := FCT_FLIGHTS[airline],
ORIGIN_CITY := RELATED(AIRPORT_CITY VIA FCT_FLIGHTS[flight_origin_id]),
DEST_CITY := RELATED(AIRPORT_CITY VIA FCT_FLIGHTS[flight_dest_id]),
SELECT
FLIGHT_ID,
FLIGHT_AIRLINE AS AIRLINE,
ORIGIN_CITY,
DEST_CITY,
FOR CROSS
FLIGHT_ID := BIND_VALUES(SEQUENCE(200, 207)),
AIRPORT_ID := BIND_VALUES(SEQUENCE(100, 104)),
;| FLIGHT_ID BIGINT | AIRLINE VARCHAR | ORIGIN_CITY VARCHAR | DEST_CITY VARCHAR |
|---|---|---|---|
| 200 | BA | London | Paris |
| 201 | KLM | London | Amsterdam |
| 202 | AF | Paris | Madrid |
| 203 | KLM | Amsterdam | London |
| 204 | AZ | Amsterdam | Rome |
| 205 | IB | Madrid | London |
| 206 | IB | Madrid | Amsterdam |
| 207 | AZ | Rome | Paris |
The same pattern works for reverse aggregations. To count departures and arrivals per airport, use two RELATED() calls with different GROUP BY FK columns. COALESCE() handles airports that may have zero departures or zero arrivals:
WITH
AIRPORTS := ENTITY(),
AIRPORT_ID := INPUT(BIGINT#AIRPORTS),
FLIGHTS := ENTITY(),
FLIGHT_ID := INPUT(BIGINT#FLIGHTS),
DIM_AIRPORTS := EXTERNAL_COLUMNS(
airport_id BIGINT#AIRPORTS BIND TO AIRPORT_ID,
name VARCHAR
FROM TABLE(airport.dim_airports)
),
AIRPORT_NAME := DIM_AIRPORTS[name],
FCT_FLIGHTS := EXTERNAL_COLUMNS(
flight_id BIGINT#FLIGHTS BIND TO FLIGHT_ID,
flight_origin_id BIGINT#AIRPORTS,
flight_dest_id BIGINT#AIRPORTS,
FROM TABLE(airport.fct_flights)
),
NUM_DEPARTURES := AIRPORT_ID.RELATED(COUNT(1) GROUP BY FCT_FLIGHTS[flight_origin_id]),
NUM_ARRIVALS := AIRPORT_ID.RELATED(COUNT(1) GROUP BY FCT_FLIGHTS[flight_dest_id]),
TOTAL_FLIGHTS := COALESCE(NUM_DEPARTURES, 0) + COALESCE(NUM_ARRIVALS, 0),
SELECT
AIRPORT_ID,
AIRPORT_NAME,
NUM_DEPARTURES,
NUM_ARRIVALS,
TOTAL_FLIGHTS,
FOR CROSS
AIRPORT_ID := BIND_VALUES(SEQUENCE(100, 104)),
FLIGHT_ID := BIND_VALUES(SEQUENCE(200, 207)),
;| AIRPORT_ID BIGINT | AIRPORT_NAME VARCHAR | NUM_DEPARTURES BIGINT | NUM_ARRIVALS BIGINT | TOTAL_FLIGHTS VARCHAR |
|---|---|---|---|---|
| 100 | Heathrow | 2 | 2 | 4 |
| 101 | Orly | 1 | 2 | 3 |
| 102 | Schiphol | 2 | 2 | 4 |
| 103 | Barajas | 2 | 1 | 3 |
| 104 | Fiumicino | 1 | 1 | 2 |
Pattern: when the same entity appears as two different FK columns in a table, call RELATED() once per FK with a different VIA clause. No special syntax is needed — FeatureQL resolves each path independently.
P2: Successive aggregations — group then re-aggregate
For each customer, compute revenue per product category (first aggregation), then find the top category (second aggregation). The naive mistake is trying to do both in a single TRANSFORM() — but you cannot nest GROUP BY inside TRANSFORM.
The solution chains two TRANSFORM() calls. The first groups by category and sums amounts. The second takes that already-grouped result and picks the top row:
WITH
CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
ORDERS := ENTITY(),
ORDER_ID := INPUT(BIGINT#ORDERS),
CUST_ORDERS := INLINE_COLUMNS(
customer_id BIGINT#CUSTOMERS BIND TO CUSTOMER_ID,
order_ids ARRAY(BIGINT#ORDERS)
FROM CSV(
customer_id,order_ids
1,"[10, 11, 12]"
2,"[13, 14, 15]"
)
),
ORDER_IDS := CUST_ORDERS[order_ids],
ORD_DATA := INLINE_COLUMNS(
order_id BIGINT#ORDERS BIND TO ORDER_ID,
category VARCHAR,
amount DOUBLE
FROM CSV(
order_id,category,amount
10,electronics,50.0
11,books,20.0
12,electronics,80.0
13,clothing,60.0
14,electronics,150.0
15,clothing,40.0
)
),
ORDER_CATEGORY := ORD_DATA[category],
ORDER_AMOUNT := ORD_DATA[amount],
ENRICHED_ORDERS := EXTEND(
ZIP(ORDER_IDS AS order_id)
WITH ORDER_CATEGORY AS category, ORDER_AMOUNT AS amount
VIA order_id BIND TO ORDER_ID
),
REVENUE_BY_CATEGORY := ENRICHED_ORDERS.TRANSFORM(
SELECT category, SUM(amount) GROUP BY category AS total
ORDER BY total DESC
),
TOP_CATEGORY := REVENUE_BY_CATEGORY.TRANSFORM(
SELECT category, total ORDER BY total DESC LIMIT 1
),
SELECT
CUSTOMER_ID,
REVENUE_BY_CATEGORY,
TOP_CATEGORY,
FOR CROSS
CUSTOMER_ID := BIND_VALUES(ARRAY[1, 2]),
ORDER_ID := BIND_VALUES(SEQUENCE(10, 15)),
;| CUSTOMER_ID BIGINT | REVENUE_BY_CATEGORY VARCHAR | TOP_CATEGORY VARCHAR |
|---|---|---|
| 1 | [{category: electronics, total: 130.0}, {category: books, total: 20.0}] | [{category: electronics, total: 130.0}] |
| 2 | [{category: electronics, total: 150.0}, {category: clothing, total: 100.0}] | [{category: electronics, total: 150.0}] |
Pattern: when you need to aggregate, then aggregate again on the result, chain .TRANSFORM(SELECT ... GROUP BY ...) followed by .TRANSFORM(SELECT ... ORDER BY ... LIMIT 1). Each TRANSFORM operates on the output of the previous one.
P3: Self-referential entity (not yet supported)
An employees table where manager_id is a FK back to the same BIGINT#EMPLOYEES entity. The goal: for each employee, get their manager's name.
This requires a self-join — looking up EMP_NAME for a different row of the same entity. In standard SQL this is a simple JOIN employees e2 ON e1.manager_id = e2.employee_id.
In FeatureQL, RELATED(EMP_NAME VIA EMP_MANAGER_ID) does not produce the correct result because the FK points to the same entity type. The resolution stays within the current entity scope instead of performing a cross-row lookup.
Status: self-referential entities are a known limitation. The workaround is to pre-compute the manager name in SQL (e.g., in an OBT) and import it as a plain VARCHAR column rather than as an entity FK.
P4: Many-to-many through a junction table
Students enroll in courses through an enrollments junction table. Each enrollment has a student FK, a course FK, and a grade. The goal: for each student, list their enrollments with grades and compute their GPA.
The modeling approach: create a pre-aggregated OBT with enrollment_ids BIGINT[] per student, then EXTEND() the array with grade data from the enrollments table:
WITH
STUDENTS := ENTITY(),
STUDENT_ID := INPUT(BIGINT#STUDENTS),
ENROLLMENTS := ENTITY(),
ENROLLMENT_ID := INPUT(BIGINT#ENROLLMENTS),
STUDENT_OBT := EXTERNAL_COLUMNS(
student_id BIGINT#STUDENTS BIND TO STUDENT_ID,
enrollment_ids ARRAY(BIGINT#ENROLLMENTS)
FROM TABLE(school.student_enrollments_obt)
),
ENROLL_DATA := EXTERNAL_COLUMNS(
enrollment_id BIGINT#ENROLLMENTS BIND TO ENROLLMENT_ID,
enroll_course_id BIGINT,
grade DOUBLE
FROM TABLE(school.enrollments)
),
ENROLL_GRADE := ENROLL_DATA[grade],
ENROLL_COURSE_ID_RAW := ENROLL_DATA[enroll_course_id],
ENRICHED := EXTEND(
ZIP(STUDENT_OBT[enrollment_ids] AS enrollment_id)
WITH ENROLL_GRADE AS grade, ENROLL_COURSE_ID_RAW AS course_id
VIA enrollment_id BIND TO ENROLLMENT_ID
),
GPA := ENRICHED.TRANSFORM(SELECT AVG(grade)).UNWRAP_ONE(),
SELECT
STUDENT_ID,
STUDENT_NAME := EXTERNAL_COLUMNS(
student_id BIGINT#STUDENTS BIND TO STUDENT_ID,
name VARCHAR
FROM TABLE(school.students)
)[name],
ENRICHED AS ENROLLMENTS_WITH_GRADES,
GPA,
FOR CROSS
STUDENT_ID := BIND_VALUES(SEQUENCE(10, 14)),
ENROLLMENT_ID := BIND_VALUES(SEQUENCE(30, 42)),
;| STUDENT_ID BIGINT | STUDENT_NAME VARCHAR | ENROLLMENTS_WITH_GRADES VARCHAR | GPA VARCHAR |
|---|---|---|---|
| 10 | Ana | [{enrollment_id: 30, grade: 3.7, course_id: 20}, {enrollment_id: 31, grade: 3.3, course_id: 21}, {enrollment_id: 32, grade: 3.9, course_id: 22}] | 3.6333333333333333 |
| 11 | Ben | [{enrollment_id: 33, grade: 2.8, course_id: 21}, {enrollment_id: 34, grade: 3.1, course_id: 23}] | 2.95 |
| 12 | Cara | [{enrollment_id: 35, grade: 3.5, course_id: 20}, {enrollment_id: 36, grade: 3.8, course_id: 22}] | 3.65 |
| 13 | Dan | [{enrollment_id: 37, grade: 2.5, course_id: 21}, {enrollment_id: 38, grade: 2.9, course_id: 23}] | 2.7 |
| 14 | Eve | [{enrollment_id: 39, grade: 3.6, course_id: 20}, {enrollment_id: 40, grade: 3.4, course_id: 21}, {enrollment_id: 41, grade: 3.2, course_id: 22}, {enrollment_id: 42, grade: 3.8, course_id: 23}] | 3.5 |
Pattern: for many-to-many relationships, model the junction table as its own entity. Create an OBT with the array of junction-entity IDs, then EXTEND() + TRANSFORM() to enrich and aggregate.
Known limitation: enriching with a feature that itself crosses entity boundaries via
RELATED()throughEXTERNAL_COLUMNS(e.g., resolvingCOURSE_TITLEthroughenroll_course_id) currently triggers anEXTERNAL_SQL_EXTRACTerror on DuckDB. The workaround is to include the raw FK column (as a plainBIGINT, notBIGINT#COURSES) and resolve course names in a post-processing step.
P5: Mixed grain — scalar vs array comparison with CARRY (bug)
For each customer, compute their average order value (a scalar), then flag each individual order as above or below that average. This forces a grain mismatch: the average lives at the customer level, but the comparison must happen at the order level inside TRANSFORM().
The intended solution uses CARRY() to inject the customer-level scalar into each row of the order array, making it available inside TRANSFORM():
AVG_ORDER_VALUE := ENRICHED_ORDERS.TRANSFORM(SELECT AVG(amount)).UNWRAP_ONE(),
ORDERS_FLAGGED := ENRICHED_ORDERS.CARRY(AVG_ORDER_VALUE).TRANSFORM(
SELECT order_id, amount,
IF(amount > avg_order_value, 'above', 'below') AS flag
ORDER BY order_id
), Status: CARRY().TRANSFORM() chaining works, but the comparison inside TRANSFORM requires precision-annotated operators for DOUBLE values. Use >.2 instead of > (the .2 specifies 2 digits of precision). Plain > on DOUBLE values triggers COMPARISON-DIGITS-EMPTY. See Pitfall patterns P3 for a working example.
P6: Temporal sequences — window functions in TRANSFORM (bug)
For each customer's events, compute the time gap between successive events and flag gaps longer than 30 days. This requires LAG() inside TRANSFORM() to compare adjacent rows within the array.
The intended solution:
EVENTS_WITH_GAPS := ENRICHED_EVENTS.TRANSFORM(
WITH prev_ts := LAG(ts, 1) OVER (ORDER BY ts ASC)
SELECT event_id, ts, type, amount,
DATE_DIFF(ts, prev_ts, 'DAY') AS gap_days
ORDER BY ts ASC
),
HAS_LONG_GAP := EVENTS_WITH_GAPS.TRANSFORM(
SELECT BOOL_OR(gap_days > 30)
).UNWRAP_ONE(), Status: window functions (LAG, LEAD, ROW_NUMBER, etc.) inside TRANSFORM() after EXTEND() currently trigger an internal server error. Window functions work at the top level (see Window functions reference ) but not yet inside array-level TRANSFORM(). The workaround is to pre-compute temporal gaps in SQL before importing into FeatureQL.
Pattern summary
| Pattern | Technique | Status |
|---|---|---|
| Dual FK (same entity, two columns) | Two RELATED() calls with different VIA | Working |
| Successive aggregations | Chain .TRANSFORM(GROUP BY) then .TRANSFORM(ORDER BY LIMIT) | Working |
| Self-referential entity | RELATED() via same-entity FK | Not yet supported |
| Many-to-many (junction table) | OBT with array of junction IDs + EXTEND() | Working (partial) |
| Mixed grain (CARRY) | CARRY() + TRANSFORM() | Working (requires >.N for DOUBLE comparisons) |
| Temporal sequences | LAG() inside TRANSFORM() | Bug — window functions in TRANSFORM |
What's next
- RELATED() reference — full syntax for cross-entity lookups
- EXTEND() reference — enriching arrays with cross-entity data
- TRANSFORM() reference — filtering, sorting, aggregating arrays
- Window functions — top-level window function usage