Airport operations tutorial
This tutorial builds an airport operations model with three entities — airports, flights, and passengers — and answers four questions an aviation analytics team would ask: which airports have the most departures, what revenue did their latest flight generate, which departing flight earned the most, and how should airports be classified by traffic tier. Every query runs against inline DuckDB tables with no external data.
The data model
The dataset has four tables: an airport dimension, flight and passenger fact tables, and a pre-aggregated OBT that stores each airport's departing flight IDs as an array.
airport.dim_airports— one row per airport.airport_idis the primary key.hub_sizeis either'large'or'medium'.airport.fct_flights— one row per flight.flight_idis the primary key.flight_origin_idandflight_dest_idare foreign keys to airports.airlineis the two-letter carrier code.airport.fct_passengers— one row per passenger on a flight.passenger_idis the primary key.passenger_flight_idis a foreign key to flights.fare_paidis the ticket price.airport.agg_airports_obt— pre-aggregated: each airport'slast_departure_id(most recent by departure time) anddepartures(array of all departing flight IDs, ordered chronologically).
/* SQL */
--
CREATE SCHEMA IF NOT EXISTS airport;
DROP TABLE IF EXISTS airport.dim_airports;
DROP TABLE IF EXISTS airport.fct_flights;
DROP TABLE IF EXISTS airport.fct_passengers;
DROP TABLE IF EXISTS airport.agg_airports_obt;
--
CREATE TABLE airport.dim_airports (
airport_id BIGINT,
name VARCHAR,
city VARCHAR,
country VARCHAR,
hub_size VARCHAR
);
INSERT INTO airport.dim_airports VALUES
(100, 'Heathrow', 'London', 'UK', 'large'),
(101, 'Orly', 'Paris', 'France', 'medium'),
(102, 'Schiphol', 'Amsterdam', 'Netherlands', 'large'),
(103, 'Barajas', 'Madrid', 'Spain', 'large'),
(104, 'Fiumicino', 'Rome', 'Italy', 'medium');
--
CREATE TABLE airport.fct_flights (
flight_id BIGINT,
flight_origin_id BIGINT,
flight_dest_id BIGINT,
departure_time TIMESTAMP,
airline VARCHAR
);
INSERT INTO airport.fct_flights VALUES
(200, 100, 101, TIMESTAMP '2025-06-01 08:00:00', 'BA'),
(201, 100, 102, TIMESTAMP '2025-06-01 14:30:00', 'KLM'),
(202, 101, 103, TIMESTAMP '2025-06-02 09:15:00', 'AF'),
(203, 102, 100, TIMESTAMP '2025-06-02 11:00:00', 'KLM'),
(204, 102, 104, TIMESTAMP '2025-06-03 07:45:00', 'AZ'),
(205, 103, 100, TIMESTAMP '2025-06-03 16:00:00', 'IB'),
(206, 103, 102, TIMESTAMP '2025-06-04 10:30:00', 'IB'),
(207, 104, 101, TIMESTAMP '2025-06-04 13:00:00', 'AZ');
--
CREATE TABLE airport.fct_passengers (
passenger_id BIGINT,
passenger_flight_id BIGINT,
class VARCHAR,
fare_paid DECIMAL(10,2)
);
INSERT INTO airport.fct_passengers VALUES
(300, 200, 'economy', 120.00),
(301, 200, 'business', 450.00),
(302, 201, 'economy', 95.00),
(303, 201, 'economy', 95.00),
(304, 202, 'business', 380.00),
(305, 203, 'economy', 110.00),
(306, 204, 'economy', 85.00),
(307, 204, 'business', 320.00),
(308, 205, 'economy', 140.00),
(309, 206, 'business', 510.00),
(310, 207, 'economy', 100.00);
--
CREATE TABLE airport.agg_airports_obt (
airport_id BIGINT,
last_departure_id BIGINT,
departures BIGINT[]
);
INSERT INTO airport.agg_airports_obt
SELECT
flight_origin_id AS airport_id,
MAX_BY(flight_id, departure_time) AS last_departure_id,
ARRAY_AGG(flight_id ORDER BY departure_time)
FROM airport.fct_flights
GROUP BY flight_origin_id;
--
SELECT CAST(COUNT(*) AS INTEGER) AS Count FROM airport.agg_airports_obt;| Count BIGINT |
|---|
| 5 |
Define entities and map external data
Every FeatureQL model starts by declaring entities and their primary key inputs. This airport model has three entities: AIRPORTS (IDs in the 100s), FLIGHTS (200s), and PASSENGERS (300s). The #ENTITY type annotation on each INPUT() tells FeatureQL which entity a key belongs to — this is how it validates relationship traversals.
CREATE OR REPLACE FEATURES IN FM.AIRPORT AS
SELECT
AIRPORTS := ENTITY(),
FLIGHTS := ENTITY(),
PASSENGERS := ENTITY(),
AIRPORT_ID := INPUT(BIGINT#AIRPORTS),
FLIGHT_ID := INPUT(BIGINT#FLIGHTS),
PASSENGER_ID := INPUT(BIGINT#PASSENGERS),
;| feature_name VARCHAR | status VARCHAR | message VARCHAR |
|---|---|---|
| FM.AIRPORT.AIRPORTS | CREATED | Feature created as not exists |
| FM.AIRPORT.FLIGHTS | CREATED | Feature created as not exists |
| FM.AIRPORT.PASSENGERS | CREATED | Feature created as not exists |
| FM.AIRPORT.AIRPORT_ID | CREATED | Feature created as not exists |
| FM.AIRPORT.FLIGHT_ID | CREATED | Feature created as not exists |
| FM.AIRPORT.PASSENGER_ID | CREATED | Feature created as not exists |
With entities declared, the next step maps table columns to features using EXTERNAL_COLUMNS(). Each mapping declares which column is the lookup key (BIND TO the entity's input), which columns to expose, and which table to read from. The TABLES. prefix groups all raw mappings together, keeping them separate from derived features.
Foreign key columns carry entity annotations too: flight_origin_id BIGINT#AIRPORTS tells FeatureQL that this column references the AIRPORTS entity, and passenger_flight_id BIGINT#FLIGHTS references FLIGHTS. The OBT's departures ARRAY(BIGINT#FLIGHTS) declares an array of flight entity references — this is what enables the EXTEND(ZIP(...)) pattern later.
CREATE OR REPLACE FEATURES IN FM.AIRPORT AS
SELECT
TABLES.DIM_AIRPORTS := EXTERNAL_COLUMNS(
airport_id BIGINT#AIRPORTS BIND TO AIRPORT_ID,
name VARCHAR,
city VARCHAR,
country VARCHAR,
hub_size VARCHAR,
FROM TABLE(airport.dim_airports)
),
TABLES.FCT_FLIGHTS := EXTERNAL_COLUMNS(
flight_id BIGINT#FLIGHTS BIND TO FLIGHT_ID,
flight_origin_id BIGINT#AIRPORTS,
flight_dest_id BIGINT#AIRPORTS,
departure_time TIMESTAMP,
airline VARCHAR,
FROM TABLE(airport.fct_flights)
),
TABLES.FCT_PASSENGERS := EXTERNAL_COLUMNS(
passenger_id BIGINT#PASSENGERS BIND TO PASSENGER_ID,
passenger_flight_id BIGINT#FLIGHTS,
class VARCHAR,
fare_paid DECIMAL(10,2),
FROM TABLE(airport.fct_passengers)
),
TABLES.AGG_AIRPORTS_OBT := EXTERNAL_COLUMNS(
airport_id BIGINT#AIRPORTS BIND TO AIRPORT_ID,
last_departure_id BIGINT#FLIGHTS,
departures ARRAY(BIGINT#FLIGHTS),
FROM TABLE(airport.agg_airports_obt)
),
;| feature_name VARCHAR | status VARCHAR | message VARCHAR |
|---|---|---|
| FM.AIRPORT.TABLES.DIM_AIRPORTS | CREATED | Feature created as not exists |
| FM.AIRPORT.TABLES.FCT_FLIGHTS | CREATED | Feature created as not exists |
| FM.AIRPORT.TABLES.FCT_PASSENGERS | CREATED | Feature created as not exists |
| FM.AIRPORT.TABLES.AGG_AIRPORTS_OBT | CREATED | Feature created as not exists |
Which airports have the most departures?
Counting flights per airport is a PK-to-FK aggregation: each airport (the primary key side) has zero or more flights originating from it (the foreign key side). In SQL, you would write a GROUP BY flight_origin_id subquery and LEFT JOIN it back to the airport dimension. In FeatureQL, RELATED() replaces that entire pattern.
AIRPORT_ID.RELATED(COUNT(1) GROUP BY TABLES.FCT_FLIGHTS[flight_origin_id]) reads as: "for each AIRPORT_ID, count the rows in fct_flights where flight_origin_id matches." The GROUP BY inside RELATED() names the foreign key column; VIA AIRPORT_ID is inferred from the chained syntax. FeatureQL generates the subquery and the join automatically.
IS_BUSY is a pure boolean derived from NUM_DEPARTURES — no additional table access, just a comparison.
WITH
AIRPORT_NAME := TABLES.DIM_AIRPORTS[name],
CITY := TABLES.DIM_AIRPORTS[city],
NUM_DEPARTURES := AIRPORT_ID.RELATED(COUNT(1) GROUP BY TABLES.FCT_FLIGHTS[flight_origin_id]),
IS_BUSY := NUM_DEPARTURES > 1,
SELECT
AIRPORT_ID,
AIRPORT_NAME,
CITY,
NUM_DEPARTURES,
IS_BUSY,
FROM FM.AIRPORT
FOR CROSS
AIRPORT_ID := BIND_VALUES(SEQUENCE(100,104)),
FLIGHT_ID := BIND_VALUES(SEQUENCE(200,207)),
;| FM.AIRPORT.AIRPORT_ID BIGINT | AIRPORT_NAME VARCHAR | CITY VARCHAR | NUM_DEPARTURES BIGINT | IS_BUSY VARCHAR |
|---|---|---|---|---|
| 100 | Heathrow | London | 2 | TRUE |
| 101 | Orly | Paris | 1 | FALSE |
| 102 | Schiphol | Amsterdam | 2 | TRUE |
| 103 | Barajas | Madrid | 2 | TRUE |
| 104 | Fiumicino | Rome | 1 | FALSE |
What revenue did each airport's latest departure generate?
This question crosses two entity boundaries: airport → flight → passengers. The OBT stores each airport's last_departure_id as a foreign key to the flights entity. To get the revenue for that flight, we need to sum passenger fares for that specific flight — a FK-to-FK aggregation.
LAST_DEPARTURE_ID.RELATED(SUM(TABLES.FCT_PASSENGERS[fare_paid]) GROUP BY TABLES.FCT_PASSENGERS[passenger_flight_id]) reads as: "follow last_departure_id to the flights entity, then aggregate passenger fares grouped by passenger_flight_id." The result is a single decimal per airport — the total fare revenue for their most recent departure.
In SQL, this would require two joins (airport OBT → passengers → group by flight) or a correlated subquery. FeatureQL resolves the multi-hop path from the entity annotations.
WITH
AIRPORT_NAME := TABLES.DIM_AIRPORTS[name],
LAST_DEPARTURE_ID := TABLES.AGG_AIRPORTS_OBT[last_departure_id],
LAST_DEPARTURE_REVENUE := LAST_DEPARTURE_ID.RELATED(
SUM(TABLES.FCT_PASSENGERS[fare_paid]) GROUP BY TABLES.FCT_PASSENGERS[passenger_flight_id]
),
SELECT
AIRPORT_ID,
AIRPORT_NAME,
LAST_DEPARTURE_ID,
LAST_DEPARTURE_REVENUE,
FROM FM.AIRPORT
FOR CROSS
AIRPORT_ID := BIND_VALUES(SEQUENCE(100,104)),
PASSENGER_ID := BIND_VALUES(SEQUENCE(300,310)),
;| FM.AIRPORT.AIRPORT_ID BIGINT | AIRPORT_NAME VARCHAR | LAST_DEPARTURE_ID BIGINT | LAST_DEPARTURE_REVENUE VARCHAR |
|---|---|---|---|
| 100 | Heathrow | 201 | 190.00 |
| 101 | Orly | 202 | 380.00 |
| 102 | Schiphol | 204 | 405.00 |
| 103 | Barajas | 206 | 510.00 |
| 104 | Fiumicino | 207 | 100.00 |
Which departing flight generated the most revenue?
This question requires working with each airport's array of departing flights, enriching each element with revenue data, and then picking the top one. This is the EXTEND() + TRANSFORM() pattern.
EXTEND(ZIP(AIRPORT_DEPARTURES AS flight_id) WITH FLIGHT_REVENUE AS flight_revenue1) does three things: ZIP() converts the scalar array of flight IDs into an array of rows (each row has a flight_id field), then EXTEND() looks up each flight's revenue from the FLIGHT_REVENUE feature and adds it as a flight_revenue1 field on every row. The result is ARRAY<ROW<flight_id, flight_revenue1>> — a nested table inside each airport row.
.TRANSFORM(SELECT flight_id, flight_revenue1 ORDER BY flight_revenue1 DESC LIMIT 1) then runs a mini-query inside that array: sort by revenue descending and keep only the top row. The result is a single-element array containing the highest-earning flight.
In SQL, the equivalent would be an UNNEST + JOIN + ROW_NUMBER() OVER (PARTITION BY airport_id ORDER BY revenue DESC) window function + re-aggregation. FeatureQL keeps the nested structure intact — no flattening and re-grouping.
WITH
AIRPORT_NAME := TABLES.DIM_AIRPORTS[name],
AIRPORT_DEPARTURES := TABLES.AGG_AIRPORTS_OBT[departures],
FLIGHT_REVENUE := FLIGHT_ID.RELATED(
SUM(TABLES.FCT_PASSENGERS[fare_paid]) GROUP BY TABLES.FCT_PASSENGERS[passenger_flight_id]
),
DEPARTURES_ENRICHED := EXTEND(
ZIP(AIRPORT_DEPARTURES AS flight_id)
WITH FLIGHT_REVENUE AS flight_revenue1
),
TOP_FLIGHT := DEPARTURES_ENRICHED.TRANSFORM(SELECT flight_id, flight_revenue1 ORDER BY flight_revenue1 DESC LIMIT 1),
SELECT
AIRPORT_ID,
AIRPORT_NAME,
TOP_FLIGHT,
FROM FM.AIRPORT
FOR CROSS
AIRPORT_ID := BIND_VALUES(SEQUENCE(100,104)),
PASSENGER_ID := BIND_VALUES(SEQUENCE(300,310)),
;| FM.AIRPORT.AIRPORT_ID BIGINT | AIRPORT_NAME VARCHAR | TOP_FLIGHT VARCHAR |
|---|---|---|
| 100 | Heathrow | [{flight_id: 200, flight_revenue1: 570.00}] |
| 101 | Orly | [{flight_id: 202, flight_revenue1: 380.00}] |
| 102 | Schiphol | [{flight_id: 204, flight_revenue1: 405.00}] |
| 103 | Barajas | [{flight_id: 206, flight_revenue1: 510.00}] |
| 104 | Fiumicino | [{flight_id: 207, flight_revenue1: 100.00}] |
Classify airports by traffic tier
Building a classification feature means combining multiple signals into a single label. Here, NUM_DEPARTURES (from RELATED()) and TOTAL_REVENUE (from EXTEND() + TRANSFORM() + UNWRAP_ONE()) feed a CASE WHEN expression.
TOTAL_REVENUE reuses the same DEPARTURES_ENRICHED array from Q3 but applies a different TRANSFORM: SELECT SUM(flight_revenue1) aggregates all flight revenues into a single-row array, and .UNWRAP_ONE() extracts the scalar value.
The CASE WHEN then classifies: airports with 2+ departures and over $500 total revenue are 'high' traffic; airports with 2+ departures or over $300 total revenue are 'medium'; everyone else is 'low'.
WITH
AIRPORT_NAME := TABLES.DIM_AIRPORTS[name],
AIRPORT_DEPARTURES := TABLES.AGG_AIRPORTS_OBT[departures],
NUM_DEPARTURES := AIRPORT_ID.RELATED(COUNT(1) GROUP BY TABLES.FCT_FLIGHTS[flight_origin_id]),
FLIGHT_REVENUE := FLIGHT_ID.RELATED(
SUM(TABLES.FCT_PASSENGERS[fare_paid]) GROUP BY TABLES.FCT_PASSENGERS[passenger_flight_id]
),
DEPARTURES_ENRICHED := EXTEND(
ZIP(AIRPORT_DEPARTURES AS flight_id)
WITH FLIGHT_REVENUE AS flight_revenue1
),
TOTAL_REVENUE := DEPARTURES_ENRICHED.TRANSFORM(SELECT SUM(flight_revenue1)).UNWRAP_ONE(),
TRAFFIC_TIER := CASE
WHEN NUM_DEPARTURES >= 2 AND TOTAL_REVENUE > 500.00 THEN 'high'
WHEN NUM_DEPARTURES >= 2 OR TOTAL_REVENUE > 300.00 THEN 'medium'
ELSE 'low'
END,
SELECT
AIRPORT_ID,
AIRPORT_NAME,
TRAFFIC_TIER,
NUM_DEPARTURES,
TOTAL_REVENUE,
FROM FM.AIRPORT
FOR CROSS
AIRPORT_ID := BIND_VALUES(SEQUENCE(100,104)),
FLIGHT_ID := BIND_VALUES(SEQUENCE(200,207)),
PASSENGER_ID := BIND_VALUES(SEQUENCE(300,310)),
;| FM.AIRPORT.AIRPORT_ID BIGINT | AIRPORT_NAME VARCHAR | TRAFFIC_TIER VARCHAR | NUM_DEPARTURES BIGINT | TOTAL_REVENUE VARCHAR |
|---|---|---|---|---|
| 100 | Heathrow | high | 2 | 760.00 |
| 101 | Orly | medium | 1 | 380.00 |
| 102 | Schiphol | high | 2 | 515.00 |
| 103 | Barajas | high | 2 | 650.00 |
| 104 | Fiumicino | low | 1 | 100.00 |
Putting it all together
This single query combines every feature from the previous sections: airport name and city from the dimension table, departure count and busy flag from RELATED(), last-departure revenue from FK-to-FK RELATED(), top-earning flight from EXTEND() + TRANSFORM(), and traffic tier from CASE WHEN.
WITH
AIRPORT_NAME := TABLES.DIM_AIRPORTS[name],
CITY := TABLES.DIM_AIRPORTS[city],
LAST_DEPARTURE_ID := TABLES.AGG_AIRPORTS_OBT[last_departure_id],
AIRPORT_DEPARTURES := TABLES.AGG_AIRPORTS_OBT[departures],
NUM_DEPARTURES := AIRPORT_ID.RELATED(COUNT(1) GROUP BY TABLES.FCT_FLIGHTS[flight_origin_id]),
IS_BUSY := NUM_DEPARTURES > 1,
LAST_DEPARTURE_REVENUE := LAST_DEPARTURE_ID.RELATED(
SUM(TABLES.FCT_PASSENGERS[fare_paid]) GROUP BY TABLES.FCT_PASSENGERS[passenger_flight_id]
),
FLIGHT_REVENUE := FLIGHT_ID.RELATED(
SUM(TABLES.FCT_PASSENGERS[fare_paid]) GROUP BY TABLES.FCT_PASSENGERS[passenger_flight_id]
),
DEPARTURES_ENRICHED := EXTEND(
ZIP(AIRPORT_DEPARTURES AS flight_id)
WITH FLIGHT_REVENUE AS flight_revenue1
),
TOP_FLIGHT := DEPARTURES_ENRICHED.TRANSFORM(SELECT flight_id, flight_revenue1 ORDER BY flight_revenue1 DESC LIMIT 1),
TOTAL_REVENUE := DEPARTURES_ENRICHED.TRANSFORM(SELECT SUM(flight_revenue1)).UNWRAP_ONE(),
TRAFFIC_TIER := CASE
WHEN NUM_DEPARTURES >= 2 AND TOTAL_REVENUE > 500.00 THEN 'high'
WHEN NUM_DEPARTURES >= 2 OR TOTAL_REVENUE > 300.00 THEN 'medium'
ELSE 'low'
END,
SELECT
AIRPORT_ID,
AIRPORT_NAME,
CITY,
NUM_DEPARTURES,
IS_BUSY,
LAST_DEPARTURE_REVENUE,
TOP_FLIGHT,
TRAFFIC_TIER,
FROM FM.AIRPORT
FOR CROSS
AIRPORT_ID := BIND_VALUES(SEQUENCE(100,104)),
FLIGHT_ID := BIND_VALUES(SEQUENCE(200,207)),
PASSENGER_ID := BIND_VALUES(SEQUENCE(300,310)),
;| FM.AIRPORT.AIRPORT_ID BIGINT | AIRPORT_NAME VARCHAR | CITY VARCHAR | NUM_DEPARTURES BIGINT | IS_BUSY BOOLEAN | LAST_DEPARTURE_REVENUE VARCHAR | TOP_FLIGHT VARCHAR | TRAFFIC_TIER VARCHAR |
|---|---|---|---|---|---|---|---|
| 100 | Heathrow | London | 2 | TRUE | 190.00 | [{flight_id: 200, flight_revenue1: 570.00}] | high |
| 101 | Orly | Paris | 1 | FALSE | 380.00 | [{flight_id: 202, flight_revenue1: 380.00}] | medium |
| 102 | Schiphol | Amsterdam | 2 | TRUE | 405.00 | [{flight_id: 204, flight_revenue1: 405.00}] | high |
| 103 | Barajas | Madrid | 2 | TRUE | 510.00 | [{flight_id: 206, flight_revenue1: 510.00}] | high |
| 104 | Fiumicino | Rome | 1 | FALSE | 100.00 | [{flight_id: 207, flight_revenue1: 100.00}] | low |
In a production FeatureMesh deployment, the data engineer owns the first two layers: entity declarations (ENTITY(), INPUT()) and data source mappings (EXTERNAL_COLUMNS()). These rarely change and define how the warehouse connects to the feature model. The analyst owns everything above: RELATED() aggregations, EXTEND() enrichments, TRANSFORM() filters, and CASE WHEN classifications. They build and iterate on business logic without touching the underlying table wiring.
What's next
- RELATED() reference — all join patterns (PK-to-FK, FK-to-PK, FK-to-FK) with syntax details
- EXTEND() reference — enriching arrays with cross-entity data, aliasing, and VIA/BIND TO inference
- Batch analytics —
BIND_KEYSET(),EXTERNAL_COLUMNS(), and running analytics across large datasets