Healthcare clinic visits tutorial
This tutorial builds a clinic analytics model with three entities — patients, visits, and procedures — and answers four questions a healthcare data team would ask: which patients are repeat visitors, what did their most recent visit cost, which visit was their most expensive, and how should patients be classified by risk tier. Every query runs against inline DuckDB tables with no external data.
The data model
The dataset has four tables: a patient dimension, visit and procedure fact tables, and a pre-aggregated OBT that stores each patient's visit IDs as an array.
clinic.dim_patients— one row per patient.patient_idis the primary key.insurance_typeis either'private'or'public'.clinic.fct_visits— one row per clinic encounter.visit_idis the primary key.visit_patient_idis a foreign key to patients.departmentanddiagnosis_codedescribe what happened.clinic.fct_procedures— one row per procedure performed during a visit.procedure_idis the primary key.procedure_visit_idis a foreign key to visits.costis the billed amount.clinic.agg_patients_obt— pre-aggregated: each patient'slast_visit_id(most recent by date) andvisits(array of all their visit IDs, ordered chronologically).
/* SQL */
--
CREATE SCHEMA IF NOT EXISTS clinic;
DROP TABLE IF EXISTS clinic.dim_patients;
DROP TABLE IF EXISTS clinic.fct_visits;
DROP TABLE IF EXISTS clinic.fct_procedures;
DROP TABLE IF EXISTS clinic.agg_patients_obt;
--
CREATE TABLE clinic.dim_patients (
patient_id BIGINT,
name VARCHAR,
date_of_birth DATE,
insurance_type VARCHAR
);
INSERT INTO clinic.dim_patients VALUES
(100, 'Alice Martin', '1985-03-12', 'private'),
(101, 'Bob Chen', '1972-08-25', 'public'),
(102, 'Clara Rivera', '1990-11-04', 'private'),
(103, 'David Kim', '1968-01-17', 'public'),
(104, 'Eva Johnson', '1995-06-30', 'private');
--
CREATE TABLE clinic.fct_visits (
visit_id BIGINT,
visit_patient_id BIGINT,
visit_date DATE,
department VARCHAR,
diagnosis_code VARCHAR
);
INSERT INTO clinic.fct_visits VALUES
(200, 100, '2025-03-10', 'cardiology', 'I10'),
(201, 100, '2025-07-22', 'cardiology', 'I25'),
(202, 101, '2025-05-15', 'orthopedics', 'M54'),
(203, 102, '2025-01-08', 'dermatology', 'L20'),
(204, 102, '2025-09-30', 'cardiology', 'I10'),
(205, 103, '2025-06-18', 'orthopedics', 'M17');
--
CREATE TABLE clinic.fct_procedures (
procedure_id BIGINT,
procedure_visit_id BIGINT,
procedure_name VARCHAR,
cost DECIMAL(10,2)
);
INSERT INTO clinic.fct_procedures VALUES
(300, 200, 'ECG', 150.00),
(301, 200, 'Blood panel', 85.00),
(302, 201, 'Stress test', 320.00),
(303, 201, 'ECG', 150.00),
(304, 202, 'X-ray', 200.00),
(305, 203, 'Skin biopsy', 275.00),
(306, 204, 'ECG', 150.00),
(307, 205, 'MRI', 450.00);
--
CREATE TABLE clinic.agg_patients_obt (
patient_id BIGINT,
last_visit_id BIGINT,
visits BIGINT[]
);
INSERT INTO clinic.agg_patients_obt
SELECT
visit_patient_id AS patient_id,
MAX_BY(visit_id, visit_date) AS last_visit_id,
ARRAY_AGG(visit_id ORDER BY visit_date)
FROM clinic.fct_visits
GROUP BY visit_patient_id;
--
SELECT CAST(COUNT(*) AS INTEGER) AS Count FROM clinic.agg_patients_obt;| Count BIGINT |
|---|
| 4 |
Define entities and map external data
Every FeatureQL model starts by declaring entities and their primary key inputs. This clinic model has three entities: PATIENTS (IDs in the 100s), VISITS (200s), and PROCEDURES (300s). The #ENTITY type annotation on each INPUT() tells FeatureQL which entity a key belongs to — this is how it validates relationship traversals.
CREATE OR REPLACE FEATURES IN FM.CLINIC AS
SELECT
PATIENTS := ENTITY(),
VISITS := ENTITY(),
PROCEDURES := ENTITY(),
PATIENT_ID := INPUT(BIGINT#PATIENTS),
VISIT_ID := INPUT(BIGINT#VISITS),
PROCEDURE_ID := INPUT(BIGINT#PROCEDURES),
;| feature_name VARCHAR | status VARCHAR | message VARCHAR |
|---|---|---|
| FM.CLINIC.PATIENTS | CREATED | Feature created as not exists |
| FM.CLINIC.VISITS | CREATED | Feature created as not exists |
| FM.CLINIC.PROCEDURES | CREATED | Feature created as not exists |
| FM.CLINIC.PATIENT_ID | CREATED | Feature created as not exists |
| FM.CLINIC.VISIT_ID | CREATED | Feature created as not exists |
| FM.CLINIC.PROCEDURE_ID | CREATED | Feature created as not exists |
With entities declared, the next step maps table columns to features using EXTERNAL_COLUMNS(). Each mapping declares which column is the lookup key (BIND TO the entity's input), which columns to expose, and which table to read from. The TABLES. prefix groups all raw mappings together, keeping them separate from derived features.
Foreign key columns carry entity annotations too: visit_patient_id BIGINT#PATIENTS tells FeatureQL that this column references the PATIENTS entity, and procedure_visit_id BIGINT#VISITS references VISITS. These annotations are what make RELATED() and EXTEND() work — without them, FeatureQL cannot determine join paths.
CREATE OR REPLACE FEATURES IN FM.CLINIC AS
SELECT
TABLES.DIM_PATIENTS := EXTERNAL_COLUMNS(
patient_id BIGINT#PATIENTS BIND TO PATIENT_ID,
name VARCHAR,
date_of_birth DATE,
insurance_type VARCHAR,
FROM TABLE(clinic.dim_patients)
),
TABLES.FCT_VISITS := EXTERNAL_COLUMNS(
visit_id BIGINT#VISITS BIND TO VISIT_ID,
visit_patient_id BIGINT#PATIENTS,
visit_date DATE,
department VARCHAR,
diagnosis_code VARCHAR,
FROM TABLE(clinic.fct_visits)
),
TABLES.FCT_PROCEDURES := EXTERNAL_COLUMNS(
procedure_id BIGINT#PROCEDURES BIND TO PROCEDURE_ID,
procedure_visit_id BIGINT#VISITS,
procedure_name VARCHAR,
cost DECIMAL(10,2),
FROM TABLE(clinic.fct_procedures)
),
TABLES.AGG_PATIENTS_OBT := EXTERNAL_COLUMNS(
patient_id BIGINT#PATIENTS BIND TO PATIENT_ID,
last_visit_id BIGINT#VISITS,
visits ARRAY(BIGINT#VISITS),
FROM TABLE(clinic.agg_patients_obt)
),
;| feature_name VARCHAR | status VARCHAR | message VARCHAR |
|---|---|---|
| FM.CLINIC.TABLES.DIM_PATIENTS | CREATED | Feature created as not exists |
| FM.CLINIC.TABLES.FCT_VISITS | CREATED | Feature created as not exists |
| FM.CLINIC.TABLES.FCT_PROCEDURES | CREATED | Feature created as not exists |
| FM.CLINIC.TABLES.AGG_PATIENTS_OBT | CREATED | Feature created as not exists |
Which patients are repeat visitors?
Counting visits per patient is a PK-to-FK aggregation: each patient (the primary key side) has zero or more visits (the foreign key side). In SQL, you would write a GROUP BY visit_patient_id subquery and LEFT JOIN it back to the patient dimension. In FeatureQL, RELATED() replaces that entire pattern.
PATIENT_ID.RELATED(COUNT(1) GROUP BY TABLES.FCT_VISITS[visit_patient_id]) reads as: "for each PATIENT_ID, count the rows in fct_visits where visit_patient_id matches." The GROUP BY inside RELATED() names the foreign key column; VIA PATIENT_ID is inferred from the chained syntax. FeatureQL generates the subquery and the join automatically.
IS_REPEAT_VISITOR is a pure boolean derived from NUM_VISITS — no additional table access, just a comparison.
WITH
PATIENT_NAME := TABLES.DIM_PATIENTS[name],
NUM_VISITS := PATIENT_ID.RELATED(COUNT(1) GROUP BY TABLES.FCT_VISITS[visit_patient_id]),
IS_REPEAT_VISITOR := NUM_VISITS > 1,
SELECT
PATIENT_ID,
PATIENT_NAME,
NUM_VISITS,
IS_REPEAT_VISITOR,
FROM FM.CLINIC
FOR CROSS
PATIENT_ID := BIND_VALUES(SEQUENCE(100,103)),
VISIT_ID := BIND_VALUES(SEQUENCE(200,205)),
;| FM.CLINIC.PATIENT_ID BIGINT | PATIENT_NAME VARCHAR | NUM_VISITS BIGINT | IS_REPEAT_VISITOR VARCHAR |
|---|---|---|---|
| 100 | Alice Martin | 2 | TRUE |
| 101 | Bob Chen | 1 | FALSE |
| 102 | Clara Rivera | 2 | TRUE |
| 103 | David Kim | 1 | FALSE |
What did each patient's most recent visit cost?
This question crosses two entity boundaries: patient → visit → procedures. The OBT stores each patient's last_visit_id as a foreign key to the visits entity. To get the cost of that visit, we need to sum procedure costs for that specific visit — a FK-to-FK aggregation.
LAST_VISIT_ID.RELATED(SUM(TABLES.FCT_PROCEDURES[cost]) GROUP BY TABLES.FCT_PROCEDURES[procedure_visit_id]) reads as: "follow last_visit_id to the visits entity, then aggregate procedure costs grouped by procedure_visit_id." The result is a single decimal per patient — the total billed amount for their most recent visit.
In SQL, this would require two joins (patient OBT → procedures → group by visit) or a correlated subquery. FeatureQL resolves the multi-hop path from the entity annotations.
WITH
PATIENT_NAME := TABLES.DIM_PATIENTS[name],
LAST_VISIT_ID := TABLES.AGG_PATIENTS_OBT[last_visit_id],
LAST_VISIT_COST := LAST_VISIT_ID.RELATED(
SUM(TABLES.FCT_PROCEDURES[cost]) GROUP BY TABLES.FCT_PROCEDURES[procedure_visit_id]
),
SELECT
PATIENT_ID,
PATIENT_NAME,
LAST_VISIT_ID,
LAST_VISIT_COST,
FROM FM.CLINIC
FOR CROSS
PATIENT_ID := BIND_VALUES(SEQUENCE(100,103)),
PROCEDURE_ID := BIND_VALUES(SEQUENCE(300,307)),
;| FM.CLINIC.PATIENT_ID BIGINT | PATIENT_NAME VARCHAR | LAST_VISIT_ID BIGINT | LAST_VISIT_COST VARCHAR |
|---|---|---|---|
| 100 | Alice Martin | 201 | 470.00 |
| 101 | Bob Chen | 202 | 200.00 |
| 102 | Clara Rivera | 204 | 150.00 |
| 103 | David Kim | 205 | 450.00 |
Which visit was each patient's most expensive?
This question requires working with each patient's array of visits, enriching each element with cost data, and then picking the top one. This is the EXTEND() + TRANSFORM() pattern.
EXTEND(ZIP(PATIENT_VISITS AS visit_id) WITH VISIT_COST AS visit_cost1) does three things: ZIP() converts the scalar array of visit IDs into an array of rows (each row has a visit_id field), then EXTEND() looks up each visit's cost from the VISIT_COST feature and adds it as a visit_cost1 field on every row. The result is ARRAY<ROW<visit_id, visit_cost1>> — a nested table inside each patient row.
.TRANSFORM(SELECT visit_id, visit_cost1 ORDER BY visit_cost1 DESC LIMIT 1) then runs a mini-query inside that array: sort by cost descending and keep only the top row. The result is a single-element array containing the costliest visit.
In SQL, the equivalent would be an UNNEST + JOIN + ROW_NUMBER() OVER (PARTITION BY patient_id ORDER BY cost DESC) window function + re-aggregation. FeatureQL keeps the nested structure intact — no flattening and re-grouping.
WITH
PATIENT_NAME := TABLES.DIM_PATIENTS[name],
PATIENT_VISITS := TABLES.AGG_PATIENTS_OBT[visits],
VISIT_COST := VISIT_ID.RELATED(
SUM(TABLES.FCT_PROCEDURES[cost]) GROUP BY TABLES.FCT_PROCEDURES[procedure_visit_id]
),
VISITS_ENRICHED := EXTEND(
ZIP(PATIENT_VISITS AS visit_id)
WITH VISIT_COST AS visit_cost1
),
COSTLIEST_VISIT := VISITS_ENRICHED.TRANSFORM(SELECT visit_id, visit_cost1 ORDER BY visit_cost1 DESC LIMIT 1),
SELECT
PATIENT_ID,
PATIENT_NAME,
COSTLIEST_VISIT,
FROM FM.CLINIC
FOR CROSS
PATIENT_ID := BIND_VALUES(SEQUENCE(100,103)),
PROCEDURE_ID := BIND_VALUES(SEQUENCE(300,307)),
;| FM.CLINIC.PATIENT_ID BIGINT | PATIENT_NAME VARCHAR | COSTLIEST_VISIT VARCHAR |
|---|---|---|
| 100 | Alice Martin | [{visit_id: 201, visit_cost1: 470.00}] |
| 101 | Bob Chen | [{visit_id: 202, visit_cost1: 200.00}] |
| 102 | Clara Rivera | [{visit_id: 203, visit_cost1: 275.00}] |
| 103 | David Kim | [{visit_id: 205, visit_cost1: 450.00}] |
Classify patients by risk tier
Building a classification feature means combining multiple signals into a single label. Here, NUM_VISITS (from RELATED()) and TOTAL_COST (from EXTEND() + TRANSFORM() + UNWRAP_ONE()) feed a CASE WHEN expression.
TOTAL_COST reuses the same VISITS_ENRICHED array from Q3 but applies a different TRANSFORM: SELECT SUM(visit_cost1) aggregates all visit costs into a single-row array, and .UNWRAP_ONE() extracts the scalar value.
The CASE WHEN then classifies: patients with 2+ visits and over $500 total cost are 'high' risk; patients with 2+ visits or over $300 total cost are 'medium'; everyone else is 'low'.
WITH
PATIENT_NAME := TABLES.DIM_PATIENTS[name],
PATIENT_VISITS := TABLES.AGG_PATIENTS_OBT[visits],
NUM_VISITS := PATIENT_ID.RELATED(COUNT(1) GROUP BY TABLES.FCT_VISITS[visit_patient_id]),
VISIT_COST := VISIT_ID.RELATED(
SUM(TABLES.FCT_PROCEDURES[cost]) GROUP BY TABLES.FCT_PROCEDURES[procedure_visit_id]
),
VISITS_ENRICHED := EXTEND(
ZIP(PATIENT_VISITS AS visit_id)
WITH VISIT_COST AS visit_cost1
),
TOTAL_COST := VISITS_ENRICHED.TRANSFORM(SELECT SUM(visit_cost1)).UNWRAP_ONE(),
RISK_TIER := CASE
WHEN NUM_VISITS >= 2 AND TOTAL_COST > 500.00 THEN 'high'
WHEN NUM_VISITS >= 2 OR TOTAL_COST > 300.00 THEN 'medium'
ELSE 'low'
END,
SELECT
PATIENT_ID,
PATIENT_NAME,
RISK_TIER,
NUM_VISITS,
TOTAL_COST,
FROM FM.CLINIC
FOR CROSS
PATIENT_ID := BIND_VALUES(SEQUENCE(100,103)),
VISIT_ID := BIND_VALUES(SEQUENCE(200,205)),
PROCEDURE_ID := BIND_VALUES(SEQUENCE(300,307)),
;| FM.CLINIC.PATIENT_ID BIGINT | PATIENT_NAME VARCHAR | RISK_TIER VARCHAR | NUM_VISITS BIGINT | TOTAL_COST VARCHAR |
|---|---|---|---|---|
| 100 | Alice Martin | high | 2 | 705.00 |
| 101 | Bob Chen | low | 1 | 200.00 |
| 102 | Clara Rivera | medium | 2 | 425.00 |
| 103 | David Kim | medium | 1 | 450.00 |
Putting it all together
This single query combines every feature from the previous sections: patient name and insurance from the dimension table, visit count and repeat-visitor flag from RELATED(), last-visit cost from FK-to-FK RELATED(), costliest visit from EXTEND() + TRANSFORM(), and risk tier from CASE WHEN.
WITH
PATIENT_NAME := TABLES.DIM_PATIENTS[name],
INSURANCE := TABLES.DIM_PATIENTS[insurance_type],
LAST_VISIT_ID := TABLES.AGG_PATIENTS_OBT[last_visit_id],
PATIENT_VISITS := TABLES.AGG_PATIENTS_OBT[visits],
NUM_VISITS := PATIENT_ID.RELATED(COUNT(1) GROUP BY TABLES.FCT_VISITS[visit_patient_id]),
IS_REPEAT_VISITOR := NUM_VISITS > 1,
LAST_VISIT_COST := LAST_VISIT_ID.RELATED(
SUM(TABLES.FCT_PROCEDURES[cost]) GROUP BY TABLES.FCT_PROCEDURES[procedure_visit_id]
),
VISIT_COST := VISIT_ID.RELATED(
SUM(TABLES.FCT_PROCEDURES[cost]) GROUP BY TABLES.FCT_PROCEDURES[procedure_visit_id]
),
VISITS_ENRICHED := EXTEND(
ZIP(PATIENT_VISITS AS visit_id)
WITH VISIT_COST AS visit_cost1
),
COSTLIEST_VISIT := VISITS_ENRICHED.TRANSFORM(SELECT visit_id, visit_cost1 ORDER BY visit_cost1 DESC LIMIT 1),
TOTAL_COST := VISITS_ENRICHED.TRANSFORM(SELECT SUM(visit_cost1)).UNWRAP_ONE(),
RISK_TIER := CASE
WHEN NUM_VISITS >= 2 AND TOTAL_COST > 500.00 THEN 'high'
WHEN NUM_VISITS >= 2 OR TOTAL_COST > 300.00 THEN 'medium'
ELSE 'low'
END,
SELECT
PATIENT_ID,
PATIENT_NAME,
INSURANCE,
NUM_VISITS,
IS_REPEAT_VISITOR,
LAST_VISIT_COST,
COSTLIEST_VISIT,
RISK_TIER,
FROM FM.CLINIC
FOR CROSS
PATIENT_ID := BIND_VALUES(SEQUENCE(100,103)),
VISIT_ID := BIND_VALUES(SEQUENCE(200,205)),
PROCEDURE_ID := BIND_VALUES(SEQUENCE(300,307)),
;| FM.CLINIC.PATIENT_ID BIGINT | PATIENT_NAME VARCHAR | INSURANCE VARCHAR | NUM_VISITS BIGINT | IS_REPEAT_VISITOR BOOLEAN | LAST_VISIT_COST VARCHAR | COSTLIEST_VISIT VARCHAR | RISK_TIER VARCHAR |
|---|---|---|---|---|---|---|---|
| 100 | Alice Martin | private | 2 | TRUE | 470.00 | [{visit_id: 201, visit_cost1: 470.00}] | high |
| 101 | Bob Chen | public | 1 | FALSE | 200.00 | [{visit_id: 202, visit_cost1: 200.00}] | low |
| 102 | Clara Rivera | private | 2 | TRUE | 150.00 | [{visit_id: 203, visit_cost1: 275.00}] | medium |
| 103 | David Kim | public | 1 | FALSE | 450.00 | [{visit_id: 205, visit_cost1: 450.00}] | medium |
In a production FeatureMesh deployment, the data engineer owns the first two layers: entity declarations (ENTITY(), INPUT()) and data source mappings (EXTERNAL_COLUMNS()). These rarely change and define how the warehouse connects to the feature model. The analyst owns everything above: RELATED() aggregations, EXTEND() enrichments, TRANSFORM() filters, and CASE WHEN classifications. They build and iterate on business logic without touching the underlying table wiring.
What's next
- RELATED() reference — all join patterns (PK-to-FK, FK-to-PK, FK-to-FK) with syntax details
- EXTEND() reference — enriching arrays with cross-entity data, aliasing, and VIA/BIND TO inference
- Historical data functions —
SCD_AT_TIME()andACCUMULATION_AT_TIME()for querying patient records as of any point in time