Healthcare clinic visits tutorial

This tutorial builds a clinic analytics model with three entities — patients, visits, and procedures — and answers four questions a healthcare data team would ask: which patients are repeat visitors, what did their most recent visit cost, which visit was their most expensive, and how should patients be classified by risk tier. Every query runs against inline DuckDB tables with no external data.

The data model

The dataset has four tables: a patient dimension, visit and procedure fact tables, and a pre-aggregated OBT that stores each patient's visit IDs as an array.

  • clinic.dim_patients — one row per patient. patient_id is the primary key. insurance_type is either 'private' or 'public'.
  • clinic.fct_visits — one row per clinic encounter. visit_id is the primary key. visit_patient_id is a foreign key to patients. department and diagnosis_code describe what happened.
  • clinic.fct_procedures — one row per procedure performed during a visit. procedure_id is the primary key. procedure_visit_id is a foreign key to visits. cost is the billed amount.
  • clinic.agg_patients_obt — pre-aggregated: each patient's last_visit_id (most recent by date) and visits (array of all their visit IDs, ordered chronologically).

FeatureQL
/* SQL */
--
CREATE SCHEMA IF NOT EXISTS clinic;
DROP TABLE IF EXISTS clinic.dim_patients;
DROP TABLE IF EXISTS clinic.fct_visits;
DROP TABLE IF EXISTS clinic.fct_procedures;
DROP TABLE IF EXISTS clinic.agg_patients_obt;
--
CREATE TABLE clinic.dim_patients (
    patient_id BIGINT,
    name VARCHAR,
    date_of_birth DATE,
    insurance_type VARCHAR
);
INSERT INTO clinic.dim_patients VALUES
    (100, 'Alice Martin', '1985-03-12', 'private'),
    (101, 'Bob Chen', '1972-08-25', 'public'),
    (102, 'Clara Rivera', '1990-11-04', 'private'),
    (103, 'David Kim', '1968-01-17', 'public'),
    (104, 'Eva Johnson', '1995-06-30', 'private');
--
CREATE TABLE clinic.fct_visits (
    visit_id BIGINT,
    visit_patient_id BIGINT,
    visit_date DATE,
    department VARCHAR,
    diagnosis_code VARCHAR
);
INSERT INTO clinic.fct_visits VALUES
    (200, 100, '2025-03-10', 'cardiology', 'I10'),
    (201, 100, '2025-07-22', 'cardiology', 'I25'),
    (202, 101, '2025-05-15', 'orthopedics', 'M54'),
    (203, 102, '2025-01-08', 'dermatology', 'L20'),
    (204, 102, '2025-09-30', 'cardiology', 'I10'),
    (205, 103, '2025-06-18', 'orthopedics', 'M17');
--
CREATE TABLE clinic.fct_procedures (
    procedure_id BIGINT,
    procedure_visit_id BIGINT,
    procedure_name VARCHAR,
    cost DECIMAL(10,2)
);
INSERT INTO clinic.fct_procedures VALUES
    (300, 200, 'ECG', 150.00),
    (301, 200, 'Blood panel', 85.00),
    (302, 201, 'Stress test', 320.00),
    (303, 201, 'ECG', 150.00),
    (304, 202, 'X-ray', 200.00),
    (305, 203, 'Skin biopsy', 275.00),
    (306, 204, 'ECG', 150.00),
    (307, 205, 'MRI', 450.00);
--
CREATE TABLE clinic.agg_patients_obt (
    patient_id BIGINT,
    last_visit_id BIGINT,
    visits BIGINT[]
);
INSERT INTO clinic.agg_patients_obt
SELECT
    visit_patient_id AS patient_id,
    MAX_BY(visit_id, visit_date) AS last_visit_id,
    ARRAY_AGG(visit_id ORDER BY visit_date)
FROM clinic.fct_visits
GROUP BY visit_patient_id;
--
SELECT CAST(COUNT(*) AS INTEGER) AS Count FROM clinic.agg_patients_obt;
Result
Count BIGINT
4

Define entities and map external data

Every FeatureQL model starts by declaring entities and their primary key inputs. This clinic model has three entities: PATIENTS (IDs in the 100s), VISITS (200s), and PROCEDURES (300s). The #ENTITY type annotation on each INPUT() tells FeatureQL which entity a key belongs to — this is how it validates relationship traversals.

FeatureQL
CREATE OR REPLACE FEATURES IN FM.CLINIC AS
SELECT
    PATIENTS := ENTITY(),
    VISITS := ENTITY(),
    PROCEDURES := ENTITY(),
    PATIENT_ID := INPUT(BIGINT#PATIENTS),
    VISIT_ID := INPUT(BIGINT#VISITS),
    PROCEDURE_ID := INPUT(BIGINT#PROCEDURES),
;
Result
feature_name VARCHARstatus VARCHARmessage VARCHAR
FM.CLINIC.PATIENTSCREATEDFeature created as not exists
FM.CLINIC.VISITSCREATEDFeature created as not exists
FM.CLINIC.PROCEDURESCREATEDFeature created as not exists
FM.CLINIC.PATIENT_IDCREATEDFeature created as not exists
FM.CLINIC.VISIT_IDCREATEDFeature created as not exists
FM.CLINIC.PROCEDURE_IDCREATEDFeature created as not exists

With entities declared, the next step maps table columns to features using EXTERNAL_COLUMNS(). Each mapping declares which column is the lookup key (BIND TO the entity's input), which columns to expose, and which table to read from. The TABLES. prefix groups all raw mappings together, keeping them separate from derived features.

Foreign key columns carry entity annotations too: visit_patient_id BIGINT#PATIENTS tells FeatureQL that this column references the PATIENTS entity, and procedure_visit_id BIGINT#VISITS references VISITS. These annotations are what make RELATED() and EXTEND() work — without them, FeatureQL cannot determine join paths.

FeatureQL
CREATE OR REPLACE FEATURES IN FM.CLINIC AS
SELECT
    TABLES.DIM_PATIENTS := EXTERNAL_COLUMNS(
        patient_id BIGINT#PATIENTS BIND TO PATIENT_ID,
        name VARCHAR,
        date_of_birth DATE,
        insurance_type VARCHAR,
        FROM TABLE(clinic.dim_patients)
    ),
    TABLES.FCT_VISITS := EXTERNAL_COLUMNS(
        visit_id BIGINT#VISITS BIND TO VISIT_ID,
        visit_patient_id BIGINT#PATIENTS,
        visit_date DATE,
        department VARCHAR,
        diagnosis_code VARCHAR,
        FROM TABLE(clinic.fct_visits)
    ),
    TABLES.FCT_PROCEDURES := EXTERNAL_COLUMNS(
        procedure_id BIGINT#PROCEDURES BIND TO PROCEDURE_ID,
        procedure_visit_id BIGINT#VISITS,
        procedure_name VARCHAR,
        cost DECIMAL(10,2),
        FROM TABLE(clinic.fct_procedures)
    ),
    TABLES.AGG_PATIENTS_OBT := EXTERNAL_COLUMNS(
        patient_id BIGINT#PATIENTS BIND TO PATIENT_ID,
        last_visit_id BIGINT#VISITS,
        visits ARRAY(BIGINT#VISITS),
        FROM TABLE(clinic.agg_patients_obt)
    ),
;
Result
feature_name VARCHARstatus VARCHARmessage VARCHAR
FM.CLINIC.TABLES.DIM_PATIENTSCREATEDFeature created as not exists
FM.CLINIC.TABLES.FCT_VISITSCREATEDFeature created as not exists
FM.CLINIC.TABLES.FCT_PROCEDURESCREATEDFeature created as not exists
FM.CLINIC.TABLES.AGG_PATIENTS_OBTCREATEDFeature created as not exists

Which patients are repeat visitors?

Counting visits per patient is a PK-to-FK aggregation: each patient (the primary key side) has zero or more visits (the foreign key side). In SQL, you would write a GROUP BY visit_patient_id subquery and LEFT JOIN it back to the patient dimension. In FeatureQL, RELATED() replaces that entire pattern.

PATIENT_ID.RELATED(COUNT(1) GROUP BY TABLES.FCT_VISITS[visit_patient_id]) reads as: "for each PATIENT_ID, count the rows in fct_visits where visit_patient_id matches." The GROUP BY inside RELATED() names the foreign key column; VIA PATIENT_ID is inferred from the chained syntax. FeatureQL generates the subquery and the join automatically.

IS_REPEAT_VISITOR is a pure boolean derived from NUM_VISITS — no additional table access, just a comparison.

FeatureQL
WITH
    PATIENT_NAME := TABLES.DIM_PATIENTS[name],
    NUM_VISITS := PATIENT_ID.RELATED(COUNT(1) GROUP BY TABLES.FCT_VISITS[visit_patient_id]),
    IS_REPEAT_VISITOR := NUM_VISITS > 1,
SELECT
    PATIENT_ID,
    PATIENT_NAME,
    NUM_VISITS,
    IS_REPEAT_VISITOR,
FROM FM.CLINIC
FOR CROSS
    PATIENT_ID := BIND_VALUES(SEQUENCE(100,103)),
    VISIT_ID := BIND_VALUES(SEQUENCE(200,205)),
;
Result
FM.CLINIC.PATIENT_ID BIGINTPATIENT_NAME VARCHARNUM_VISITS BIGINTIS_REPEAT_VISITOR VARCHAR
100Alice Martin2TRUE
101Bob Chen1FALSE
102Clara Rivera2TRUE
103David Kim1FALSE

What did each patient's most recent visit cost?

This question crosses two entity boundaries: patient → visit → procedures. The OBT stores each patient's last_visit_id as a foreign key to the visits entity. To get the cost of that visit, we need to sum procedure costs for that specific visit — a FK-to-FK aggregation.

LAST_VISIT_ID.RELATED(SUM(TABLES.FCT_PROCEDURES[cost]) GROUP BY TABLES.FCT_PROCEDURES[procedure_visit_id]) reads as: "follow last_visit_id to the visits entity, then aggregate procedure costs grouped by procedure_visit_id." The result is a single decimal per patient — the total billed amount for their most recent visit.

In SQL, this would require two joins (patient OBT → procedures → group by visit) or a correlated subquery. FeatureQL resolves the multi-hop path from the entity annotations.

FeatureQL
WITH
    PATIENT_NAME := TABLES.DIM_PATIENTS[name],
    LAST_VISIT_ID := TABLES.AGG_PATIENTS_OBT[last_visit_id],
    LAST_VISIT_COST := LAST_VISIT_ID.RELATED(
        SUM(TABLES.FCT_PROCEDURES[cost]) GROUP BY TABLES.FCT_PROCEDURES[procedure_visit_id]
    ),
SELECT
    PATIENT_ID,
    PATIENT_NAME,
    LAST_VISIT_ID,
    LAST_VISIT_COST,
FROM FM.CLINIC
FOR CROSS
    PATIENT_ID := BIND_VALUES(SEQUENCE(100,103)),
    PROCEDURE_ID := BIND_VALUES(SEQUENCE(300,307)),
;
Result
FM.CLINIC.PATIENT_ID BIGINTPATIENT_NAME VARCHARLAST_VISIT_ID BIGINTLAST_VISIT_COST VARCHAR
100Alice Martin201470.00
101Bob Chen202200.00
102Clara Rivera204150.00
103David Kim205450.00

Which visit was each patient's most expensive?

This question requires working with each patient's array of visits, enriching each element with cost data, and then picking the top one. This is the EXTEND() + TRANSFORM() pattern.

EXTEND(ZIP(PATIENT_VISITS AS visit_id) WITH VISIT_COST AS visit_cost1) does three things: ZIP() converts the scalar array of visit IDs into an array of rows (each row has a visit_id field), then EXTEND() looks up each visit's cost from the VISIT_COST feature and adds it as a visit_cost1 field on every row. The result is ARRAY<ROW<visit_id, visit_cost1>> — a nested table inside each patient row.

.TRANSFORM(SELECT visit_id, visit_cost1 ORDER BY visit_cost1 DESC LIMIT 1) then runs a mini-query inside that array: sort by cost descending and keep only the top row. The result is a single-element array containing the costliest visit.

In SQL, the equivalent would be an UNNEST + JOIN + ROW_NUMBER() OVER (PARTITION BY patient_id ORDER BY cost DESC) window function + re-aggregation. FeatureQL keeps the nested structure intact — no flattening and re-grouping.

FeatureQL
WITH
    PATIENT_NAME := TABLES.DIM_PATIENTS[name],
    PATIENT_VISITS := TABLES.AGG_PATIENTS_OBT[visits],
    VISIT_COST := VISIT_ID.RELATED(
        SUM(TABLES.FCT_PROCEDURES[cost]) GROUP BY TABLES.FCT_PROCEDURES[procedure_visit_id]
    ),
    VISITS_ENRICHED := EXTEND(
        ZIP(PATIENT_VISITS AS visit_id)
        WITH VISIT_COST AS visit_cost1
    ),
    COSTLIEST_VISIT := VISITS_ENRICHED.TRANSFORM(SELECT visit_id, visit_cost1 ORDER BY visit_cost1 DESC LIMIT 1),
SELECT
    PATIENT_ID,
    PATIENT_NAME,
    COSTLIEST_VISIT,
FROM FM.CLINIC
FOR CROSS
    PATIENT_ID := BIND_VALUES(SEQUENCE(100,103)),
    PROCEDURE_ID := BIND_VALUES(SEQUENCE(300,307)),
;
Result
FM.CLINIC.PATIENT_ID BIGINTPATIENT_NAME VARCHARCOSTLIEST_VISIT VARCHAR
100Alice Martin[{visit_id: 201, visit_cost1: 470.00}]
101Bob Chen[{visit_id: 202, visit_cost1: 200.00}]
102Clara Rivera[{visit_id: 203, visit_cost1: 275.00}]
103David Kim[{visit_id: 205, visit_cost1: 450.00}]

Classify patients by risk tier

Building a classification feature means combining multiple signals into a single label. Here, NUM_VISITS (from RELATED()) and TOTAL_COST (from EXTEND() + TRANSFORM() + UNWRAP_ONE()) feed a CASE WHEN expression.

TOTAL_COST reuses the same VISITS_ENRICHED array from Q3 but applies a different TRANSFORM: SELECT SUM(visit_cost1) aggregates all visit costs into a single-row array, and .UNWRAP_ONE() extracts the scalar value.

The CASE WHEN then classifies: patients with 2+ visits and over $500 total cost are 'high' risk; patients with 2+ visits or over $300 total cost are 'medium'; everyone else is 'low'.

FeatureQL
WITH
    PATIENT_NAME := TABLES.DIM_PATIENTS[name],
    PATIENT_VISITS := TABLES.AGG_PATIENTS_OBT[visits],
    NUM_VISITS := PATIENT_ID.RELATED(COUNT(1) GROUP BY TABLES.FCT_VISITS[visit_patient_id]),
    VISIT_COST := VISIT_ID.RELATED(
        SUM(TABLES.FCT_PROCEDURES[cost]) GROUP BY TABLES.FCT_PROCEDURES[procedure_visit_id]
    ),
    VISITS_ENRICHED := EXTEND(
        ZIP(PATIENT_VISITS AS visit_id)
        WITH VISIT_COST AS visit_cost1
    ),
    TOTAL_COST := VISITS_ENRICHED.TRANSFORM(SELECT SUM(visit_cost1)).UNWRAP_ONE(),
    RISK_TIER := CASE
        WHEN NUM_VISITS >= 2 AND TOTAL_COST > 500.00 THEN 'high'
        WHEN NUM_VISITS >= 2 OR TOTAL_COST > 300.00 THEN 'medium'
        ELSE 'low'
    END,
SELECT
    PATIENT_ID,
    PATIENT_NAME,
    RISK_TIER,
    NUM_VISITS,
    TOTAL_COST,
FROM FM.CLINIC
FOR CROSS
    PATIENT_ID := BIND_VALUES(SEQUENCE(100,103)),
    VISIT_ID := BIND_VALUES(SEQUENCE(200,205)),
    PROCEDURE_ID := BIND_VALUES(SEQUENCE(300,307)),
;
Result
FM.CLINIC.PATIENT_ID BIGINTPATIENT_NAME VARCHARRISK_TIER VARCHARNUM_VISITS BIGINTTOTAL_COST VARCHAR
100Alice Martinhigh2705.00
101Bob Chenlow1200.00
102Clara Riveramedium2425.00
103David Kimmedium1450.00

Putting it all together

This single query combines every feature from the previous sections: patient name and insurance from the dimension table, visit count and repeat-visitor flag from RELATED(), last-visit cost from FK-to-FK RELATED(), costliest visit from EXTEND() + TRANSFORM(), and risk tier from CASE WHEN.

FeatureQL
WITH
    PATIENT_NAME := TABLES.DIM_PATIENTS[name],
    INSURANCE := TABLES.DIM_PATIENTS[insurance_type],
    LAST_VISIT_ID := TABLES.AGG_PATIENTS_OBT[last_visit_id],
    PATIENT_VISITS := TABLES.AGG_PATIENTS_OBT[visits],
    NUM_VISITS := PATIENT_ID.RELATED(COUNT(1) GROUP BY TABLES.FCT_VISITS[visit_patient_id]),
    IS_REPEAT_VISITOR := NUM_VISITS > 1,
    LAST_VISIT_COST := LAST_VISIT_ID.RELATED(
        SUM(TABLES.FCT_PROCEDURES[cost]) GROUP BY TABLES.FCT_PROCEDURES[procedure_visit_id]
    ),
    VISIT_COST := VISIT_ID.RELATED(
        SUM(TABLES.FCT_PROCEDURES[cost]) GROUP BY TABLES.FCT_PROCEDURES[procedure_visit_id]
    ),
    VISITS_ENRICHED := EXTEND(
        ZIP(PATIENT_VISITS AS visit_id)
        WITH VISIT_COST AS visit_cost1
    ),
    COSTLIEST_VISIT := VISITS_ENRICHED.TRANSFORM(SELECT visit_id, visit_cost1 ORDER BY visit_cost1 DESC LIMIT 1),
    TOTAL_COST := VISITS_ENRICHED.TRANSFORM(SELECT SUM(visit_cost1)).UNWRAP_ONE(),
    RISK_TIER := CASE
        WHEN NUM_VISITS >= 2 AND TOTAL_COST > 500.00 THEN 'high'
        WHEN NUM_VISITS >= 2 OR TOTAL_COST > 300.00 THEN 'medium'
        ELSE 'low'
    END,
SELECT
    PATIENT_ID,
    PATIENT_NAME,
    INSURANCE,
    NUM_VISITS,
    IS_REPEAT_VISITOR,
    LAST_VISIT_COST,
    COSTLIEST_VISIT,
    RISK_TIER,
FROM FM.CLINIC
FOR CROSS
    PATIENT_ID := BIND_VALUES(SEQUENCE(100,103)),
    VISIT_ID := BIND_VALUES(SEQUENCE(200,205)),
    PROCEDURE_ID := BIND_VALUES(SEQUENCE(300,307)),
;
Result
FM.CLINIC.PATIENT_ID BIGINTPATIENT_NAME VARCHARINSURANCE VARCHARNUM_VISITS BIGINTIS_REPEAT_VISITOR BOOLEANLAST_VISIT_COST VARCHARCOSTLIEST_VISIT VARCHARRISK_TIER VARCHAR
100Alice Martinprivate2TRUE470.00[{visit_id: 201, visit_cost1: 470.00}]high
101Bob Chenpublic1FALSE200.00[{visit_id: 202, visit_cost1: 200.00}]low
102Clara Riveraprivate2TRUE150.00[{visit_id: 203, visit_cost1: 275.00}]medium
103David Kimpublic1FALSE450.00[{visit_id: 205, visit_cost1: 450.00}]medium

In a production FeatureMesh deployment, the data engineer owns the first two layers: entity declarations (ENTITY(), INPUT()) and data source mappings (EXTERNAL_COLUMNS()). These rarely change and define how the warehouse connects to the feature model. The analyst owns everything above: RELATED() aggregations, EXTEND() enrichments, TRANSFORM() filters, and CASE WHEN classifications. They build and iterate on business logic without touching the underlying table wiring.

What's next

Last update at: 2026/04/27 15:40:31