Healthcare clinic visits tutorial

This tutorial builds a clinic analytics model with three entities — patients, visits, and procedures — and answers four questions a healthcare data team would ask: which patients are repeat visitors, what did their most recent visit cost, which visit was their most expensive, and how should patients be classified by risk tier. Every query runs against inline DuckDB tables with no external data.

The data model

The dataset has four tables: a patient dimension, visit and procedure fact tables, and a pre-aggregated OBT that stores each patient's visit IDs as an array.

clinic.dim_patients — one row per patient. patient_id is the primary key. insurance_type is either 'private' or 'public'.
clinic.fct_visits — one row per clinic encounter. visit_id is the primary key. visit_patient_id is a foreign key to patients. department and diagnosis_code describe what happened.
clinic.fct_procedures — one row per procedure performed during a visit. procedure_id is the primary key. procedure_visit_id is a foreign key to visits. cost is the billed amount.
clinic.agg_patients_obt — pre-aggregated: each patient's last_visit_id (most recent by date) and visits (array of all their visit IDs, ordered chronologically).

FeatureQL

/* SQL */
--
CREATE SCHEMA IF NOT EXISTS clinic;
DROP TABLE IF EXISTS clinic.dim_patients;
DROP TABLE IF EXISTS clinic.fct_visits;
DROP TABLE IF EXISTS clinic.fct_procedures;
DROP TABLE IF EXISTS clinic.agg_patients_obt;
--
CREATE TABLE clinic.dim_patients (
    patient_id BIGINT,
    name VARCHAR,
    date_of_birth DATE,
    insurance_type VARCHAR
);
INSERT INTO clinic.dim_patients VALUES
    (100, 'Alice Martin', '1985-03-12', 'private'),
    (101, 'Bob Chen', '1972-08-25', 'public'),
    (102, 'Clara Rivera', '1990-11-04', 'private'),
    (103, 'David Kim', '1968-01-17', 'public'),
    (104, 'Eva Johnson', '1995-06-30', 'private');
--
CREATE TABLE clinic.fct_visits (
    visit_id BIGINT,
    visit_patient_id BIGINT,
    visit_date DATE,
    department VARCHAR,
    diagnosis_code VARCHAR
);
INSERT INTO clinic.fct_visits VALUES
    (200, 100, '2025-03-10', 'cardiology', 'I10'),
    (201, 100, '2025-07-22', 'cardiology', 'I25'),
    (202, 101, '2025-05-15', 'orthopedics', 'M54'),
    (203, 102, '2025-01-08', 'dermatology', 'L20'),
    (204, 102, '2025-09-30', 'cardiology', 'I10'),
    (205, 103, '2025-06-18', 'orthopedics', 'M17');
--
CREATE TABLE clinic.fct_procedures (
    procedure_id BIGINT,
    procedure_visit_id BIGINT,
    procedure_name VARCHAR,
    cost DECIMAL(10,2)
);
INSERT INTO clinic.fct_procedures VALUES
    (300, 200, 'ECG', 150.00),
    (301, 200, 'Blood panel', 85.00),
    (302, 201, 'Stress test', 320.00),
    (303, 201, 'ECG', 150.00),
    (304, 202, 'X-ray', 200.00),
    (305, 203, 'Skin biopsy', 275.00),
    (306, 204, 'ECG', 150.00),
    (307, 205, 'MRI', 450.00);
--
CREATE TABLE clinic.agg_patients_obt (
    patient_id BIGINT,
    last_visit_id BIGINT,
    visits BIGINT[]
);
INSERT INTO clinic.agg_patients_obt
SELECT
    visit_patient_id AS patient_id,
    MAX_BY(visit_id, visit_date) AS last_visit_id,
    ARRAY_AGG(visit_id ORDER BY visit_date)
FROM clinic.fct_visits
GROUP BY visit_patient_id;
--
SELECT CAST(COUNT(*) AS INTEGER) AS Count FROM clinic.agg_patients_obt;

Result

Count BIGINT
4

Define entities and map external data

Every FeatureQL model starts by declaring entities and their primary key inputs. This clinic model has three entities: PATIENTS (IDs in the 100s), VISITS (200s), and PROCEDURES (300s). The #ENTITY type annotation on each INPUT() tells FeatureQL which entity a key belongs to — this is how it validates relationship traversals.

FeatureQL

CREATE OR REPLACE FEATURES IN FM.CLINIC AS
SELECT
    patients := ENTITY(),
    visits := ENTITY(),
    procedures := ENTITY(),
    patient_id := INPUT(BIGINT#patients),
    visit_id := INPUT(BIGINT#visits),
    procedure_id := INPUT(BIGINT#procedures)
;

Result

feature_name VARCHAR	status VARCHAR	message VARCHAR
FM.CLINIC.PATIENTS	CREATED	Feature created as not exists
FM.CLINIC.VISITS	CREATED	Feature created as not exists
FM.CLINIC.PROCEDURES	CREATED	Feature created as not exists
FM.CLINIC.PATIENT_ID	CREATED	Feature created as not exists
FM.CLINIC.VISIT_ID	CREATED	Feature created as not exists
FM.CLINIC.PROCEDURE_ID	CREATED	Feature created as not exists

With entities declared, the next step maps table columns to features using EXTERNAL_COLUMNS(). Each mapping declares which column is the lookup key (BIND TO the entity's input), which columns to expose, and which table to read from. The TABLES. prefix groups all raw mappings together, keeping them separate from derived features.

Foreign key columns carry entity annotations too: visit_patient_id BIGINT#PATIENTS tells FeatureQL that this column references the PATIENTS entity, and procedure_visit_id BIGINT#VISITS references VISITS. These annotations are what make RELATED() and EXTEND() work — without them, FeatureQL cannot determine join paths.

FeatureQL

CREATE OR REPLACE FEATURES IN FM.CLINIC AS
SELECT
    tables.dim_patients := EXTERNAL_COLUMNS(
        patient_id BIGINT#patients BIND TO patient_id,
        name VARCHAR,
        date_of_birth DATE,
        insurance_type VARCHAR
        FROM TABLE(clinic.dim_patients)
    ),
    tables.fct_visits := EXTERNAL_COLUMNS(
        visit_id BIGINT#visits BIND TO visit_id,
        visit_patient_id BIGINT#patients,
        visit_date DATE,
        department VARCHAR,
        diagnosis_code VARCHAR
        FROM TABLE(clinic.fct_visits)
    ),
    tables.fct_procedures := EXTERNAL_COLUMNS(
        procedure_id BIGINT#procedures BIND TO procedure_id,
        procedure_visit_id BIGINT#visits,
        procedure_name VARCHAR,
        cost DECIMAL(10,2)
        FROM TABLE(clinic.fct_procedures)
    ),
    tables.agg_patients_obt := EXTERNAL_COLUMNS(
        patient_id BIGINT#patients BIND TO patient_id,
        last_visit_id BIGINT#visits,
        visits ARRAY(BIGINT#visits)
        FROM TABLE(clinic.agg_patients_obt)
    )
;

Result

feature_name VARCHAR	status VARCHAR	message VARCHAR
FM.CLINIC.TABLES.DIM_PATIENTS	CREATED	Feature created as not exists
FM.CLINIC.TABLES.FCT_VISITS	CREATED	Feature created as not exists
FM.CLINIC.TABLES.FCT_PROCEDURES	CREATED	Feature created as not exists
FM.CLINIC.TABLES.AGG_PATIENTS_OBT	CREATED	Feature created as not exists

Which patients are repeat visitors?

Counting visits per patient is a PK-to-FK aggregation: each patient (the primary key side) has zero or more visits (the foreign key side). In SQL, you would write a GROUP BY visit_patient_id subquery and LEFT JOIN it back to the patient dimension. In FeatureQL, RELATED() replaces that entire pattern.

PATIENT_ID.RELATED(COUNT(1) GROUP BY TABLES.FCT_VISITS[visit_patient_id]) reads as: "for each PATIENT_ID, count the rows in fct_visits where visit_patient_id matches." The GROUP BY inside RELATED() names the foreign key column; VIA PATIENT_ID is inferred from the chained syntax. FeatureQL generates the subquery and the join automatically.

IS_REPEAT_VISITOR is a pure boolean derived from NUM_VISITS — no additional table access, just a comparison.

FeatureQL

WITH
    PATIENT_NAME := TABLES.DIM_PATIENTS[name],
    NUM_VISITS := PATIENT_ID.RELATED(COUNT(1) GROUP BY TABLES.FCT_VISITS[visit_patient_id]),
    IS_REPEAT_VISITOR := NUM_VISITS > 1,
SELECT
    PATIENT_ID,
    PATIENT_NAME,
    NUM_VISITS,
    IS_REPEAT_VISITOR,
FROM FM.CLINIC
FOR
    PATIENT_ID := BIND_VALUES(SEQUENCE(100,103)),
    NESTED VISIT_ID := BIND_VALUES(SEQUENCE(200,205)),
;

Result

FM.CLINIC.PATIENT_ID BIGINT	PATIENT_NAME VARCHAR	NUM_VISITS BIGINT	IS_REPEAT_VISITOR VARCHAR
100	Alice Martin	2	TRUE
101	Bob Chen	1	FALSE
102	Clara Rivera	2	TRUE
103	David Kim	1	FALSE

What did each patient's most recent visit cost?

This question crosses two entity boundaries: patient → visit → procedures. The OBT stores each patient's last_visit_id as a foreign key to the visits entity. To get the cost of that visit, we need to sum procedure costs for that specific visit — a FK-to-FK aggregation.

LAST_VISIT_ID.RELATED(SUM(TABLES.FCT_PROCEDURES[cost]) GROUP BY TABLES.FCT_PROCEDURES[procedure_visit_id]) reads as: "follow last_visit_id to the visits entity, then aggregate procedure costs grouped by procedure_visit_id." The result is a single decimal per patient — the total billed amount for their most recent visit.

In SQL, this would require two joins (patient OBT → procedures → group by visit) or a correlated subquery. FeatureQL resolves the multi-hop path from the entity annotations.

FeatureQL

WITH
    PATIENT_NAME := TABLES.DIM_PATIENTS[name],
    LAST_VISIT_ID := TABLES.AGG_PATIENTS_OBT[last_visit_id],
    LAST_VISIT_COST := LAST_VISIT_ID.RELATED(
        SUM(TABLES.FCT_PROCEDURES[cost]) GROUP BY TABLES.FCT_PROCEDURES[procedure_visit_id]
    ),
SELECT
    PATIENT_ID,
    PATIENT_NAME,
    LAST_VISIT_ID,
    LAST_VISIT_COST,
FROM FM.CLINIC
FOR
    PATIENT_ID := BIND_VALUES(SEQUENCE(100,103)),
    NESTED PROCEDURE_ID := BIND_VALUES(SEQUENCE(300,307)),
;

Result

FM.CLINIC.PATIENT_ID BIGINT	PATIENT_NAME VARCHAR	LAST_VISIT_ID BIGINT	LAST_VISIT_COST VARCHAR
100	Alice Martin	201	470.00
101	Bob Chen	202	200.00
102	Clara Rivera	204	150.00
103	David Kim	205	450.00

Which visit was each patient's most expensive?

This question requires working with each patient's array of visits, enriching each element with cost data, and then picking the top one. This is the EXTEND() + TRANSFORM() pattern.

EXTEND(ZIP(PATIENT_VISITS AS visit_id) WITH VISIT_COST AS visit_cost) does three things: ZIP() converts the scalar array of visit IDs into an array of rows (each row has a visit_id field), then EXTEND() looks up each visit's cost from the VISIT_COST feature and adds it as a visit_cost field on every row. The result is ARRAY<ROW<visit_id, visit_cost>> — a nested table inside each patient row.

.TRANSFORM(SELECT visit_id, visit_cost ORDER BY visit_cost DESC LIMIT 1) then runs a mini-query inside that array: sort by cost descending and keep only the top row. The result is a single-element array containing the costliest visit.

In SQL, the equivalent would be an UNNEST + JOIN + ROW_NUMBER() OVER (PARTITION BY patient_id ORDER BY cost DESC) window function + re-aggregation. FeatureQL keeps the nested structure intact — no flattening and re-grouping.

FeatureQL

WITH
    PATIENT_NAME := TABLES.DIM_PATIENTS[name],
    PATIENT_VISITS := TABLES.AGG_PATIENTS_OBT[visits],
    VISIT_COST := VISIT_ID.RELATED(
        SUM(TABLES.FCT_PROCEDURES[cost]) GROUP BY TABLES.FCT_PROCEDURES[procedure_visit_id]
    ),
    VISITS_ENRICHED := EXTEND(
        ZIP(PATIENT_VISITS AS visit_id)
        WITH VISIT_COST
    ),
    COSTLIEST_VISIT := VISITS_ENRICHED.TRANSFORM(SELECT visit_id, visit_cost ORDER BY visit_cost DESC LIMIT 1),
SELECT
    PATIENT_ID,
    PATIENT_NAME,
    COSTLIEST_VISIT,
FROM FM.CLINIC
FOR
    PATIENT_ID := BIND_VALUES(SEQUENCE(100,103)),
    NESTED PROCEDURE_ID := BIND_VALUES(SEQUENCE(300,307)),
;

Result

FM.CLINIC.PATIENT_ID BIGINT	PATIENT_NAME VARCHAR	COSTLIEST_VISIT VARCHAR
100	Alice Martin	[{visit_id: 201, visit_cost: 470.00}]
101	Bob Chen	[{visit_id: 202, visit_cost: 200.00}]
102	Clara Rivera	[{visit_id: 203, visit_cost: 275.00}]
103	David Kim	[{visit_id: 205, visit_cost: 450.00}]

Classify patients by risk tier

Building a classification feature means combining multiple signals into a single label. Here, NUM_VISITS (from RELATED()) and TOTAL_COST (from EXTEND() + TRANSFORM() + UNWRAP_ONE()) feed a CASE WHEN expression.

TOTAL_COST reuses the same VISITS_ENRICHED array from Q3 but applies a different TRANSFORM: SELECT SUM(visit_cost) aggregates all visit costs into a single-row array, and .UNWRAP_ONE() extracts the scalar value.

The CASE WHEN then classifies: patients with 2+ visits and over $500 total cost are 'high' risk; patients with 2+ visits or over $300 total cost are 'medium'; everyone else is 'low'.

FeatureQL

WITH
    PATIENT_NAME := TABLES.DIM_PATIENTS[name],
    PATIENT_VISITS := TABLES.AGG_PATIENTS_OBT[visits],
    NUM_VISITS := PATIENT_ID.RELATED(COUNT(1) GROUP BY TABLES.FCT_VISITS[visit_patient_id]),
    VISIT_COST := VISIT_ID.RELATED(
        SUM(TABLES.FCT_PROCEDURES[cost]) GROUP BY TABLES.FCT_PROCEDURES[procedure_visit_id]
    ),
    VISITS_ENRICHED := EXTEND(
        ZIP(PATIENT_VISITS AS visit_id)
        WITH VISIT_COST
    ),
    TOTAL_COST := VISITS_ENRICHED.TRANSFORM(SELECT SUM(visit_cost)).UNWRAP_ONE(),
    RISK_TIER := CASE
        WHEN NUM_VISITS >= 2 AND TOTAL_COST > 500.00 THEN 'high'
        WHEN NUM_VISITS >= 2 OR TOTAL_COST > 300.00 THEN 'medium'
        ELSE 'low'
    END,
SELECT
    PATIENT_ID,
    PATIENT_NAME,
    RISK_TIER,
    NUM_VISITS,
    TOTAL_COST,
FROM FM.CLINIC
FOR
    PATIENT_ID := BIND_VALUES(SEQUENCE(100,103)),
    NESTED VISIT_ID := BIND_VALUES(SEQUENCE(200,205)),
    NESTED PROCEDURE_ID := BIND_VALUES(SEQUENCE(300,307)),
;

Result

FM.CLINIC.PATIENT_ID BIGINT	PATIENT_NAME VARCHAR	RISK_TIER VARCHAR	NUM_VISITS BIGINT	TOTAL_COST VARCHAR
100	Alice Martin	high	2	705.00
101	Bob Chen	low	1	200.00
102	Clara Rivera	medium	2	425.00
103	David Kim	medium	1	450.00

Putting it all together

This single query combines every feature from the previous sections: patient name and insurance from the dimension table, visit count and repeat-visitor flag from RELATED(), last-visit cost from FK-to-FK RELATED(), costliest visit from EXTEND() + TRANSFORM(), and risk tier from CASE WHEN.

FeatureQL

WITH
    PATIENT_NAME := TABLES.DIM_PATIENTS[name],
    INSURANCE := TABLES.DIM_PATIENTS[insurance_type],
    LAST_VISIT_ID := TABLES.AGG_PATIENTS_OBT[last_visit_id],
    PATIENT_VISITS := TABLES.AGG_PATIENTS_OBT[visits],
    NUM_VISITS := PATIENT_ID.RELATED(COUNT(1) GROUP BY TABLES.FCT_VISITS[visit_patient_id]),
    IS_REPEAT_VISITOR := NUM_VISITS > 1,
    LAST_VISIT_COST := LAST_VISIT_ID.RELATED(
        SUM(TABLES.FCT_PROCEDURES[cost]) GROUP BY TABLES.FCT_PROCEDURES[procedure_visit_id]
    ),
    VISIT_COST := VISIT_ID.RELATED(
        SUM(TABLES.FCT_PROCEDURES[cost]) GROUP BY TABLES.FCT_PROCEDURES[procedure_visit_id]
    ),
    VISITS_ENRICHED := EXTEND(
        ZIP(PATIENT_VISITS AS visit_id)
        WITH VISIT_COST
    ),
    COSTLIEST_VISIT := VISITS_ENRICHED.TRANSFORM(SELECT visit_id, visit_cost ORDER BY visit_cost DESC LIMIT 1),
    TOTAL_COST := VISITS_ENRICHED.TRANSFORM(SELECT SUM(visit_cost)).UNWRAP_ONE(),
    RISK_TIER := CASE
        WHEN NUM_VISITS >= 2 AND TOTAL_COST > 500.00 THEN 'high'
        WHEN NUM_VISITS >= 2 OR TOTAL_COST > 300.00 THEN 'medium'
        ELSE 'low'
    END,
SELECT
    PATIENT_ID,
    PATIENT_NAME,
    INSURANCE,
    NUM_VISITS,
    IS_REPEAT_VISITOR,
    LAST_VISIT_COST,
    COSTLIEST_VISIT,
    RISK_TIER,
FROM FM.CLINIC
FOR
    PATIENT_ID := BIND_VALUES(SEQUENCE(100,103)),
    NESTED VISIT_ID := BIND_VALUES(SEQUENCE(200,205)),
    NESTED PROCEDURE_ID := BIND_VALUES(SEQUENCE(300,307)),
;

Result

FM.CLINIC.PATIENT_ID BIGINT	PATIENT_NAME VARCHAR	INSURANCE VARCHAR	NUM_VISITS BIGINT	IS_REPEAT_VISITOR BOOLEAN	LAST_VISIT_COST VARCHAR	COSTLIEST_VISIT VARCHAR	RISK_TIER VARCHAR
100	Alice Martin	private	2	TRUE	470.00	[{visit_id: 201, visit_cost: 470.00}]	high
101	Bob Chen	public	1	FALSE	200.00	[{visit_id: 202, visit_cost: 200.00}]	low
102	Clara Rivera	private	2	TRUE	150.00	[{visit_id: 203, visit_cost: 275.00}]	medium
103	David Kim	public	1	FALSE	450.00	[{visit_id: 205, visit_cost: 450.00}]	medium

In a production FeatureMesh deployment, the data engineer owns the first two layers: entity declarations (ENTITY(), INPUT()) and data source mappings (EXTERNAL_COLUMNS()). These rarely change and define how the warehouse connects to the feature model. The analyst owns everything above: RELATED() aggregations, EXTEND() enrichments, TRANSFORM() filters, and CASE WHEN classifications. They build and iterate on business logic without touching the underlying table wiring.

What's next

RELATED() reference — all join patterns (PK-to-FK, FK-to-PK, FK-to-FK) with syntax details
EXTEND() reference — enriching arrays with cross-entity data, aliasing, and VIA/BIND TO inference
Historical data functions — SCD_AT_TIME() and ACCUMULATION_AT_TIME() for querying patient records as of any point in time

Suggest changes to this page

Last update at: 2026/06/20 10:08:10

On this page

The data model Define entities and map external data Which patients are repeat visitors?What did each patient's most recent visit cost?Which visit was each patient's most expensive?Classify patients by risk tier Putting it all together What's next