Design philosophy

FeatureQL looks like SQL. It borrows SQL's keywords, its expression syntax, and its general shape. But it is not SQL. It is a different kind of language that uses familiar syntax to solve a different problem, and some of its design decisions only make sense once you understand the goals behind them.

This page explains the principles behind FeatureQL's design: what problems it targets, what tradeoffs it makes, and why. For the mechanics of how each concept works in practice, follow the links to the relevant reference pages.

When to use FeatureQL, when to use SQL

FeatureQL is not a replacement for SQL. It is a complement, designed for a specific layer of the data stack.

Use SQL for data cleaning, conforming, and exploration. When you are in the silver layer, wrangling raw data into shape, SQL is the right tool. When you are exploring an unfamiliar dataset (SELECT * FROM table LIMIT 100), SQL is faster and more direct. When you need a one off aggregation you will run once and discard, SQL has less overhead.

Use FeatureQL when business logic matters enough to name, share, and reuse. This is the gold layer: customer segments, pricing rules, risk scores, KPIs. Logic that multiple people depend on, that needs to stay consistent across dashboards and APIs, that changes over time and must be auditable. This is where FeatureQL pays off.

The transition point is clear: the moment you find yourself copying a SQL calculation into a second query, that calculation should probably be a feature.

Think in entities, not in tables

SQL trains you to think in rows flowing through a pipeline: scan a table, filter rows, join more rows, group and aggregate. FeatureQL asks you to think differently: start with an entity (a customer, an order, a product), ask what you need to know about it, and express that as named features with explicit dependencies.

This is closer to how business people already think. "What is this customer's lifetime value?" is a question about an entity, not about a table. The answer depends on other things you know about the customer (their orders, their payments), which in turn depend on things you know about those orders (amounts, dates). FeatureQL lets you express this chain of reasoning directly, and the engine figures out the data fetching, joining, and aggregation.

If you have used spreadsheet formulas, the model will feel familiar. A feature is like a named cell formula: it takes inputs, references other named formulas, and produces a value. The "table" you see in query results is a presentation layer, not the core abstraction.

-- SQL thinking: "join these tables, filter, aggregate"
SELECT c.id, SUM(o.amount)
FROM customers c JOIN orders o ON c.id = o.customer_id
WHERE o.date > '2024-01-01'
GROUP BY c.id

-- FeatureQL thinking: "for a customer, what is their recent revenue?"
RECENT_REVENUE := CUSTOMER_ID.RELATED(
    SUM(ORDER_AMOUNT) GROUP BY ORDER_CUSTOMER_ID
)
sql

The FeatureQL version reads like a business rule: "a customer's recent revenue is the sum of their order amounts, grouped by the customer key on the orders." The SQL version reads like database instructions. Both compute the same thing; FeatureQL makes the intent visible.

Entity based reasoning in practice

The mental model shift shows up most clearly when working with related data across entities. In SQL, you think: "I need to JOIN orders to products to get categories, then JOIN back to customers to aggregate." In FeatureQL, you think in functional hops:

  1. I have a customer. They have a list of order IDs (an array).
  2. I enrich those order IDs with product categories ( EXTEND : add data from another entity).
  3. I filter and aggregate the enriched list ( TRANSFORM : compute within the array).

Each step is a clear operation: fetch related data at the graph level, then compute within the array. Once this pattern clicks (usually within a couple of weeks of daily use), it becomes more intuitive than SQL joins because each step maps to a business concept rather than a database operation.

The five design principles

The rest of this page covers the principles that shape FeatureQL as a language. Each principle explains why a design choice was made. The how (syntax, mechanics, edge cases) lives in the linked reference pages.

1. Purity: every feature is a deterministic function

Every feature in FeatureQL is a pure function: given the same inputs, it always returns the same output. There is no mutable state. There are no side effects within the feature graph. Features cannot read from external systems at evaluation time, modify data, or behave differently depending on when they run.

This purity is not accidental. It is the property that makes everything else work.

Because features are pure, the engine can evaluate them in any order, cache results freely, and guarantee that a feature produces the same value whether it runs in a DuckDB notebook, a Trino warehouse, or a DataFusion serving endpoint. Because features are pure, VARIANT() is safe: replacing a dependency produces a well defined new computation with no hidden interactions. Because features are pure, the registry can validate consistency statically without executing anything.

Where does non determinism go? Real systems need non deterministic values: the current date, random samples, external API responses. FeatureQL pushes these to two explicit boundaries:

  • Compile time constants (@literal(), @echo()) evaluate expressions before the feature graph runs, injecting the results as frozen values. @literal(NOW()) does not give you "the current time per row." It gives you "the time at which the query was compiled," identical for every row. SQL conflates "varies per evaluation" and "varies per row," and the result is subtle bugs when NOW() behaves differently across backends or execution plans. See Metaprogramming .
  • External data sources (EXTERNAL_COLUMNS, EXTERNAL_REDIS, EXTERNAL_HTTP) are the boundary where the outside world enters the feature graph. They are declared explicitly, their types are specified at the boundary, and their results are treated as inputs to the pure graph, not as computations within it. Once data crosses this boundary, everything downstream is deterministic. See Mapping data sources .

This two layer separation means you can always answer the question: "If I run this feature again with the same inputs and the same source data, will I get the same result?" The answer is always yes.

Hybrid SQL is scaffolding, not part of the language. FeatureQL queries can embed raw SQL using /* SQL */ blocks for setup (creating tables, loading data) and for incremental adoption (wrapping FeatureQL inside existing SQL workflows). But SQL blocks cannot be persisted in the feature registry. The registry is the system of record for business logic, and everything in it is pure FeatureQL with full type safety, dependency tracking, and cross backend portability. Think of it like a REPL session: you can run arbitrary commands to set up your environment, but only the feature definitions get saved. See Hybrid queries .

2. Strict types, permissive syntax

FeatureQL enforces strict type safety but stays flexible about how you write your code.

No implicit type coercion. FeatureQL transpiles to multiple backends (DuckDB, Trino, BigQuery, DataFusion), and each handles implicit coercion differently. A query that silently widens BIGINT to DOUBLE in DuckDB might truncate precision in BigQuery. By requiring explicit casts, FeatureQL guarantees that your feature produces the same result everywhere, not just on the backend you happened to test on. In practice, this is less painful than it sounds: integers adapt their type automatically to DECIMAL(N, 0) when combined with decimals, which covers the vast majority of business logic. All other types including floats/doubles cannot be mixed freely without a cast because the right resolution is genuinely ambiguous. See Types for the full coercion rules.

Canonical syntax with an adoption ramp. FeatureQL has one canonical syntax: := for assignment, ARRAY not LIST, ROW not STRUCT. In STRICT mode (the default), using a non canonical form returns an error with a precise suggestion on how to fix it. The FORMAT (FIX) command automatically normalizes any valid query to canonical form. The registry also normalizes stored features, so persisted definitions are always consistent. For teams migrating from different SQL dialects, NON STRICT mode temporarily accepts alternative forms (1 AS NAME, LIST, STRUCT, INT64) to reduce friction during adoption. See Friendly syntax .

The philosophy is: be strict about semantics (types, dependencies, purity), be friendly about surface syntax for smoother adoption.

3. Two levels of scope, nothing more

SQL queries can nest arbitrarily: subqueries inside subqueries inside CTEs inside views. This nesting is one of the main reasons complex SQL is hard to read, hard to debug, and hard to maintain.

FeatureQL has exactly two scope levels.

The feature graph is the main scope. Features reference other features, the engine resolves the dependency order, and everything evaluates in a flat DAG. There is no nesting. A feature that depends on another feature is not "inside" it; they are peers in the graph.

TRANSFORM inner queries are the second scope. When you operate on an array of rows (filtering, aggregating, sorting), the inner query runs in its own scope with access only to the fields inside that array. It cannot reference features from the outer graph, trigger external data fetches, or nest another TRANSFORM.

This restriction is deliberate, not a limitation. The inner query is pure and local: it sees only the data in the array, transforms it, and returns the result. If you need data from outside the array (another entity, an external source), you enrich the array first using EXTEND() at the graph level, then transform the enriched result. The pattern is always: fetch and join in the graph scope, compute in the transform scope.

This two level model means you never need to trace through nested scopes to understand what a feature does. The graph level tells you where data comes from and how entities relate. The transform level tells you how array data is processed. Nothing else.

TRANSFORM supports an IMPORT clause that brings feature definitions from the outer scope into the inner query. The imported definition re-evaluates against the array's fields: if the outer scope defines DOUBLED := base * 2, importing it into a TRANSFORM will use the base field from each array row, not the outer INPUT. Because imported features are re-evaluated inside the inner scope, they must be pure: any feature with a non-pure dependency in its graph will fail to compile. To import a frozen value rather than a definition, materialize it as a field of the array before transforming (using ARRAY_MERGE or EXTEND). See TRANSFORM for syntax and examples.

4. Entity annotations: types with meaning

FeatureQL extends its type system with entity annotations: BIGINT#CUSTOMERS means "a BIGINT that identifies a customer." This annotation is not a comment. It is part of the type, checked by the compiler, and used by operations like RELATED() and EXTEND() to resolve joins automatically.

Entity annotations serve three purposes:

They prevent join errors. You cannot accidentally join a customer ID to an order ID, even though both are BIGINT. The entity annotation distinguishes them at the type level.

They enable automatic relationship resolution. When you write RELATED(ORDER_CITY VIA LAST_ORDER_ID), FeatureQL knows that LAST_ORDER_ID is BIGINT#ORDERS and ORDER_CITY depends on ORDER_ID which is also BIGINT#ORDERS. The join path is unambiguous because the types encode the entity relationships. Unlike other languages such as DAX or traditional semantic layers, nothing is implicit in RELATED: the foreign key is specified explicitly, and the features define the primary key of the entity to join unequivocally. This avoids the multiple join path ambiguities that plague systems with implicit relationship resolution.

They document the data model. Reading CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS) tells you immediately what this feature represents, without checking a schema diagram or a wiki page.

In practice, entity annotations are mainly declared once at the data mapping boundary by data engineering teams, using EXTERNAL_COLUMNS or INLINE_COLUMNS. Business users who build features on top of these mappings rarely need to write entity annotations themselves; they just use features that already carry them. See Types for syntax and Mapping data sources for how annotations flow through source mappings.

5. Composition over construction

The core workflow in FeatureQL is not writing features from scratch. It is composing existing features into new ones. Several design choices serve this principle.

Named features are the unit of reuse. If someone has published CUSTOMER_LIFETIME_VALUE to the registry, you use it directly. You do not copy its logic, you do not need to know how it is computed, and you do not need to set up the data sources it depends on. All of that is resolved automatically from the registry.

VARIANT() enables safe experimentation. If a feature does almost what you need but one of its dependencies should be different, use VARIANT() to replace that dependency in your query without modifying the original. Your variant shares the entire dependency graph except for the one piece you replaced. This is the mechanism for what if analysis, A/B testing, and client specific customization. See Variants .

Visibility is controlled through naming, not keywords. FeatureQL does not have PUBLIC or PRIVATE keywords. When you persist features with CREATE FEATURES AS, every feature in the SELECT part becomes addressable in the registry. Features defined in the WITH part are persisted as dependencies but cannot be referenced or replaced by other users. This means the act of putting a feature in SELECT is the act of making it public. Consumers can see internal dependencies via SHOW CREATE FEATURES (auditable), but they cannot build on them (not addressable). See Create features .

Automatic dependency resolution. Features can reference other features defined anywhere in the same query, in any order. The engine topologically sorts the DAG. You write in whatever order makes the logic clearest.

Discoverability is built into the language. A language built on composition requires that you can discover what exists before you build on it. FeatureQL provides exploration commands directly in the query environment: SHOW FEATURES to browse the registry, SHOW CREATE FEATURES to inspect definitions, SHOW FUNCTION SIGNATURES for built in functions, and SHOW DOCS to browse documentation from inside a query. Dependency exploration is also built in: you can query the feature graph to understand what depends on what, trace upstream dependencies, and assess the impact of changing a feature. See Explore features .

How it all fits together

These five principles reinforce each other. Purity makes composition safe (you can combine features without worrying about hidden interactions). Strict types make composition correct (the compiler catches mismatches before anything runs). Two scope levels make composition readable (you never lose track of where you are). Entity annotations make composition automatic (the engine resolves join paths from the types). And the registry makes composition discoverable (you can find, inspect, and build on what others have published).

The result is a language where the default workflow is not writing from scratch, but assembling from existing pieces. This is the property that sets FeatureQL apart from SQL, semantic layers, and feature stores.

Last update at: 2026/03/18 16:18:57
Last updated: 2026-03-18 16:19:34