FeatureQL for the Impatient

This guide helps SQL practitioners understand FeatureQL by comparing it with familiar SQL concepts.

Core concepts

Features are functions

In SQL, you create a table from other tables. In FeatureQL, you compute features from other features.

Contrarily to tables, when you persist features with CREATE FEATURES AS, no data is evaluated, only their definition is stored in the feature registry, similar to a view or function in SQL.

Think of features as database functions with automatic parameter detection. When you reference other features, they automatically become inputs to your formula.

With the exception of EXTERNAL functions that fetch external data, features are pure functions. Their output depends only on their inputs.

You can then apply your features/functions to some inputs to get an evaluation using BIND functions.

Simple example: Building step by step

Let's create a customer segment.

First, the minimal setup:

Note: WITH is used to define features not returned in the result set.

Now let's create some mockup external data:

We can now map features to columns bu using the EXTERNAL_SQL function:

Then finally add your business logic:

SQL equivalent comparison

What you wantSQLFeatureQL
Get dataSELECT * FROM customers WHERE id IN (1,2)CUSTOMER_DETAILS := EXTERNAL_SQL(...) then BIND_TABLE(ARRAY[1,2])
TransformSELECT id, name, last_login > '2025-01-01' AS is_active FROM ...CUSTOMER_IS_ACTIVE := CUSTOMER_DETAILS[last_login_date] > '2025-01-01'
Join on PKSELECT * FROM t1 JOIN t2 ON t1.id = t2.idFeatures with same INPUT automatically align
Join on FKSELECT * FROM orders o JOIN customers c ON o.customer_id = c.idADD_FIELDS() binds foreign key to entity

Feature organization

Catalog vs local definitions

Catalog features (persisted globally):

  • Defined once by your data team
  • Available everywhere in your organization
  • Namespaced (e.g., fm.finance.revenue, fm.marketing.campaign_roi)
  • Have granular permissions

Local features (in WITH or SELECT clause):

  • Defined in your query
  • Override catalog features if same name
  • Useful for prototyping

In practice, core features like CUSTOMER_ID, CUSTOMER_LAST_LOGIN_DATE have been created in the catalog by the data team with this:

So end-users can just write:

Which is incredibly similar to SQL.

Entities and INPUT relationship

ENTITY() creates a semantic label for your data domain. INPUT() parameters can reference entities by type annotation:

CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),  -- CUSTOMER_ID references the CUSTOMERS entity as a BIGINT key
sql

This clarify the nature of features, prevents mixing incompatible features (e.g., accidentally joining customer features with product IDs) and allow automated binding with ADD_FIELDS().

Working with features

Type system

  • Automatic inference: Types are inferred for pure transformations
  • Explicit at boundaries: Only INPUT() and EXTERNAL_XXX() need type declarations
  • No coercion: Cast explicitly when needed with ::TYPE or CAST(x AS TYPE)
  • Advanced types: Use ARRAY, ROW and ARRAY(ROW(...)) all the time

Metaprogramming

@echo and @literal compute values at compilation time.

This is useful for computed literal parameters without having to use external ways of generating text such as jinja templates.

QUERY := @literal(`SELECT * FROM dim_customers WHERE date = '` || CURRENT_DATE() || `'`),
sql

Hybrid queries

Mix SQL and FeatureQL to integrate with existing SQL based clients.

SELECT * FROM (
    FEATUREQL(
        SELECT CUSTOMER_ID := BIND_TABLE(ARRAY[1,2]), CUSTOMER_NAME
    )
) WHERE customer_name LIKE 'J%'
sql

Array of rows operations

FeatureQL promotes ECM (Entity Centric Modeling) that stores related data as arrays within each entity (also known as One Big Table modeling). It has then strong support for ARRAY and ARRAY of ROW that are the core data structures of ECM.

Core operations

  • Access: ORDERS[amount] gets amount field from array of order rows
  • Zip: ZIP(<array_of_rows1>, <array_of_rows2>) zips two arrays as an array of rows
  • Combine: ARRAY_MERGE(<array_of_rows1>, <array_of_rows2>) combines array of rows
  • Transform: TRANSFORM(<array_of_rows>, <subquery>) applies transformations to an array of rows
  • Enrich: ADD_FIELDS() adds fields to array of rows by joining on an existing field

Relationship handling

Instead of SQL joins, use ADD_FIELDS() to enrich rows with related entity data:

ADD_FIELDS(<feature> TO <array_of_rows> BINDING FIELDS <field_foreign_key> WITH <feature_primary_key>)

-- Each order gets customer information added
ENRICHED_ORDERS := ADD_FIELDS(
    CUSTOMER_NAME,                  -- Feature at CUSTOMER_ID level to add
    TO ROW(ORDER_CUSTOMER_ID),      -- Foreign key field as a ROW to join on
    BINDING FIELDS ORDER_CUSTOMER_ID WITH CUSTOMER_ID -- Bind the foreign key field to the primary key feature
)
sql

Transform an array of rows

You can apply any transform, filter, reduce operation on an array of rows using a query in the TRANSFORM function.

SELECT ORDERS.TRANSFORM(SELECT SUM(price) GROUP BY ORDER_CUSTOMER_ID AS sum_price)
sql

This query needs to contain only "pure" transformations. If you need to join with external data, you must add them first to your array of rows with ARRAY_MERGE() or ADD_FIELDS() before applying TRANSFORM().

Advanced patterns

Historical data and time travel

  • SCD_AT_TIME(): Get slowly changing dimension type 2 values at specific time
  • ACCUMULATION_AT_TIME(): Get accumulated facts up to specific time

Testing and reusability

  • VARIANT(): Create feature variations (e.g., test different thresholds)
  • MACRO(): Define reusable FeatureQL functions
  • UDF(): Import functions from your database

Developer tools

SHOW FEATURES;                    -- Browse catalog
SHOW CREATE FEATURES <name>;      -- See definition
SHOW FUNCTION <function>;         -- Get function docs
SHOW DOCS;                        -- Browse documentation
sql

Key differences from SQL

  1. Primary key thinking: INPUT() represents your row identifier (like a PK)
  2. No implicit behavior: No NULL coalescing, no type coercion
  3. Dependency tracking: Reference a feature = automatic dependency
  4. Nested data model: Arrays of rows instead of normalized tables
  5. Functional purity: Features are formulas, not stored data
Last update at: 2025/10/13 10:23:46