FeatureQL for the Impatient
This guide helps SQL practitioners understand FeatureQL by comparing it with familiar SQL concepts.
Core concepts
Features are functions
In SQL, we create one table from other tables.
In FeatureQL, we define features as formulas based on other features.
Unlike tables, when we persist features with CREATE FEATURES AS, no data is evaluated. Only their definition is stored in the feature registry, similar to a view or function in SQL.
Think of features as database functions with automatic parameter detection. When we reference other features, they automatically become inputs to our formula.
Features are pure functions: their output depends only on their inputs. The only exception is the EXTERNAL functions, which map external data to features.
We can evaluate features by binding values to their inputs using the BIND functions.
Simple example step by step
Let's create a customer segment.
First, the minimal setup:
Now let's create some mockup external data:
@fql-playground(impatient_create_table_customer1)
We can now map features to existing table columns by using the EXTERNAL_COLUMNS function:
Then finally add our business logic:
Persisted and local definitions
Persisted features (stored in the registry):
- Defined with the
CREATE FEATURES ASstatement - Available everywhere in the organization
- Namespaced (e.g.,
fm.finance.revenue,fm.marketing.campaign_roi) - Have granular permissions
Local features (defined in the WITH clause):
- Defined in the query scope
- Override persisted features if same name
- Useful for prototyping
In practice, core features like CUSTOMER_ID, CUSTOMER_LAST_LOGIN_DATE would have been persisted in the registry using this query:
End-users would then just use the features in their queries like this:
Presentation layer and hybrid queries for analytics
Feature definitions are typically used to create fine-grained datasets for analytics.
But the ultimate goal is to present data to the end-user, often filtered or at a coarse level of aggregation.
This is typically done in a presentation layer.
For maximum flexibility, you can also mix SQL and FeatureQL to integrate with existing SQL based clients: the SQL part of the query is generated by the visualisation tool itself.
@fql-playground(impatient_7_hybrid)
Structural patterns
All JOIN and GROUP BY operations in feature definitions are described using the RELATED() function.
To illustrate the main structural patterns, let's create more mockup data for our ecommerce dataset.
A second table containing customer information:
@fql-playground(impatient_create_table_customer2)
A table of orders with a foreign key to the customers table.
@fql-playground(impatient_create_table_orders)
Join PK on PK
Informations related to the customer can be queried without join:
@fql-playground(impatient_8_join_pk_on_pk)
Join FK on PK
Informations related to the order can be queried by joining on the customer ID.
@fql-playground(impatient_9_join_fk_on_pk)
Join PK on FK
If we want to get the revenue for each customer, we can group orders by customer ID and sum the revenue, then join the customer information on this table.
@fql-playground(impatient_10_join_pk_on_fk)
Join FK on FK
Let's imagine we want to compute the percentage that represents each order in the total revenue for each customer. We need to join the orders table on the customer ID and then join the customer information on this table.
@fql-playground(impatient_11_join_fk_on_fk)
Array of rows operations
FeatureQL promotes ECM (Entity Centric Modeling) that stores facts as arrays or arrays of rows within each dimension table (also known as One Big Table modeling).
FeatureQL has then strong support for ARRAY and ARRAY of ROWs that are the core data structures of ECM.
| Operation | Syntax | Description | Scope |
|---|---|---|---|
| Access rows | <array_of_rows>[1] or <array_of_rows>[1:3] | Extract one or a range of rows as an array of rows | rows |
| Access fields | <array_of_rows>[amount] | Extract fields from an array of rows as a array of values (one field) or an array of rows (many fields) | fields |
| Unwrap | UNWRAP(<array_of_rows>) | Unwraps an array of one field rows as a array of values (convenient when the name of the field is not named) | fields |
| Concatenate | ARRAY_CONCAT(<array_of_rows1>, <array_of_rows2> (DEDUPLICATED ON <field1>, <field2>)) | Concatenates the rows of the two arrays of rows, possibly deduplicated on one or many fields values | rows |
| Zip | ZIP(<array_of_rows1>, <array_of_rows2>) | Zips one or many arrays into an array of rows | fields |
| Merge (by positions) | ARRAY_MERGE(<array_of_rows1>, <array_of_rows2>) | Combines two or more arrays of rows | fields |
| Merge (by values) | ARRAY_MERGE(<array_of_rows1>, <array_of_rows2> (JOINED ON <field1>, <field2>)) | Merge the rows of two arrays of rows by joining on a field | fields and rows |
| Extend | EXTEND(<array_of_rows> WITH <feature>) | Adds features as fields to an array of rows by joining on an existing field | fields |
| Transform | TRANSFORM(<array_of_rows>, <subquery>) | Applies transformations to an array of rows (map, filter, aggregate) | fields and rows |
Example
Let's imagine we want to compute the top 3 categories of products bought by each customer in the last 84 days.
We imagine we have a OBT table that contains the list of orders for each customer and their date of purchase.
@fql-playground(impatient_create_customer_obt)
We can use TRANSFORM() to filter the 84 days window and then use EXTEND() to add the product categories to the array of rows. Then use TRANSFORM() again to group by order category and count the number of orders.
@fql-playground(impatient_12_transform_customer_obt)
Developer tools
To be efficient in a collaborative environment, FeatureQL provides a set of developer tools to help you write and test your features.
Metaprogramming
@echo and @literal compute values at compilation time.
This is useful for computed literal parameters without having to use external ways of generating text such as jinja templates.
QUERY := @literal(`SELECT * FROM dim_customers WHERE date = '` || CURRENT_DATE() || `'`), Testing and reusability
VARIANT(): Create feature variations (e.g., test different thresholds, swap sources, etc.)MACRO(): Define reusable functions in FeatureQLUDF(): Define reusable functions in the language of the backend database
Business functions
FeatureQL provides specialized functions for analytics and machine learning in enterprise environments.
Historical
SCD_AT_TIME(): Get the state of a slowly changing dimension (SCD type 2) at a certain point in timeACCUMULATION_AT_TIME(): Get the state of an accumulation of facts at a certain point in time
Experimentation
HASH01(): Generate a deterministic value between 0 and 1 based on the input string.GRADUAL_ROLLOUT(): Enable controlled, time based feature deployment with deterministic assignment. Unlike simple percentage rollouts, it smoothly transitions from 0% to 100% coverage over a defined time window.
Growth accounting
TO_ACTIVITY_BITS(): Converts an array of dates to a dense 60 days bitwise representation called activity bits.FROM_ACTIVITY_BITS(): Converts the activity bits back to an array of dates.IS_ACTIVE(): Checks if a user was active within a time window using the activity bits.COUNT_ACTIVE(): Counts the number of active days within a time window using the activity bits.RECENCY(): Returns days since last activity (0 = active today) using the activity bits.ACTIVITY_STATUS(): Classifies users into lifecycle stages by comparing activity across time periods using the activity bits.
Simulation
SAMPLE_VALUE(): Samples from a discrete distribution with custom probabilities.SAMPLE_UNIFORM(): Samples from a uniform distribution between min and max values.SAMPLE_GAUSSIAN(): Samples from a normal distribution between two bounds.PROBABILITY_TO_EVENT(): Returns the discrete event (true or false) with the given probability based on the given hashing key.
Discovery tools
Discovery tools are used to explore metadata about the features and the system.
SHOW FEATURES: Browse the catalog of featuresSHOW CREATE FEATURES <name>: See the definition of a featureSHOW FUNCTION <function>: Get the documentation of a functionSHOW DOCS: Browse the documentation