Bind keyset
FeatureQL's key selection approach varies between online and offline evaluation scenarios, requiring different strategies for defining the scope of feature computation.
Online vs. Offline evaluation patterns
Real-time serving:
- Precise key specification: "Calculate revenue for user_id = 1"
- Real-time feature serving with specific entity instances
- Typically involves
BIND_VALUE()or smallBIND_VALUES()operations
Batch analytics:
- Subset selection: "Calculate revenue for all orders from yesterday"
- Batch processing across entity populations
- Requires efficient key space definition and filtering
The @BIND_KEYSET() function is designed specifically for offline analytics scenarios, enabling sophisticated key selection based on business criteria while maintaining FeatureQL's entity-centric feature evaluation principles.
How it works
@BIND_KEYSET() is a metaprogramming function (prefixed with @) that resolves at compile time. It works in two steps:
- Define a keyset using the
KEYSET()function when creating persisted features. A keyset associates an entity with a SQL query that returns the set of keys for that entity. - Reference the keyset using
@BIND_KEYSET()in theFORclause of an evaluation query. At compile time,@BIND_KEYSET()expands into the appropriateBIND_SQL()call.
Syntax
Defining a keyset (in CREATE FEATURES):
<keyset_name> := KEYSET(
'<label>',
<entity>,
'<SQL query>',
ARRAY['<table>', ...] -- optional
)Using a keyset (in evaluation queries):
FOR
<input> := @BIND_KEYSET(<label>, <entity> [, <param1>, ...])| Parameter | Description |
|---|---|
label | The keyset label defined in KEYSET() (e.g., ALL, BY_DATE) |
entity | The entity reference (can be a local alias or a fully qualified name like FM.NAMESPACE.ENTITY) |
param1, param2, ... | Optional parameters substituted into $1, $2, ... placeholders in the keyset SQL |
Basic usage: single keyset
The simplest case defines a keyset with a static SQL query and uses it to bind all entity keys.
First, create the entity and its keyset:
CREATE OR REPLACE FEATURES IN FM.TUTORIALS.KEYSETS1 AS
SELECT
ORDERS := ENTITY(),
ORDER_KEYSET1 := KEYSET('ALL', ORDERS, `SELECT order_id FROM (VALUES (10), (11), (12)) AS t(order_id)`)
;| feature_name VARCHAR | status VARCHAR | message VARCHAR |
|---|---|---|
| FM.TUTORIALS.KEYSETS1.ORDERS | CREATED | Feature created as not exists |
| FM.TUTORIALS.KEYSETS1.ORDER_KEYSET1 | CREATED | Feature created as not exists |
Then use @BIND_KEYSET() to evaluate features for all keys in the set:
WITH
ORDERS := FM.TUTORIALS.KEYSETS1.ORDERS,
ORDER_ID := INPUT(BIGINT#ORDERS)
SELECT
ORDER_ID,
FOR
ORDER_ID := @bind_keyset(ALL, ORDERS)
;| ORDER_ID BIGINT |
|---|
| 10 |
| 11 |
| 12 |
The @BIND_KEYSET(ALL, ORDERS) call resolves to a BIND_SQL() that executes the SQL query defined in the ORDER_KEYSET1 keyset, returning order IDs 10, 11, and 12.
Multiple keysets with table disambiguation
When an entity has multiple keysets with the same label, @BIND_KEYSET() automatically selects the right one based on which tables the query references. This is useful when different data sources have different key populations.
First, set up two tables with different data:
/* SQL */
CREATE SCHEMA IF NOT exists keysets;
--
CREATE OR REPLACE TABLE keysets.table1 AS SELECT num as key_id, 'table1_' || num::varchar as value FROM UNNEST(ARRAY[1,2,3]) AS t(num);
CREATE OR REPLACE TABLE keysets.table2 AS SELECT num as key_id, 'table2_' || num::varchar as value FROM UNNEST(ARRAY[1,2,3]) AS t(num);| Count BIGINT |
|---|
| 3 |
Create two keysets with the same 'SHARED' label, each associated with a different table:
CREATE OR REPLACE FEATURES IN FM.TUTORIALS.KEYSETS2 AS
SELECT
KEYS := ENTITY(),
KEYSET_TABLE1 := KEYSET('SHARED', KEYS, `SELECT 1 as key_id`, ARRAY['keysets.table1']),
KEYSET_TABLE2 := KEYSET('SHARED', KEYS, `SELECT 2 as key_id`, ARRAY['keysets.table2']),
;| feature_name VARCHAR | status VARCHAR | message VARCHAR |
|---|---|---|
| FM.TUTORIALS.KEYSETS2.KEYS | CREATED | Feature created as not exists |
| FM.TUTORIALS.KEYSETS2.KEYSET_TABLE1 | CREATED | Feature created as not exists |
| FM.TUTORIALS.KEYSETS2.KEYSET_TABLE2 | CREATED | Feature created as not exists |
When the query uses EXTERNAL_COLUMNS from table1, @BIND_KEYSET() automatically picks the keyset associated with table1:
WITH
KEY_ID := INPUT(BIGINT),
SELECT
VALUE := EXTERNAL_COLUMNS(
key_id BIGINT BIND TO KEY_ID,
value VARCHAR
FROM TABLE(keysets.table1)
)[value],
FOR
KEY_ID := @bind_keyset(SHARED, FM.TUTORIALS.KEYSETS2.KEYS)
;| VALUE VARCHAR |
|---|
| table1_1 |
And when the query references table2, the keyset for table2 is selected instead:
WITH
KEY_ID := INPUT(BIGINT),
SELECT
VALUE := EXTERNAL_COLUMNS(
key_id BIGINT BIND TO KEY_ID,
value VARCHAR
FROM TABLE(keysets.table2)
)[value],
FOR
KEY_ID := @bind_keyset(SHARED, FM.TUTORIALS.KEYSETS2.KEYS)
;| VALUE VARCHAR |
|---|
| table2_2 |
The fourth argument to KEYSET() — ARRAY['keysets.table1'] — tells the system which tables the keyset is associated with, enabling automatic disambiguation.
Parameterized keysets
Keyset SQL queries can include $1, $2, ... placeholders that are filled in at query time via additional arguments to @BIND_KEYSET(). This enables dynamic key selection based on runtime parameters.
Create tables and parameterized keysets (note the $1 placeholder in the SQL):
/* SQL */
CREATE SCHEMA IF NOT exists keysets;
--
CREATE OR REPLACE TABLE keysets.tablec1 AS SELECT num as key_id, 'tablec1_' || num::varchar as value FROM UNNEST(GENERATE_SERIES(1, 30)) AS t(num);
CREATE OR REPLACE TABLE keysets.tablec2 AS SELECT num as key_id, 'tablec2_' || num::varchar as value FROM UNNEST(GENERATE_SERIES(1, 30)) AS t(num);| Count BIGINT |
|---|
| 30 |
Then create the keysets that reference those tables, using $1 as a placeholder for the parameter that will be supplied at query time:
CREATE OR REPLACE FEATURES IN FM.TUTORIALS.KEYSETS2 AS
SELECT
KEYSC := ENTITY(),
KEYSET_TABLEC1 := KEYSET('SHARED', KEYSC, 'SELECT 1+$1 as key_id', ARRAY['keysets.tablec1']),
KEYSET_TABLEC2 := KEYSET('SHARED', KEYSC, 'SELECT 2+$1 as key_id', ARRAY['keysets.tablec2']),
;| feature_name VARCHAR | status VARCHAR | message VARCHAR |
|---|---|---|
| FM.TUTORIALS.KEYSETS2.KEYSC | CREATED | Feature created as not exists |
| FM.TUTORIALS.KEYSETS2.KEYSET_TABLEC1 | CREATED | Feature created as not exists |
| FM.TUTORIALS.KEYSETS2.KEYSET_TABLEC2 | CREATED | Feature created as not exists |
Pass a computed parameter to @BIND_KEYSET() — the value is substituted into $1 in the keyset SQL:
CONST
BASE := 5
WITH
KEY_ID := INPUT(BIGINT),
SELECT
VALUE := EXTERNAL_COLUMNS(
key_id BIGINT BIND TO KEY_ID,
value VARCHAR
FROM TABLE(keysets.tablec1)
)[value],
PARAM_1_OF_BINDKEYSET := VALUE(@literal(BASE*2))
FOR
KEY_ID := @bind_keyset(SHARED, FM.TUTORIALS.KEYSETS2.KEYSC, CAST(BASE*2 AS VARCHAR))
;| VALUE VARCHAR | PARAM_1_OF_BINDKEYSET BIGINT |
|---|---|
| tablec1_11 | 10 |
The same pattern works with different parameter values, selecting from the appropriate table:
CONST
BASE := 5
WITH
KEY_ID := INPUT(BIGINT),
SELECT
VALUE := EXTERNAL_COLUMNS(
key_id BIGINT BIND TO KEY_ID,
value VARCHAR
FROM TABLE(keysets.tablec2)
)[value],
PARAM_1_OF_BINDKEYSET := VALUE(@literal(BASE*4))
FOR
KEY_ID := @bind_keyset(SHARED, FM.TUTORIALS.KEYSETS2.KEYSC, CAST(BASE*4 AS VARCHAR))
;| VALUE VARCHAR | PARAM_1_OF_BINDKEYSET BIGINT |
|---|---|
| tablec2_22 | 20 |
Parameters are passed as strings (use CAST(... AS VARCHAR) for non-string values) and substituted into the keyset SQL at compile time via CONST evaluation.
Date-filtered keysets
A common pattern in batch analytics is filtering entity populations by date — for example, computing features for all customers who had activity on a specific day. Parameterized keysets make this straightforward.
Create a table with customer activity across multiple dates:
/* SQL */
CREATE SCHEMA IF NOT EXISTS keysets;
--
CREATE OR REPLACE TABLE keysets.customers_by_date AS
SELECT * FROM (VALUES
(1, DATE '2024-01-15', 100.0),
(2, DATE '2024-01-15', 200.0),
(3, DATE '2024-01-15', 150.0),
(1, DATE '2024-01-16', 110.0),
(2, DATE '2024-01-16', 220.0),
(3, DATE '2024-01-16', 165.0),
(4, DATE '2024-01-16', 300.0),
(5, DATE '2024-01-16', 250.0)
) AS t(customer_id, ds, revenue);| Count BIGINT |
|---|
| 8 |
Define a keyset with a date filter using the $1 placeholder:
CREATE OR REPLACE FEATURES IN FM.TUTORIALS.KEYSETS3 AS
SELECT
CUSTOMERS := ENTITY(),
KEYSET_BY_DATE := KEYSET('BY_DATE', CUSTOMERS, `SELECT DISTINCT customer_id FROM keysets.customers_by_date WHERE ds = '$1'`, ARRAY['keysets.customers_by_date']),
;| feature_name VARCHAR | status VARCHAR | message VARCHAR |
|---|---|---|
| FM.TUTORIALS.KEYSETS3.CUSTOMERS | CREATED | Feature created as not exists |
| FM.TUTORIALS.KEYSETS3.KEYSET_BY_DATE | CREATED | Feature created as not exists |
Query customers for a specific date — only customers active on that date are included:
CONST
TARGET_DATE := DATE '2024-01-15'
WITH
CUSTOMER_ID := INPUT(BIGINT),
SELECT
CUSTOMER_ID,
REVENUE := EXTERNAL_COLUMNS(
customer_id BIGINT BIND TO CUSTOMER_ID,
ds DATE BIND TO @literal(TARGET_DATE),
revenue DOUBLE
FROM TABLE(keysets.customers_by_date)
)[revenue],
FOR
CUSTOMER_ID := @bind_keyset(BY_DATE, FM.TUTORIALS.KEYSETS3.CUSTOMERS, TARGET_DATE)
;| CUSTOMER_ID BIGINT | REVENUE VARCHAR |
|---|---|
| 1 | 100.0 |
| 2 | 200.0 |
| 3 | 150.0 |
Change the date parameter to get a different customer population. Here, SECOND_DATE is computed from TARGET_DATE using DATE_ADD():
CONST
TARGET_DATE := DATE '2024-01-15',
SECOND_DATE := DATE_ADD(TARGET_DATE::TIMESTAMP, 'DAY', 1)
WITH
CUSTOMER_ID := INPUT(BIGINT),
SELECT
CUSTOMER_ID,
REVENUE := EXTERNAL_COLUMNS(
customer_id BIGINT BIND TO CUSTOMER_ID,
ds DATE BIND TO @literal(SECOND_DATE),
revenue DOUBLE
FROM TABLE(keysets.customers_by_date)
)[revenue],
FOR
CUSTOMER_ID := @bind_keyset(BY_DATE, FM.TUTORIALS.KEYSETS3.CUSTOMERS, SECOND_DATE)
;| CUSTOMER_ID BIGINT | REVENUE VARCHAR |
|---|---|
| 1 | 110.0 |
| 2 | 220.0 |
| 3 | 165.0 |
| 4 | 300.0 |
| 5 | 250.0 |
Notice how the BIND TO @literal(TARGET_DATE) in EXTERNAL_COLUMNS and the parameter in @BIND_KEYSET() work together: the keyset selects which customers to evaluate, while the bound literal filters the data source to the correct date partition.
Best practices
- Use descriptive keyset labels:
'ALL','BY_DATE','ACTIVE'— labels help document the intent of each keyset - Keep keyset SQL simple: The SQL query should only return key columns. Complex filtering logic belongs in features, not in keyset definitions
- Leverage table disambiguation: When multiple keysets share a label, associate each with its relevant tables so
@BIND_KEYSET()can auto-select the correct one - Parameterize for flexibility: Use
$1,$2placeholders for date ranges, cohort filters, or any dynamic criteria rather than creating separate keysets for each variation - Combine with
CONST: UseCONSTto define parameters once and reference them in both@BIND_KEYSET()andBIND TO @literal()for consistency