Historical functions
FeatureQL treats time as a first-class concept. Two functions — SCD_AT_TIME() and ACCUMULATION_AT_TIME() — let you answer "what was the state at time X?" without writing complex temporal joins.
Slowly changing dimensions with SCD_AT_TIME()
Many business attributes change over time: a customer's loyalty tier, an order's delivery status, a product's price. When these changes are stored as an array of timestamped updates (SCD type 2), SCD_AT_TIME() extracts the value that was valid at any point in time.
SCD_AT_TIME(scd_array, point_in_time) The scd_array is an ARRAY(ROW(time_update TIMESTAMP, value T)) — each row records when a value took effect. The function returns the most recent value where time_update <= point_in_time, or NULL if the timestamp is before the first update.
In this example, order 100 was marked DELAYED at 15:51, then CANCELLED at 15:56. Querying at 15:50 returns NULL (no status yet), at 15:55 returns DELAYED, and at 16:00 returns CANCELLED:
WITH
ORDERS := ENTITY(),
ORDER_ID := INPUT(BIGINT#ORDERS),
POINT_IN_TIME := INPUT(TIMESTAMP),
ORDER_SOURCE := EXTERNAL_SQL(
`SELECT * FROM (VALUES
(100, 12.51, ARRAY[
ROW(TIMESTAMP '2022-05-05 15:51:48', 'DELAYED'),
ROW(TIMESTAMP '2022-05-05 15:56:01', 'CANCELLED')
]),
(101, 13.20, ARRAY[ROW(TIMESTAMP '2022-05-05 15:52:06', 'DELIVERED')]),
(102, 15.06, ARRAY[ROW(TIMESTAMP '2022-05-05 15:57:09', 'DELIVERED')])
) AS t(order_id, price, order_status_scd)`
ON `SELF.ORDER_ID = %ORDER_ID`
AS ROW(order_id BIGINT#ORDERS, price DECIMAL(10, 2), order_status_scd ARRAY(ROW(time_update TIMESTAMP, value VARCHAR)))
),
ORDER_STATUS_SCD := ORDER_SOURCE[order_status_scd],
(ORDER_ID, POINT_IN_TIME) := BIND_VALUES(ARRAY[
ROW(100, TIMESTAMP '2022-05-05 15:50:00'),
ROW(100, TIMESTAMP '2022-05-05 15:55:00'),
ROW(100, TIMESTAMP '2022-05-05 16:00:00'),
ROW(101, TIMESTAMP '2022-05-05 15:50:00'),
ROW(101, TIMESTAMP '2022-05-05 16:00:00'),
]),
SELECT
ORDER_ID,
POINT_IN_TIME,
SCD_AT_TIME(ORDER_STATUS_SCD, POINT_IN_TIME) as ORDER_STATUS_AT_TIME,
ORDER_STATUS_SCD
ORDER BY ORDER_ID, POINT_IN_TIME;| ORDER_ID BIGINT | POINT_IN_TIME TIMESTAMP | ORDER_STATUS_AT_TIME VARCHAR | ORDER_STATUS_SCD VARCHAR |
|---|---|---|---|
| 100 | 2022-05-05T15:50:00 | NULL | [{time_update: 2022-05-05T15:51:48, value: DELAYED}, {time_update: 2022-05-05T15:56:01, value: CANCELLED}] |
| 100 | 2022-05-05T15:55:00 | DELAYED | [{time_update: 2022-05-05T15:51:48, value: DELAYED}, {time_update: 2022-05-05T15:56:01, value: CANCELLED}] |
| 100 | 2022-05-05T16:00:00 | CANCELLED | [{time_update: 2022-05-05T15:51:48, value: DELAYED}, {time_update: 2022-05-05T15:56:01, value: CANCELLED}] |
| 101 | 2022-05-05T15:50:00 | NULL | [{time_update: 2022-05-05T15:52:06, value: DELIVERED}] |
| 101 | 2022-05-05T16:00:00 | DELIVERED | [{time_update: 2022-05-05T15:52:06, value: DELIVERED}] |
Lifecycle tracking with ACCUMULATION_AT_TIME()
Where SCD_AT_TIME() tracks a single value changing over time, ACCUMULATION_AT_TIME() tracks a collection of items with independent lifecycles. Each item has a time_create, time_update, and time_delete — and the function returns only the items that were active at a given moment.
ACCUMULATION_AT_TIME(events_array, point_in_time) An item is "active" at point_in_time if it was created on or before that time and either hasn't been deleted or was deleted after that time. This is the right tool for questions like "how many active orders did this customer have on January 10th?" or "which subscriptions were valid at the time of this charge?"
Here, customer 1 has two orders: order 10 (created May 5, deleted May 12) and order 13 (created May 8, never deleted). Watch how the active set changes across four points in time:
WITH
ORDERS := ENTITY(),
CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
POINT_IN_TIME := INPUT(TIMESTAMP),
CUSTOMER_SOURCE := EXTERNAL_SQL(
`SELECT
customer_id,
order_ids_details
FROM (VALUES
(1, ARRAY[
ROW(10, TIMESTAMP '2022-05-05 15:52:06', TIMESTAMP '2022-05-05 15:52:06', TIMESTAMP '2022-05-12 15:52:06'),
ROW(13, TIMESTAMP '2022-05-08 15:52:06', TIMESTAMP '2022-05-08 15:52:06', NULL)
])
) AS t(customer_id, order_ids_details)
WHERE %CUSTOMER_ID=customer_id`
AS ROW(customer_id BIGINT#CUSTOMERS, order_ids_details ARRAY(ROW(
order_id BIGINT#ORDERS,
time_create TIMESTAMP,
time_update TIMESTAMP,
time_delete TIMESTAMP
)))
),
ORDER_IDS_DETAILS := CUSTOMER_SOURCE[order_ids_details],
(CUSTOMER_ID, POINT_IN_TIME) := BIND_VALUES(ARRAY[
ROW(1, TIMESTAMP '2022-05-05 15:50:00'),
ROW(1, TIMESTAMP '2022-05-06 15:50:00'),
ROW(1, TIMESTAMP '2022-05-10 15:50:00'),
ROW(1, TIMESTAMP '2022-05-15 15:50:00')
])
SELECT
CUSTOMER_ID,
DATE_FORMAT(POINT_IN_TIME, '%Y-%m-%dT%H:%M:%S') as POINT_IN_TIME_F,
ACCUMULATION_AT_TIME(ORDER_IDS_DETAILS, POINT_IN_TIME)[ORDER_ID] AS ORDER_IDS_AT_TIME,
;| CUSTOMER_ID BIGINT | POINT_IN_TIME_F VARCHAR | ORDER_IDS_AT_TIME VARCHAR |
|---|---|---|
| 1 | 2022-05-05T15:50:00 | [] |
| 1 | 2022-05-06T15:50:00 | [10] |
| 1 | 2022-05-10T15:50:00 | [10, 13] |
| 1 | 2022-05-15T15:50:00 | [13] |
Point-in-time feature engineering with EXTEND
The real power of these functions emerges when you combine them with EXTEND() to compute features at the time of an event. This is the foundation of point-in-time correct feature engineering — computing "what did we know when this happened?" rather than "what do we know now?"
A common pattern: for each order, compute how many orders the customer had placed before this one. This requires crossing entity boundaries (order → customer) while carrying a temporal context:
WITH
-- ORDER LEVEL INFORMATION (PRIMARY KEY)
ORDERS := ENTITY(),
ORDER_ID := INPUT(BIGINT#ORDERS),
POINT_IN_TIME_ORDER := DATE_ADD(ORDER_DATE, 'SECOND', -1),
ORDER_SOURCE := EXTERNAL_SQL(
`SELECT * FROM (VALUES
(10, 1, TIMESTAMP '2024-01-01 11:11:00'),
(13, 1, TIMESTAMP '2024-03-03 11:11:00'),
(11, 2, TIMESTAMP '2024-01-02 12:12:00'),
(14, 2, TIMESTAMP '2024-01-17 12:12:00'),
(15, 2, TIMESTAMP '2024-03-02 12:12:00'),
(12, 3, TIMESTAMP '2024-01-03 13:13:00')
) AS t(order_id, order_customer_id, time_create) WHERE order_id=%ORDER_ID`
AS ROW(order_id BIGINT#ORDERS, order_customer_id BIGINT#CUSTOMERS, time_create TIMESTAMP)
),
ORDER_CUSTOMER_ID := ORDER_SOURCE[order_customer_id],
ORDER_DATE := ORDER_SOURCE[time_create],
-- CUSTOMER LEVEL INFORMATION (FOREIGN KEY)
CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),
POINT_IN_TIME_CUSTOMER := INPUT(TIMESTAMP),
CUSTOMER_SOURCE := EXTERNAL_SQL(
`SELECT
customer_id,
order_ids_details
FROM (VALUES
(1, ARRAY[
ROW(10, TIMESTAMP '2024-01-01 11:11:00', TIMESTAMP '2024-01-01 11:11:00', NULL),
ROW(13, TIMESTAMP '2024-03-03 11:11:00', TIMESTAMP '2024-03-03 11:11:00', NULL)
]),
(2, ARRAY[
ROW(11, TIMESTAMP '2024-01-02 12:12:00', TIMESTAMP '2024-01-02 12:12:00', NULL),
ROW(14, TIMESTAMP '2024-01-17 12:12:00', TIMESTAMP '2024-01-17 12:12:00', NULL),
ROW(15, TIMESTAMP '2024-03-02 12:12:00', TIMESTAMP '2024-03-02 12:12:00', NULL)
]),
(3, ARRAY[
ROW(12, TIMESTAMP '2024-01-03 13:13:00', TIMESTAMP '2024-01-03 13:13:00', NULL)
])
) AS t(customer_id, order_ids_details) WHERE customer_id=%CUSTOMER_ID`
AS ROW(customer_id BIGINT#CUSTOMERS, order_ids_details ARRAY(ROW(
order_id BIGINT#ORDERS,
time_create TIMESTAMP,
time_update TIMESTAMP,
time_delete TIMESTAMP
)))
),
ORDER_IDS_DETAILS := CUSTOMER_SOURCE[order_ids_details],
NUM_ORDERS_AT_TIME := ARRAY_LENGTH(ACCUMULATION_AT_TIME(ORDER_IDS_DETAILS, POINT_IN_TIME_CUSTOMER))
SELECT
ORDER_ID := BIND_VALUES(ARRAY[10, 11, 12, 13, 14, 15]),
ORDER_CUSTOMER_ID,
ORDER_DATE,
EXTEND(
ROW(ORDER_CUSTOMER_ID as ocid, POINT_IN_TIME_ORDER as pito)
WITH NUM_ORDERS_AT_TIME AS NUM_ORDERS_AT_TIME
VIA ocid, pito BIND TO CUSTOMER_ID, POINT_IN_TIME_CUSTOMER
)[NUM_ORDERS_AT_TIME] AS NUM_ORDERS_AT_TIME_OF_ORDER,
;| ORDER_ID BIGINT | ORDER_CUSTOMER_ID BIGINT | ORDER_DATE TIMESTAMP | NUM_ORDERS_AT_TIME_OF_ORDER VARCHAR |
|---|---|---|---|
| 10 | 1 | 2024-01-01T11:11:00 | 0 |
| 11 | 2 | 2024-01-02T12:12:00 | 0 |
| 12 | 3 | 2024-01-03T13:13:00 | 0 |
| 13 | 1 | 2024-03-03T11:11:00 | 1 |
| 14 | 2 | 2024-01-17T12:12:00 | 1 |
| 15 | 2 | 2024-03-02T12:12:00 | 2 |
The query uses ACCUMULATION_AT_TIME() at the customer level to count active orders, then EXTEND() passes each order's timestamp into the customer-level computation. The result: order 10 sees 0 prior orders (it's the first), while order 15 sees 2 (orders 11 and 14 were already placed).
When to use each function
| Function | Use when... | Example |
|---|---|---|
SCD_AT_TIME() | A single attribute changes over time | Customer tier, order status, product price |
ACCUMULATION_AT_TIME() | A collection of items has create/delete lifecycles | Active orders, valid subscriptions, open tickets |
EXTEND() + temporal functions | You need point-in-time features across entity boundaries | "Customer's order count when they placed this order" |