Simulation functions
Simulation functions generate deterministic pseudo-random values for Monte Carlo simulations, synthetic data generation, and probabilistic modeling. All functions use stable hashing to ensure reproducibility across runs.
Core principle: Deterministic randomness
All simulation functions require a string key that gets hashed to produce consistent "random" values. The same key always produces the same result, enabling:
- Reproducible simulations
- Distributed computation without coordination
- Time travel queries with consistent results
- A/B testing with stable assignments
Discrete sampling
SAMPLE_VALUE()
Samples from a discrete distribution with custom probabilities.
SAMPLE_VALUE(
key: VARCHAR, -- Unique identifier for deterministic sampling
values: ARRAY(T), -- Possible values to sample from
weights: ARRAY(NUMERIC) -- Relative weights (not required to sum to 1)
) -> T
sql
Parameters
key
: String that determines the random seed (e.g., user_id + salt)values
: Array of possible outcomesweights
: Relative probabilities (automatically normalized)
Example: Single sampling
Example: Distribution validation
Use cases
- User segmentation with custom proportions
- Weighted random selection for recommendations
- Simulating categorical outcomes with known distributions
Continuous distributions
SAMPLE_UNIFORM()
Samples from a uniform distribution between min and max values.
SAMPLE_UNIFORM(
key: VARCHAR,
min_value: DOUBLE,
max_value: DOUBLE
) -> DOUBLE
sql
Parameters
key
: Deterministic seed stringmin_value
: Lower bound (inclusive)max_value
: Upper bound (exclusive)
Example: Individual samples
Example: Distribution verification
Use cases
- Random price variations within a range
- Simulating uniform arrival times
- Bootstrap sampling for confidence intervals
SAMPLE_GAUSSIAN()
Samples from a truncated Gaussian (normal) distribution.
SAMPLE_GAUSSIAN(
key: VARCHAR,
mean: DOUBLE, -- Distribution center
std_dev: DOUBLE, -- Standard deviation
min_value: DOUBLE, -- Truncation lower bound
max_value: DOUBLE -- Truncation upper bound
) -> DOUBLE
sql
Parameters
key
: Deterministic seed stringmean
: Center of the distributionstd_dev
: Standard deviation (spread)min_value
: Minimum possible value (truncation)max_value
: Maximum possible value (truncation)
Example: Individual samples
Example: Bell curve validation
Use cases
- Simulating measurement errors
- Generating realistic human characteristics (height, weight)
- Modeling response times with natural variation
Probabilistic events
PROBABILITY_TO_EVENT()
Converts a probability into a deterministic boolean outcome.
PROBABILITY_TO_EVENT(
key: VARCHAR,
probability: DOUBLE -- Between 0.0 and 1.0
) -> BOOLEAN
sql
Parameters
key
: Deterministic seed stringprobability
: Event probability (0.0 = never, 1.0 = always)
Example: Individual events
Example: Probability validation
Use cases
- Conversion modeling with known probabilities
- Simulating binary outcomes (churn, purchase, click)
- Monte Carlo simulations for risk assessment
Practical patterns
Customer lifetime value simulation
WITH
CUSTOMER_ID := INPUT(BIGINT),
-- Simulate purchase probability based on segment
WILL_PURCHASE := PROBABILITY_TO_EVENT(
'purchase_' || CUSTOMER_ID::VARCHAR,
CASE customer_segment
WHEN 'high_value' THEN 0.8
WHEN 'medium_value' THEN 0.5
ELSE 0.2
END
),
-- Simulate order value if purchasing
ORDER_VALUE := CASE
WHEN WILL_PURCHASE THEN
SAMPLE_GAUSSIAN(
'value_' || CUSTOMER_ID::VARCHAR,
100.0, -- mean
30.0, -- std dev
10.0, -- min
500.0 -- max
)
ELSE 0.0
END
sql
A/B test simulation
-- Assign users to variants with specific proportions
VARIANT := SAMPLE_VALUE(
'test_v1_' || USER_ID::VARCHAR,
ARRAY['control', 'variant_a', 'variant_b'],
ARRAY[50, 25, 25] -- 50% control, 25% A, 25% B
)
sql
Synthetic data generation
-- Generate realistic user attributes
WITH
USER_ID := BIND_TABLE(SEQUENCE(1, 10000)),
SELECT
USER_ID,
AGE := ROUND(SAMPLE_GAUSSIAN(
'age_' || USER_ID::VARCHAR,
35.0, 12.0, 18.0, 80.0
))::INT,
INCOME := ROUND(SAMPLE_GAUSSIAN(
'income_' || USER_ID::VARCHAR,
75000.0, 25000.0, 20000.0, 250000.0
)),
IS_SUBSCRIBER := PROBABILITY_TO_EVENT(
'sub_' || USER_ID::VARCHAR,
0.15 -- 15% subscription rate
)
sql
Monte Carlo risk assessment
-- Simulate 1000 scenarios for project completion
WITH
SCENARIO_ID := BIND_TABLE(SEQUENCE(1, 1000)),
-- Each task has uncertain duration
TASK1_DAYS := SAMPLE_GAUSSIAN(
'task1_' || SCENARIO_ID::VARCHAR,
10.0, 2.0, 5.0, 20.0
),
TASK2_DAYS := SAMPLE_GAUSSIAN(
'task2_' || SCENARIO_ID::VARCHAR,
15.0, 3.0, 8.0, 30.0
),
TOTAL_DAYS := TASK1_DAYS + TASK2_DAYS
SELECT
APPROX_PERCENTILE(TOTAL_DAYS, 0.5) AS p50,
APPROX_PERCENTILE(TOTAL_DAYS, 0.9) AS p90,
APPROX_PERCENTILE(TOTAL_DAYS, 0.95) AS p95
sql
Best practices
- Key design: Always include a unique salt in keys to ensure independence between different simulations
- Validation: Test distributions with large samples to verify expected behavior
- Truncation: Use reasonable bounds for Gaussian sampling to avoid extreme outliers
- Weight normalization: Weights in SAMPLE_VALUE don't need to sum to 1; they're automatically normalized
- Reproducibility: Store keys used in simulations for audit and debugging purposes
On this page
Core principle: Deterministic randomnessDiscrete samplingSAMPLE_VALUE()ParametersExample: Single samplingExample: Distribution validationUse casesContinuous distributionsSAMPLE_UNIFORM()ParametersExample: Individual samplesExample: Distribution verificationUse casesSAMPLE_GAUSSIAN()ParametersExample: Individual samplesExample: Bell curve validationUse casesProbabilistic eventsPROBABILITY_TO_EVENT()ParametersExample: Individual eventsExample: Probability validationUse casesPractical patternsCustomer lifetime value simulationA/B test simulationSynthetic data generationMonte Carlo risk assessmentBest practices