APPROX_DISTINCT() GROUP BY ...
All functions > GROUP BY > APPROX_DISTINCT() GROUP BY ...
Returns the approximate number of distinct values in the group.
Syntax
Diagram(
Sequence(
Terminal("APPROX_DISTINCT"),
Terminal("("),
NonTerminal("expr"),
Terminal(")"),
Choice(0, Skip(),
Sequence(
Terminal("FILTER"),
Terminal("("),
Terminal("WHERE"),
NonTerminal("condition"),
Terminal(")")
)
),
Choice(0, Skip(),
Sequence(
Terminal("GROUP BY"),
OneOrMore(NonTerminal("feature"), Terminal(","))
)
)
)
)| Parameter | Type | Required | Description |
|---|---|---|---|
expr | T | Yes | The expression to count distinct values of |
condition | BOOLEAN | No | The condition to filter the values before aggregation |
feature | FEATURE | No | The features to group by (many features are supported) |
| Parameter | Type | Required | Description |
|---|---|---|---|
expr | T | Yes | The expression to count distinct values of |
precision | DOUBLE | Yes | Precision parameter for accuracy control |
Notes
- Uses HyperLogLog algorithm for efficient approximate counting
- Much faster than COUNT(DISTINCT) for large datasets
- Provides probabilistic estimate with controllable error rate
- NULL values are excluded from the count
- Precision parameter controls accuracy vs memory tradeoff
- Typical accuracy: within 2-3% of exact count
- Returns 0 for empty groups
- Can be used with WHERE clause to filter before aggregation
- Can be used with GROUP BY clause for grouped aggregation