APPROX_DISTINCT() GROUP BY ...

All functions > GROUP BY > APPROX_DISTINCT() GROUP BY ...

Returns the approximate number of distinct values in the group.

Syntax

Diagram(
  Sequence(
    Terminal("APPROX_DISTINCT"),
    Terminal("("),
    NonTerminal("expr"),
    Terminal(")"),
    Choice(0, Skip(),
      Sequence(
        Terminal("FILTER"),
        Terminal("("),
        Terminal("WHERE"),
        NonTerminal("condition"),
        Terminal(")")
      )
    ),
    Choice(0, Skip(),
      Sequence(
        Terminal("GROUP BY"),
        OneOrMore(NonTerminal("feature"), Terminal(","))
      )
    )
  )
)

Parameter	Type	Required	Description
`expr`	`T`	Yes	The expression to count distinct values of
`condition`	`BOOLEAN`	No	The condition to filter the values before aggregation
`feature`	`FEATURE`	No	The features to group by (many features are supported)

Parameter	Type	Required	Description
`expr`	`T`	Yes	The expression to count distinct values of
`precision`	`DOUBLE`	Yes	Precision parameter for accuracy control

Notes

Uses HyperLogLog algorithm for efficient approximate counting
Much faster than COUNT(DISTINCT) for large datasets
Provides probabilistic estimate with controllable error rate
NULL values are excluded from the count
Precision parameter controls accuracy vs memory tradeoff
Typical accuracy: within 2-3% of exact count
Returns 0 for empty groups
Can be used with WHERE clause to filter before aggregation
Can be used with GROUP BY clause for grouped aggregation

Suggest changes to this page

Last update at: 2026/03/03 16:47:38

On this page

Syntax Notes