Windowed aggregates are a SQL 2003 feature for computing aggregates in moving windows. Common examples include cumulative sums, local maxima and moving quantiles. With the advent over the last few years of easy-to-use data analytics tools, these functions are becoming widely used by more and more analysts, but some aggregates (such as local maxima) are much easier to compute than others (such as moving quantiles). Nevertheless, aggregates that are more difficult to compute, like quantile and mode (or “most frequent”) provide more appropriate statistical summaries in the common situation when a distribution is not Gaussian and are an essential part of a data analysis toolkit.
Recent work has described highly efficient windowed implementations of the most common aggregate function categories, including distributive aggregates such as cumulative sums and algebraic aggregates such as moving averages. But little has been published on either the implementation or the performance of the more complex holistic windowed aggregates such as moving quantiles.
This paper provides the first in-depth study of how to efficiently implement the three most common holistic windowed aggregates (count distinct, mode and quantile) by reusing the aggregate state between consecutive frames. Our measurements show that these incremental algorithms generally achieve improvements of about 10× over naïve implementations, and that they can effectively detect when to reset the internal state during extreme frame variation.