Fang et al (1999) Computing Iceberg Queries Efficiently (VLDB'98)

Find the elements in a set-with-duplicates for top-\(K\) frequencies. Two approaches are proposed: sampling and coarse counting. Sampling is to take \(s\) samples from a pool of \(N\) and count for the frequencies in \(s\). The result is then scaled by \(N/s\). Afterwards, report those with scaled frequency larger than... [more]