Fang et al (1999) Computing Iceberg Queries Efficiently (VLDB'98)

Find the elements in a set-with-duplicates for top-$K$ frequencies. Two approaches are proposed: sampling and coarse counting. Sampling is to take $s$ samples from a pool of $N$ and count for the frequencies in $s$. The result is then scaled by $N/s$. Afterwards, report those with scaled frequency larger than... [more]