count min sketch
TIL about Count-Min Sketch, an algorithm used to answer questions like:
- What’s the frequency of our samples?
- What’s our most frequent samples?
Similar to bloom filters, it uses
k distinct hash functions. Every observed
value serves as an input for these functions and the output for each function
is a number corresponding to a bucket.After finding the buckets we just need
to increment a counter in each of these buckets.
In order to retrieve the frequency of an observed value we find the buckets for
this value and return the minimum counter among all the buckets:
min(bucket_x, bucket_y, bucket_z)
A short version the paper can be found here.