What are probabilistic data structures?

There are probably a lot of different (and good) answers, but in my humble opinion, the common characteristics of probabilistic data structures is that they provide you with approximate, not precise answer.

How many items are here? About 1523425 with probability of 99%

Update: Quick search produced link to decent article on the issue:

https://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/


If you are interested in probabilistic data structures, you might want to read my recently published book "Probabilistic Data Structures and Algorithms for Big Data Applications" (ISBN: 9783748190486, available at Amazon) where I have explained many of such space-efficient data structures and fast algorithms that are extremely useful in modern Big Data applications.

In this book, you can find the state of the art algorithms and data structures that help to handle such common problems in Big Data processing as

  • Membership querying (Bloom filter, Counting Bloom filter, Quotient filter, Cuckoo filter).
  • Cardinality (Linear counting, probabilistic counting, LogLog, HyperLogLog, HyperLogLog++).
  • Frequency (Majority algorithm, Frequent, Count Sketch, Count-Min Sketch).
  • Rank (Random sampling, q-digest, t-digest).
  • Similarity (LSH, MinHash, SimHash).

You can get a free preview and all related information about the book at https://pdsa.gakhov.com