Category: Probabilistic data structures

Locality-sensitive hashing
In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same "buckets" with high probability. (The number of buckets is much smaller
Approximate Membership Query Filter
Approximate Membership Query Filter (AMQ-Filter) is a group of space-efficient probabilistic data structures that supports approximate membership queries. An approximate membership query answers if an
Random binary tree
In computer science and probability theory, a random binary tree is a binary tree selected at random from some probability distribution on binary trees. Two different distributions are commonly used:
Count–min sketch
In computing, the count–min sketch (CM sketch) is a probabilistic data structure that serves as a frequency table of events in a stream of data. It uses hash functions to map events to frequencies, bu
MinHash
In computer science and data mining, MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are. The scheme was
Bloom filter
A Bloom filter is a space-efficient probabilistic data structure, conceived by in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false neg
Cuckoo filter
A cuckoo filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set, like a Bloom filter does. False positive matches are possible, but false
Kinetic hanger
A Kinetic hanger is a randomized version of a kinetic heap whose performance is easy to analyze tightly. A kinetic hanger satisfies the heap property (the priority of each element is higher than the p
Skip list
In computer science, a skip list (or skiplist) is a probabilistic data structure that allows average complexity for search as well as average complexity for insertion within an ordered sequence of ele
Kinetic heater
A Kinetic Heater is a kinetic priority queue similar to a kinetic heap, that makes use of randomization to simplify its analysis in a way similar to a treap. Specifically, each element has a random ke
Treap
In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary se
Quotient filter
A quotient filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set (an approximate membership query filter, AMQ). A query will elicit a reply spec
SimHash
In computer science, SimHash is a technique for quickly estimating how similar two sets are. The algorithm is used by the Google Crawler to find near duplicate pages. It was created by Moses Charikar.
Random tree
In mathematics and computer science, a random tree is a tree or arborescence that is formed by a stochastic process. Types of random trees include: * Uniform spanning tree, a spanning tree of a given
HyperLogLog
HyperLogLog is an algorithm for the count-distinct problem, approximating the number of distinct elements in a multiset. Calculating the exact cardinality of the unique elements of a multiset requires
Rapidly-exploring random tree
A rapidly exploring random tree (RRT) is an algorithm designed to efficiently search nonconvex, high-dimensional spaces by randomly building a space-filling tree. The tree is constructed incrementally