Category: Probabilistic data structures

Locality-sensitive hashing

In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same "buckets" with high probability. (The number of buckets is much smaller

Approximate Membership Query Filter

Approximate Membership Query Filter (AMQ-Filter) is a group of space-efficient probabilistic data structures that supports approximate membership queries. An approximate membership query answers if an

Random binary tree

In computer science and probability theory, a random binary tree is a binary tree selected at random from some probability distribution on binary trees. Two different distributions are commonly used:

Count–min sketch

In computing, the count–min sketch (CM sketch) is a probabilistic data structure that serves as a frequency table of events in a stream of data. It uses hash functions to map events to frequencies, bu

MinHash

In computer science and data mining, MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are. The scheme was

Bloom filter

A Bloom filter is a space-efficient probabilistic data structure, conceived by in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false neg

Cuckoo filter

A cuckoo filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set, like a Bloom filter does. False positive matches are possible, but false

Kinetic hanger

A Kinetic hanger is a randomized version of a kinetic heap whose performance is easy to analyze tightly. A kinetic hanger satisfies the heap property (the priority of each element is higher than the p

Skip list

In computer science, a skip list (or skiplist) is a probabilistic data structure that allows average complexity for search as well as average complexity for insertion within an ordered sequence of ele

Kinetic heater

A Kinetic Heater is a kinetic priority queue similar to a kinetic heap, that makes use of randomization to simplify its analysis in a way similar to a treap. Specifically, each element has a random ke

Treap

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary se

Quotient filter

A quotient filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set (an approximate membership query filter, AMQ). A query will elicit a reply spec

SimHash

In computer science, SimHash is a technique for quickly estimating how similar two sets are. The algorithm is used by the Google Crawler to find near duplicate pages. It was created by Moses Charikar.

Random tree

In mathematics and computer science, a random tree is a tree or arborescence that is formed by a stochastic process. Types of random trees include: * Uniform spanning tree, a spanning tree of a given

HyperLogLog

HyperLogLog is an algorithm for the count-distinct problem, approximating the number of distinct elements in a multiset. Calculating the exact cardinality of the unique elements of a multiset requires

Rapidly-exploring random tree

A rapidly exploring random tree (RRT) is an algorithm designed to efficiently search nonconvex, high-dimensional spaces by randomly building a space-filling tree. The tree is constructed incrementally