Category: Statistical distance

Normalized compression distance

Normalized compression distance (NCD) is a way of measuring the similarity between two objects, be it two documents, two letters, two emails, two music scores, two languages, two programs, two picture

Stein's method

Stein's method is a general method in probability theory to obtain bounds on the distance between two probability distributions with respect to a probability metric. It was introduced by Charles Stein

Cramér–von Mises criterion

In statistics the Cramér–von Mises criterion is a criterion used for judging the goodness of fit of a cumulative distribution function compared to a given empirical distribution function , or for comp

Hellinger distance

In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions

Normalized Google distance

The Normalized Google Distance (NGD) is a semantic similarity measure derived from the number of hits returned by the Google search engine for a given set of keywords. Keywords with the same or simila

Earth mover's distance

In statistics, the earth mover's distance (EMD) is a measure of the distance between two probability distributions over a region D. In mathematics, this is known as the Wasserstein metric. Informally,

Kolmogorov–Smirnov test

In statistics, the Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous (or discontinuous, see ), one-dimensional probability distributions that can be u

Jensen–Shannon divergence

In probability theory and statistics, the Jensen–Shannon divergence is a method of measuring the similarity between two probability distributions. It is also known as information radius (IRad) or tota

Process Window Index

Process Window Index (PWI) is a statistical measure that quantifies the robustness of a manufacturing process, e.g. one which involves heating and cooling, known as a thermal process. In manufacturing

DiShIn

As described in and DiShIn (Disjunctive Shared Information) is method to calculate that shared information content by complementing the value of most informative common ancestor (MICA) with their disj

Canonical divergence

No description available.

Minimum-distance estimation

Minimum-distance estimation (MDE) is a conceptual method for fitting a statistical model to data, usually the empirical distribution. Often-used estimators such as ordinary least squares can be though

Semantic similarity

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicogra

Chentsov's theorem

In information geometry, Chentsov's theorem states that the Fisher information metric is, up to rescaling, the unique Riemannian metric on a statistical manifold that is invariant under sufficient sta

Bregman divergence

In mathematics, specifically statistics and information geometry, a Bregman divergence or Bregman distance is a measure of difference between two points, defined in terms of a strictly convex function

Cook's distance

In statistics, Cook's distance or Cook's D is a commonly used estimate of the influence of a data point when performing a least-squares regression analysis. In a practical ordinary least squares analy

Deviation (statistics)

In mathematics and statistics, deviation is a measure of difference between the observed value of a variable and some other value, often that variable's mean. The sign of the deviation reports the dir

Similarity measure

In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single defi

Fisher information metric

In information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, i.e., a smooth manifold whose points are probability mea

Bhattacharyya distance

In statistics, the Bhattacharyya distance measures the similarity of two probability distributions. It is closely related to the Bhattacharyya coefficient which is a measure of the amount of overlap b

Optimal matching

Optimal matching is a sequence analysis method used in social science, to assess the dissimilarity of ordered arrays of tokens that usually represent a time-ordered sequence of socio-economic states t

Discrepancy (statistics)

No description available.

Pitman closeness criterion

In statistical theory, the Pitman closeness criterion, named after E. J. G. Pitman, is a way of comparing two candidate estimators for the same parameter. Under this criterion, estimator A is preferre

Mahalanobis distance

The Mahalanobis distance is a measure of the distance between a point P and a distribution D, introduced by P. C. Mahalanobis in 1936. Mahalanobis's definition was prompted by the problem of identifyi

Wasserstein metric

In mathematics, the Wasserstein distance or Kantorovich–Rubinstein metric is a distance function defined between probability distributions on a given metric space . It is named after Leonid Vaseršteĭn

Information distance

Information distance is the distance between two finite objects (represented as computer files) expressed as the number of bits in the shortest program which transforms one object into the other one o

Signal-to-noise statistic

In mathematics the signal-to-noise statistic distance between two vectors a and b with mean values and and standard deviation and respectively is: In the case of Gaussian-distributed data and unbiased

Circular error probable

In the military science of ballistics, circular error probable (CEP) (also circular error probability or circle of equal probability) is a measure of a weapon system's precision. It is defined as the

Distance correlation

In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vectors of arbitrary, not necessarily equal, dimension. The po

Stein discrepancy

A Stein discrepancy is a statistical divergence between two probability measures that is rooted in Stein's method. It was first formulated as a tool to assess the quality of Markov chain Monte Carlo s

Second-order co-occurrence pointwise mutual information

In computational linguistics, second-order co-occurrence pointwise mutual information is a semantic similarity measure. To assess the degree of association between two given words, it uses pointwise m

Squared Euclidean distance

No description available.

Divergence (statistics)

In information geometry, a divergence is a kind of statistical distance: a binary function which establishes the separation from one probability distribution to another on a statistical manifold. The

Bhattacharyya angle

In statistics, Bhattacharyya angle, also called statistical angle, is a measure of distance between two probability measures defined on a finite probability space. It is defined as where pi, qi are th

Statistical distance

In statistics, probability theory, and information theory, a statistical distance quantifies the distance between two statistical objects, which can be two random variables, or two probability distrib

Kendall tau distance

The Kendall tau rank distance is a metric (distance function) that counts the number of pairwise disagreements between two ranking lists. The larger the distance, the more dissimilar the two lists are

Energy distance

Energy distance is a statistical distance between probability distributions. If X and Y are independent random vectors in Rd with cumulative distribution functions (cdf) F and G respectively, then the