- Category theory
- >
- Information geometry
- >
- Statistical distance
- >
- Clustering criteria

- Computational statistics
- >
- Data mining
- >
- Cluster analysis
- >
- Clustering criteria

- Differential geometry
- >
- Information geometry
- >
- Statistical distance
- >
- Clustering criteria

- Information theory
- >
- Information geometry
- >
- Statistical distance
- >
- Clustering criteria

- Metric geometry
- >
- Distance
- >
- Statistical distance
- >
- Clustering criteria

- Multivariate statistics
- >
- Statistical classification
- >
- Cluster analysis
- >
- Clustering criteria

- Similarity measures
- >
- Distance
- >
- Statistical distance
- >
- Clustering criteria

- Statistical analysis
- >
- Descriptive statistics
- >
- Statistical distance
- >
- Clustering criteria

- Statistical data types
- >
- Spatial analysis
- >
- Cluster analysis
- >
- Clustering criteria

- Statistical data types
- >
- Statistical classification
- >
- Cluster analysis
- >
- Clustering criteria

- Subtraction
- >
- Distance
- >
- Statistical distance
- >
- Clustering criteria

- Summary statistics
- >
- Descriptive statistics
- >
- Statistical distance
- >
- Clustering criteria

- Theory of probability distributions
- >
- Information geometry
- >
- Statistical distance
- >
- Clustering criteria

Dunn index

The Dunn index (DI) (introduced by J. C. Dunn in 1974) is a metric for evaluating clustering algorithms. This is part of a group of validity indices including the Davies–Bouldin index or Silhouette in

P4-metric

P4 metric enables performance evaluation of the binary classifier.It is calculated from precision, recall, specificity and NPV (negative predictive value).P4 is designed in similar way to F1 metric, h

Automatic clustering algorithms

Automatic clustering algorithms are algorithms that can perform clustering without prior knowledge of data sets. In contrast with other cluster analysis techniques, automatic clustering algorithms can

MinHash

In computer science and data mining, MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are. The scheme was

Variation of information

In probability theory and information theory, the variation of information or shared information distance is a measure of the distance between two clusterings (partitions of elements). It is closely r

Silhouette (clustering)

Silhouette refers to a method of interpretation and validation of consistency within clusters of data. The technique provides a succinct graphical representation of how well each object has been class

Determining the number of clusters in a data set

Determining the number of clusters in a data set, a quantity often labelled k as in the k-means algorithm, is a frequent problem in data clustering, and is a distinct issue from the process of actuall

SimHash

In computer science, SimHash is a technique for quickly estimating how similar two sets are. The algorithm is used by the Google Crawler to find near duplicate pages. It was created by Moses Charikar.

Elbow method (clustering)

In cluster analysis, the elbow method is a heuristic used in determining the number of clusters in a data set. The method consists of plotting the explained variation as a function of the number of cl

Jaccard index

The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets. It was developed by Grove Karl Gilbert in 1884 as his

Adjusted mutual information

In probability theory and information theory, adjusted mutual information, a variation of mutual information may be used for comparing clusterings. It corrects the effect of agreement solely due to ch

Similarity measure

In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single defi

Fowlkes–Mallows index

The Fowlkes–Mallows index is an external evaluation method that is used to determine the similarity between two clusterings (clusters obtained after a clustering algorithm), and also a metric to measu

Dasgupta's objective

In the study of hierarchical clustering, Dasgupta's objective is a measure of the quality of a clustering, defined from a similarity measure on the elements to be clustered. It is named after Sanjoy D

Simple matching coefficient

The simple matching coefficient (SMC) or Rand similarity coefficient is a statistic used for comparing the similarity and diversity of sample sets. Given two objects, A and B, each with n binary attri

Rand index

The Rand index or Rand measure (named after William M. Rand) in statistics, and in particular in data clustering, is a measure of the similarity between two data clusterings. A form of the Rand index

F-score

In statistical analysis of binary classification, the F-score or F-measure is a measure of a test's accuracy. It is calculated from the precision and recall of the test, where the precision is the num

Davies–Bouldin index

The Davies–Bouldin index (DBI), introduced by David L. Davies and Donald W. Bouldin in 1979, is a metric for evaluating clustering algorithms. This is an internal evaluation scheme, where the validati

Balanced clustering

Balanced clustering is a special case of clustering where, in the strictest sense, cluster sizes are constrained to or , where is the number of points and is the number of clusters. A typical algorith

Hopkins statistic

The Hopkins statistic (introduced by Brian Hopkins and John Gordon Skellam) is a way of measuring the cluster tendency of a data set. It belongs to the family of sparse sampling tests. It acts as a st

© 2023 Useful Links.