Category: Classification algorithms

Calibration (statistics)

There are two main uses of the term calibration in statistics that denote special types of statistical inference problems. "Calibration" can mean * a reverse process to regression, where instead of a

Random forest

Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. Fo

Rules extraction system family

The rules extraction system (RULES) family is a family of inductive learning that includes several covering algorithms. This family is used to build a predictive model based on given observation. It w

Types of artificial neural networks

There are many types of artificial neural networks (ANN). Artificial neural networks are computational models inspired by biological neural networks, and are used to approximate functions that are gen

Feature Selection Toolbox

Feature Selection Toolbox (FST) is software primarily for feature selection in the machine learning domain, written in C++, developed at the Institute of Information Theory and Automation (UTIA), of t

Kernel method

In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). The general task of pattern analysis is to find and st

Decision tree learning

Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive m

Latent class model

In statistics, a latent class model (LCM) relates a set of observed (usually discrete) multivariate variables to a set of latent variables. It is a type of latent variable model. It is called a latent

Cascading classifiers

Cascading is a particular case of ensemble learning based on the concatenation of several classifiers, using all information collected from the output from a given classifier as additional information

Multispectral pattern recognition

Multispectral remote sensing is the collection and analysis of reflected, emitted, or back-scattered energy from an object or an area of interest in multiple bands of regions of the electromagnetic sp

Gradient boosting

Gradient boosting is a machine learning technique used in regression and classification tasks, among others. It gives a prediction model in the form of an ensemble of weak prediction models, which are

Normal discriminant analysis

No description available.

Classifier chains

Classifier chains is a machine learning method for problem transformation in multi-label classification. It combines the computational efficiency of the method while still being able to take the label

Information gain ratio

In decision tree learning, Information gain ratio is a ratio of information gain to the intrinsic information. It was proposed by Ross Quinlan, to reduce a bias towards multi-valued attributes by taki

Alternating decision tree

An alternating decision tree (ADTree) is a machine learning method for classification. It generalizes decision trees and has connections to boosting. An ADTree consists of an alternation of decision n

Linear discriminant analysis

Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fie

Jackknife variance estimates for random forest

In statistics, jackknife variance estimates for random forest are a way to estimate the variance in random forest models, in order to eliminate the bootstrap effects.

Nearest centroid classifier

In machine learning, a nearest centroid classifier or nearest prototype classifier is a classification model that assigns to observations the label of the class of training samples whose mean (centroi

Averaged one-dependence estimators

Averaged one-dependence estimators (AODE) is a probabilistic classification learning technique. It was developed to address the attribute-independence problem of the popular naive Bayes classifier. It

Nearest neighbor search

Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically

Textual case-based reasoning

Textual case-based reasoning (TCBR) is a subtopic of case-based reasoning, in short CBR, a popular area in artificial intelligence. CBR suggests the ways to use past experiences to solve future simila

Logic learning machine

Logic learning machine (LLM) is a machine learning method based on the generation of intelligible rules. LLM is an efficient implementation of the Switching Neural Network (SNN) paradigm, developed by

CoBoosting

CoBoost is a semi-supervised training algorithm proposed by Collins and Singer in 1999. The original application for the algorithm was the task of Named Entity Classification using very weak learners.

Winnow (algorithm)

The winnow algorithm is a technique from machine learning for learning a linear classifier from labeled examples. It is very similar to the perceptron algorithm. However, the perceptron algorithm uses

Generalization error

For supervised learning applications in machine learning and statistical learning theory, generalization error (also known as the out-of-sample error or the risk) is a measure of how accurately an alg

LogitBoost

In machine learning and computational learning theory, LogitBoost is a boosting algorithm formulated by Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The original paper casts the AdaBoost alg

Decision boundary

In a statistical-classification problem with two classes, a decision boundary or decision surface is a hypersurface that partitions the underlying vector space into two sets, one for each class. The c

Naive Bayes classifier

In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes

AdaBoost

AdaBoost, short for Adaptive Boosting, is a statistical classification meta-algorithm formulated by Yoav Freund and Robert Schapire in 1995, who won the 2003 Gödel Prize for their work. It can be used

Statistical classification

In statistics, classification is the problem of identifying which of a set of categories (sub-populations) an observation (or observations) belongs to. Examples are assigning a given email to the "spa

Margin-infused relaxed algorithm

Margin-infused relaxed algorithm (MIRA) is a machine learning algorithm, an online algorithm for multiclass classification problems. It is designed to learn a set of parameters (vector or matrix) by p

Conceptual clustering

Conceptual clustering is a machine learning paradigm for unsupervised classification that has been defined by Ryszard S. Michalski in 1980 (Fisher 1987, Michalski 1980) and developed mainly during the

Group method of data handling

Group method of data handling (GMDH) is a family of inductive algorithms for computer-based mathematical modeling of multi-parametric datasets that features fully automatic structural and parametric o

Hyper basis function network

In machine learning, a Hyper basis function network, or HyperBF network, is a generalization of radial basis function (RBF) networks concept, where the Mahalanobis-like distance is used instead of Euc

ALOPEX

ALOPEX (an acronym from "ALgorithms Of Pattern EXtraction") is a correlation based machine learning algorithm first proposed by Tzanakou and Harth in 1974.

Multiclass classification

In machine learning and statistical classification, multiclass classification or multinomial classification is the problem of classifying instances into one of three or more classes (classifying insta

One-class classification

In machine learning, one-class classification (OCC), also known as unary classification or class-modelling, tries to identify objects of a specific class amongst all objects, by primarily learning fro

Syntactic pattern recognition

Syntactic pattern recognition or structural pattern recognition is a form of pattern recognition, in which each object can be represented by a variable-cardinality set of symbolic, nominal features. T

K-nearest neighbors algorithm

In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It i

Gesture Description Language

Gesture Description Language (GDL or GDL Technology) is a method of describing and automatic (computer) syntactic classification of gestures and movements createdby doctor Tomasz Hachaj (PhD) and prof

Soft independent modelling of class analogies

Soft independent modelling by class analogy (SIMCA) is a statistical method for supervised classification of data. The method requires a training data set consisting of samples (or objects) with a set

Large margin nearest neighbor

Large margin nearest neighbor (LMNN) classification is a statistical machine learning algorithm for metric learning. It learns a pseudometric designed for k-nearest neighbor classification. The algori

Variable kernel density estimation

In statistics, adaptive or "variable-bandwidth" kernel density estimation is a form of kernel density estimation in which the size of the kernels used in the estimate are varieddepending upon either t

Linear classifier

In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class (or group) it belongs to. A linear classifier achieves this by ma

Multinomial logistic regression

In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it i

(1+ε)-approximate nearest neighbor search

(1+ε)-approximate nearest neighbor search is a special case of the nearest neighbor search problem. The solution to the (1+ε)-approximate nearest neighbor search is a point or multiple points within d

Support vector machine

In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression

Probit model

In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from probability + un

Evolving classification function

Evolving classification functions (ECF), evolving classifier functions or evolving classifiers are used for classifying and clustering in the field of machine learning and artificial intelligence, typ

C4.5 algorithm

C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classific

Deductive classifier

A deductive classifier is a type of artificial intelligence inference engine. It takes as input a set of declarations in a frame language about a domain such as medical research or molecular biology.

Case-based reasoning

In artificial intelligence and philosophy, case-based reasoning (CBR), broadly construed, is the process of solving new problems based on the solutions of similar past problems. In everyday life, an a

BrownBoost

BrownBoost is a boosting algorithm that may be robust to noisy datasets. BrownBoost is an adaptive version of the algorithm. As is true for all boosting algorithms, BrownBoost is used in conjunction w

Optimal discriminant analysis and classification tree analysis

Optimal Discriminant Analysis (ODA) and the related classification tree analysis (CTA) are exact statistical methods that maximize predictive accuracy. For any specific sample and exploratory or confi

Margin classifier

In machine learning, a margin classifier is a classifier which is able to give an associated distance from the decision boundary for each example. For instance, if a linear classifier (e.g. perceptron

Artificial neural network

Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is

Locality-sensitive hashing

In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same "buckets" with high probability. (The number of buckets is much smaller

Relevance vector machine

In mathematics, a Relevance Vector Machine (RVM) is a machine learning technique that uses Bayesian inference to obtain parsimonious solutions for regression and probabilistic classification.The RVM h

Automated Pain Recognition

Automated Pain Recognition (APR) is a method for objectively measuring pain and at the same time represents an interdisciplinary research area that comprises elements of medicine, psychology, psychobi

Multi-label classification

In machine learning, multi-label classification or multi-output classification is a variant of the classification problem where multiple nonexclusive labels may be assigned to each instance. Multi-lab

Compositional pattern-producing network

Compositional pattern-producing networks (CPPNs) are a variation of artificial neural networks (ANNs) that have an architecture whose evolution is guided by genetic algorithms. While ANNs often contai

Quadratic classifier

In statistics, a quadratic classifier is a statistical classifier that uses a quadratic decision surface to separate measurements of two or more classes of objects or events. It is a more general vers

Information gain (decision tree)

In information theory and machine learning, information gain is a synonym for Kullback–Leibler divergence; the amount of information gained about a random variable or signal from observing another ran

Operational taxonomic unit

An Operational Taxonomic Unit (OTU) is an operational definition used to classify groups of closely related individuals. The term was originally introduced in 1963 by and Robert R. Sokal and Peter H.

Information fuzzy networks

Information fuzzy networks (IFN) is a greedy machine learning algorithm for supervised learning.The data structure produced by the learning algorithm is also called Info Fuzzy Network.IFN construction

Ordinal regression

In statistics, ordinal regression, also called ordinal classification, is a type of regression analysis used for predicting an ordinal variable, i.e. a variable whose value exists on an arbitrary scal

Co-training

Co-training is a machine learning algorithm used when there are only small amounts of labeled data and large amounts of unlabeled data. One of its uses is in text mining for search engines. It was int

ID3 algorithm

In decision tree learning, ID3 (Iterative Dichotomiser 3) is an algorithm invented by Ross Quinlan used to generate a decision tree from a dataset. ID3 is the precursor to the C4.5 algorithm, and is t

Hierarchical classification

Hierarchical classification is a system of grouping things according to a hierarchy. In the field of machine learning, hierarchical classification is sometimes referred to as instance space decomposit

Learning vector quantization

In computer science, learning vector quantization (LVQ) is a prototype-based supervised classification algorithm. LVQ is the supervised counterpart of vector quantization systems.

Multilayer perceptron

A multilayer perceptron (MLP) is a fully connected class of feedforward artificial neural network (ANN). The term MLP is used ambiguously, sometimes loosely to mean any feedforward ANN, sometimes stri

Least-squares support vector machine

Least-squares support-vector machines (LS-SVM) for statistics and in statistical modeling, are least-squares versions of support-vector machines (SVM), which are a set of related supervised learning m

Multifactor dimensionality reduction

Multifactor dimensionality reduction (MDR) is a statistical approach, also used in machine learning automatic approaches, for detecting and characterizing combinations of attributes or independent var

Analogical modeling

Analogical modeling (AM) is a formal theory of exemplar based analogical reasoning, proposed by Royal Skousen, professor of Linguistics and English language at Brigham Young University in Provo, Utah.

IDistance

In pattern recognition, the iDistance is an indexing and query processing technique for k-nearest neighbor queries on point data in multi-dimensional metric spaces. The kNN query is one of the hardest

Boosting (machine learning)

In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, and also variance in supervised learning, and a family of machine learning algorithms that convert weak learner

Radial basis function network

In the field of mathematical modeling, a radial basis function network is an artificial neural network that uses radial basis functions as activation functions. The output of the network is a linear c

Tsetlin machine

A Tsetlin Machine is an Artificial Intelligence algorithm based on propositional logic.

Multiple discriminant analysis

Multiple Discriminant Analysis (MDA) is a multivariate dimensionality reduction technique. It has been used to predict signals as diverse as neural memory traces and corporate failure. MDA is not dire

Elastic matching

Elastic matching is one of the pattern recognition techniques in computer science. Elastic matching (EM) is also known as deformable template, flexible matching, or nonlinear template matching. Elasti

Probabilistic latent semantic analysis

Probabilistic latent semantic analysis (PLSA), also known as probabilistic latent semantic indexing (PLSI, especially in information retrieval circles) is a statistical technique for the analysis of t

Whitening transformation

A whitening transformation or sphering transformation is a linear transformation that transforms a vector of random variables with a known covariance matrix into a set of new variables whose covarianc

Mathematics of artificial neural networks

An artificial neural network (ANN) combines biological principles with advanced statistics to solve problems in domains such as pattern recognition and game-play. ANNs adopt the basic model of neuron

Chi-square automatic interaction detection

Chi-square automatic interaction detection (CHAID) is a decision tree technique based on adjusted significance testing (Bonferroni correction, Holm-Bonferroni testing). The technique was developed in

Perceptron

In machine learning, the perceptron (or McCulloch-Pitts neuron) is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an inpu

Random subspace method

In machine learning the random subspace method, also called attribute bagging or feature bagging, is an ensemble learning method that attempts to reduce the correlation between estimators in an ensemb