Category: Data mining

Formal concept analysis

In information science, formal concept analysis (FCA) is a principled way of deriving a concept hierarchy or formal ontology from a collection of objects and their properties. Each concept in the hier

Association rule learning

Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in data

Archetypal analysis

Archetypal analysis in statistics is an unsupervised learning method similar to cluster analysis and introduced by Adele Cutler and Leo Breiman in 1994. Rather than "typical" observations (cluster cen

International Journal of Data Warehousing and Mining

The International Journal of Data Warehousing and Mining (IJDWM) is a quarterly peer-reviewed academic journal covering data warehousing and data mining. It was established in 2005 and is published by

Instance selection

Instance selection (or dataset reduction, or dataset condensation) is an important data pre-processing step that can be applied in many machine learning (or data mining) tasks. Approaches for instance

Automatic clustering algorithms

Automatic clustering algorithms are algorithms that can perform clustering without prior knowledge of data sets. In contrast with other cluster analysis techniques, automatic clustering algorithms can

Cyborg data mining

Cyborg data mining is the practice of collecting data produced by an implantable device that monitors bodily processes for commercial interests. As an android is a human-like robot, a cyborg, on the o

Astrostatistics

Astrostatistics is a discipline which spans astrophysics, statistical analysis and data mining. It is used to process the vast amount of data produced by of the cosmos, to characterize complex dataset

Data stream mining

Data Stream Mining (also known as stream learning) is the process of extracting knowledge structures from continuous, rapid data records. A data stream is an ordered sequence of instances that in many

Latent space

A latent space, also known as a latent feature space or embedding space, is an embedding of a set of items within a manifold in which items resembling each other are positioned closer to one another i

Receiver operating characteristic

A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The method

Biomedical text mining

Biomedical text mining (including biomedical natural language processing or BioNLP) refers to the methods and study of how text mining may be applied to texts and literature of the biomedical and mole

Multiple kernel learning

Multiple kernel learning refers to a set of machine learning methods that use a predefined set of kernels and learn an optimal linear or non-linear combination of kernels as part of the algorithm. Rea

Data Mining and Knowledge Discovery

Data Mining and Knowledge Discovery is a bimonthly peer-reviewed scientific journal focusing on data mining published by Springer Science+Business Media. It was started in 1996 and launched in 1997 by

Frequent pattern discovery

Frequent pattern discovery (or FP discovery, FP mining, or Frequent itemset mining) is part of knowledge discovery in databases, Massive Online Analysis, and data mining; it describes the task of find

Cluster analysis

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in

K-optimal pattern discovery

K-optimal pattern discovery is a data mining technique that provides an alternative to the frequent pattern discovery approach that underlies most association rule learning techniques. Frequent patter

Molecule mining

This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represe

Concept drift

In predictive analytics and machine learning, concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. Thi

Elastic map

Elastic maps provide a tool for nonlinear dimensionality reduction. By their construction, they are a system of elastic springs embedded in the dataspace. This system approximates a low-dimensional ma

PatientsLikeMe

PatientsLikeMe is the world’s largest integrated community, health management, and real-world data platform. Through PatientsLikeMe, a growing community of more than 830,000 people with over 2,900 con

Total operating characteristic

The total operating characteristic (TOC) is a statistical method to compare a Boolean variable versus a rank variable. TOC can measure the ability of an index variable to diagnose either presence or a

Lift (data mining)

In data mining and association rule learning, lift is a measure of the performance of a targeting model (association rule) at predicting or classifying cases as having an enhanced response (with respe

Evolutionary data mining

Evolutionary data mining, or genetic data mining is an umbrella term for any data mining using evolutionary algorithms. While it can be used for mining data from DNA sequences, it is not limited to bi

Anomaly detection

In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations wh

Profiling (information science)

In information science, profiling refers to the process of construction and application of user profiles generated by computerized data analysis. This is the use of algorithms or other mathematical te

Rexer's Annual Data Miner Survey

Rexer Analytics’s Annual Data Miner Survey is the largest survey of data mining, data science, and analytics professionals in the industry. It consists of approximately 50 multiple choice and open-end

Action model learning

Action model learning (sometimes abbreviated action learning) is an area of machine learning concerned with creation and modification of software agent's knowledge about effects and preconditions of t

Software mining

Software mining is an application of knowledge discovery in the area of software modernization which involves understanding existing software artifacts. This process is related to a concept of reverse

Wiener connector

In network theory, the Wiener connector is a means of maximizing efficiency in connecting specified "query vertices" in a network. Given a connected, undirected graph and a set of query vertices in a

Affinity analysis

Affinity analysis falls under the umbrella term of data mining which uncovers meaningful correlations between different entities according to their co-occurrence in a data set. In almost all systems a

Spatial embedding

Spatial embedding is one of feature learning techniques used in spatial analysis where points, lines, polygons or other spatial data types. representing geographic locations are mapped to vectors of r

Feature (machine learning)

In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating and independent features is a crucia

Social media mining

Social media mining is the process of obtaining big data from user-generated content on social media sites and mobile apps in order to extract actionable patterns, form conclusions about users, and ac

Bibliomining

Bibliomining is the use of a combination of data mining, data warehousing, and bibliometrics for the purpose of analyzing library services. The term was created in 2003 by Scott Nicholson, Assistant P

Data dredging

Data dredging (also known as data snooping or p-hacking) is the misuse of data analysis to find patterns in data that can be presented as statistically significant, thus dramatically increasing and un

Multifactor dimensionality reduction

Multifactor dimensionality reduction (MDR) is a statistical approach, also used in machine learning automatic approaches, for detecting and characterizing combinations of attributes or independent var

Technology mining

Tech mining or technology mining refers to applying text mining methods to technical documents. For patent analysis purposes, it is named ‘patent mining’. Porter, as one of the pioneers in technology

Uncertain data

In computer science, uncertain data is data that contains noise that makes it deviate from the correct, intended or original values. In the age of big data, uncertainty or data veracity is one of the

Sequential pattern mining

Sequential pattern mining is a topic of data mining concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence. It is usually presumed th

Structure mining

Structure mining or structured data mining is the process of finding and extracting useful information from semi-structured data sets. Graph mining, sequential pattern mining and molecule mining are s

ROUGE (metric)

ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language

Web intelligence

Web intelligence is the area of scientific research and development that explores the roles and makes use of artificial intelligence and information technology for new products, services and framework

Outline of machine learning

The following outline is provided as an overview of and topical guide to machine learning. Machine learning is a subfield of soft computing within computer science that evolved from the study of patte

Document classification

Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. Thi

Documenting Hate

Documenting Hate is a project of ProPublica, in collaboration with a number of journalistic, academic, and computing organizations, for systematic tracking of hate crimes and bias incidents. It uses a

Co-occurrence network

Co-occurrence network, sometimes referred to as a semantic network, is a method to analyze text that includes a graphic visualization of potential relationships between people, organizations, concepts

Local outlier factor

In anomaly detection, the local outlier factor (LOF) is an algorithm proposed by Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng and Jörg Sander in 2000 for finding anomalous data points by measu

Optimal matching

Optimal matching is a sequence analysis method used in social science, to assess the dissimilarity of ordered arrays of tokens that usually represent a time-ordered sequence of socio-economic states t

Contrast set learning

Contrast set learning is a form of association rule learning that seeks to identify meaningful differences between separate groups by reverse-engineering the key predictors that identify for each part

AMiner (database)

AMiner (formerly ArnetMiner) is a free online service used to index, search, and mine big scientific data.

Novelty detection

Novelty detection is the mechanism by which an intelligent organism is able to identify an incoming sensory pattern as being hitherto unknown. If the pattern is sufficiently salient or associated with

Nearest neighbor search

Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically

Adamic–Adar index

The Adamic–Adar index is a measure introduced in 2003 by Lada Adamic and to predict links in a social network, according to the amount of shared links between two nodes. It is defined as the sum of th

Concept mining

Concept mining is an activity that results in the extraction of concepts from artifacts. Solutions to the task typically involve aspects of artificial intelligence and statistics, such as data mining

Discovery system (AI research)

A discovery system is an artificial intelligence system that attempts to discover new scientific concepts or laws. The aim of discovery systems is to automate scientific data analysis and the scientif

Social profiling

Social profiling is the process of constructing a social media user's profile using his or her social data. In general, profiling refers to the data science process of generating a person's profile wi

Wrapper (data mining)

Wrapper in data mining is a procedure that extracts regular subcontent of an unstructured or loosely-structured information source and translates it into a relational form, so it can be processed as s

Automatic summarization

Automatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original conten

Agent mining

Agent mining is an interdisciplinary area that synergizes multiagent systems with data mining and machine learning. The interaction and integration between multiagent systems and data mining have a lo

Special Interest Group on Knowledge Discovery and Data Mining

SIGKDD, representing the Association for Computing Machinery's (ACM) Special Interest Group (SIG) on Knowledge Discovery and Data Mining, hosts an influential annual conference.

Data mining

Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an inte

Weighted correlation network analysis

Weighted correlation network analysis, also known as weighted gene co-expression network analysis (WGCNA), is a widely used data mining method especially for studying biological networks based on pair

Domain driven data mining

Domain driven data mining is a data mining methodology for discovering actionable knowledge and deliver actionable insights from complex data and behaviors in a complex environment. It studies the cor

Argument mining

Argument mining, or argumentation mining, is a research area within the natural-language processing field. The goal of argument mining is the automatic extraction and identification of argumentative s