Category: Data mining

Formal concept analysis
In information science, formal concept analysis (FCA) is a principled way of deriving a concept hierarchy or formal ontology from a collection of objects and their properties. Each concept in the hier
Association rule learning
Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in data
Archetypal analysis
Archetypal analysis in statistics is an unsupervised learning method similar to cluster analysis and introduced by Adele Cutler and Leo Breiman in 1994. Rather than "typical" observations (cluster cen
International Journal of Data Warehousing and Mining
The International Journal of Data Warehousing and Mining (IJDWM) is a quarterly peer-reviewed academic journal covering data warehousing and data mining. It was established in 2005 and is published by
Instance selection
Instance selection (or dataset reduction, or dataset condensation) is an important data pre-processing step that can be applied in many machine learning (or data mining) tasks. Approaches for instance
Automatic clustering algorithms
Automatic clustering algorithms are algorithms that can perform clustering without prior knowledge of data sets. In contrast with other cluster analysis techniques, automatic clustering algorithms can
Cyborg data mining
Cyborg data mining is the practice of collecting data produced by an implantable device that monitors bodily processes for commercial interests. As an android is a human-like robot, a cyborg, on the o
Astrostatistics is a discipline which spans astrophysics, statistical analysis and data mining. It is used to process the vast amount of data produced by of the cosmos, to characterize complex dataset
Data stream mining
Data Stream Mining (also known as stream learning) is the process of extracting knowledge structures from continuous, rapid data records. A data stream is an ordered sequence of instances that in many
Latent space
A latent space, also known as a latent feature space or embedding space, is an embedding of a set of items within a manifold in which items resembling each other are positioned closer to one another i
Receiver operating characteristic
A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The method
Biomedical text mining
Biomedical text mining (including biomedical natural language processing or BioNLP) refers to the methods and study of how text mining may be applied to texts and literature of the biomedical and mole
Multiple kernel learning
Multiple kernel learning refers to a set of machine learning methods that use a predefined set of kernels and learn an optimal linear or non-linear combination of kernels as part of the algorithm. Rea
Data Mining and Knowledge Discovery
Data Mining and Knowledge Discovery is a bimonthly peer-reviewed scientific journal focusing on data mining published by Springer Science+Business Media. It was started in 1996 and launched in 1997 by
Frequent pattern discovery
Frequent pattern discovery (or FP discovery, FP mining, or Frequent itemset mining) is part of knowledge discovery in databases, Massive Online Analysis, and data mining; it describes the task of find
Cluster analysis
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in
K-optimal pattern discovery
K-optimal pattern discovery is a data mining technique that provides an alternative to the frequent pattern discovery approach that underlies most association rule learning techniques. Frequent patter
Molecule mining
This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represe
Concept drift
In predictive analytics and machine learning, concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. Thi
Elastic map
Elastic maps provide a tool for nonlinear dimensionality reduction. By their construction, they are a system of elastic springs embedded in the dataspace. This system approximates a low-dimensional ma
PatientsLikeMe is the world’s largest integrated community, health management, and real-world data platform. Through PatientsLikeMe, a growing community of more than 830,000 people with over 2,900 con
Total operating characteristic
The total operating characteristic (TOC) is a statistical method to compare a Boolean variable versus a rank variable. TOC can measure the ability of an index variable to diagnose either presence or a
Lift (data mining)
In data mining and association rule learning, lift is a measure of the performance of a targeting model (association rule) at predicting or classifying cases as having an enhanced response (with respe
Evolutionary data mining
Evolutionary data mining, or genetic data mining is an umbrella term for any data mining using evolutionary algorithms. While it can be used for mining data from DNA sequences, it is not limited to bi
Anomaly detection
In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations wh
Profiling (information science)
In information science, profiling refers to the process of construction and application of user profiles generated by computerized data analysis. This is the use of algorithms or other mathematical te
Rexer's Annual Data Miner Survey
Rexer Analytics’s Annual Data Miner Survey is the largest survey of data mining, data science, and analytics professionals in the industry. It consists of approximately 50 multiple choice and open-end
Action model learning
Action model learning (sometimes abbreviated action learning) is an area of machine learning concerned with creation and modification of software agent's knowledge about effects and preconditions of t
Software mining
Software mining is an application of knowledge discovery in the area of software modernization which involves understanding existing software artifacts. This process is related to a concept of reverse
Wiener connector
In network theory, the Wiener connector is a means of maximizing efficiency in connecting specified "query vertices" in a network. Given a connected, undirected graph and a set of query vertices in a
Affinity analysis
Affinity analysis falls under the umbrella term of data mining which uncovers meaningful correlations between different entities according to their co-occurrence in a data set. In almost all systems a
Spatial embedding
Spatial embedding is one of feature learning techniques used in spatial analysis where points, lines, polygons or other spatial data types. representing geographic locations are mapped to vectors of r
Feature (machine learning)
In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating and independent features is a crucia
Social media mining
Social media mining is the process of obtaining big data from user-generated content on social media sites and mobile apps in order to extract actionable patterns, form conclusions about users, and ac
Bibliomining is the use of a combination of data mining, data warehousing, and bibliometrics for the purpose of analyzing library services. The term was created in 2003 by Scott Nicholson, Assistant P
Data dredging
Data dredging (also known as data snooping or p-hacking) is the misuse of data analysis to find patterns in data that can be presented as statistically significant, thus dramatically increasing and un
Multifactor dimensionality reduction
Multifactor dimensionality reduction (MDR) is a statistical approach, also used in machine learning automatic approaches, for detecting and characterizing combinations of attributes or independent var
Technology mining
Tech mining or technology mining refers to applying text mining methods to technical documents. For patent analysis purposes, it is named ‘patent mining’. Porter, as one of the pioneers in technology
Uncertain data
In computer science, uncertain data is data that contains noise that makes it deviate from the correct, intended or original values. In the age of big data, uncertainty or data veracity is one of the
Sequential pattern mining
Sequential pattern mining is a topic of data mining concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence. It is usually presumed th
Structure mining
Structure mining or structured data mining is the process of finding and extracting useful information from semi-structured data sets. Graph mining, sequential pattern mining and molecule mining are s
ROUGE (metric)
ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language
Web intelligence
Web intelligence is the area of scientific research and development that explores the roles and makes use of artificial intelligence and information technology for new products, services and framework
Outline of machine learning
The following outline is provided as an overview of and topical guide to machine learning. Machine learning is a subfield of soft computing within computer science that evolved from the study of patte
Document classification
Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. Thi
Documenting Hate
Documenting Hate is a project of ProPublica, in collaboration with a number of journalistic, academic, and computing organizations, for systematic tracking of hate crimes and bias incidents. It uses a
Co-occurrence network
Co-occurrence network, sometimes referred to as a semantic network, is a method to analyze text that includes a graphic visualization of potential relationships between people, organizations, concepts
Local outlier factor
In anomaly detection, the local outlier factor (LOF) is an algorithm proposed by Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng and Jörg Sander in 2000 for finding anomalous data points by measu
Optimal matching
Optimal matching is a sequence analysis method used in social science, to assess the dissimilarity of ordered arrays of tokens that usually represent a time-ordered sequence of socio-economic states t
Contrast set learning
Contrast set learning is a form of association rule learning that seeks to identify meaningful differences between separate groups by reverse-engineering the key predictors that identify for each part
AMiner (database)
AMiner (formerly ArnetMiner) is a free online service used to index, search, and mine big scientific data.
Novelty detection
Novelty detection is the mechanism by which an intelligent organism is able to identify an incoming sensory pattern as being hitherto unknown. If the pattern is sufficiently salient or associated with
Nearest neighbor search
Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically
Adamic–Adar index
The Adamic–Adar index is a measure introduced in 2003 by Lada Adamic and to predict links in a social network, according to the amount of shared links between two nodes. It is defined as the sum of th
Concept mining
Concept mining is an activity that results in the extraction of concepts from artifacts. Solutions to the task typically involve aspects of artificial intelligence and statistics, such as data mining
Discovery system (AI research)
A discovery system is an artificial intelligence system that attempts to discover new scientific concepts or laws. The aim of discovery systems is to automate scientific data analysis and the scientif
Social profiling
Social profiling is the process of constructing a social media user's profile using his or her social data. In general, profiling refers to the data science process of generating a person's profile wi
Wrapper (data mining)
Wrapper in data mining is a procedure that extracts regular subcontent of an unstructured or loosely-structured information source and translates it into a relational form, so it can be processed as s
Automatic summarization
Automatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original conten
Agent mining
Agent mining is an interdisciplinary area that synergizes multiagent systems with data mining and machine learning. The interaction and integration between multiagent systems and data mining have a lo
Special Interest Group on Knowledge Discovery and Data Mining
SIGKDD, representing the Association for Computing Machinery's (ACM) Special Interest Group (SIG) on Knowledge Discovery and Data Mining, hosts an influential annual conference.
Data mining
Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an inte
Weighted correlation network analysis
Weighted correlation network analysis, also known as weighted gene co-expression network analysis (WGCNA), is a widely used data mining method especially for studying biological networks based on pair
Domain driven data mining
Domain driven data mining is a data mining methodology for discovering actionable knowledge and deliver actionable insights from complex data and behaviors in a complex environment. It studies the cor
Argument mining
Argument mining, or argumentation mining, is a research area within the natural-language processing field. The goal of argument mining is the automatic extraction and identification of argumentative s