Category: Data mining and machine learning software

SenseTime (Chinese: 商汤科技) is a Hong Kong-headquartered artificial intelligence company with offices in China, Indonesia, Japan, South Korea, Macau, Malaysia, the Philippines, Saudi Arabia, Singapore,
Feature Selection Toolbox
Feature Selection Toolbox (FST) is software primarily for feature selection in the machine learning domain, written in C++, developed at the Institute of Information Theory and Automation (UTIA), of t
PolyAnalyst is a data science software platform developed by Megaputer Intelligence that provides an environment for text mining, data mining, machine learning, and predictive analytics. It is used by
Gremlin (query language)
Gremlin is a graph traversal language and virtual machine developed by Apache TinkerPop of the Apache Software Foundation. Gremlin works for both OLTP-based graph databases as well as OLAP-based graph
Shogun (toolbox)
Shogun is a free, open-source machine learning software library written in C++. It offers numerous algorithms and data structures for machine learning problems. It offers interfaces for Octave, Python
Apache Spark
Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Origin
Anne O'Tate
Anne O'Tate is a free, web-based application that analyses sets of records identified on PubMed, the bibliographic database of articles from over 5,500 biomedical journals worldwide. While PubMed has
Jubatus is an open-source online machine learning and distributed computing framework developed at Nippon Telegraph and Telephone and . Its features include classification, recommendation, regression,
Sketch Engine
Sketch Engine is a corpus manager and text analysis software developed by Lexical Computing CZ s.r.o. since 2003. Its purpose is to enable people studying language behaviour (lexicographers, researche
Apache Giraph
Apache Giraph is an Apache project to perform graph processing on big data. Giraph utilizes Apache Hadoop's MapReduce implementation to process graphs. Facebook used Giraph with some performance impro
Apache Mahout
Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily on linear algebra. In th
CellCognition is a free open-source computational framework for quantitative analysis of high-throughput fluorescence microscopy (time-lapse) images in the field of bioimage informatics and systems mi
General Architecture for Text Engineering
General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientist
SPSS Modeler
IBM SPSS Modeler is a data mining and text analytics software application from IBM. It is used to build predictive models and conduct other analytic tasks. It has a visual interface which allows users
Vowpal Wabbit
Vowpal Wabbit (VW) is an open-source fast online interactive machine learning system library and program developed originally at Yahoo! Research, and currently at Microsoft Research. It was started an
Wolfram Mathematica
Wolfram Mathematica is a software system with built-in libraries for several areas of technical computing that allow machine learning, statistics, symbolic computation, data manipulation, network anal
NetOwl is a suite of multilingual text and identity analytics products that analyze big data in the form of text data – reports, web, social media, etc. – as well as structured entity data about peopl
Dlib is a general purpose cross-platform software library written in the programming language C++. Its design is heavily influenced by ideas from design by contract and component-based software engine
VIGRA is the abbreviation for "Vision with Generic Algorithms". It is a free open-source computer vision library which focuses on customizable algorithms and data structures. VIGRA component can be ea
Scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for the Python programming language.It features various classification, regression and clust
LIBSVM and LIBLINEAR are two popular open source machine learning libraries, both developed at the National Taiwan University and both written in C++ though with a C API. LIBSVM implements the Sequent
Linguamatics, headquartered in Cambridge, England, with offices in the United States and UK, is a provider of text mining systems through software licensing and services, primarily for pharmaceutical
Turi is a graph-based, high performance, distributed computation framework written in C++. The GraphLab project was started by Prof. Carlos Guestrin of Carnegie Mellon University in 2009. It is an ope
Megvii (Chinese: 旷视; pinyin: Kuàngshì) is a Chinese technology company that designs image recognition and deep-learning software. Based in Beijing, the company develops artificial intelligence (AI) te
Automated Artificial Intelligence (AutoAI) is a variation of the automated machine learning, or AutoML, technology, which extends the automation of model building towards automation of the full life c
L-1 Identity Solutions
L-1 Identity Solutions, Inc. is a large American defense contractor in Connecticut. It was formed on August 29, 2006, from a merger of Viisage Technology, Inc. and Identix Incorporated. It specializes
Picollator is an Internet search engine that performs searches for web sites and multimedia by visual query (image) or text, or a combination of visual query and text. Picollator recognizes objects in
Yooreeka is a library for data mining, machine learning, soft computing, and mathematical analysis. The project started with the code of the book "Algorithms of the Intelligent Web". Although the term
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plott
RapidMiner is a data science platform designed for enterprises that analyses the collective impact of organizations’ employees, expertise and data. Rapid Miner's data science platform is intended to s
Information Harvesting
Information Harvesting (IH) was an early data mining product from the 1990s. It was invented by Ralphe Wiggins and produced by the Ryan Corp, later Information Harvesting Inc., of Cambridge, Massachus
Amazon Rekognition
Amazon Rekognition is a cloud-based software as a service (SaaS) computer vision platform that was launched in 2016. It has been sold to, and used by a number of United States government agencies, inc
Stata (/ˈsteɪtə/, STAY-ta, alternatively /ˈstætə/, occasionally stylized as STATA) is a general-purpose statistical software package developed by StataCorp for data manipulation, visualization, statis
Angoss Software Corporation, headquartered in Toronto, Ontario, Canada, with offices in the United States and UK, acquired by Datawatch and now owned by Altair, was a provider of predictive analytics
Apache SystemDS
Apache SystemDS (Previously, Apache SystemML) is an open source ML system for the end-to-end data science lifecycle. SystemDS's distinguishing characteristics are: 1. * Algorithm customizability via
GNU Octave
GNU Octave is a high-level programming language primarily intended for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for perfor
ELKI (for Environment for DeveLoping KDD-Applications Supported by Index-Structures) is a data mining (KDD, knowledge discovery in databases) software framework developed for use in research and teach
UIMA (/juˈiːmə/ yoo-EE-mə), short for Unstructured Information Management Architecture, is an OASIS standard for content analytics, originally developed at IBM. It provides a component software archit
The QLattice is a software library which provides a framework for symbolic regression in Python. It works on Linux, Windows, and macOS. The QLattice algorithm is developed by the Danish/Spanish AI res
Julia (programming language)
Julia is a high-level, dynamic programming language. Its features are well suited for numerical analysis and computational science. Distinctive aspects of Julia's design include a type system with par
Oracle Data Mining
Oracle Data Mining (ODM) is an option of Oracle Database Enterprise Edition. It contains several data mining and data analysis algorithms for classification, prediction, regression, associations, feat
CatBoost is an open-source software library developed by Yandex. It provides a gradient boosting framework which among other features attempts to solve for Categorical features using a permutation dri
Tanagra (machine learning)
Tanagra is a free suite of machine learning software for research and academic purposesdeveloped by at the Lumière University Lyon 2, France.Tanagra supports several standard data mining tasks such as
Deep Web Technologies
Deep Web Technologies is a software company that specializes in mining the Deep Web — the part of the Internet that is not directly searchable through ordinary web search engines. The company produces
FICO (legal name: Fair Isaac Corporation), originally Fair, Isaac and Company, is a data analytics company based in Bozeman, Montana, focused on credit scoring services. It was founded by and Earl Isa
Folding@home (FAH or F@h) is a volunteer computing project aimed to help scientists develop new therapeutics for a variety of diseases by the means of simulating protein dynamics. This includes the pr
Self-Service Semantic Suite
The Self-Service Semantic Suite (S4) provides on-demand access to text mining and linked open data technology in the cloud.The S4 stack is based on enterprise-grade technology from Ontotext including
SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPS
XGBoost (eXtreme Gradient Boosting) is an open-source software library which provides a regularizing gradient boosting framework for C++, Java, Python, R, Julia, Perl, and Scala. It works on Linux, Wi
World Programming System
The World Programming System, also known as WPS Analytics or WPS, is a software product developed by a company called World Programming (acquired by Altair Engineering). WPS Analytics supports users o
Waffles (machine learning)
Waffles is a collection of command-line tools for performing machine learning operations developed at Brigham Young University. These tools are written in C++, and are available under the GNU Lesser G
Apache Flume
Apache Flume is a distributed, reliable, and available software for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on stream
Data Mining Extensions
Data Mining Extensions (DMX) is a query language for data mining models supported by Microsoft's SQL Server Analysis Services product. Like SQL, it supports a data definition language, data manipulati
Renjin is an implementation of the R programming language atop the Java Virtual Machine. It is free software released under the GPL. Renjin is tightly integrated with Java to allow the embedding of th
BigDL is a distributed deep learning framework for Apache Spark, created by Jason Dai at Intel. BigDL has its source code hosted on GitHub.
mlpy is a Python, open-source, machine learning library built on top of NumPy/SciPy, the GNU Scientific Library and it makes an extensive use of the Cython language. mlpy provides a wide range of stat
SolveIT Software
SolveIT Software Pty Ltd is a provider of advanced planning and scheduling enterprise software for supply and demand optimisation and predictive modelling. Based in Adelaide, South Australia, 70% of i
Mallet (software project)
MALLET is a Java "Machine Learning for Language Toolkit".
LanguageWare is a natural language processing (NLP) technology developed by IBM, which allows applications to process natural language text. It comprises a set of Java libraries which provide a range
Distributed R
Distributed R is an open source, high-performance platform for the R language. It splits tasks between multiple processing nodes to reduce execution time and analyze large data sets. Distributed R enh
Never-Ending Language Learning
Never-Ending Language Learning system (NELL) is a semantic machine learning system developed by a research team at Carnegie Mellon University, and supported by grants from DARPA, Google, NSF, and CNPq
Programming with Big Data in R
Programming with Big Data in R (pbdR) is a series of R packages and an environment for statistical computing with big data by using high-performance statistical computation. The pbdR uses the same pro
Piranha (software)
Piranha is a text mining system developed for the United States Department of Energy (DOE) by Oak Ridge National Laboratory (ORNL). The software processes large volumes of unrelated free-text document
SAS (software)
SAS (previously "Statistical Analysis System") is a statistical software suite developed by SAS Institute for data management, advanced analytics, multivariate analysis, business intelligence, crimina
Fluentd is a cross platform open-source data collection software project originally developed at Treasure Data. It is written primarily in the Ruby programming language.
Massive Online Analysis
Massive Online Analysis (MOA) is a free open-source software project specific for data stream mining with concept drift. It is written in Java and developed at the University of Waikato, New Zealand.
Aphelion (software)
The Aphelion Imaging Software Suite is a software suite that includes three base products - Aphelion Lab, Aphelion Dev, and Aphelion SDK for addressing image processing and image analysis applications
Orange (software)
Orange is an open-source data visualization, machine learning and data mining toolkit. It features a visual programming front-end for explorative rapid qualitative data analysis and interactive data v
Kubeflow is a free and open-source platform for machine learning on Kubernetes. The Kubeflow project has multiple distinct software components which each address specific stages of the machine learnin
mlpack is a machine learning software library for C++, built on top of the Armadillo library and the ensmallen numerical optimization library. mlpack has an emphasis on scalability, speed, and ease-of
DADiSP (Data Analysis and Display, pronounced day-disp) is a numerical computing environment developed by DSP Development Corporation which allows one to display and manipulate data series, matrices a
Probabilistic Action Cores
PRAC (Probabilistic Action Cores) is an interpreter for natural-language instructions for robotic applications developed at the Institute for Artificial Intelligence at the University of Bremen, Germa
KXEN was an American software company which existed from 1998 to 2013 when it was acquired by SAP AG.
Rattle GUI
Rattle GUI is a free and open source software (GNU GPL v2) package providing a graphical user interface (GUI) for data mining using the R statistical programming language. Rattle is used in a variety
scikit-mutliflow (also known as skmultiflow) is a free and open source software machine learning library for multi-output/multi-label and stream data written in Python.
ML.NET is a free software machine learning library for the C# and F# programming languages. It also supports Python models when used together with NimbusML. The preview release of ML.NET included tran
VITAL (machine learning software)
VITAL (Validating Investment Tool for Advancing Life Sciences) was a Board Management Software machine learning proprietary software developed by Aging Analytics, a company registered in Bristol (Engl
ilastik is a user-friendly free open source software for image classification and segmentation. No previous experience in image processing is required to run the software.
MeeMix Ltd is a company specializing in personalizing media-related content recommendations, discovery and advertising for the telecommunication industry, founded in 2006. On January 1, 2008, MeeMix l
Zeroth (software)
Zeroth is a platform for brain-inspired computing from Qualcomm. It is based around a neural processing unit (NPU) AI accelerator chip and a software API to interact with the platform. It makes a form
List of text mining software
Text mining computer programs are available from many commercial and open source companies and sources.
Maple (software)
Maple is a symbolic and numeric computing environment as well as a multi-paradigm programming language. It covers several areas of technical computing, such as symbolic mathematics, numerical analysis
Data Version Control
DVC is a free and open-source, platform-agnostic version system for data, machine learning models, and experiments. It is designed to make ML models shareable, experiments reproducible, and to track v
Pipeline Pilot
Pipeline Pilot is a desktop software program sold by Dassault Systèmes for processing and analyzing data. Originally used in the natural sciences, the product's basic ETL (Extract, transform, load) an
KNIME (/naɪm/), the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining t
R (programming language)
R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. Created by statisticians Ross Ihaka and Robert Gentleman
LightGBM, short for light gradient-boosting machine, is a free and open-source distributed gradient-boosting framework for machine learning, originally developed by Microsoft. It is based on decision
Weka (machine learning)
Waikato Environment for Knowledge Analysis (Weka), developed at the University of Waikato, New Zealand, is free software licensed under the GNU General Public License, and the companion software to th
Lattice Miner
Lattice Miner is a formal concept analysis software tool for the construction, visualization and manipulation of concept lattices. It allows the generation of formal concepts and association rules as