Useful Links
Computer Science
Data Science
Data Cleaning
1. Introduction to Data Cleaning
2. Core Concepts of Data Quality
3. The Data Cleaning Workflow
4. Common Types of Data Quality Issues
5. Techniques for Handling Missing Data
6. Techniques for Correcting Inaccurate Data
7. Techniques for Standardization and Consistency
8. Techniques for Fixing Structural Errors
9. Tools and Technologies for Data Cleaning
10. Advanced Data Cleaning Topics
11. Best Practices and Documentation
Tools and Technologies for Data Cleaning
Spreadsheet Applications
Microsoft Excel
Built-in Functions
Text Functions
Date Functions
Lookup Functions
Statistical Functions
Data Validation Tools
Input Restrictions
Custom Validation Rules
Error Alerts
Data Lists
Advanced Features
Conditional Formatting
Pivot Tables
Power Query
VBA Macros
Google Sheets
Native Functions
Add-ons and Extensions
Google Apps Script
Collaboration Features
LibreOffice Calc
Open Source Alternative
Macro Capabilities
Extension Support
Programming Languages and Libraries
Python Ecosystem
Core Libraries
Pandas
DataFrame Operations
Data Manipulation
Missing Data Handling
Grouping and Aggregation
NumPy
Array Operations
Mathematical Functions
Broadcasting
Linear Algebra
Specialized Libraries
Polars
High-Performance DataFrames
Lazy Evaluation
Memory Efficiency
Dask
Parallel Computing
Out-of-Core Processing
Scalable Analytics
Modin
Pandas Acceleration
Distributed Computing
Text Processing
Regular Expressions (re)
String Methods
Natural Language Toolkit (NLTK)
spaCy
Data Quality Libraries
Great Expectations
Pandera
Cerberus
Schema
Utility Libraries
Pyjanitor
Missingno
Fuzzywuzzy
Dedupe
R Programming
Core Packages
dplyr
Data Manipulation Grammar
Pipe Operations
Grouping Functions
tidyr
Data Reshaping
Missing Data Tools
Nested Data Handling
data.table
High-Performance Operations
Memory Efficiency
Fast Aggregations
String Processing
stringr
stringi
Regular Expressions
Data Quality Packages
janitor
VIM (Visualization and Imputation of Missing values)
mice (Multiple Imputation)
Hmisc
Specialized Packages
lubridate (Date/Time)
forcats (Categorical Data)
readr (Data Import)
Database Systems and SQL
SQL Data Manipulation
Data Definition Language (DDL)
Data Manipulation Language (DML)
Data Query Language (DQL)
Data Control Language (DCL)
Advanced SQL Features
Window Functions
Common Table Expressions (CTEs)
Stored Procedures
User-Defined Functions
String and Date Functions
Text Processing Functions
Pattern Matching
Date Arithmetic
Format Conversion
Data Quality Constraints
Primary Key Constraints
Foreign Key Constraints
Unique Constraints
Check Constraints
Not Null Constraints
Database-Specific Features
PostgreSQL
Advanced Data Types
Full-Text Search
JSON Support
MySQL
String Functions
Date Functions
Regular Expressions
SQL Server
T-SQL Extensions
Data Quality Services
Integration Services
Oracle
PL/SQL
Advanced Analytics
Data Mining
Specialized Data Cleaning Tools
Open Source Tools
OpenRefine
Interactive Data Cleaning
Faceting and Filtering
Clustering and Reconciliation
Expression Language
Apache Spark
Distributed Processing
MLlib for Data Quality
Structured Streaming
Talend Open Studio
ETL Processes
Data Integration
Job Design
Commercial Tools
Trifacta Wrangler
Visual Data Preparation
Machine Learning Suggestions
Collaboration Features
Alteryx Designer
Drag-and-Drop Interface
Predictive Analytics
Spatial Analytics
Informatica Data Quality
Enterprise Data Quality
Data Profiling
Data Standardization
IBM InfoSphere QualityStage
Data Standardization
Matching and Deduplication
Data Investigation
Cloud-Based Solutions
AWS Glue DataBrew
Google Cloud Dataprep
Microsoft Power BI Dataflows
Databricks Data Engineering
Workflow and Pipeline Tools
Apache Airflow
Workflow Orchestration
Task Dependencies
Monitoring and Alerting
Prefect
Modern Workflow Engine
Dynamic Workflows
Error Handling
Luigi
Batch Job Pipeline
Dependency Resolution
Failure Recovery
Dagster
Data Pipeline Framework
Type System
Testing Framework
Previous
8. Techniques for Fixing Structural Errors
Go to top
Next
10. Advanced Data Cleaning Topics