Useful Links
Computer Science
Data Science
Data Cleaning
1. Introduction to Data Cleaning
2. Core Concepts of Data Quality
3. The Data Cleaning Workflow
4. Common Types of Data Quality Issues
5. Techniques for Handling Missing Data
6. Techniques for Correcting Inaccurate Data
7. Techniques for Standardization and Consistency
8. Techniques for Fixing Structural Errors
9. Tools and Technologies for Data Cleaning
10. Advanced Data Cleaning Topics
11. Best Practices and Documentation
Common Types of Data Quality Issues
Missing Data Problems
Types of Missingness
Missing Completely at Random (MCAR)
Missing at Random (MAR)
Missing Not at Random (MNAR)
Missingness Patterns
Univariate Missingness
Monotone Missingness
Arbitrary Missingness
Missing Data Mechanisms
System Failures
Data Collection Issues
User Input Errors
Processing Errors
Impact Assessment
Analysis Bias
Statistical Power Reduction
Model Performance Degradation
Inaccurate and Invalid Data
Outliers and Anomalies
Statistical Outliers
Contextual Outliers
Collective Outliers
Data Entry Errors
Typographical Errors
Transcription Mistakes
Copy-Paste Errors
Measurement Errors
Instrument Calibration Issues
Human Measurement Errors
Environmental Factors
Factual Inaccuracies
Outdated Information
Incorrect References
Misattributed Data
Logical Inconsistencies
Cross-Field Contradictions
Temporal Inconsistencies
Business Rule Violations
Inconsistent and Redundant Data
Duplicate Records
Exact Duplicates
Near Duplicates
Partial Duplicates
Contradictory Information
Conflicting Values
Version Conflicts
Source Disagreements
Format Inconsistencies
Date Format Variations
Number Format Differences
Text Case Variations
Categorical Label Inconsistencies
Spelling Variations
Abbreviation Differences
Synonym Usage
Encoding Issues
Character Encoding Problems
Special Character Handling
Unicode Inconsistencies
Structural and Formatting Problems
Schema Issues
Inconsistent Column Names
Variable Data Types
Missing Columns
Data Type Mismatches
Numeric Data as Text
Date Data as Text
Boolean Data Inconsistencies
Text and String Issues
Unstructured Text in Structured Fields
Embedded Delimiters
Leading and Trailing Whitespace
Delimiter and Separator Problems
Inconsistent Delimiters
Escaped Characters
Nested Separators
Column Alignment Issues
Shifted Columns
Missing Headers
Extra Columns
Multi-Value Fields
Lists in Single Fields
Concatenated Values
Nested Structures
Irrelevant and Noisy Data
Unnecessary Features
Redundant Columns
Derived Variables
Constant Values
Out-of-Scope Records
Temporal Misalignment
Geographic Misalignment
Population Misalignment
Noise and Artifacts
Random Noise
Systematic Noise
Processing Artifacts
Obsolete Information
Deprecated Fields
Historical Artifacts
Legacy System Remnants
Previous
3. The Data Cleaning Workflow
Go to top
Next
5. Techniques for Handling Missing Data