Data Cleaning

  1. Core Concepts of Data Quality
    1. Dimensions of Data Quality
      1. Accuracy
        1. Correctness of Values
          1. Precision of Measurements
            1. Factual Accuracy
            2. Completeness
              1. Missing Value Assessment
                1. Coverage Analysis
                  1. Required Field Completeness
                  2. Consistency
                    1. Internal Consistency
                      1. Cross-Field Consistency
                        1. Format Consistency
                        2. Timeliness
                          1. Currency of Data
                            1. Freshness Requirements
                              1. Update Frequency
                              2. Uniqueness
                                1. Duplicate Detection
                                  1. Entity Resolution
                                    1. Record Deduplication
                                    2. Validity
                                      1. Format Validation
                                        1. Range Validation
                                          1. Business Rule Compliance
                                          2. Relevance
                                            1. Business Context Alignment
                                              1. Use Case Appropriateness
                                                1. Feature Relevance
                                                2. Integrity
                                                  1. Referential Integrity
                                                    1. Entity Integrity
                                                      1. Domain Integrity
                                                    2. Data Profiling and Assessment
                                                      1. Purpose of Data Profiling
                                                        1. Initial Data Exploration
                                                          1. Data Discovery Process
                                                            1. Summary Statistics Generation
                                                              1. Central Tendency Measures
                                                                1. Mean
                                                                  1. Median
                                                                    1. Mode
                                                                    2. Dispersion Measures
                                                                      1. Standard Deviation
                                                                        1. Variance
                                                                          1. Range
                                                                            1. Interquartile Range
                                                                            2. Distribution Characteristics
                                                                              1. Skewness
                                                                                1. Kurtosis
                                                                                  1. Percentiles
                                                                                  2. Frequency Analysis
                                                                                    1. Value Counts
                                                                                      1. Frequency Distributions
                                                                                        1. Categorical Frequencies
                                                                                      2. Data Visualization for Inspection
                                                                                        1. Univariate Visualizations
                                                                                          1. Histograms
                                                                                            1. Box Plots
                                                                                              1. Density Plots
                                                                                              2. Bivariate Visualizations
                                                                                                1. Scatter Plots
                                                                                                  1. Correlation Heatmaps
                                                                                                    1. Cross-Tabulations
                                                                                                    2. Multivariate Visualizations
                                                                                                      1. Pair Plots
                                                                                                        1. Parallel Coordinates
                                                                                                          1. Dimensionality Reduction Plots
                                                                                                        2. Data Type and Structure Analysis
                                                                                                          1. Identifying Data Types
                                                                                                            1. Schema Validation
                                                                                                              1. Structural Patterns
                                                                                                                1. Nested Data Structures
                                                                                                                2. Anomaly and Pattern Detection
                                                                                                                  1. Statistical Anomalies
                                                                                                                    1. Pattern Recognition
                                                                                                                      1. Trend Analysis
                                                                                                                        1. Seasonal Patterns