Statistics for Data Science

  1. Descriptive Statistics: Summarizing Data
    1. Measures of Central Tendency
      1. Mean
        1. Arithmetic Mean
          1. Calculation for Ungrouped Data
            1. Calculation for Grouped Data
              1. Weighted Mean
              2. Properties of the Mean
                1. Sensitivity to Outliers
                  1. Mathematical Properties
                    1. When to Use vs. Avoid
                    2. Alternative Means
                      1. Geometric Mean
                        1. Harmonic Mean
                          1. Trimmed Mean
                        2. Median
                          1. Calculation Methods
                            1. Calculation for Odd Data Sets
                              1. Calculation for Even Data Sets
                                1. Interpolation Methods
                                2. Properties of the Median
                                  1. Robustness to Outliers
                                    1. Positional Nature
                                      1. When to Prefer Over Mean
                                    2. Mode
                                      1. Identification Methods
                                        1. Unimodal Distributions
                                          1. Bimodal Distributions
                                            1. Multimodal Distributions
                                            2. Applications of Mode
                                              1. Categorical Data Analysis
                                                1. Peak Identification
                                                  1. Distribution Shape Assessment
                                                2. Choosing Appropriate Measures
                                                  1. Data Type Considerations
                                                    1. Distribution Shape Impact
                                                      1. Outlier Presence
                                                        1. Business Context Relevance
                                                      2. Measures of Variability and Dispersion
                                                        1. Range
                                                          1. Calculation and Interpretation
                                                            1. Limitations and Weaknesses
                                                              1. When Range is Useful
                                                              2. Interquartile Range (IQR)
                                                                1. Calculation Steps
                                                                  1. First Quartile (Q1)
                                                                    1. Third Quartile (Q3)
                                                                      1. IQR Computation
                                                                      2. Use in Outlier Detection
                                                                        1. IQR Rule for Outliers
                                                                          1. Box Plot Construction
                                                                          2. Robustness Properties
                                                                          3. Variance
                                                                            1. Population Variance
                                                                              1. Formula and Calculation
                                                                                1. Degrees of Freedom Concept
                                                                                2. Sample Variance
                                                                                  1. Bessel's Correction
                                                                                    1. Unbiased Estimation
                                                                                    2. Units and Interpretation
                                                                                      1. Squared Units Problem
                                                                                        1. Relative Magnitude Assessment
                                                                                      2. Standard Deviation
                                                                                        1. Relationship to Variance
                                                                                          1. Square Root Transformation
                                                                                            1. Unit Restoration
                                                                                            2. Interpretation in Context
                                                                                              1. Typical Deviation from Mean
                                                                                                1. Distribution Spread Assessment
                                                                                                2. Population vs. Sample Standard Deviation
                                                                                                3. Coefficient of Variation
                                                                                                  1. Calculation and Formula
                                                                                                    1. Use Cases and Applications
                                                                                                      1. Relative Variability Comparison
                                                                                                        1. Scale-Independent Comparison
                                                                                                        2. Comparing Variability Across Datasets
                                                                                                          1. Different Units Handling
                                                                                                            1. Different Scales Normalization
                                                                                                          2. Mean Absolute Deviation
                                                                                                            1. Calculation and Properties
                                                                                                              1. Comparison with Standard Deviation
                                                                                                                1. Robustness Characteristics
                                                                                                              2. Measures of Position
                                                                                                                1. Percentiles
                                                                                                                  1. Definition and Concept
                                                                                                                    1. Calculation Methods
                                                                                                                      1. Linear Interpolation
                                                                                                                        1. Nearest Rank Method
                                                                                                                        2. Interpretation and Applications
                                                                                                                          1. Performance Benchmarking
                                                                                                                            1. Distribution Analysis
                                                                                                                            2. Applications in Data Science
                                                                                                                              1. Feature Scaling
                                                                                                                                1. Outlier Detection Thresholds
                                                                                                                              2. Quartiles
                                                                                                                                1. First Quartile (Q1)
                                                                                                                                  1. 25th Percentile
                                                                                                                                    1. Lower Quartile Interpretation
                                                                                                                                    2. Second Quartile (Q2)
                                                                                                                                      1. Median Relationship
                                                                                                                                        1. 50th Percentile
                                                                                                                                        2. Third Quartile (Q3)
                                                                                                                                          1. 75th Percentile
                                                                                                                                            1. Upper Quartile Interpretation
                                                                                                                                            2. Five-Number Summary
                                                                                                                                              1. Minimum Value
                                                                                                                                                1. Q1, Median, Q3
                                                                                                                                                  1. Maximum Value
                                                                                                                                                    1. Box Plot Foundation
                                                                                                                                                  2. Z-scores (Standard Scores)
                                                                                                                                                    1. Standardization Formula
                                                                                                                                                      1. Calculation Process
                                                                                                                                                        1. Interpretation Guidelines
                                                                                                                                                          1. Distance from Mean
                                                                                                                                                            1. Standard Deviation Units
                                                                                                                                                            2. Applications
                                                                                                                                                              1. Identifying Outliers
                                                                                                                                                                1. Comparing Across Distributions
                                                                                                                                                                  1. Data Normalization
                                                                                                                                                                2. Deciles and Other Quantiles
                                                                                                                                                                  1. Decile Calculations
                                                                                                                                                                    1. Custom Quantile Selection
                                                                                                                                                                      1. Business Applications
                                                                                                                                                                    2. Understanding Data Shape
                                                                                                                                                                      1. Skewness
                                                                                                                                                                        1. Definition and Measurement
                                                                                                                                                                          1. Positive Skew (Right-Skewed)
                                                                                                                                                                            1. Characteristics and Examples
                                                                                                                                                                              1. Tail Direction
                                                                                                                                                                                1. Mean vs. Median Relationship
                                                                                                                                                                                2. Negative Skew (Left-Skewed)
                                                                                                                                                                                  1. Characteristics and Examples
                                                                                                                                                                                    1. Tail Direction
                                                                                                                                                                                      1. Mean vs. Median Relationship
                                                                                                                                                                                      2. Symmetrical Distributions
                                                                                                                                                                                        1. Impact on Statistical Measures
                                                                                                                                                                                          1. Central Tendency Measures
                                                                                                                                                                                            1. Variability Measures
                                                                                                                                                                                              1. Inference Implications
                                                                                                                                                                                              2. Skewness Coefficients
                                                                                                                                                                                                1. Pearson's Skewness
                                                                                                                                                                                                  1. Sample Skewness Formula
                                                                                                                                                                                                2. Kurtosis
                                                                                                                                                                                                  1. Definition and Measurement
                                                                                                                                                                                                    1. Types of Kurtosis
                                                                                                                                                                                                      1. Leptokurtic Distributions
                                                                                                                                                                                                        1. Mesokurtic Distributions
                                                                                                                                                                                                          1. Platykurtic Distributions
                                                                                                                                                                                                          2. Excess Kurtosis
                                                                                                                                                                                                            1. Comparison to Normal Distribution
                                                                                                                                                                                                              1. Interpretation Guidelines
                                                                                                                                                                                                              2. Interpretation in Data Analysis
                                                                                                                                                                                                                1. Tail Behavior Assessment
                                                                                                                                                                                                                  1. Outlier Propensity
                                                                                                                                                                                                                    1. Risk Assessment Applications
                                                                                                                                                                                                                  2. Distribution Comparison
                                                                                                                                                                                                                    1. Normal Distribution Benchmarking
                                                                                                                                                                                                                      1. Empirical vs. Theoretical Distributions
                                                                                                                                                                                                                        1. Goodness-of-Fit Assessment
                                                                                                                                                                                                                      2. Data Visualization for EDA
                                                                                                                                                                                                                        1. Histograms
                                                                                                                                                                                                                          1. Construction Principles
                                                                                                                                                                                                                            1. Bin Selection Strategies
                                                                                                                                                                                                                              1. Frequency vs. Density
                                                                                                                                                                                                                              2. Choosing Bin Widths
                                                                                                                                                                                                                                1. Sturges' Rule
                                                                                                                                                                                                                                  1. Scott's Rule
                                                                                                                                                                                                                                    1. Freedman-Diaconis Rule
                                                                                                                                                                                                                                    2. Interpretation Guidelines
                                                                                                                                                                                                                                      1. Shape Assessment
                                                                                                                                                                                                                                        1. Outlier Identification
                                                                                                                                                                                                                                          1. Distribution Comparison
                                                                                                                                                                                                                                        2. Box Plots (Box-and-Whisker Plots)
                                                                                                                                                                                                                                          1. Components of a Box Plot
                                                                                                                                                                                                                                            1. Box Construction
                                                                                                                                                                                                                                              1. Whisker Calculation
                                                                                                                                                                                                                                                1. Outlier Marking
                                                                                                                                                                                                                                                2. Variations
                                                                                                                                                                                                                                                  1. Notched Box Plots
                                                                                                                                                                                                                                                    1. Violin Plots
                                                                                                                                                                                                                                                      1. Multiple Box Plots
                                                                                                                                                                                                                                                      2. Identifying Outliers
                                                                                                                                                                                                                                                        1. IQR Method
                                                                                                                                                                                                                                                          1. Visual Identification
                                                                                                                                                                                                                                                            1. Statistical vs. Practical Outliers
                                                                                                                                                                                                                                                          2. Bar Charts
                                                                                                                                                                                                                                                            1. Categorical Data Visualization
                                                                                                                                                                                                                                                              1. Frequency Representation
                                                                                                                                                                                                                                                                1. Proportion Display
                                                                                                                                                                                                                                                                2. Chart Variations
                                                                                                                                                                                                                                                                  1. Grouped Bar Charts
                                                                                                                                                                                                                                                                    1. Stacked Bar Charts
                                                                                                                                                                                                                                                                      1. Horizontal vs. Vertical
                                                                                                                                                                                                                                                                      2. Best Practices
                                                                                                                                                                                                                                                                        1. Ordering Strategies
                                                                                                                                                                                                                                                                          1. Color Usage
                                                                                                                                                                                                                                                                            1. Label Clarity
                                                                                                                                                                                                                                                                          2. Scatter Plots
                                                                                                                                                                                                                                                                            1. Construction Principles
                                                                                                                                                                                                                                                                              1. Variable Assignment
                                                                                                                                                                                                                                                                                1. Point Representation
                                                                                                                                                                                                                                                                                2. Visualizing Relationships
                                                                                                                                                                                                                                                                                  1. Linear Relationships
                                                                                                                                                                                                                                                                                    1. Non-Linear Patterns
                                                                                                                                                                                                                                                                                      1. No Relationship Patterns
                                                                                                                                                                                                                                                                                      2. Detecting Correlation and Patterns
                                                                                                                                                                                                                                                                                        1. Positive Correlation
                                                                                                                                                                                                                                                                                          1. Negative Correlation
                                                                                                                                                                                                                                                                                            1. Correlation Strength Assessment
                                                                                                                                                                                                                                                                                            2. Enhancements
                                                                                                                                                                                                                                                                                              1. Color Coding
                                                                                                                                                                                                                                                                                                1. Size Mapping
                                                                                                                                                                                                                                                                                                  1. Trend Lines
                                                                                                                                                                                                                                                                                                2. Density Plots
                                                                                                                                                                                                                                                                                                  1. Kernel Density Estimation
                                                                                                                                                                                                                                                                                                    1. Bandwidth Selection
                                                                                                                                                                                                                                                                                                      1. Kernel Function Types
                                                                                                                                                                                                                                                                                                        1. Smoothing Concepts
                                                                                                                                                                                                                                                                                                        2. Comparison with Histograms
                                                                                                                                                                                                                                                                                                          1. Continuous vs. Discrete Representation
                                                                                                                                                                                                                                                                                                            1. Smoothness vs. Granularity
                                                                                                                                                                                                                                                                                                              1. Interpretation Differences
                                                                                                                                                                                                                                                                                                              2. Multiple Distribution Comparison
                                                                                                                                                                                                                                                                                                              3. Additional Visualization Types
                                                                                                                                                                                                                                                                                                                1. Stem-and-Leaf Plots
                                                                                                                                                                                                                                                                                                                  1. Dot Plots
                                                                                                                                                                                                                                                                                                                    1. Q-Q Plots
                                                                                                                                                                                                                                                                                                                      1. Heat Maps for Correlation