Statistics for Data Science

  1. Hypothesis Testing
    1. The Framework of Hypothesis Testing
      1. Formulating Hypotheses
        1. Null Hypothesis (H₀)
          1. Definition and Characteristics
            1. Status Quo Assumption
              1. Equality Statements
                1. Burden of Proof Concept
                2. Alternative Hypothesis (H₁ or Hₐ)
                  1. Definition and Characteristics
                    1. Research Hypothesis
                      1. What We Want to Prove
                        1. Complement of Null
                        2. Hypothesis Formulation Guidelines
                          1. Clear and Testable Statements
                            1. Population Parameter Focus
                              1. Mutually Exclusive and Exhaustive
                            2. Types of Tests
                              1. One-tailed Tests
                                1. Right-Tailed Tests
                                  1. Left-Tailed Tests
                                    1. Directional Hypotheses
                                      1. When to Use
                                      2. Two-tailed Tests
                                        1. Non-directional Hypotheses
                                          1. "Not Equal To" Alternatives
                                            1. Conservative Approach
                                              1. When to Use
                                            2. Test Statistics
                                              1. Definition and Purpose
                                                1. Sample Data Summarization
                                                  1. Standardized Measures
                                                    1. Distribution Under H₀
                                                    2. Common Test Statistics
                                                      1. Z-statistic
                                                        1. t-statistic
                                                          1. Chi-square statistic
                                                            1. F-statistic
                                                            2. Calculation and Interpretation
                                                              1. Formula Applications
                                                                1. Degrees of Freedom
                                                                  1. Critical Value Comparison
                                                              2. Decision Making in Hypothesis Testing
                                                                1. Errors in Hypothesis Testing
                                                                  1. Type I Error (α)
                                                                    1. Definition: Rejecting True H₀
                                                                      1. False Positive
                                                                        1. Consequences and Examples
                                                                          1. Significance Level Setting
                                                                          2. Type II Error (β)
                                                                            1. Definition: Failing to Reject False H₀
                                                                              1. False Negative
                                                                                1. Consequences and Examples
                                                                                  1. Factors Affecting β
                                                                                  2. Relationship Between Error Types
                                                                                    1. Trade-off Nature
                                                                                      1. Sample Size Impact
                                                                                        1. Effect Size Influence
                                                                                        2. Statistical Power (1 - β)
                                                                                          1. Definition and Interpretation
                                                                                            1. Factors Affecting Power
                                                                                              1. Sample Size
                                                                                                1. Effect Size
                                                                                                  1. Significance Level
                                                                                                    1. Population Variability
                                                                                                    2. Power Analysis
                                                                                                      1. Prospective Power Analysis
                                                                                                        1. Retrospective Power Analysis
                                                                                                          1. Sample Size Determination
                                                                                                      2. The p-value Approach
                                                                                                        1. Definition and Interpretation
                                                                                                          1. Probability Under H₀
                                                                                                            1. Strength of Evidence
                                                                                                              1. Not Probability of H₀ Being True
                                                                                                              2. Calculation Methods
                                                                                                                1. Area in Tail(s)
                                                                                                                  1. Test Statistic Comparison
                                                                                                                    1. Software Implementation
                                                                                                                    2. Decision Making with p-values
                                                                                                                      1. Comparison with α
                                                                                                                        1. Rejecting H₀ (p < α)
                                                                                                                          1. Failing to Reject H₀ (p ≥ α)
                                                                                                                            1. Strength of Evidence Interpretation
                                                                                                                            2. Common Misconceptions
                                                                                                                              1. p-hacking
                                                                                                                                1. Multiple Testing Issues
                                                                                                                                  1. Practical vs. Statistical Significance
                                                                                                                                2. Critical Value Approach
                                                                                                                                  1. Critical Region Definition
                                                                                                                                    1. Critical Value Determination
                                                                                                                                      1. Decision Rule Application
                                                                                                                                        1. Relationship to p-value Approach
                                                                                                                                        2. Significance Level (α)
                                                                                                                                          1. Common Choices (0.05, 0.01, 0.10)
                                                                                                                                            1. Context-Dependent Selection
                                                                                                                                              1. Relationship to Confidence Level
                                                                                                                                                1. Multiple Testing Adjustments
                                                                                                                                              2. Common Statistical Tests
                                                                                                                                                1. Tests for a Single Population Mean
                                                                                                                                                  1. One-Sample Z-test
                                                                                                                                                    1. Assumptions
                                                                                                                                                      1. Known Population Variance
                                                                                                                                                        1. Normal Distribution or Large Sample
                                                                                                                                                          1. Random Sampling
                                                                                                                                                          2. Test Statistic Formula
                                                                                                                                                            1. Critical Values and p-values
                                                                                                                                                              1. Applications and Examples
                                                                                                                                                              2. One-Sample t-test
                                                                                                                                                                1. Assumptions
                                                                                                                                                                  1. Unknown Population Variance
                                                                                                                                                                    1. Normal Distribution or Large Sample
                                                                                                                                                                      1. Random Sampling
                                                                                                                                                                      2. Test Statistic Formula
                                                                                                                                                                        1. Degrees of Freedom
                                                                                                                                                                          1. Critical Values and p-values
                                                                                                                                                                            1. Applications and Examples
                                                                                                                                                                            2. Robustness Considerations
                                                                                                                                                                              1. Normality Assumption Violations
                                                                                                                                                                                1. Sample Size Requirements
                                                                                                                                                                                  1. Alternative Non-parametric Tests
                                                                                                                                                                                2. Tests for Two Population Means
                                                                                                                                                                                  1. Independent Samples t-test
                                                                                                                                                                                    1. Assumptions
                                                                                                                                                                                      1. Independent Groups
                                                                                                                                                                                        1. Normal Distributions
                                                                                                                                                                                          1. Equal Variances (Pooled) vs. Unequal Variances (Welch's)
                                                                                                                                                                                          2. Pooled Variance t-test
                                                                                                                                                                                            1. Equal Variances Assumption
                                                                                                                                                                                              1. Pooled Standard Error
                                                                                                                                                                                                1. Degrees of Freedom Calculation
                                                                                                                                                                                                2. Welch's t-test
                                                                                                                                                                                                  1. Unequal Variances
                                                                                                                                                                                                    1. Separate Variance Estimates
                                                                                                                                                                                                      1. Adjusted Degrees of Freedom
                                                                                                                                                                                                      2. Effect Size Measures
                                                                                                                                                                                                        1. Cohen's d
                                                                                                                                                                                                          1. Practical Significance
                                                                                                                                                                                                        2. Paired Samples t-test
                                                                                                                                                                                                          1. Assumptions
                                                                                                                                                                                                            1. Paired Observations
                                                                                                                                                                                                              1. Normal Distribution of Differences
                                                                                                                                                                                                                1. Random Sampling
                                                                                                                                                                                                                2. Difference Score Analysis
                                                                                                                                                                                                                  1. Test Statistic Formula
                                                                                                                                                                                                                    1. Applications
                                                                                                                                                                                                                      1. Before-After Studies
                                                                                                                                                                                                                        1. Matched Pairs Design
                                                                                                                                                                                                                          1. Repeated Measures
                                                                                                                                                                                                                      2. Tests for Population Proportions
                                                                                                                                                                                                                        1. One-Proportion Z-test
                                                                                                                                                                                                                          1. Assumptions
                                                                                                                                                                                                                            1. Large Sample Size
                                                                                                                                                                                                                              1. Success-Failure Condition
                                                                                                                                                                                                                                1. Random Sampling
                                                                                                                                                                                                                                2. Test Statistic Formula
                                                                                                                                                                                                                                  1. Continuity Correction
                                                                                                                                                                                                                                    1. Applications and Examples
                                                                                                                                                                                                                                    2. Two-Proportion Z-test
                                                                                                                                                                                                                                      1. Assumptions
                                                                                                                                                                                                                                        1. Independent Samples
                                                                                                                                                                                                                                          1. Large Sample Sizes
                                                                                                                                                                                                                                            1. Success-Failure Conditions
                                                                                                                                                                                                                                            2. Pooled Proportion Estimate
                                                                                                                                                                                                                                              1. Test Statistic Formula
                                                                                                                                                                                                                                                1. Applications
                                                                                                                                                                                                                                                  1. Comparing Success Rates
                                                                                                                                                                                                                                                    1. A/B Testing
                                                                                                                                                                                                                                                2. Analysis of Variance (ANOVA)
                                                                                                                                                                                                                                                  1. One-Way ANOVA
                                                                                                                                                                                                                                                    1. Purpose and Applications
                                                                                                                                                                                                                                                      1. Comparing Multiple Group Means
                                                                                                                                                                                                                                                        1. Extension of Two-Sample t-test
                                                                                                                                                                                                                                                        2. Assumptions
                                                                                                                                                                                                                                                          1. Independence of Observations
                                                                                                                                                                                                                                                            1. Normality Within Groups
                                                                                                                                                                                                                                                              1. Equal Variances (Homoscedasticity)
                                                                                                                                                                                                                                                              2. ANOVA Table Components
                                                                                                                                                                                                                                                                1. Sum of Squares (SS)
                                                                                                                                                                                                                                                                  1. Degrees of Freedom (df)
                                                                                                                                                                                                                                                                    1. Mean Squares (MS)
                                                                                                                                                                                                                                                                      1. F-statistic
                                                                                                                                                                                                                                                                      2. Between-Group vs. Within-Group Variation
                                                                                                                                                                                                                                                                        1. Post-Hoc Tests
                                                                                                                                                                                                                                                                          1. Tukey's HSD
                                                                                                                                                                                                                                                                            1. Bonferroni Correction
                                                                                                                                                                                                                                                                              1. Scheffe's Method
                                                                                                                                                                                                                                                                            2. Two-Way ANOVA
                                                                                                                                                                                                                                                                              1. Purpose and Design
                                                                                                                                                                                                                                                                                1. Two Factors Analysis
                                                                                                                                                                                                                                                                                  1. Interaction Effects
                                                                                                                                                                                                                                                                                  2. Main Effects vs. Interaction Effects
                                                                                                                                                                                                                                                                                    1. Assumptions
                                                                                                                                                                                                                                                                                      1. ANOVA Table for Two-Way Design
                                                                                                                                                                                                                                                                                        1. Interpretation of Results
                                                                                                                                                                                                                                                                                        2. F-statistic and F-distribution
                                                                                                                                                                                                                                                                                          1. Ratio of Variances
                                                                                                                                                                                                                                                                                            1. F-distribution Properties
                                                                                                                                                                                                                                                                                              1. Critical Values
                                                                                                                                                                                                                                                                                                1. Interpretation Guidelines
                                                                                                                                                                                                                                                                                              2. Chi-Squared Tests
                                                                                                                                                                                                                                                                                                1. Goodness-of-Fit Test
                                                                                                                                                                                                                                                                                                  1. Purpose and Applications
                                                                                                                                                                                                                                                                                                    1. Testing Distribution Assumptions
                                                                                                                                                                                                                                                                                                      1. Comparing Observed vs. Expected
                                                                                                                                                                                                                                                                                                      2. Assumptions
                                                                                                                                                                                                                                                                                                        1. Independent Observations
                                                                                                                                                                                                                                                                                                          1. Expected Frequency Requirements
                                                                                                                                                                                                                                                                                                            1. Categorical Data
                                                                                                                                                                                                                                                                                                            2. Test Statistic Calculation
                                                                                                                                                                                                                                                                                                              1. Degrees of Freedom
                                                                                                                                                                                                                                                                                                                1. Applications
                                                                                                                                                                                                                                                                                                                  1. Testing Normality
                                                                                                                                                                                                                                                                                                                    1. Uniform Distribution Testing
                                                                                                                                                                                                                                                                                                                      1. Model Validation
                                                                                                                                                                                                                                                                                                                    2. Test for Independence
                                                                                                                                                                                                                                                                                                                      1. Purpose and Applications
                                                                                                                                                                                                                                                                                                                        1. Association Between Variables
                                                                                                                                                                                                                                                                                                                          1. Contingency Table Analysis
                                                                                                                                                                                                                                                                                                                          2. Contingency Tables
                                                                                                                                                                                                                                                                                                                            1. Row and Column Variables
                                                                                                                                                                                                                                                                                                                              1. Expected Frequency Calculation
                                                                                                                                                                                                                                                                                                                                1. Marginal Totals
                                                                                                                                                                                                                                                                                                                                2. Test Statistic Calculation
                                                                                                                                                                                                                                                                                                                                  1. Degrees of Freedom Formula
                                                                                                                                                                                                                                                                                                                                    1. Interpretation of Results
                                                                                                                                                                                                                                                                                                                                      1. Statistical Independence
                                                                                                                                                                                                                                                                                                                                        1. Strength of Association
                                                                                                                                                                                                                                                                                                                                        2. Measures of Association
                                                                                                                                                                                                                                                                                                                                          1. Cramér's V
                                                                                                                                                                                                                                                                                                                                            1. Phi Coefficient
                                                                                                                                                                                                                                                                                                                                              1. Contingency Coefficient
                                                                                                                                                                                                                                                                                                                                          2. Non-Parametric Tests
                                                                                                                                                                                                                                                                                                                                            1. When to Use Non-Parametric Tests
                                                                                                                                                                                                                                                                                                                                              1. Assumption Violations
                                                                                                                                                                                                                                                                                                                                                1. Ordinal Data
                                                                                                                                                                                                                                                                                                                                                  1. Small Sample Sizes
                                                                                                                                                                                                                                                                                                                                                  2. Wilcoxon Signed-Rank Test
                                                                                                                                                                                                                                                                                                                                                    1. Mann-Whitney U Test
                                                                                                                                                                                                                                                                                                                                                      1. Kruskal-Wallis Test
                                                                                                                                                                                                                                                                                                                                                        1. Spearman's Rank Correlation