Statistics for Data Science

  1. Advanced and Modern Statistical Methods
    1. Bayesian Statistics
      1. Philosophical Foundations
        1. Frequentist vs. Bayesian Paradigms
          1. Probability Interpretation Differences
            1. Parameter Treatment
              1. Long-Run vs. Degree of Belief
                1. Objective vs. Subjective Approaches
                2. Bayesian Inference Framework
                  1. Prior-to-Posterior Learning
                    1. Uncertainty Quantification
                      1. Decision Theory Integration
                    2. Core Components
                      1. Prior Distributions
                        1. Subjective Priors
                          1. Expert Opinion Incorporation
                            1. Personal Belief Representation
                            2. Objective Priors
                              1. Non-informative Priors
                                1. Jeffreys Priors
                                  1. Reference Priors
                                  2. Conjugate Priors
                                    1. Mathematical Convenience
                                      1. Closed-Form Posteriors
                                        1. Common Conjugate Pairs
                                        2. Empirical Priors
                                          1. Data-Driven Prior Selection
                                            1. Hierarchical Modeling
                                          2. Likelihood Function
                                            1. Data Generation Model
                                              1. Parameter Dependence
                                                1. Connection to Frequentist Methods
                                                2. Posterior Distributions
                                                  1. Bayes' Theorem Application
                                                    1. Prior × Likelihood ∝ Posterior
                                                      1. Normalization Constant
                                                        1. Posterior Inference
                                                      2. Bayesian Estimation
                                                        1. Point Estimation
                                                          1. Posterior Mean
                                                            1. Posterior Median
                                                              1. Maximum A Posteriori (MAP)
                                                              2. Interval Estimation
                                                                1. Credible Intervals
                                                                  1. Equal-Tailed Intervals
                                                                    1. Highest Posterior Density (HPD)
                                                                      1. Interpretation vs. Confidence Intervals
                                                                    2. Predictive Distributions
                                                                      1. Posterior Predictive Distribution
                                                                        1. Prior Predictive Distribution
                                                                          1. Model Checking Applications
                                                                        2. Computational Methods
                                                                          1. Markov Chain Monte Carlo (MCMC)
                                                                            1. Metropolis-Hastings Algorithm
                                                                              1. Gibbs Sampling
                                                                                1. Convergence Diagnostics
                                                                                2. Variational Inference
                                                                                  1. Approximate Posterior Distributions
                                                                                    1. Computational Efficiency
                                                                                  2. Applications in Data Science
                                                                                    1. A/B Testing
                                                                                      1. Machine Learning Model Uncertainty
                                                                                        1. Hierarchical Modeling
                                                                                      2. Resampling Methods
                                                                                        1. The Bootstrap
                                                                                          1. Bootstrap Principle
                                                                                            1. Sampling with Replacement
                                                                                              1. Empirical Distribution Function
                                                                                                1. Plug-in Principle
                                                                                                2. Bootstrap Procedure
                                                                                                  1. Original Sample as Population
                                                                                                    1. Resampling Process
                                                                                                      1. Bootstrap Samples Generation
                                                                                                      2. Bootstrap Applications
                                                                                                        1. Standard Error Estimation
                                                                                                          1. Bootstrap Standard Error
                                                                                                            1. Comparison with Analytical Methods
                                                                                                              1. Complex Statistics
                                                                                                              2. Confidence Interval Construction
                                                                                                                1. Percentile Method
                                                                                                                  1. Bias-Corrected and Accelerated (BCa)
                                                                                                                    1. Bootstrap-t Method
                                                                                                                    2. Hypothesis Testing
                                                                                                                      1. Bootstrap p-values
                                                                                                                        1. Permutation Tests Connection
                                                                                                                      2. Bootstrap Variants
                                                                                                                        1. Parametric Bootstrap
                                                                                                                          1. Block Bootstrap for Time Series
                                                                                                                            1. Balanced Bootstrap
                                                                                                                            2. Limitations and Considerations
                                                                                                                              1. Sample Size Requirements
                                                                                                                                1. Assumption Violations
                                                                                                                                  1. Computational Intensity
                                                                                                                                2. Permutation Tests
                                                                                                                                  1. Permutation Principle
                                                                                                                                    1. Null Hypothesis of No Effect
                                                                                                                                      1. Exchangeability Under H₀
                                                                                                                                        1. Exact p-value Calculation
                                                                                                                                        2. Procedure
                                                                                                                                          1. Test Statistic Calculation
                                                                                                                                            1. Permutation Distribution Generation
                                                                                                                                              1. p-value Determination
                                                                                                                                              2. Applications
                                                                                                                                                1. Two-Sample Tests
                                                                                                                                                  1. Correlation Testing
                                                                                                                                                    1. Regression Coefficient Testing
                                                                                                                                                    2. Advantages
                                                                                                                                                      1. Distribution-Free
                                                                                                                                                        1. Exact Tests
                                                                                                                                                          1. Robust to Outliers
                                                                                                                                                          2. Computational Considerations
                                                                                                                                                            1. Exhaustive vs. Random Permutations
                                                                                                                                                              1. Monte Carlo Approximation
                                                                                                                                                            2. Jackknife Method
                                                                                                                                                              1. Leave-One-Out Principle
                                                                                                                                                                1. Bias Reduction
                                                                                                                                                                  1. Variance Estimation
                                                                                                                                                                  2. Cross-Validation
                                                                                                                                                                    1. Model Selection Application
                                                                                                                                                                      1. Overfitting Prevention
                                                                                                                                                                        1. Performance Estimation
                                                                                                                                                                      2. Experimental Design and A/B Testing
                                                                                                                                                                        1. Principles of Experimental Design
                                                                                                                                                                          1. Randomization
                                                                                                                                                                            1. Random Assignment Importance
                                                                                                                                                                              1. Eliminating Selection Bias
                                                                                                                                                                                1. Balancing Confounders
                                                                                                                                                                                  1. Randomization Methods
                                                                                                                                                                                  2. Control Groups
                                                                                                                                                                                    1. Treatment vs. Control
                                                                                                                                                                                      1. Placebo Effects
                                                                                                                                                                                        1. Historical Controls
                                                                                                                                                                                        2. Replication
                                                                                                                                                                                          1. Sample Size Adequacy
                                                                                                                                                                                            1. Statistical Power
                                                                                                                                                                                              1. Generalizability
                                                                                                                                                                                              2. Blocking
                                                                                                                                                                                                1. Controlling Known Sources of Variation
                                                                                                                                                                                                  1. Matched Pairs Design
                                                                                                                                                                                                    1. Randomized Block Design
                                                                                                                                                                                                  2. A/B Testing Framework
                                                                                                                                                                                                    1. Business Context
                                                                                                                                                                                                      1. Conversion Rate Optimization
                                                                                                                                                                                                        1. User Experience Testing
                                                                                                                                                                                                          1. Product Feature Evaluation
                                                                                                                                                                                                          2. Test Design Components
                                                                                                                                                                                                            1. Primary Metric Selection
                                                                                                                                                                                                              1. Secondary Metrics
                                                                                                                                                                                                                1. Success Criteria Definition
                                                                                                                                                                                                                  1. Minimum Detectable Effect
                                                                                                                                                                                                                  2. Randomization in A/B Tests
                                                                                                                                                                                                                    1. User-Level Randomization
                                                                                                                                                                                                                      1. Session-Level Randomization
                                                                                                                                                                                                                        1. Cluster Randomization
                                                                                                                                                                                                                        2. Sample Size Determination
                                                                                                                                                                                                                          1. Power Analysis Application
                                                                                                                                                                                                                            1. Effect Size Specification
                                                                                                                                                                                                                              1. Type I and Type II Error Rates
                                                                                                                                                                                                                                1. Business Impact Considerations
                                                                                                                                                                                                                              2. A/B Test Implementation
                                                                                                                                                                                                                                1. Pre-Test Phase
                                                                                                                                                                                                                                  1. Hypothesis Formulation
                                                                                                                                                                                                                                    1. Metric Definition
                                                                                                                                                                                                                                      1. Technical Implementation
                                                                                                                                                                                                                                        1. Quality Assurance
                                                                                                                                                                                                                                        2. Test Execution
                                                                                                                                                                                                                                          1. Monitoring and Quality Control
                                                                                                                                                                                                                                            1. Early Stopping Considerations
                                                                                                                                                                                                                                              1. External Validity Threats
                                                                                                                                                                                                                                              2. Post-Test Analysis
                                                                                                                                                                                                                                                1. Statistical Significance Testing
                                                                                                                                                                                                                                                  1. Practical Significance Assessment
                                                                                                                                                                                                                                                    1. Confidence Interval Construction
                                                                                                                                                                                                                                                      1. Segmentation Analysis
                                                                                                                                                                                                                                                    2. Advanced A/B Testing Concepts
                                                                                                                                                                                                                                                      1. Multiple Testing Problem
                                                                                                                                                                                                                                                        1. Family-Wise Error Rate
                                                                                                                                                                                                                                                          1. False Discovery Rate
                                                                                                                                                                                                                                                            1. Bonferroni Correction
                                                                                                                                                                                                                                                              1. Benjamini-Hochberg Procedure
                                                                                                                                                                                                                                                              2. Sequential Testing
                                                                                                                                                                                                                                                                1. Early Stopping Rules
                                                                                                                                                                                                                                                                  1. Group Sequential Methods
                                                                                                                                                                                                                                                                    1. Bayesian Approaches
                                                                                                                                                                                                                                                                    2. Multi-Armed Bandits
                                                                                                                                                                                                                                                                      1. Exploration vs. Exploitation
                                                                                                                                                                                                                                                                        1. Thompson Sampling
                                                                                                                                                                                                                                                                          1. Upper Confidence Bound
                                                                                                                                                                                                                                                                        2. Common Pitfalls and Best Practices
                                                                                                                                                                                                                                                                          1. Selection Bias
                                                                                                                                                                                                                                                                            1. Survivorship Bias
                                                                                                                                                                                                                                                                              1. Simpson's Paradox
                                                                                                                                                                                                                                                                                1. Novelty Effects
                                                                                                                                                                                                                                                                                  1. Seasonal Variations
                                                                                                                                                                                                                                                                                2. Introduction to Statistical Learning Concepts
                                                                                                                                                                                                                                                                                  1. Supervised vs. Unsupervised Learning
                                                                                                                                                                                                                                                                                    1. Problem Type Classification
                                                                                                                                                                                                                                                                                      1. Prediction vs. Inference Goals
                                                                                                                                                                                                                                                                                        1. Labeled vs. Unlabeled Data
                                                                                                                                                                                                                                                                                        2. Bias-Variance Tradeoff
                                                                                                                                                                                                                                                                                          1. Bias Component
                                                                                                                                                                                                                                                                                            1. Underfitting Characteristics
                                                                                                                                                                                                                                                                                              1. Model Simplicity
                                                                                                                                                                                                                                                                                                1. Systematic Error
                                                                                                                                                                                                                                                                                                2. Variance Component
                                                                                                                                                                                                                                                                                                  1. Overfitting Characteristics
                                                                                                                                                                                                                                                                                                    1. Model Complexity
                                                                                                                                                                                                                                                                                                      1. Random Error
                                                                                                                                                                                                                                                                                                      2. Tradeoff Implications
                                                                                                                                                                                                                                                                                                        1. Model Selection Guidance
                                                                                                                                                                                                                                                                                                          1. Complexity Optimization
                                                                                                                                                                                                                                                                                                            1. Generalization Performance
                                                                                                                                                                                                                                                                                                            2. Decomposition
                                                                                                                                                                                                                                                                                                              1. Mathematical Framework
                                                                                                                                                                                                                                                                                                                1. Irreducible Error
                                                                                                                                                                                                                                                                                                                  1. Total Expected Error
                                                                                                                                                                                                                                                                                                                2. Model Validation and Selection
                                                                                                                                                                                                                                                                                                                  1. Training Error vs. Test Error
                                                                                                                                                                                                                                                                                                                    1. Optimistic Training Error
                                                                                                                                                                                                                                                                                                                      1. Generalization Gap
                                                                                                                                                                                                                                                                                                                        1. Overfitting Detection
                                                                                                                                                                                                                                                                                                                        2. Cross-Validation Methods
                                                                                                                                                                                                                                                                                                                          1. k-Fold Cross-Validation
                                                                                                                                                                                                                                                                                                                            1. Procedure and Implementation
                                                                                                                                                                                                                                                                                                                              1. Choosing k Value
                                                                                                                                                                                                                                                                                                                                1. Stratified Cross-Validation
                                                                                                                                                                                                                                                                                                                                2. Leave-One-Out Cross-Validation (LOOCV)
                                                                                                                                                                                                                                                                                                                                  1. Extreme Case of k-Fold
                                                                                                                                                                                                                                                                                                                                    1. Computational Considerations
                                                                                                                                                                                                                                                                                                                                      1. Bias-Variance Properties
                                                                                                                                                                                                                                                                                                                                      2. Time Series Cross-Validation
                                                                                                                                                                                                                                                                                                                                        1. Forward Chaining
                                                                                                                                                                                                                                                                                                                                          1. Temporal Dependencies
                                                                                                                                                                                                                                                                                                                                        2. Information Criteria
                                                                                                                                                                                                                                                                                                                                          1. Akaike Information Criterion (AIC)
                                                                                                                                                                                                                                                                                                                                            1. Bayesian Information Criterion (BIC)
                                                                                                                                                                                                                                                                                                                                              1. Model Comparison Framework
                                                                                                                                                                                                                                                                                                                                            2. Regularization Techniques
                                                                                                                                                                                                                                                                                                                                              1. Ridge Regression (L2 Regularization)
                                                                                                                                                                                                                                                                                                                                                1. Penalty Term Addition
                                                                                                                                                                                                                                                                                                                                                  1. Shrinkage Effect
                                                                                                                                                                                                                                                                                                                                                    1. Multicollinearity Handling
                                                                                                                                                                                                                                                                                                                                                      1. Tuning Parameter Selection
                                                                                                                                                                                                                                                                                                                                                      2. Lasso Regression (L1 Regularization)
                                                                                                                                                                                                                                                                                                                                                        1. Penalty Term Characteristics
                                                                                                                                                                                                                                                                                                                                                          1. Variable Selection Property
                                                                                                                                                                                                                                                                                                                                                            1. Sparsity Induction
                                                                                                                                                                                                                                                                                                                                                              1. Geometric Interpretation
                                                                                                                                                                                                                                                                                                                                                              2. Elastic Net
                                                                                                                                                                                                                                                                                                                                                                1. L1 and L2 Combination
                                                                                                                                                                                                                                                                                                                                                                  1. Grouped Variable Selection
                                                                                                                                                                                                                                                                                                                                                                    1. Parameter Tuning
                                                                                                                                                                                                                                                                                                                                                                    2. Regularization Path
                                                                                                                                                                                                                                                                                                                                                                      1. Solution Trajectory
                                                                                                                                                                                                                                                                                                                                                                        1. Cross-Validation for λ Selection
                                                                                                                                                                                                                                                                                                                                                                      2. Dimensionality Reduction
                                                                                                                                                                                                                                                                                                                                                                        1. Curse of Dimensionality
                                                                                                                                                                                                                                                                                                                                                                          1. High-Dimensional Challenges
                                                                                                                                                                                                                                                                                                                                                                            1. Distance Concentration
                                                                                                                                                                                                                                                                                                                                                                              1. Sample Size Requirements
                                                                                                                                                                                                                                                                                                                                                                              2. Principal Component Analysis (PCA)
                                                                                                                                                                                                                                                                                                                                                                                1. Variance Maximization Objective
                                                                                                                                                                                                                                                                                                                                                                                  1. Eigenvalue Decomposition
                                                                                                                                                                                                                                                                                                                                                                                    1. Steps in PCA Implementation
                                                                                                                                                                                                                                                                                                                                                                                      1. Data Standardization
                                                                                                                                                                                                                                                                                                                                                                                        1. Covariance Matrix Computation
                                                                                                                                                                                                                                                                                                                                                                                          1. Eigenvalue and Eigenvector Calculation
                                                                                                                                                                                                                                                                                                                                                                                            1. Component Selection
                                                                                                                                                                                                                                                                                                                                                                                              1. Transformation Application
                                                                                                                                                                                                                                                                                                                                                                                              2. Interpreting Principal Components
                                                                                                                                                                                                                                                                                                                                                                                                1. Loading Interpretation
                                                                                                                                                                                                                                                                                                                                                                                                  1. Variance Explained
                                                                                                                                                                                                                                                                                                                                                                                                    1. Scree Plots
                                                                                                                                                                                                                                                                                                                                                                                                      1. Biplot Visualization
                                                                                                                                                                                                                                                                                                                                                                                                      2. Applications
                                                                                                                                                                                                                                                                                                                                                                                                        1. Data Visualization
                                                                                                                                                                                                                                                                                                                                                                                                          1. Noise Reduction
                                                                                                                                                                                                                                                                                                                                                                                                            1. Feature Engineering
                                                                                                                                                                                                                                                                                                                                                                                                              1. Preprocessing for Machine Learning
                                                                                                                                                                                                                                                                                                                                                                                                            2. Other Dimensionality Reduction Methods
                                                                                                                                                                                                                                                                                                                                                                                                              1. Factor Analysis
                                                                                                                                                                                                                                                                                                                                                                                                                1. Independent Component Analysis (ICA)
                                                                                                                                                                                                                                                                                                                                                                                                                  1. t-SNE for Visualization
                                                                                                                                                                                                                                                                                                                                                                                                                2. Performance Metrics
                                                                                                                                                                                                                                                                                                                                                                                                                  1. Regression Metrics
                                                                                                                                                                                                                                                                                                                                                                                                                    1. Mean Squared Error (MSE)
                                                                                                                                                                                                                                                                                                                                                                                                                      1. Root Mean Squared Error (RMSE)
                                                                                                                                                                                                                                                                                                                                                                                                                        1. Mean Absolute Error (MAE)
                                                                                                                                                                                                                                                                                                                                                                                                                          1. R-squared and Adjusted R-squared
                                                                                                                                                                                                                                                                                                                                                                                                                          2. Classification Metrics
                                                                                                                                                                                                                                                                                                                                                                                                                            1. Accuracy and Error Rate
                                                                                                                                                                                                                                                                                                                                                                                                                              1. Precision and Recall
                                                                                                                                                                                                                                                                                                                                                                                                                                1. F1-Score
                                                                                                                                                                                                                                                                                                                                                                                                                                  1. ROC Curves and AUC
                                                                                                                                                                                                                                                                                                                                                                                                                                  2. Model Comparison
                                                                                                                                                                                                                                                                                                                                                                                                                                    1. Statistical Tests for Model Comparison
                                                                                                                                                                                                                                                                                                                                                                                                                                      1. Practical vs. Statistical Significance
                                                                                                                                                                                                                                                                                                                                                                                                                                        1. Business Impact Assessment