Computational Statistics

  1. High-Dimensional Data Analysis
    1. The Curse of Dimensionality
      1. Effects on Distance Metrics
        1. Concentration of Distances
          1. Nearest Neighbor Problems
          2. Sparsity of Data
            1. Empty Space Phenomenon
              1. Sample Size Requirements
              2. Overfitting Risks
                1. Model Complexity
                  1. Generalization Error
                2. Regularization Methods for Regression
                  1. Ridge Regression (L2 Penalty)
                    1. Shrinkage of Coefficients
                      1. Bias-Variance Tradeoff
                        1. Geometric Interpretation
                        2. Lasso (L1 Penalty)
                          1. Variable Selection
                            1. Sparse Solutions
                              1. Solution Path
                              2. Elastic Net
                                1. Combination of L1 and L2 Penalties
                                  1. Tuning Parameter Selection
                                    1. Grouped Variable Selection
                                    2. Other Penalty Methods
                                      1. SCAD Penalty
                                        1. Adaptive Lasso
                                          1. Group Lasso
                                        2. Dimensionality Reduction
                                          1. Principal Component Analysis (PCA)
                                            1. Eigenvalue Decomposition
                                              1. Singular Value Decomposition Approach
                                                1. Scree Plots
                                                  1. Interpreting Principal Components
                                                    1. Kernel PCA
                                                    2. Independent Component Analysis (ICA)
                                                      1. Non-Gaussian Components
                                                        1. FastICA Algorithm
                                                        2. Multidimensional Scaling (MDS)
                                                          1. Distance Matrices
                                                            1. Metric and Non-metric MDS
                                                              1. Classical MDS
                                                              2. Factor Analysis
                                                                1. Latent Variable Models
                                                                  1. Factor Rotation
                                                                    1. Maximum Likelihood Estimation
                                                                    2. t-SNE and UMAP
                                                                      1. Non-linear Dimensionality Reduction
                                                                        1. Neighborhood Preservation
                                                                      2. Computation for Large Datasets
                                                                        1. Subsampling and Data Sketching
                                                                          1. Random Sampling
                                                                            1. Sketching Algorithms
                                                                              1. Count-Min Sketch
                                                                                1. Bloom Filters
                                                                                2. Online Algorithms
                                                                                  1. Incremental Learning
                                                                                    1. Streaming Data Processing
                                                                                      1. Stochastic Approximation
                                                                                      2. Parallel and Distributed Computing
                                                                                        1. MapReduce Paradigm
                                                                                          1. Implementations
                                                                                            1. Apache Spark
                                                                                              1. Dask
                                                                                                1. Ray
                                                                                                2. Load Balancing and Fault Tolerance
                                                                                                  1. Communication Overhead