Mathematics for Machine Learning and Data Science

  1. Optimization Techniques for Machine Learning
    1. Fundamentals of Optimization
      1. Optimization Problem Formulation
        1. Objective Functions
          1. Decision Variables
            1. Constraints
            2. Types of Optimization Problems
              1. Unconstrained Optimization
                1. Constrained Optimization
                  1. Linear Programming
                    1. Nonlinear Programming
                      1. Integer Programming
                      2. Convex Optimization
                        1. Convex Sets
                          1. Convex Functions
                            1. Properties of Convex Problems
                              1. Global vs Local Optima
                              2. Loss Functions in Machine Learning
                                1. Mean Squared Error
                                  1. Mean Absolute Error
                                    1. Cross-Entropy Loss
                                      1. Hinge Loss
                                        1. Huber Loss
                                      2. Unconstrained Optimization
                                        1. Optimality Conditions
                                          1. First-Order Necessary Conditions
                                            1. Second-Order Necessary Conditions
                                              1. Second-Order Sufficient Conditions
                                              2. Line Search Methods
                                                1. Exact Line Search
                                                  1. Inexact Line Search
                                                    1. Armijo Rule
                                                      1. Wolfe Conditions
                                                      2. Gradient Descent
                                                        1. Algorithm Description
                                                          1. Convergence Analysis
                                                            1. Step Size Selection
                                                              1. Batch Gradient Descent
                                                                1. Stochastic Gradient Descent
                                                                  1. Mini-Batch Gradient Descent
                                                                  2. Newton's Method
                                                                    1. Algorithm Description
                                                                      1. Convergence Properties
                                                                        1. Computational Considerations
                                                                        2. Quasi-Newton Methods
                                                                          1. BFGS Algorithm
                                                                            1. L-BFGS Algorithm
                                                                              1. DFP Algorithm
                                                                              2. Conjugate Gradient Methods
                                                                                1. Linear Conjugate Gradient
                                                                                  1. Nonlinear Conjugate Gradient
                                                                                    1. Fletcher-Reeves Method
                                                                                      1. Polak-Ribière Method
                                                                                    2. Advanced Gradient-Based Methods
                                                                                      1. Momentum Methods
                                                                                        1. Classical Momentum
                                                                                          1. Nesterov Accelerated Gradient
                                                                                            1. Heavy Ball Method
                                                                                            2. Adaptive Learning Rate Methods
                                                                                              1. AdaGrad
                                                                                                1. Algorithm Description
                                                                                                  1. Adaptive Learning Rates
                                                                                                    1. Convergence Properties
                                                                                                    2. RMSprop
                                                                                                      1. Exponential Moving Average
                                                                                                        1. Addressing AdaGrad Limitations
                                                                                                        2. Adam
                                                                                                          1. Adaptive Moment Estimation
                                                                                                            1. Bias Correction
                                                                                                              1. Variants of Adam
                                                                                                              2. AdaDelta
                                                                                                                1. Nadam
                                                                                                                2. Learning Rate Scheduling
                                                                                                                  1. Step Decay
                                                                                                                    1. Exponential Decay
                                                                                                                      1. Polynomial Decay
                                                                                                                        1. Cosine Annealing
                                                                                                                          1. Warm Restarts
                                                                                                                          2. Gradient Clipping
                                                                                                                            1. Gradient Norm Clipping
                                                                                                                              1. Gradient Value Clipping
                                                                                                                                1. Applications to RNNs
                                                                                                                              2. Constrained Optimization
                                                                                                                                1. Equality Constrained Optimization
                                                                                                                                  1. Lagrange Multipliers
                                                                                                                                    1. Lagrangian Function
                                                                                                                                      1. First-Order Optimality Conditions
                                                                                                                                        1. Second-Order Conditions
                                                                                                                                        2. Inequality Constrained Optimization
                                                                                                                                          1. Karush-Kuhn-Tucker Conditions
                                                                                                                                            1. Complementary Slackness
                                                                                                                                              1. Constraint Qualification
                                                                                                                                              2. Penalty Methods
                                                                                                                                                1. Quadratic Penalty Method
                                                                                                                                                  1. Exact Penalty Methods
                                                                                                                                                    1. Augmented Lagrangian Method
                                                                                                                                                    2. Barrier Methods
                                                                                                                                                      1. Interior Point Methods
                                                                                                                                                        1. Logarithmic Barrier Function
                                                                                                                                                          1. Central Path
                                                                                                                                                          2. Sequential Quadratic Programming
                                                                                                                                                            1. Active Set Methods
                                                                                                                                                            2. Specialized Optimization for Machine Learning
                                                                                                                                                              1. Coordinate Descent
                                                                                                                                                                1. Algorithm Description
                                                                                                                                                                  1. Block Coordinate Descent
                                                                                                                                                                    1. Applications to Lasso and Ridge Regression
                                                                                                                                                                    2. Proximal Methods
                                                                                                                                                                      1. Proximal Operators
                                                                                                                                                                        1. Proximal Gradient Method
                                                                                                                                                                          1. Accelerated Proximal Methods
                                                                                                                                                                            1. Applications to Sparse Optimization
                                                                                                                                                                            2. Subgradient Methods
                                                                                                                                                                              1. Subgradients and Subdifferentials
                                                                                                                                                                                1. Subgradient Descent
                                                                                                                                                                                  1. Convergence Analysis
                                                                                                                                                                                  2. Mirror Descent
                                                                                                                                                                                    1. Bregman Divergence
                                                                                                                                                                                      1. Mirror Maps
                                                                                                                                                                                        1. Applications to Online Learning
                                                                                                                                                                                        2. Frank-Wolfe Algorithm
                                                                                                                                                                                          1. Conditional Gradient Method
                                                                                                                                                                                            1. Sparse Solutions
                                                                                                                                                                                              1. Applications to Structured Sparsity
                                                                                                                                                                                            2. Stochastic Optimization
                                                                                                                                                                                              1. Stochastic Gradient Descent
                                                                                                                                                                                                1. Algorithm Variants
                                                                                                                                                                                                  1. Convergence Analysis
                                                                                                                                                                                                    1. Learning Rate Schedules
                                                                                                                                                                                                    2. Variance Reduction Methods
                                                                                                                                                                                                      1. SVRG (Stochastic Variance Reduced Gradient)
                                                                                                                                                                                                        1. SAG (Stochastic Average Gradient)
                                                                                                                                                                                                          1. SAGA
                                                                                                                                                                                                          2. Online Optimization
                                                                                                                                                                                                            1. Online Gradient Descent
                                                                                                                                                                                                              1. Regret Analysis
                                                                                                                                                                                                                1. Follow-the-Regularized-Leader
                                                                                                                                                                                                                2. Evolutionary Algorithms
                                                                                                                                                                                                                  1. Genetic Algorithms
                                                                                                                                                                                                                    1. Particle Swarm Optimization
                                                                                                                                                                                                                      1. Differential Evolution
                                                                                                                                                                                                                    2. Optimization in Deep Learning
                                                                                                                                                                                                                      1. Challenges in Deep Learning Optimization
                                                                                                                                                                                                                        1. Non-Convexity
                                                                                                                                                                                                                          1. High Dimensionality
                                                                                                                                                                                                                            1. Saddle Points
                                                                                                                                                                                                                              1. Vanishing and Exploding Gradients
                                                                                                                                                                                                                              2. Batch Normalization
                                                                                                                                                                                                                                1. Internal Covariate Shift
                                                                                                                                                                                                                                  1. Algorithm Description
                                                                                                                                                                                                                                    1. Effects on Optimization
                                                                                                                                                                                                                                    2. Dropout as Regularization
                                                                                                                                                                                                                                      1. Weight Initialization Strategies
                                                                                                                                                                                                                                        1. Xavier Initialization
                                                                                                                                                                                                                                          1. He Initialization
                                                                                                                                                                                                                                            1. Effects on Convergence
                                                                                                                                                                                                                                            2. Second-Order Methods for Deep Learning
                                                                                                                                                                                                                                              1. Natural Gradients
                                                                                                                                                                                                                                                1. K-FAC (Kronecker-Factored Approximate Curvature)
                                                                                                                                                                                                                                                  1. Shampoo Algorithm