Reinforcement Learning

  1. Policy Gradient Methods
    1. Introduction to Policy-Based Methods
      1. Direct Policy Optimization
        1. Advantages over Value-Based Methods
          1. Policy Parameterization
          2. Policy Gradient Theorem
            1. Mathematical Derivation
              1. Score Function Estimator
                1. Gradient Estimation
                2. REINFORCE Algorithm
                  1. Monte Carlo Policy Gradient
                    1. Algorithm Implementation
                      1. Variance Issues
                        1. Baseline Methods
                          1. State-Value Baselines
                            1. Advantage Estimation
                          2. Actor-Critic Methods
                            1. Combining Policy and Value Learning
                              1. Actor Network (Policy)
                                1. Critic Network (Value Function)
                                  1. Training Procedures
                                  2. Advanced Actor-Critic Methods
                                    1. Advantage Actor-Critic (A2C)
                                      1. Advantage Function Estimation
                                        1. Synchronous Updates
                                        2. Asynchronous Advantage Actor-Critic (A3C)
                                          1. Parallel Training
                                            1. Asynchronous Updates
                                              1. Exploration Benefits
                                              2. Generalized Advantage Estimation (GAE)
                                                1. Bias-Variance Trade-off
                                                  1. λ-Return Estimation
                                                2. Trust Region Methods
                                                  1. Trust Region Policy Optimization (TRPO)
                                                    1. Monotonic Improvement
                                                      1. KL Divergence Constraints
                                                        1. Natural Policy Gradients
                                                        2. Proximal Policy Optimization (PPO)
                                                          1. Clipped Surrogate Objective
                                                            1. Practical Implementation
                                                              1. Performance Characteristics
                                                            2. Continuous Action Spaces
                                                              1. Gaussian Policies
                                                                1. Policy Parameterization
                                                                  1. Exploration in Continuous Spaces