Reinforcement Learning

  1. Deep Reinforcement Learning
    1. Introduction to Deep RL
      1. Neural Networks in RL
        1. Representation Learning
          1. End-to-End Learning
          2. Deep Q-Networks (DQN)
            1. Neural Network Q-Function Approximation
              1. Network Architecture Design
                1. Loss Function Definition
                  1. Training Procedures
                  2. Stabilizing Deep RL
                    1. Experience Replay
                      1. Replay Buffer Implementation
                        1. Breaking Sample Correlations
                          1. Batch Sampling Strategies
                          2. Fixed Q-Targets
                            1. Target Network Updates
                              1. Stabilizing Training
                                1. Update Frequencies
                                2. Gradient Clipping
                                  1. Reward Clipping
                                  2. DQN Improvements
                                    1. Double DQN
                                      1. Overestimation Bias Problem
                                        1. Double Estimation Solution
                                          1. Performance Improvements
                                          2. Dueling DQN
                                            1. Value and Advantage Decomposition
                                              1. Network Architecture
                                                1. Aggregation Methods
                                                2. Prioritized Experience Replay
                                                  1. TD-Error Based Prioritization
                                                    1. Importance Sampling Corrections
                                                      1. Implementation Details
                                                      2. Rainbow DQN
                                                        1. Combining Multiple Improvements
                                                          1. Distributional RL
                                                            1. Noisy Networks
                                                          2. Deep RL for Continuous Actions
                                                            1. Challenges with Continuous Actions
                                                              1. Deep Deterministic Policy Gradient (DDPG)
                                                                1. Actor-Critic Architecture
                                                                  1. Deterministic Policy Gradients
                                                                    1. Exploration Strategies
                                                                    2. Twin Delayed DDPG (TD3)
                                                                      1. Addressing Overestimation
                                                                        1. Delayed Policy Updates
                                                                          1. Target Policy Smoothing