Reinforcement Learning

  1. Advanced Topics
    1. Exploration Strategies
      1. Exploration-Exploitation Trade-off
        1. Random Exploration
          1. Epsilon-Greedy
            1. Boltzmann Exploration
            2. Optimistic Exploration
              1. Upper Confidence Bounds
                1. Optimism in Face of Uncertainty
                2. Information-Directed Exploration
                  1. Thompson Sampling
                    1. Bayesian Approaches
                    2. Curiosity-Driven Exploration
                      1. Intrinsic Motivation
                        1. Count-Based Exploration
                          1. Prediction Error Methods
                        2. Model-Based Reinforcement Learning
                          1. Learning Environment Models
                            1. Transition Model Learning
                              1. Reward Model Learning
                                1. Model Uncertainty
                                2. Planning with Learned Models
                                  1. Tree Search Methods
                                  2. Dyna Architecture
                                    1. Integrated Learning and Planning
                                      1. Dyna-Q Algorithm
                                        1. Model Updates
                                        2. Model-Predictive Control
                                          1. Receding Horizon Control
                                            1. Optimization-Based Planning
                                          2. Hierarchical Reinforcement Learning
                                            1. Temporal Abstraction
                                              1. Options Framework
                                                1. Option Definition
                                                  1. Option Discovery
                                                    1. Semi-Markov Decision Processes
                                                    2. Goal-Conditioned RL
                                                      1. Universal Value Functions
                                                        1. Hindsight Experience Replay
                                                        2. Feudal Networks
                                                          1. Manager-Worker Hierarchies
                                                            1. Subgoal Generation
                                                          2. Multi-Agent Reinforcement Learning
                                                            1. Multi-Agent Environments
                                                              1. Independent Learning
                                                                1. Centralized Training, Decentralized Execution
                                                                  1. Communication and Coordination
                                                                    1. Competitive vs Cooperative Settings
                                                                      1. Nash Equilibria
                                                                        1. Multi-Agent Policy Gradients
                                                                        2. Partial Observability
                                                                          1. Partially Observable MDPs (POMDPs)
                                                                            1. Belief State Representation
                                                                              1. Memory-Augmented Policies
                                                                                1. Recurrent Neural Networks
                                                                                  1. Attention Mechanisms
                                                                                  2. Inverse Reinforcement Learning
                                                                                    1. Learning from Demonstrations
                                                                                      1. Reward Function Recovery
                                                                                        1. Maximum Entropy IRL
                                                                                          1. Apprenticeship Learning
                                                                                            1. Imitation Learning
                                                                                              1. Behavioral Cloning
                                                                                                1. Dataset Aggregation (DAgger)
                                                                                              2. Safe Reinforcement Learning
                                                                                                1. Safety Constraints
                                                                                                  1. Risk-Sensitive Objectives
                                                                                                    1. Safe Exploration
                                                                                                      1. Constrained Policy Optimization
                                                                                                        1. Worst-Case Analysis
                                                                                                        2. Transfer Learning and Meta-Learning
                                                                                                          1. Domain Adaptation
                                                                                                            1. Multi-Task Learning
                                                                                                              1. Few-Shot Learning
                                                                                                                1. Model-Agnostic Meta-Learning (MAML)