Machine Learning

  1. Reinforcement Learning
    1. Fundamental Concepts
      1. Agent-Environment Interaction
        1. Agent Definition
          1. Environment Definition
            1. Interaction Loop
            2. States and State Spaces
              1. State Representation
                1. Discrete vs. Continuous States
                  1. Partially Observable States
                    1. State Abstraction
                    2. Actions and Action Spaces
                      1. Discrete Actions
                        1. Continuous Actions
                          1. Action Selection
                            1. Action Constraints
                            2. Rewards and Reward Functions
                              1. Reward Signal
                                1. Reward Shaping
                                  1. Sparse vs. Dense Rewards
                                    1. Delayed Rewards
                                    2. Policies
                                      1. Deterministic Policies
                                        1. Stochastic Policies
                                          1. Policy Representation
                                            1. Optimal Policies
                                            2. Value Functions
                                              1. State Value Function
                                                1. Action Value Function
                                                  1. Advantage Function
                                                    1. Bellman Equations
                                                    2. Exploration vs. Exploitation
                                                      1. Exploration Strategies
                                                        1. Epsilon-Greedy
                                                          1. Upper Confidence Bound
                                                            1. Thompson Sampling
                                                          2. Markov Decision Processes
                                                            1. Markov Property
                                                              1. Memoryless Property
                                                                1. State Transition Probabilities
                                                                2. MDP Components
                                                                  1. State Space
                                                                    1. Action Space
                                                                      1. Transition Function
                                                                        1. Reward Function
                                                                          1. Discount Factor
                                                                          2. Bellman Equations
                                                                            1. Bellman Expectation Equations
                                                                              1. Bellman Optimality Equations
                                                                                1. Value Iteration
                                                                                  1. Policy Iteration
                                                                                  2. Partially Observable MDPs
                                                                                    1. Belief States
                                                                                      1. Observation Model
                                                                                        1. POMDP Solving Methods
                                                                                      2. Dynamic Programming
                                                                                        1. Value Iteration
                                                                                          1. Algorithm Steps
                                                                                            1. Convergence Properties
                                                                                              1. Computational Complexity
                                                                                              2. Policy Iteration
                                                                                                1. Policy Evaluation
                                                                                                  1. Policy Improvement
                                                                                                    1. Convergence Guarantees
                                                                                                    2. Generalized Policy Iteration
                                                                                                      1. Interleaving Evaluation and Improvement
                                                                                                        1. Asynchronous Updates
                                                                                                      2. Model-Free Methods
                                                                                                        1. Monte Carlo Methods
                                                                                                          1. First-Visit MC
                                                                                                            1. Every-Visit MC
                                                                                                              1. MC Control
                                                                                                                1. Exploring Starts
                                                                                                                  1. On-Policy vs. Off-Policy
                                                                                                                  2. Temporal Difference Learning
                                                                                                                    1. TD Prediction
                                                                                                                      1. TD Error
                                                                                                                        1. TD(0) Algorithm
                                                                                                                          1. TD(λ) and Eligibility Traces
                                                                                                                          2. Q-Learning
                                                                                                                            1. Off-Policy Learning
                                                                                                                              1. Q-Table Updates
                                                                                                                                1. Convergence Properties
                                                                                                                                  1. Exploration Strategies
                                                                                                                                  2. SARSA
                                                                                                                                    1. On-Policy Learning
                                                                                                                                      1. State-Action-Reward-State-Action
                                                                                                                                        1. Comparison with Q-Learning
                                                                                                                                        2. Expected SARSA
                                                                                                                                          1. Expected Value Updates
                                                                                                                                            1. Reduced Variance
                                                                                                                                          2. Function Approximation
                                                                                                                                            1. Linear Function Approximation
                                                                                                                                              1. Feature Vectors
                                                                                                                                                1. Weight Updates
                                                                                                                                                  1. Convergence Issues
                                                                                                                                                  2. Non-Linear Function Approximation
                                                                                                                                                    1. Neural Network Approximators
                                                                                                                                                      1. Deep Q-Networks
                                                                                                                                                        1. Experience Replay
                                                                                                                                                          1. Target Networks
                                                                                                                                                            1. Double DQN
                                                                                                                                                              1. Dueling DQN
                                                                                                                                                              2. Policy Gradient Methods
                                                                                                                                                                1. REINFORCE Algorithm
                                                                                                                                                                  1. Policy Gradient Theorem
                                                                                                                                                                    1. Baseline Methods
                                                                                                                                                                      1. Actor-Critic Methods
                                                                                                                                                                        1. Advantage Actor-Critic
                                                                                                                                                                          1. Asynchronous Advantage Actor-Critic
                                                                                                                                                                            1. Proximal Policy Optimization
                                                                                                                                                                              1. Trust Region Policy Optimization
                                                                                                                                                                            2. Continuous Control
                                                                                                                                                                              1. Deep Deterministic Policy Gradient
                                                                                                                                                                                1. Twin Delayed DDPG
                                                                                                                                                                                  1. Soft Actor-Critic
                                                                                                                                                                                2. Multi-Agent Reinforcement Learning
                                                                                                                                                                                  1. Game Theory Basics
                                                                                                                                                                                    1. Nash Equilibrium
                                                                                                                                                                                      1. Cooperative vs. Competitive Settings
                                                                                                                                                                                        1. Independent Learning
                                                                                                                                                                                          1. Centralized Training with Decentralized Execution
                                                                                                                                                                                          2. Applications
                                                                                                                                                                                            1. Game Playing
                                                                                                                                                                                              1. Board Games
                                                                                                                                                                                                1. Video Games
                                                                                                                                                                                                  1. Real-Time Strategy
                                                                                                                                                                                                  2. Robotics
                                                                                                                                                                                                    1. Robot Control
                                                                                                                                                                                                      1. Manipulation Tasks
                                                                                                                                                                                                      2. Autonomous Systems
                                                                                                                                                                                                        1. Autonomous Driving
                                                                                                                                                                                                          1. Drone Control
                                                                                                                                                                                                          2. Resource Management
                                                                                                                                                                                                            1. Network Routing
                                                                                                                                                                                                              1. Energy Management
                                                                                                                                                                                                                1. Portfolio Management