Apache Spark

  1. Core Spark Concepts
    1. Spark Application Architecture
      1. Driver Program
        1. Driver Responsibilities
          1. SparkContext Management
            1. Cluster Communication
            2. SparkSession
              1. Unified Entry Point
                1. Configuration Management
                  1. Resource Access
                  2. Cluster Manager Integration
                    1. Resource Allocation
                      1. Cluster Manager Types
                        1. Deployment Coordination
                        2. Executors and Workers
                          1. Task Execution Model
                            1. Memory Management
                              1. Resource Utilization
                                1. Inter-Executor Communication
                              2. Resilient Distributed Datasets
                                1. RDD Fundamentals
                                  1. Core Abstraction Concept
                                    1. Distributed Collection Model
                                      1. Immutability Principle
                                      2. RDD Properties
                                        1. Fault Tolerance through Lineage
                                          1. Lazy Evaluation
                                            1. Partitioning Strategy
                                              1. Hash Partitioning
                                                1. Range Partitioning
                                                  1. Custom Partitioning
                                                  2. Persistence Options
                                                  3. RDD Creation Methods
                                                    1. Parallelizing Collections
                                                      1. Local Data Parallelization
                                                        1. Collection Distribution
                                                        2. External Data Sources
                                                          1. Text Files
                                                            1. Sequence Files
                                                              1. Hadoop InputFormats
                                                                1. Database Connections
                                                            2. RDD Operations
                                                              1. Transformations
                                                                1. Narrow Transformations
                                                                  1. map Operations
                                                                    1. filter Operations
                                                                      1. flatMap Operations
                                                                        1. sample Operations
                                                                          1. union Operations
                                                                          2. Wide Transformations
                                                                            1. groupByKey Operations
                                                                              1. reduceByKey Operations
                                                                                1. sortByKey Operations
                                                                                  1. join Operations
                                                                                    1. cogroup Operations
                                                                                      1. repartition Operations
                                                                                    2. Actions
                                                                                      1. Collection Actions
                                                                                        1. collect Operations
                                                                                          1. take Operations
                                                                                            1. first Operations
                                                                                            2. Aggregation Actions
                                                                                              1. count Operations
                                                                                                1. reduce Operations
                                                                                                  1. aggregate Operations
                                                                                                  2. Output Actions
                                                                                                    1. saveAsTextFile Operations
                                                                                                      1. foreach Operations
                                                                                                  3. Shared Variables
                                                                                                    1. Broadcast Variables
                                                                                                      1. Read-Only Shared Data
                                                                                                        1. Memory Efficiency
                                                                                                          1. Implementation Patterns
                                                                                                            1. Best Practices
                                                                                                            2. Accumulators
                                                                                                              1. Write-Only Variables
                                                                                                                1. Distributed Counters
                                                                                                                  1. Custom Accumulators
                                                                                                                    1. Usage Limitations