Data Science

  1. Big Data and Distributed Computing
    1. Big Data Concepts
      1. The Four Vs of Big Data
        1. Volume
          1. Velocity
            1. Variety
              1. Veracity
              2. Big Data Challenges
                1. Storage
                  1. Processing
                    1. Analysis
                      1. Visualization
                      2. Big Data Architecture
                        1. Lambda Architecture
                          1. Kappa Architecture
                            1. Data Lake vs Data Warehouse
                          2. Distributed Computing Fundamentals
                            1. Parallel vs Distributed Computing
                              1. CAP Theorem
                                1. Consistency Models
                                  1. Fault Tolerance
                                    1. Load Balancing
                                    2. Hadoop Ecosystem
                                      1. Hadoop Distributed File System
                                        1. Architecture
                                          1. Data Replication
                                            1. Fault Tolerance
                                            2. MapReduce
                                              1. Programming Model
                                                1. Job Execution
                                                  1. Optimization Techniques
                                                  2. YARN
                                                    1. Resource Management
                                                      1. Application Scheduling
                                                      2. Hadoop Ecosystem Tools
                                                        1. Hive
                                                          1. Pig
                                                            1. HBase
                                                              1. Sqoop
                                                                1. Flume
                                                              2. Apache Spark
                                                                1. Spark Architecture
                                                                  1. Driver and Executors
                                                                    1. Cluster Managers
                                                                      1. Memory Management
                                                                      2. Resilient Distributed Datasets
                                                                        1. RDD Operations
                                                                          1. Transformations vs Actions
                                                                            1. Lazy Evaluation
                                                                              1. Caching and Persistence
                                                                              2. Spark SQL
                                                                                1. DataFrames and Datasets
                                                                                  1. Catalyst Optimizer
                                                                                    1. SQL Interface
                                                                                    2. Spark MLlib
                                                                                      1. Machine Learning Pipelines
                                                                                        1. Feature Engineering
                                                                                          1. Model Training and Evaluation
                                                                                          2. Spark Streaming
                                                                                            1. DStreams
                                                                                              1. Structured Streaming
                                                                                                1. Real-time Processing
                                                                                              2. NoSQL Databases
                                                                                                1. Document Stores
                                                                                                  1. MongoDB
                                                                                                    1. CouchDB
                                                                                                    2. Key-Value Stores
                                                                                                      1. Redis
                                                                                                        1. Amazon DynamoDB
                                                                                                        2. Column-Family
                                                                                                          1. Apache Cassandra
                                                                                                            1. HBase
                                                                                                            2. Graph Databases
                                                                                                              1. Neo4j
                                                                                                                1. Amazon Neptune
                                                                                                              2. Stream Processing
                                                                                                                1. Apache Kafka
                                                                                                                  1. Topics and Partitions
                                                                                                                    1. Producers and Consumers
                                                                                                                      1. Stream Processing
                                                                                                                      2. Apache Storm
                                                                                                                        1. Topology Design
                                                                                                                          1. Spouts and Bolts