Big Data Technologies

  1. Data Warehousing and Analytics on Big Data
    1. Data Lakes
      1. Concept and Architecture
        1. Centralized Storage
          1. Scalability
            1. Cost-Effective Storage
            2. Storing Raw Data
              1. Data Ingestion
                1. Data Formats
                  1. Metadata Management
                  2. Schema-on-Read
                    1. Flexibility
                      1. Querying Raw Data
                        1. Data Discovery
                        2. Data Lake Challenges
                          1. Data Swamps
                            1. Data Quality
                              1. Governance
                            2. Data Warehouses
                              1. Concept and Architecture
                                1. Structured Data Storage
                                  1. Performance Optimization
                                    1. Star and Snowflake Schemas
                                    2. Storing Structured, Processed Data
                                      1. ETL Processes
                                        1. Data Modeling
                                          1. Data Quality Assurance
                                          2. Schema-on-Write
                                            1. Data Validation
                                              1. Query Performance
                                                1. Consistency Guarantees
                                                2. Traditional Data Warehouse Limitations
                                                  1. Scalability Constraints
                                                    1. Cost Considerations
                                                      1. Flexibility Issues
                                                    2. The Data Lakehouse Concept
                                                      1. Hybrid Architecture
                                                        1. Unified Analytics
                                                          1. ACID Transactions on Data Lakes
                                                            1. Delta Lake
                                                              1. Apache Iceberg
                                                                1. Apache Hudi
                                                                2. SQL-on-Hadoop Engines
                                                                  1. Apache Hive
                                                                    1. Hive Architecture
                                                                      1. Metastore
                                                                        1. Execution Engines
                                                                          1. Driver
                                                                          2. HiveQL (HQL)
                                                                            1. Query Syntax
                                                                              1. Data Definition Language
                                                                                1. User-Defined Functions
                                                                                2. Hive Optimization
                                                                                  1. Vectorization
                                                                                    1. Cost-Based Optimizer
                                                                                  2. Presto / Trino
                                                                                    1. Distributed SQL Query Engine
                                                                                      1. Federated Querying
                                                                                        1. Connector Architecture
                                                                                          1. Query Optimization
                                                                                          2. Apache Impala
                                                                                            1. Low-Latency SQL Queries
                                                                                              1. Integration with Hadoop Ecosystem
                                                                                                1. Columnar Processing
                                                                                                2. Apache Drill
                                                                                                  1. Schema-Free SQL
                                                                                                    1. Nested Data Support
                                                                                                  2. Columnar Storage Formats
                                                                                                    1. Apache Parquet
                                                                                                      1. Columnar Compression
                                                                                                        1. Schema Evolution
                                                                                                          1. Predicate Pushdown
                                                                                                            1. Nested Data Support
                                                                                                            2. Apache ORC
                                                                                                              1. Optimized Row Columnar
                                                                                                                1. Lightweight Indexing
                                                                                                                  1. ACID Support
                                                                                                                  2. Apache Avro
                                                                                                                    1. Row-Based Storage
                                                                                                                      1. Schema Definition
                                                                                                                        1. Schema Evolution
                                                                                                                        2. Apache Arrow
                                                                                                                          1. In-Memory Columnar Format
                                                                                                                            1. Cross-Language Support
                                                                                                                          2. Data Modeling for Big Data
                                                                                                                            1. Dimensional Modeling
                                                                                                                              1. Data Vault Modeling
                                                                                                                                1. Denormalization Strategies
                                                                                                                                  1. Partitioning Strategies