Useful Links
Computer Science
Big Data
Big Data Technologies
1. Introduction to Big Data
2. Core Principles of Distributed Systems
3. The Hadoop Ecosystem
4. Modern Data Processing with Apache Spark
5. Stream Processing Technologies
6. NoSQL Databases
7. Data Warehousing and Analytics on Big Data
8. Cloud-Based Big Data Platforms
9. Supporting Ecosystem and Tools
10. Big Data Governance and Security
11. Performance Optimization and Best Practices
12. Emerging Trends and Future Directions
Data Warehousing and Analytics on Big Data
Data Lakes
Concept and Architecture
Centralized Storage
Scalability
Cost-Effective Storage
Storing Raw Data
Data Ingestion
Data Formats
Metadata Management
Schema-on-Read
Flexibility
Querying Raw Data
Data Discovery
Data Lake Challenges
Data Swamps
Data Quality
Governance
Data Warehouses
Concept and Architecture
Structured Data Storage
Performance Optimization
Star and Snowflake Schemas
Storing Structured, Processed Data
ETL Processes
Data Modeling
Data Quality Assurance
Schema-on-Write
Data Validation
Query Performance
Consistency Guarantees
Traditional Data Warehouse Limitations
Scalability Constraints
Cost Considerations
Flexibility Issues
The Data Lakehouse Concept
Hybrid Architecture
Unified Analytics
ACID Transactions on Data Lakes
Delta Lake
Apache Iceberg
Apache Hudi
SQL-on-Hadoop Engines
Apache Hive
Hive Architecture
Metastore
Execution Engines
Driver
HiveQL (HQL)
Query Syntax
Data Definition Language
User-Defined Functions
Hive Optimization
Vectorization
Cost-Based Optimizer
Presto / Trino
Distributed SQL Query Engine
Federated Querying
Connector Architecture
Query Optimization
Apache Impala
Low-Latency SQL Queries
Integration with Hadoop Ecosystem
Columnar Processing
Apache Drill
Schema-Free SQL
Nested Data Support
Columnar Storage Formats
Apache Parquet
Columnar Compression
Schema Evolution
Predicate Pushdown
Nested Data Support
Apache ORC
Optimized Row Columnar
Lightweight Indexing
ACID Support
Apache Avro
Row-Based Storage
Schema Definition
Schema Evolution
Apache Arrow
In-Memory Columnar Format
Cross-Language Support
Data Modeling for Big Data
Dimensional Modeling
Data Vault Modeling
Denormalization Strategies
Partitioning Strategies
Previous
6. NoSQL Databases
Go to top
Next
8. Cloud-Based Big Data Platforms