Useful Links
Computer Science
Data Science
Data Engineering
1. Introduction to Data Engineering
2. Foundational Programming Skills
3. Computer Science and Software Engineering Foundations
4. Database Systems and Data Storage
5. Data Warehousing and Analytics
6. Modern Data Storage Architectures
7. Batch Data Processing Systems
8. Stream Processing and Real-Time Data
9. Data Pipeline Architecture and Orchestration
10. Cloud Data Engineering Platforms
11. Data Operations and Infrastructure Management
12. Data Governance, Quality, and Security
13. Advanced Data Engineering Topics
Modern Data Storage Architectures
Data Lake Concepts
Centralized Storage Architecture
Raw Data Preservation
Multi-Format Support
Scalable Storage Solutions
Schema-on-Read vs. Schema-on-Write
Flexibility vs. Structure Trade-offs
Data Discovery Challenges
Query Performance Implications
Data Lake Zones
Raw Data Zone
Processed Data Zone
Curated Data Zone
Sandbox Zone
Big Data File Formats
Apache Parquet
Columnar Storage Benefits
Compression Efficiency
Query Performance Optimization
Apache Avro
Schema Evolution Support
Serialization Efficiency
Cross-Language Compatibility
ORC Format
Optimized Row Columnar Structure
Hive Integration
Compression and Indexing
Traditional Formats
CSV File Handling
JSON Data Processing
XML Data Management
Compression Techniques
Compression Algorithm Selection
Storage vs. Processing Trade-offs
Compression Ratio Analysis
Data Lakehouse Architecture
Unified Storage and Processing
ACID Transaction Support
Schema Enforcement Options
Time Travel Capabilities
Delta Lake Implementation
Versioning and Rollback
Concurrent Read/Write Operations
Data Quality Enforcement
Apache Iceberg Features
Table Format Specifications
Partition Evolution
Hidden Partitioning
Data Mesh Principles
Domain-Oriented Data Ownership
Business Domain Alignment
Decentralized Data Management
Domain Team Responsibilities
Data as a Product
Product Thinking for Data
Data Product Lifecycle
Consumer-Centric Design
Self-Serve Data Infrastructure
Platform Abstraction
Developer Experience
Automated Data Operations
Federated Computational Governance
Global Standards and Policies
Local Implementation Flexibility
Automated Compliance Monitoring
Previous
5. Data Warehousing and Analytics
Go to top
Next
7. Batch Data Processing Systems