Useful Links
Computer Science
Big Data
Big Data Technologies
1. Introduction to Big Data
2. Core Principles of Distributed Systems
3. The Hadoop Ecosystem
4. Modern Data Processing with Apache Spark
5. Stream Processing Technologies
6. NoSQL Databases
7. Data Warehousing and Analytics on Big Data
8. Cloud-Based Big Data Platforms
9. Supporting Ecosystem and Tools
10. Big Data Governance and Security
11. Performance Optimization and Best Practices
12. Emerging Trends and Future Directions
Supporting Ecosystem and Tools
Data Ingestion and Integration
Apache Sqoop
Importing Data from RDBMS
Exporting Data to RDBMS
Incremental Imports
Parallel Processing
Apache Flume
Log Data Collection
Event Delivery
Agent Configuration
Reliability Mechanisms
Logstash
Data Pipeline Configuration
Plugin Ecosystem
Input, Filter, Output Plugins
Apache NiFi
Data Flow Management
Visual Interface
Provenance Tracking
Talend
ETL Tool
Data Integration Platform
Workflow Orchestration and Scheduling
Apache Airflow
Directed Acyclic Graphs (DAGs)
Task Scheduling
Operators
Sensors
XComs
Oozie
Workflow Definition
Integration with Hadoop
Coordinator Jobs
Luigi
Python-Based Workflow
Dependency Resolution
Prefect
Modern Workflow Engine
Dynamic Workflows
Cluster Management and Monitoring
Apache Ambari
Cluster Provisioning
Service Monitoring
Configuration Management
Cloudera Manager
Configuration Management
Performance Monitoring
Health Checks
Ganglia
Distributed Monitoring
Metrics Collection
Nagios
Infrastructure Monitoring
Alerting
Data Catalogs and Metadata Management
Apache Atlas
Metadata Collection
Data Lineage Tracking
Data Classification
LinkedIn DataHub
Metadata Platform
Data Discovery
AWS Glue Data Catalog
Managed Metadata Repository
Apache Hive Metastore
Schema Repository
Table Metadata
Previous
8. Cloud-Based Big Data Platforms
Go to top
Next
10. Big Data Governance and Security