Python for Data Science

  1. Advanced and Specialized Topics
    1. Text Data Processing and Analysis
      1. Text Preprocessing Fundamentals
        1. Text Cleaning
          1. Lowercasing
            1. Punctuation Removal
              1. Special Character Handling
                1. HTML Tag Removal
                2. Tokenization
                  1. Word Tokenization
                    1. Sentence Tokenization
                      1. Custom Tokenizers
                      2. Stop Word Removal
                        1. Standard Stop Word Lists
                          1. Custom Stop Words
                            1. Language-specific Considerations
                            2. Text Normalization
                              1. Stemming
                                1. Porter Stemmer
                                  1. Snowball Stemmer
                                  2. Lemmatization
                                    1. WordNet Lemmatizer
                                      1. Part-of-speech Tagging
                                  3. Feature Extraction from Text
                                    1. Bag-of-Words Model
                                      1. CountVectorizer
                                        1. Vocabulary Building
                                          1. N-gram Features
                                            1. Binary vs Frequency Counts
                                            2. TF-IDF Representation
                                              1. Term Frequency
                                                1. Inverse Document Frequency
                                                  1. TfidfVectorizer
                                                    1. Normalization Options
                                                    2. Advanced Text Features
                                                      1. Character N-grams
                                                        1. Word Embeddings Integration
                                                          1. Syntactic Features
                                                        2. Text Classification
                                                          1. Document Classification Pipeline
                                                            1. Feature Selection for Text
                                                              1. Model Selection for Text Data
                                                                1. Handling Imbalanced Text Data
                                                              2. Introduction to Deep Learning
                                                                1. Deep Learning Fundamentals
                                                                  1. Neural Network Basics
                                                                    1. Deep Learning vs Traditional ML
                                                                      1. Common Architectures Overview
                                                                      2. TensorFlow and Keras
                                                                        1. TensorFlow Ecosystem
                                                                          1. TensorFlow Core
                                                                            1. TensorFlow Extended (TFX)
                                                                              1. TensorFlow Lite
                                                                              2. Keras High-level API
                                                                                1. Sequential Models
                                                                                  1. Functional API
                                                                                    1. Model Subclassing
                                                                                    2. Basic Neural Network Implementation
                                                                                      1. Layer Types
                                                                                        1. Activation Functions
                                                                                          1. Loss Functions
                                                                                            1. Optimizers
                                                                                            2. Model Training Process
                                                                                              1. Compiling Models
                                                                                                1. Training Loop
                                                                                                  1. Validation and Monitoring
                                                                                                    1. Callbacks
                                                                                                  2. Common Deep Learning Tasks
                                                                                                    1. Image Classification
                                                                                                      1. Text Classification
                                                                                                        1. Regression with Neural Networks
                                                                                                      2. Web APIs and Data Collection
                                                                                                        1. HTTP and Web APIs
                                                                                                          1. HTTP Protocol Basics
                                                                                                            1. Request Methods
                                                                                                              1. Status Codes
                                                                                                                1. Headers
                                                                                                                2. RESTful API Concepts
                                                                                                                  1. Resource-based URLs
                                                                                                                    1. Stateless Communication
                                                                                                                      1. JSON Data Format
                                                                                                                    2. Making API Requests
                                                                                                                      1. requests Library
                                                                                                                        1. GET Requests
                                                                                                                          1. POST Requests
                                                                                                                            1. Request Parameters
                                                                                                                              1. Headers and Authentication
                                                                                                                              2. Handling API Responses
                                                                                                                                1. Response Status Codes
                                                                                                                                  1. JSON Parsing
                                                                                                                                    1. Error Handling
                                                                                                                                    2. Rate Limiting and Ethics
                                                                                                                                      1. API Rate Limits
                                                                                                                                        1. Respectful Data Collection
                                                                                                                                          1. Terms of Service
                                                                                                                                        2. Working with JSON Data
                                                                                                                                          1. JSON Structure
                                                                                                                                            1. Parsing JSON in Python
                                                                                                                                              1. Nested JSON Handling
                                                                                                                                                1. Converting to DataFrames
                                                                                                                                                2. Web Scraping Basics
                                                                                                                                                  1. HTML Structure
                                                                                                                                                    1. BeautifulSoup Library
                                                                                                                                                      1. Ethical Considerations
                                                                                                                                                        1. robots.txt Compliance
                                                                                                                                                      2. Performance Optimization and Scaling
                                                                                                                                                        1. Python Performance Optimization
                                                                                                                                                          1. Profiling Code
                                                                                                                                                            1. cProfile Module
                                                                                                                                                              1. line_profiler
                                                                                                                                                                1. Memory Profiling
                                                                                                                                                                2. Optimization Strategies
                                                                                                                                                                  1. Algorithmic Improvements
                                                                                                                                                                    1. Data Structure Selection
                                                                                                                                                                      1. Vectorization
                                                                                                                                                                    2. Pandas Performance
                                                                                                                                                                      1. Efficient DataFrame Operations
                                                                                                                                                                        1. Avoiding Loops
                                                                                                                                                                          1. Vectorized Operations
                                                                                                                                                                            1. Method Chaining
                                                                                                                                                                            2. Memory Management
                                                                                                                                                                              1. Data Type Optimization
                                                                                                                                                                                1. Categorical Data
                                                                                                                                                                                  1. Chunking Large Files
                                                                                                                                                                                  2. Query Optimization
                                                                                                                                                                                    1. Boolean Indexing vs query()
                                                                                                                                                                                      1. Index Usage
                                                                                                                                                                                    2. NumPy Performance
                                                                                                                                                                                      1. Broadcasting Optimization
                                                                                                                                                                                        1. Memory Layout Considerations
                                                                                                                                                                                          1. Avoiding Copies
                                                                                                                                                                                            1. Compiled Extensions
                                                                                                                                                                                            2. Parallel and Distributed Computing
                                                                                                                                                                                              1. Dask Framework
                                                                                                                                                                                                1. Dask Arrays
                                                                                                                                                                                                  1. Chunked Arrays
                                                                                                                                                                                                    1. Lazy Evaluation
                                                                                                                                                                                                      1. Parallel Operations
                                                                                                                                                                                                      2. Dask DataFrames
                                                                                                                                                                                                        1. Partitioned DataFrames
                                                                                                                                                                                                          1. Distributed Operations
                                                                                                                                                                                                          2. Dask Delayed
                                                                                                                                                                                                            1. Task Graphs
                                                                                                                                                                                                              1. Custom Workflows
                                                                                                                                                                                                              2. Dask Client and Cluster
                                                                                                                                                                                                                1. Local Clusters
                                                                                                                                                                                                                  1. Distributed Clusters
                                                                                                                                                                                                                2. Multiprocessing
                                                                                                                                                                                                                  1. Process Pools
                                                                                                                                                                                                                    1. Shared Memory
                                                                                                                                                                                                                      1. Inter-process Communication
                                                                                                                                                                                                                      2. Threading Considerations
                                                                                                                                                                                                                        1. Global Interpreter Lock (GIL)
                                                                                                                                                                                                                          1. I/O-bound vs CPU-bound Tasks
                                                                                                                                                                                                                        2. Big Data Integration
                                                                                                                                                                                                                          1. Apache Spark with PySpark
                                                                                                                                                                                                                            1. Spark DataFrames
                                                                                                                                                                                                                              1. RDD Operations
                                                                                                                                                                                                                                1. MLlib Integration
                                                                                                                                                                                                                                2. Database Integration
                                                                                                                                                                                                                                  1. SQL Databases
                                                                                                                                                                                                                                    1. NoSQL Databases
                                                                                                                                                                                                                                      1. Connection Pooling
                                                                                                                                                                                                                                      2. Cloud Computing Platforms
                                                                                                                                                                                                                                        1. AWS Services
                                                                                                                                                                                                                                          1. Google Cloud Platform
                                                                                                                                                                                                                                            1. Azure Machine Learning