Useful Links
Computer Science
Data Science
Textual Analysis
1. Foundations of Textual Analysis
2. Data Acquisition and Preprocessing
3. Feature Engineering and Text Representation
4. Core Analysis Techniques and Tasks
5. Advanced Models and Methods
6. Evaluation and Interpretation
7. Tools and Technologies
8. Ethical Considerations and Challenges
Feature Engineering and Text Representation
Vectorization Fundamentals
Converting Text to Numbers
Vector Space Models
Dimensionality Considerations
Sparse vs Dense Representations
Basic Vectorization Methods
One-Hot Encoding
Binary Representation
Vocabulary Size Limitations
Frequency-Based Encoding
Count Vectors
Normalized Frequencies
Bag-of-Words Models
Document-Term Matrix
Matrix Construction
Sparse Matrix Representation
Dimensionality Issues
Memory Considerations
Count Vectorization
Binary vs Count Representation
Vocabulary Filtering
N-gram Integration
Term Frequency-Inverse Document Frequency
Term Frequency Calculation
Raw Count
Log Normalization
Double Normalization
Inverse Document Frequency
IDF Formula Variations
Smoothing Techniques
TF-IDF Weight Calculation
Standard TF-IDF
Normalized TF-IDF
Advantages and Limitations of TF-IDF
Strengths in Information Retrieval
Weaknesses with Semantic Similarity
N-gram Features
Unigrams
Single Word Features
Vocabulary Size Considerations
Bigrams
Two-Word Combinations
Phrase Capture
Trigrams
Three-Word Sequences
Context Enhancement
Higher-Order N-grams
Computational Complexity
Sparsity Issues
Applications of N-grams
Language Modeling
Text Classification
Authorship Attribution
Word Embeddings
Static Embeddings
Word2Vec
Continuous Bag of Words
Skip-gram Model
Hierarchical Softmax
Negative Sampling
GloVe
Global Matrix Factorization
Co-occurrence Statistics
Training Process
FastText
Subword Information
Out-of-Vocabulary Handling
Character N-grams
Contextualized Embeddings
ELMo
Bidirectional LSTM
Context-Dependent Representations
BERT
Transformer Architecture
Pre-training Objectives
Fine-tuning Process
Tokenization in BERT
GPT Models
Autoregressive Language Modeling
Decoder-Only Architecture
Comparison of Static and Contextualized Embeddings
Semantic Richness
Computational Requirements
Task Performance
Embedding Operations
Vector Arithmetic
Similarity Calculations
Analogy Tasks
Embedding Evaluation
Intrinsic Evaluation
Extrinsic Evaluation
Bias Detection in Embeddings
Embedding Visualization
Dimensionality Reduction
t-SNE
PCA
UMAP
Interactive Visualization Tools
Previous
2. Data Acquisition and Preprocessing
Go to top
Next
4. Core Analysis Techniques and Tasks