Useful Links
Computer Science
Artificial Intelligence
Natural Language Processing (NLP)
Natural Language Processing (NLP)
1. Introduction to Natural Language Processing
2. Linguistic Foundations
3. Text Processing and Preprocessing
4. Language Modeling
5. Feature Representation
6. Word Embeddings and Distributed Representations
7. Classical Machine Learning for NLP
8. Deep Learning Foundations
9. Recurrent Neural Networks
10. Attention Mechanisms and Transformers
11. Pre-trained Language Models
12. Core NLP Applications
13. Advanced Topics
14. Evaluation and Benchmarking
15. Ethics and Responsible AI
Text Processing and Preprocessing
Data Acquisition
Text Corpora
Corpus Types and Characteristics
Corpus Annotation
Corpus Licensing
Web Scraping
HTML Parsing
API-Based Collection
Ethical Considerations
Data Quality Assessment
Text Cleaning and Normalization
Character Encoding
Unicode Handling
Encoding Detection
HTML and Markup Removal
Special Character Processing
Case Normalization
Whitespace Normalization
Number and Symbol Handling
Tokenization
Word Tokenization
Rule-Based Methods
Statistical Methods
Language-Specific Challenges
Sentence Segmentation
Boundary Detection
Abbreviation Handling
Subword Tokenization
Byte-Pair Encoding
WordPiece
SentencePiece
Unigram Language Model
Lexical Processing
Stop Word Removal
Standard Stop Lists
Domain-Specific Stop Words
Impact on Tasks
Stemming
Porter Stemmer
Snowball Stemmer
Language-Specific Stemmers
Lemmatization
Dictionary-Based Methods
Rule-Based Methods
Statistical Methods
Text Normalization
Spelling Correction
Abbreviation Expansion
Slang and Informal Language
Social Media Text Processing
Previous
2. Linguistic Foundations
Go to top
Next
4. Language Modeling