Useful Links
Computer Science
Other Tools and Technologies
Search Engines
1. Introduction to Search Engines
2. Web Crawling
3. Indexing
4. Query Processing and Information Retrieval
5. Ranking Algorithms
6. Search Engine Architecture and Infrastructure
7. Search User Interface and Experience
8. Search Engine Optimization
9. Business and Societal Impact
10. Future of Search
Indexing
Purpose of an Index
Fast Retrieval
Efficient Storage
Query Processing Support
Inverted Index
Structure and Concept
Term-to-Document Mapping
Index Organization
Terms and Tokens
Tokenization Process
Handling Special Characters
Unicode Support
Posting Lists
Document Identifiers
Position Information
Term Frequency Storage
Document Frequency
Importance in Ranking
Collection Statistics
Term Frequency
Weighting Terms
Normalization Methods
Indexing Pipeline
Content Parsing
HTML Parsing
PDF and Other Formats
Metadata Extraction
Text Extraction
Removing Boilerplate
Extracting Main Content
Content Quality Assessment
Tokenization
Word Segmentation
Handling Multilingual Content
Compound Word Processing
Linguistic Processing
Stemming
Porter Stemmer
Snowball Stemmer
Language-specific Stemmers
Lemmatization
Differences from Stemming
Morphological Analysis
Stop Word Removal
Common Stop Words
Impact on Index Size
Language-specific Stop Words
Synonym Handling
Synonym Dictionaries
Automatic Synonym Discovery
Spelling Correction
Edit Distance Algorithms
Statistical Methods
Index Construction and Updates
Batch Indexing
Offline Processing
Merge-based Construction
Real-time Indexing
Incremental Updates
Stream Processing
Index Merging and Compression
Delta Indexes
Compression Techniques
Block-based Compression
Handling Updates and Deletions
Document Versioning
Tombstone Records
Data Structures for Indexing
Hash Tables
Fast Lookup
Collision Handling
B-Trees and B+ Trees
Range Queries
Disk-based Storage
Tries and Prefix Trees
String Matching
Autocomplete Support
Skip Lists
Probabilistic Data Structure
Search Efficiency
Distributed Indexing
Index Partitioning
Horizontal Partitioning
Vertical Partitioning
Sharding Strategies
Document-based Sharding
Term-based Sharding
Replication and Consistency
Master-slave Replication
Eventual Consistency
Load Balancing
Query Distribution
Hot Spot Management
Previous
2. Web Crawling
Go to top
Next
4. Query Processing and Information Retrieval