Searching
- Aho-Corasick. Search in a text by building a table from words.
- Bitap (or shift-or, shift-and, Baeza-Yates-Gonnet). Fuzzy string searching algorithm developed by Udi Manber and Sun Wu.
- Boyer-Moore string search. Search in text by skipping sub-string not containing letters in the searched input.
- Knuth-Morris-Pratt. Build a table when searching to skip sub-string.
- Rabin-Karp string search. Use hashing for multiple searches.
- Longest common subsequence problem. Haskell's algorithm. Of two sequences.
- Longest increasing subsequence problem. Of two sequences. It also reduces to find the longest path in a directed acyclic graph.
- Shortest common supersequence. Of two sequences.
- Horspool. Simplification of the Boyer-Moore algorithm. O(mn).
Approximate matching
- Levenshtein distance (or edit distance). Minimum number of operations (insertion, deletion, replacement) needed to transform one string into the other.
- Soundex. Phonetic algorithm for indexing words by their sound (in English).
- Metaphone. Indexing words by their sound (in English).
- NYSIIS. (New York State Identification and Intelligence System). Phonetic algorithm that improves soundex.
Word processing
- Latent Dirichlet Allocation (LDA). Analysis of documents to associate the content with a topic. Used by search engines.
- Latent Semantic Indexing (LSI). Automation of methods to attach a text to a topic from the words that occur commonly in this context.
- Stemming. A method of reducing words to their stem, or root form.