POS Tagging
Assign grammatical roles to each token โ noun, verb, adjective, and more.
Run in Google Colab โOverview
Part-of-Speech (POS) tagging assigns a grammatical label to each token in a sentence โ noun (NN), verb (VB), adjective (JJ), etc. It is a core NLP task that feeds into dependency parsing, named entity recognition, and more accurate lemmatization.
๐ก Why It Matters
POS tags provide context that pure bag-of-words models miss. Knowing that 'book' is a noun vs a verb fundamentally changes its meaning and how a model should handle it. POS tagging is especially critical for information extraction tasks.
POS Tag Categories
NN / NNS
Noun, singular or plural (e.g., 'dog', 'dogs').
VB / VBG / VBD
Verb base form, gerund, past tense (e.g., 'run', 'running', 'ran').
JJ
Adjective (e.g., 'quick', 'tall').
Penn Treebank Tagset
The standard 36-tag POS annotation scheme used by NLTK's pos_tag function.
๐ Library Note
This module uses NLTK's `pos_tag` function and optionally spaCy's pipeline for comparison.
External Documentation
What You'll Learn
- What POS tags are and why they matter
- Penn Treebank tagset used by NLTK
- How POS tagging improves lemmatization and parsing