NLTKspaCyPenn Treebank15 min

POS Tagging

Assign grammatical roles to each token — noun, verb, adjective, and more.

Overview

Part-of-Speech (POS) tagging assigns a grammatical label to each token in a sentence — noun (NN), verb (VB), adjective (JJ), etc. It is a core NLP task that feeds into dependency parsing, named entity recognition, and more accurate lemmatization.

💡 Why It Matters

POS tags provide context that pure bag-of-words models miss. Knowing that 'book' is a noun vs a verb fundamentally changes its meaning and how a model should handle it. POS tagging is especially critical for information extraction tasks.

POS Tag Categories

NN / NNS

Noun, singular or plural (e.g., 'dog', 'dogs').

VB / VBG / VBD

Verb base form, gerund, past tense (e.g., 'run', 'running', 'ran').

Adjective (e.g., 'quick', 'tall').

Penn Treebank Tagset

The standard 36-tag POS annotation scheme used by NLTK's pos_tag function.

🛠 Library Note

This module uses NLTK's `pos_tag` function and optionally spaCy's pipeline for comparison.

External Documentation

POS Tagging with spaCy NLTK POS Documentation

What You'll Learn

What POS tags are and why they matter
Penn Treebank tagset used by NLTK
How POS tagging improves lemmatization and parsing

Lemmatization Feature Extraction