Learning Path/POS Tagging
NLTKspaCyPenn Treebank15 min

POS Tagging

Assign grammatical roles to each token โ€” noun, verb, adjective, and more.

Run in Google Colab โ†—

Overview

Part-of-Speech (POS) tagging assigns a grammatical label to each token in a sentence โ€” noun (NN), verb (VB), adjective (JJ), etc. It is a core NLP task that feeds into dependency parsing, named entity recognition, and more accurate lemmatization.

๐Ÿ’ก Why It Matters

POS tags provide context that pure bag-of-words models miss. Knowing that 'book' is a noun vs a verb fundamentally changes its meaning and how a model should handle it. POS tagging is especially critical for information extraction tasks.

POS Tag Categories

NN / NNS

Noun, singular or plural (e.g., 'dog', 'dogs').

VB / VBG / VBD

Verb base form, gerund, past tense (e.g., 'run', 'running', 'ran').

JJ

Adjective (e.g., 'quick', 'tall').

Penn Treebank Tagset

The standard 36-tag POS annotation scheme used by NLTK's pos_tag function.

๐Ÿ›  Library Note

This module uses NLTK's `pos_tag` function and optionally spaCy's pipeline for comparison.

What You'll Learn

  • What POS tags are and why they matter
  • Penn Treebank tagset used by NLTK
  • How POS tagging improves lemmatization and parsing