Intermediate

NLP Tasks

Explore the core tasks that NLP systems perform, from classifying text and extracting entities to translating languages and generating content.

Text Classification

Text classification assigns predefined categories to text. It is one of the most common and widely used NLP tasks.

Common Classification Tasks

  • Sentiment Analysis: Determining whether text is positive, negative, or neutral
  • Spam Detection: Classifying emails or messages as spam or not spam
  • Topic Classification: Categorizing news articles into topics (sports, politics, technology)
Python - Sentiment Analysis
from transformers import pipeline

classifier = pipeline("sentiment-analysis")

results = classifier([
    "I love this product! It's amazing.",
    "This is terrible. Complete waste of money.",
    "The movie was okay, nothing special."
])

for r in results:
    print(f"{r['label']}: {r['score']:.3f}")
# POSITIVE: 0.999
# NEGATIVE: 0.998
# NEGATIVE: 0.876

Named Entity Recognition (NER)

NER identifies and classifies named entities in text into predefined categories such as person names, organizations, locations, dates, and more.

Python - spaCy NER
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple Inc. was founded by Steve Jobs in Cupertino, California in 1976.")

for ent in doc.ents:
    print(f"{ent.text:20s} {ent.label_:10s} {spacy.explain(ent.label_)}")
# Apple Inc.           ORG        Companies, agencies, institutions
# Steve Jobs           PERSON     People, including fictional
# Cupertino            GPE        Countries, cities, states
# California           GPE        Countries, cities, states
# 1976                 DATE       Absolute or relative dates

Part-of-Speech Tagging

POS tagging assigns grammatical labels (noun, verb, adjective, etc.) to each word in a sentence. It is a fundamental step for many downstream NLP tasks.

Python - spaCy POS
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("The quick brown fox jumps over the lazy dog.")

for token in doc:
    print(f"{token.text:10s} {token.pos_:8s} {token.dep_}")
# The        DET      det
# quick      ADJ      amod
# brown      ADJ      amod
# fox        NOUN     nsubj
# jumps      VERB     ROOT
# ...

Dependency Parsing

Dependency parsing analyzes the grammatical structure of a sentence, establishing relationships between words. Each word is connected to its "head" word by a specific relationship (subject, object, modifier, etc.).

Machine Translation

Machine translation automatically converts text from one language to another. Modern neural machine translation (NMT) uses encoder-decoder architectures with attention mechanisms.

Python - Translation
from transformers import pipeline

translator = pipeline("translation_en_to_fr", model="Helsinki-NLP/opus-mt-en-fr")

result = translator("Natural Language Processing is fascinating.")
print(result[0]['translation_text'])
# "Le traitement du langage naturel est fascinant."

Text Summarization

Summarization condenses long texts into shorter versions while preserving key information. There are two main approaches:

ApproachMethodProsCons
ExtractiveSelects important sentences from the original textPreserves original wording; factually accurateCan be choppy; may miss context
AbstractiveGenerates new sentences that capture the meaningMore fluent and natural summariesMay hallucinate or introduce errors
Python - Summarization
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

article = """
Natural language processing (NLP) is a subfield of linguistics, computer science,
and artificial intelligence concerned with the interactions between computers and
human language, in particular how to program computers to process and analyze large
amounts of natural language data. The goal is a computer capable of understanding
the contents of documents, including the contextual nuances of the language within
them. Challenges in NLP frequently involve speech recognition, natural language
understanding, and natural language generation.
"""

summary = summarizer(article, max_length=50, min_length=20)
print(summary[0]['summary_text'])

Question Answering

QA systems find answers to questions from a given context or knowledge base:

Python - QA
from transformers import pipeline

qa = pipeline("question-answering")

context = "BERT was developed by Google in 2018. It uses a bidirectional transformer architecture and was trained on Wikipedia and BookCorpus."

result = qa(question="Who developed BERT?", context=context)
print(f"Answer: {result['answer']}, Score: {result['score']:.3f}")
# Answer: Google, Score: 0.972

Text Generation

Text generation creates new text based on a prompt or context. Modern LLMs like GPT-4 and Claude excel at this task:

Python - Text Generation
from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")

result = generator(
    "The future of NLP is",
    max_length=50,
    num_return_sequences=1,
    temperature=0.7
)

print(result[0]['generated_text'])
Key takeaway: NLP encompasses a wide range of tasks, from understanding text (classification, NER) to generating it (translation, summarization). Modern transformer models have unified many of these tasks under a single architecture, making it easier than ever to build powerful NLP applications.