NLP Tasks
Explore the core tasks that NLP systems perform, from classifying text and extracting entities to translating languages and generating content.
Text Classification
Text classification assigns predefined categories to text. It is one of the most common and widely used NLP tasks.
Common Classification Tasks
- Sentiment Analysis: Determining whether text is positive, negative, or neutral
- Spam Detection: Classifying emails or messages as spam or not spam
- Topic Classification: Categorizing news articles into topics (sports, politics, technology)
from transformers import pipeline classifier = pipeline("sentiment-analysis") results = classifier([ "I love this product! It's amazing.", "This is terrible. Complete waste of money.", "The movie was okay, nothing special." ]) for r in results: print(f"{r['label']}: {r['score']:.3f}") # POSITIVE: 0.999 # NEGATIVE: 0.998 # NEGATIVE: 0.876
Named Entity Recognition (NER)
NER identifies and classifies named entities in text into predefined categories such as person names, organizations, locations, dates, and more.
import spacy nlp = spacy.load("en_core_web_sm") doc = nlp("Apple Inc. was founded by Steve Jobs in Cupertino, California in 1976.") for ent in doc.ents: print(f"{ent.text:20s} {ent.label_:10s} {spacy.explain(ent.label_)}") # Apple Inc. ORG Companies, agencies, institutions # Steve Jobs PERSON People, including fictional # Cupertino GPE Countries, cities, states # California GPE Countries, cities, states # 1976 DATE Absolute or relative dates
Part-of-Speech Tagging
POS tagging assigns grammatical labels (noun, verb, adjective, etc.) to each word in a sentence. It is a fundamental step for many downstream NLP tasks.
import spacy nlp = spacy.load("en_core_web_sm") doc = nlp("The quick brown fox jumps over the lazy dog.") for token in doc: print(f"{token.text:10s} {token.pos_:8s} {token.dep_}") # The DET det # quick ADJ amod # brown ADJ amod # fox NOUN nsubj # jumps VERB ROOT # ...
Dependency Parsing
Dependency parsing analyzes the grammatical structure of a sentence, establishing relationships between words. Each word is connected to its "head" word by a specific relationship (subject, object, modifier, etc.).
Machine Translation
Machine translation automatically converts text from one language to another. Modern neural machine translation (NMT) uses encoder-decoder architectures with attention mechanisms.
from transformers import pipeline translator = pipeline("translation_en_to_fr", model="Helsinki-NLP/opus-mt-en-fr") result = translator("Natural Language Processing is fascinating.") print(result[0]['translation_text']) # "Le traitement du langage naturel est fascinant."
Text Summarization
Summarization condenses long texts into shorter versions while preserving key information. There are two main approaches:
| Approach | Method | Pros | Cons |
|---|---|---|---|
| Extractive | Selects important sentences from the original text | Preserves original wording; factually accurate | Can be choppy; may miss context |
| Abstractive | Generates new sentences that capture the meaning | More fluent and natural summaries | May hallucinate or introduce errors |
from transformers import pipeline summarizer = pipeline("summarization", model="facebook/bart-large-cnn") article = """ Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. The goal is a computer capable of understanding the contents of documents, including the contextual nuances of the language within them. Challenges in NLP frequently involve speech recognition, natural language understanding, and natural language generation. """ summary = summarizer(article, max_length=50, min_length=20) print(summary[0]['summary_text'])
Question Answering
QA systems find answers to questions from a given context or knowledge base:
from transformers import pipeline qa = pipeline("question-answering") context = "BERT was developed by Google in 2018. It uses a bidirectional transformer architecture and was trained on Wikipedia and BookCorpus." result = qa(question="Who developed BERT?", context=context) print(f"Answer: {result['answer']}, Score: {result['score']:.3f}") # Answer: Google, Score: 0.972
Text Generation
Text generation creates new text based on a prompt or context. Modern LLMs like GPT-4 and Claude excel at this task:
from transformers import pipeline generator = pipeline("text-generation", model="gpt2") result = generator( "The future of NLP is", max_length=50, num_return_sequences=1, temperature=0.7 ) print(result[0]['generated_text'])
Lilly Tech Systems