Natural Language Processing (NLP) is at the heart of many AI-powered applications—sentiment analysis, chatbots, document summarizers, and much more. In this guide, we’ll explore how to build powerful NLP models using PyTorch, one of the most popular deep learning frameworks. We’ll focus on four core tasks, providing code samples and showing how they’re used in real-world projects.

Text Classification
Text Generation
Summarization/Translation (with Transformers)
Transfer Learning (with Pretrained Models)

Let’s roll up our sleeves and dive in!

1. Text Classification: Predicting Sentiment with PyTorch

Use-case: “Is this movie review positive or negative?”
Popular dataset: IMDB Movie Reviews

How does it work?
We preprocess reviews by tokenizing and converting words to indices, pad sequences to a uniform length, and batch our data. The model learns from batches, predicting class labels (e.g., positive=1, negative=0).

Sample dataset entry:

"review": "The plot was dull and predictable.", "label": 0  # 0=negative
"review": "A wonderful movie with excellent performances.", "label": 1  # 1=positive

PyTorch Model (LSTM-based):

import torch
import torch.nn as nn

class TextClassifier(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim, num_classes):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.lstm = nn.LSTM(embed_dim, hidden_dim, batch_first=True)
        self.dropout = nn.Dropout(0.3)
        self.fc = nn.Linear(hidden_dim, num_classes)

    def forward(self, x):
        x = self.embedding(x)
        _, (h_n, _) = self.lstm(x)
        out = self.dropout(h_n[-1])
        return self.fc(out)

# Example usage:
# model = TextClassifier(vocab_size=10000, embed_dim=128, hidden_dim=64, num_classes=2)

Key steps in training:

Tokenize and numerically encode reviews.
Batch and pad sequences.
Use nn.CrossEntropyLoss() to train.
Predict with:

  outputs = model(X_batch)        # forward pass
  preds = outputs.argmax(dim=1)   # predicted class

2. Text Generation: Creating Natural Language with PyTorch

Use-case: “Autocomplete this sentence: ‘The weather today is…’”
Popular dataset: WikiText-2

How does it work?
The model is trained to predict the next word, given a sequence. To generate text, we input a prompt, then repeatedly sample the next word using the model’s predictions.

Sample data:

Input: "The weather today is"
Output (model): "The weather today is sunny and warm."

PyTorch Model (LSTM-based language model):

class TextGenerator(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.lstm = nn.LSTM(embed_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, vocab_size)

    def forward(self, x, hidden=None):
        x = self.embedding(x)
        x, hidden = self.lstm(x, hidden)
        out = self.fc(x)
        return out, hidden

# Example usage:
# model = TextGenerator(vocab_size=10000, embed_dim=128, hidden_dim=256)

Training steps:

Create input-target pairs: e.g., input=[‘the’,‘weather’,‘today’], target=[‘weather’,‘today’,‘is’].
Use teacher forcing during training (model gets correct token as next input).
For generation, sample one token at a time using the trained model.

3. Summarization or Translation: Transformers to the Rescue

Use-case: Summarize or translate long texts (“NASA launches new Mars rover to search for signs of life.” → “NASA sends rover to find life on Mars.”)
Popular datasets:

Summarization: CNN/DailyMail
Translation: WMT

How does it work?
Transformers process the entire source text in parallel, using self-attention to model long-distance dependencies. Both input and output are sequences.

Sample data:

Input:  "NASA launches new Mars rover to search for signs of life."
Target: "NASA sends rover to find life on Mars."

PyTorch Model (Toy Transformer):

class TransformerSeq2Seq(nn.Module):
    def __init__(self, vocab_size, embed_dim, nhead, num_layers):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.transformer = nn.Transformer(
            d_model=embed_dim,
            nhead=nhead,
            num_encoder_layers=num_layers,
            num_decoder_layers=num_layers,
            batch_first=True
        )
        self.fc = nn.Linear(embed_dim, vocab_size)

    def forward(self, src, tgt):
        src_emb = self.embedding(src)
        tgt_emb = self.embedding(tgt)
        out = self.transformer(src_emb, tgt_emb)
        return self.fc(out)

Workflow:

Tokenize and index both source and target texts.
During training, shift the target sequence by one for teacher forcing.
During inference, generate the output sequence token by token.

4. Transfer Learning: Fine-tuning Pretrained Transformers (e.g., BERT)

Use-case: Achieve state-of-the-art performance on text classification with only a small amount of labeled data.
Popular datasets:

SST-2 (binary sentiment)
IMDB

How does it work?
Pre-trained language models like BERT have already “read” the internet (or lots of books, etc.) and learned rich language representations. With transfer learning, you can quickly fine-tune these models on your own data, often achieving outstanding results with minimal effort.

Sample from SST-2

"sentence": "An exhilarating experience!", "label": 1
"sentence": "Not my cup of tea.", "label": 0

Typical fine-tuning workflow:

Load a pre-trained model (e.g., BERT) and its tokenizer.
Tokenize your text to get input IDs and attention masks compatible with the pre-trained model.
Attach a new classification head (a simple linear/fully connected layer).
Train only the classification head (optionally, some top Transformer layers) on your labeled dataset.

PyTorch Model Example (using Hugging Face Transformers):

from transformers import BertTokenizer, BertModel
import torch.nn as nn

class BertClassifier(nn.Module):
    def __init__(self, num_classes):
        super().__init__()
        self.bert = BertModel.from_pretrained('bert-base-uncased')
        self.dropout = nn.Dropout(0.3)
        self.fc = nn.Linear(self.bert.config.hidden_size, num_classes)

    def forward(self, input_ids, attention_mask):
        outputs = self.bert(input_ids, attention_mask=attention_mask)
        pooled = outputs.pooler_output
        out = self.dropout(pooled)
        return self.fc(out)

Typical usage:
You tokenize with BertTokenizer. For each text, you get input_ids, attention_mask (handled automatically by tokenizer(..., padding=True, truncation=True, return_tensors='pt')). You pass those to your model; during training, you compute a loss (usually cross-entropy) against your actual labels and optimize the model.

Summary Table: Tasks, Workflows, and Datasets

Task	Workflow	Popular Dataset	Example Input/Output
Text Classification	Tokenize → Index → Pad & Batch → Model → Predict	IMDB, AG News, SST-2	Review → Positive/Negative
Text Generation	Tokenize → Index → Batch → Model → Sample Output	WikiText-2, Penn Treebank	Prompt → Model completes sentence
Summarization/Translation (Transformer)	Tokenize Source/Target → Seq2Seq Model	CNN/DailyMail, WMT	Article/English → Summary/French
Transfer Learning	Tokenize → Pretrained Model + Head → Fine-tune	SST-2, IMDB, custom datasets	Text → Class (e.g., sentiment/intent/NER)

Getting Started

Data loading: Use torchtext, datasets (Hugging Face), or write your own code to preprocess tokenized text into numerical format, pad sequences (torch.nn.utils.rnn.pad_sequence), and batch.
Model training: Standard PyTorch routines: instantiate the model, set up optimizer (e.g., Adam), train with batches, evaluate accuracy or generation quality.
Inference: Pass your new text through the preprocessing pipeline, predict categories/sequences, and interpret results.

Here’s an overview of how the example PyTorch models are actually used in practice, along with concrete sample datasets for each. You’ll find a brief workflow for each use case, a typical dataset example, and a corresponding sample data snippet.

Task	How Model Is Used in Practice	Typical Example Dataset	Sample Data Example
Text Classification	Predicts the category or sentiment of input text (e.g., positive/negative review, topic classification). The process: • Preprocess: tokenize, lower-case, convert words to indices• Build vocabulary• Use dataloaders to batch-indexed sequences• Pass through embedding, LSTM/GRU/Transformer, and output layer• Train using cross-entropy loss• Inference: feed user text, output predicted class	IMDB Movie Review Dataset[1][2]	"review": "A wonderful movie with excellent performances.", "label": 1``"review": "The plot was dull and predictable.", "label": 0(1=positive, 0=negative)
Text Generation	Learns to generate new, syntactically/semantically correct text. The process: • Preprocess: similar tokenization• Usually trained as language model (predict next word given previous words)• During inference: input a prompt, use model to sample next word repeatedly• Used in chatbots, story generators, auto-complete	Penn Treebank, WikiText-2	Input: `"The weather today is"`Model might generate: `"The weather today is sunny and clear."`
Summarization / Translation (Transformer)	Turns a source text into a target summary or translation using sequence-to-sequence with attention (Transformer). • Preprocess: tokenize source and target texts• Split into train/val/test• Use DataLoaders• Output is trained to match target summary/translation • Inference: input new source text, generate output sequence	CNN/DailyMail (summarization), WMT (translation)	Summarization:Input: `"NASA launches new Mars rover to search for signs of life."`Target: `"NASA launches rover to Mars to search for life."`Translation:Input: `"Hello, how are you?"`Target: `"Bonjour, comment ça va ?"`
Transfer Learning (BERT/XLM-R etc)	Fine-tunes a pre-trained model on a smaller, task-specific corpus.• Use tokenizers from pretrained model• Preprocess so tokens match pre-trained vocab• Usually freeze lower layers, train classifier head• Used for quick, highly accurate adaptation to tasks like sentiment, QA, NER	SST-2, IMDB, AG News[4]	Sample for SST-2 (Stanford Sentiment Treebank v2):"sentence": "The movie was captivating!", "label": 1``"sentence": "Not my cup of tea.", "label": 0

Additional Details

Text Classification Sample Workflow (IMDB):
1. Tokenization: 'I loved the plot.' → ['i', 'loved', 'the', 'plot']
2. Vocabulary Mapping: {'i':1, 'loved':2, ...}
3. Sequence Padding/Truncation: [1, 2, 3, 4] (padded as needed)
4. Label Mapping: positive → 1, negative → 0
5. Batched Data Sample: X_batch = [[1,2,3,4],[...]], y_batch = [1,0]
6. Model Forward Pass: outputs = model(X_batch)
7. Loss Computation: loss = criterion(outputs, y_batch)
Text Generation Sample Workflow (WikiText-2):
- Input: “Once upon a”
- Model predicts probability distribution over vocabulary for each next token, sampled sequentially to build a sentence.
Summarization/Translation:
- Inputs and targets both sequences. Preprocessing ensures both are indexed, and sentences are batched.
- During inference, models typically generate one token at a time with previous output as input (autogressive sampling).
Transfer Learning Example (SST-2):
- Data: Each instance is a sentence and a label (binary sentiment).
- Tokenizer matches pre-trained model’s vocabulary (e.g., WordPiece for BERT).
- Fine-tune new classifier head with a small, labeled corpus—fast and effective[4].

For datasets:

IMDB for sentiment (positive/negative)—classic for text classification[1][2].
Penn Treebank, WikiText-2 for language modeling / text generation.
CNN/DailyMail for summarization.
WMT for translation.
SST-2 for transfer learning and binary sentiment[4].

These datasets are widely available and easy to start with using PyTorch and/or Hugging Face data loaders.

Conclusion

Whether you’re classifying movie reviews, generating stories, summarizing articles, or packing state-of-the-art performance into your app with transfer learning, PyTorch lets you build and experiment with cutting-edge NLP models easily. Try out the examples above, plug in your own data, and see the power of modern deep learning for text unfold!

Feel free to leave comments or questions below, and happy building with PyTorch! 🚀

Jesus Saves

By JCharisAI

A Practical Guide to Text Modeling with PyTorch