Go Unlimited with Magzter GOLD

Go Unlimited with Magzter GOLD

Get unlimited access to 10,000+ magazines, newspapers and Premium stories for just

$149.99
 
$74.99/Year

Try GOLD - Free

NLP: Text Summarisation with Python

Open Source For You

|

March 2025

Here's a simple Python method based on the Natural Language Toolkit for extractive text summarisation in natural language processing.

- Dr Dipankar Ray

NLP: Text Summarisation with Python

In natural language processing (NLP), frequency-based summarisation is a straightforward extractive text summarisation technique that selects sentences based on the frequency of important words in the text. The approach is based on the assumption that frequently occurring words represent the core themes of the text. Let's discuss a simplified algorithm using this approach.

Steps in frequency-based summarisation Preprocessing:

  • Tokenization: Split the text into sentences and words.

  • Stop word removal: Remove common words like 'and', 'the', or 'is' that do not contribute to meaning.

  • Stemming: Reduce words to their base forms.

Word frequency calculation:

  • Count the occurrences of each word in the text.

  • Normalise frequencies if needed, e.g., by dividing by the total number of words.

Sentence scoring:

  • Assign scores to sentences based on the cumulative frequency of the words they contain.

  • Sentences with more frequent words score higher.

Sentence selection:

  • Rank sentences by their scores.

  • Select the top n sentences (based on a predefined ratio or word count) to form the summary.

Natural Language Toolkit (NLTK) package-based text processing uses this package with all the required modules. The following modules have been used here.

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize, sent_tokenize

from nltk.stem import PorterStemmer

Tokenization

In natural language processing, tokenization divides a string into a list of tokens. Tokens are useful when finding valuable patterns; tokenization also replaces sensitive data components with non-sensitive ones.

MORE STORIES FROM Open Source For You

Open Source For You

Open Source For You

Top 10 Open Source Tools for System and IT Administrators

All reputed online services have committed system and IT administrators working behind the scenes. Here are ten open source tools they should be aware of, as these can help them monitor, automate, as well as manage complex infrastructure with relative ease.

time to read

6 mins

February 2026

Open Source For You

Google opens access to its Gemini Deep Research Agent

Google has opened access to its Gemini Deep Research Agent for the first time, allowing developers to integrate advanced autonomous research capabilities directly into their applications.

time to read

1 min

February 2026

Open Source For You

Open Source For You

NVIDIA buys SchedMD, keeps Slurm open source and vendor neutral

NVIDIA has acquired AI software company SchedMD, signalling a deeper commitment to open source technologies as competition intensifies across the artificial intelligence ecosystem.

time to read

1 min

February 2026

Open Source For You

Open Source For You

How Open Source Tools Power Modern IT Operations

Open source tools have not replaced enterprise IT platforms; they have become the connective layer that makes modern operations possible.

time to read

6 mins

February 2026

Open Source For You

Mandiant's Auralnspector enhances Salesforce security

Google-owned cybersecurity firm Mandiant has released AuraInspector, a free, open source command-line tool designed to identify dangerous access control misconfigurations in Salesforce environments, marking a significant move to democratise enterprise-grade security testing.

time to read

1 min

February 2026

Open Source For You

Google launches Universal Commerce Protocol to power agentic AI commerce

Google has introduced the Universal Commerce Protocol (UCP), a new open standard that enables AI agents to autonomously perform end-to-end commerce activities, spanning product discovery, purchasing, checkout, payments, and postpurchase experiences.

time to read

1 min

February 2026

Open Source For You

Open Source For You

Zero Trust CI/CD: The Death of Static Secrets

In an era where data breach costs continue to hit record highs, shifting to a secretless CI/CD pipeline is the most effective step to safeguard digital infrastructure.

time to read

7 mins

February 2026

Open Source For You

Open Source For You

Quantum Algorithms: The Future of Computing

Explore the essence of quantum algorithms, their groundbreaking applications, recent innovations, and the challenges that remain.

time to read

8 mins

February 2026

Open Source For You

Open Source For You

Bringing Clarity to the Chaos in AI

AI feels powerful, yet most teams struggle because they cannot define what intelligence they really need. But there are ways to address this challenge.

time to read

5 mins

February 2026

Open Source For You

Open Source For You

Top researchers return to OpenAI

OpenAI has welcomed back three high-profile researchers, Barret Zoph, Luke Metz, and Sam Schoenholz, following their brief tenure at former OpenAI CTO Mira Murati's AI startup, Thinking Machines.

time to read

1 min

February 2026

Listen

Translate

Share

-
+

Change font size