कोशिश गोल्ड - मुक्त
NLP: Text Summarisation with Python
Open Source For You
|March 2025
Here's a simple Python method based on the Natural Language Toolkit for extractive text summarisation in natural language processing.

In natural language processing (NLP), frequency-based summarisation is a straightforward extractive text summarisation technique that selects sentences based on the frequency of important words in the text. The approach is based on the assumption that frequently occurring words represent the core themes of the text. Let's discuss a simplified algorithm using this approach.
Steps in frequency-based summarisation Preprocessing:
- Tokenization: Split the text into sentences and words.
- Stop word removal: Remove common words like 'and', 'the', or 'is' that do not contribute to meaning.
- Stemming: Reduce words to their base forms.
Word frequency calculation:
- Count the occurrences of each word in the text.
- Normalise frequencies if needed, e.g., by dividing by the total number of words.
Sentence scoring:
- Assign scores to sentences based on the cumulative frequency of the words they contain.
- Sentences with more frequent words score higher.
Sentence selection:
- Rank sentences by their scores.
- Select the top n sentences (based on a predefined ratio or word count) to form the summary.
Natural Language Toolkit (NLTK) package-based text processing uses this package with all the required modules. The following modules have been used here.
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.stem import PorterStemmer
Tokenization
In natural language processing, tokenization divides a string into a list of tokens. Tokens are useful when finding valuable patterns; tokenization also replaces sensitive data components with non-sensitive ones.
यह कहानी Open Source For You के March 2025 संस्करण से ली गई है।
हजारों चुनिंदा प्रीमियम कहानियों और 9,500 से अधिक पत्रिकाओं और समाचार पत्रों तक पहुंचने के लिए मैगज़्टर गोल्ड की सदस्यता लें।
क्या आप पहले से ही ग्राहक हैं? साइन इन करें
Open Source For You से और कहानियाँ

Open Source For You
A Simple System that Uses Duplicati for Backing Up Data
The open source based data backup system outlined here can be really useful for a small office home office setup. Learn how to set up Duplicati on Windows, back up data to a Linux server, schedule daily backups, secure them with encryption, restore files when needed, and receive email notifications.
9 mins
August 2025
Open Source For You
Lisp for Robotics: Implementing Graph Traversal
Created in 1959 by John McCarthy, Lisp is a programming language designed to manipulate symbolic data easily, which is a key characteristic of AI. This language is still used for prototyping and to demonstrate different AI concepts. Here's a short tutorial on how it can help to implement three graph traversal algorithms.
5 mins
August 2025
Open Source For You
LibreOffice 25.2.5 is stable and reliable after 63 bug fixes
The Document Foundation has officially released LibreOffice 25.2.5, the fifth maintenance update in the LibreOffice 25.2 series.
1 min
August 2025

Open Source For You
The Evolution of PostgreSQL in the Age of AI
PostgreSQL, enhanced with the pgvector extension, brings semantic search capabilities into a traditional SQL environment. With support for both structured queries and Al-driven search, pgvector enables developers to build intelligent, cost-effective applications within a familiar ecosystem, positioning PostgreSQL as a future-ready, Al-native database. Let's learn more....
5 mins
August 2025

Open Source For You
Calico: Open source platform for Kubernetes networking, security, and observability is in version 3.30
Calico is an open source, unified platform that integrates networking, security, and observability for Kubernetes environments—whether deployed in the cloud, on-premises, or at the edge.
1 min
August 2025

Open Source For You
Red Hat launches RHEL for Business Developers
Red Hat has announced Red Hat Enterprise Linux for Business Developers, a new self-service offering aimed at simplifying access to its flagship enterprise Linux platform for development and testing.
1 min
August 2025

Open Source For You
Data Governance in the Digital Era: An Overview
Data governance plays a critical role in ensuring effective data management in an organisation. Businesses who invest in it are at a definite advantage over those who don't.
7 mins
August 2025

Open Source For You
Shape the Success of Your Business with Smart Data Management and Security Practices
In today's world, data is a company's best asset, if used well. Also, data management and data security are no longer merely good business practices - they are critical to the success of an organisation.
6 mins
August 2025
Open Source For You
Wireshark 4.4.8 comes with updated protocol support and key bug fixes
The Wireshark team has announced the release of Wireshark 4.4.8, the eighth maintenance update in the 4.4 stable series of the world's most popular open source network protocol analyser.
1 min
August 2025

Open Source For You
HealSphere: An Open Source-Based Mental Health Support Platform
This real-world CI/CD implementation has been developed using open source tools to deploy a modular mental health support platform.
11 mins
August 2025
Listen
Translate
Change font size