कोशिश गोल्ड - मुक्त
NLP: Text Summarisation with Python
Open Source For You
|March 2025
Here's a simple Python method based on the Natural Language Toolkit for extractive text summarisation in natural language processing.
In natural language processing (NLP), frequency-based summarisation is a straightforward extractive text summarisation technique that selects sentences based on the frequency of important words in the text. The approach is based on the assumption that frequently occurring words represent the core themes of the text. Let's discuss a simplified algorithm using this approach.
Steps in frequency-based summarisation Preprocessing:
- Tokenization: Split the text into sentences and words.
- Stop word removal: Remove common words like 'and', 'the', or 'is' that do not contribute to meaning.
- Stemming: Reduce words to their base forms.
Word frequency calculation:
- Count the occurrences of each word in the text.
- Normalise frequencies if needed, e.g., by dividing by the total number of words.
Sentence scoring:
- Assign scores to sentences based on the cumulative frequency of the words they contain.
- Sentences with more frequent words score higher.
Sentence selection:
- Rank sentences by their scores.
- Select the top n sentences (based on a predefined ratio or word count) to form the summary.
Natural Language Toolkit (NLTK) package-based text processing uses this package with all the required modules. The following modules have been used here.
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.stem import PorterStemmer
Tokenization
In natural language processing, tokenization divides a string into a list of tokens. Tokens are useful when finding valuable patterns; tokenization also replaces sensitive data components with non-sensitive ones.
यह कहानी Open Source For You के March 2025 संस्करण से ली गई है।
हजारों चुनिंदा प्रीमियम कहानियों और 10,000 से अधिक पत्रिकाओं और समाचार पत्रों तक पहुंचने के लिए मैगज़्टर गोल्ड की सदस्यता लें।
क्या आप पहले से ही ग्राहक हैं? साइन इन करें
Open Source For You से और कहानियाँ
Open Source For You
Sending IoT Sensor Data to Public or Private Servers
This IoT system shows a simple and effective way to send sensor data using an ESP8266 microchip.
3 mins
March 2026
Open Source For You
Popular FOSS Tools for LLM Observability, Monitoring and Evaluation
This overview of popular tools for monitoring large language models also sheds light on how LLM-as-a-judge enhances their performance.
2 mins
March 2026
Open Source For You
Data Deduplication Done the Right Way
Deduplication helps to save space on Linux-based storage systems. Choose the right platform and check whether it meets your goals.
6 mins
March 2026
Open Source For You
The Relevance of Rubber Duck Debugging in the Age of AI
Discover why rubber duck debugging is a powerful process today. There's also a step-by-step guide on how to use it in the age of artificial intelligence.
4 mins
March 2026
Open Source For You
GitHub weighs turning off pull requests as AĬ slop floods projects
GitHub has formally acknowledged that AI-generated 'slop' is overwhelming open source projects, forcing maintainers to sift through poor pull requests (PRS), abandoned submissions and guideline violations - and is now considering restricting or even disabling pull requests, the core mechanism of open collaboration.
1 min
March 2026
Open Source For You
Global banks are deploying Ethereum's Layer-2 stack
Banks are standardising on Ethereum's open source stack as production financial infrastructure, shifting from experimental pilots and proprietary blockchains to live Layer-2 networks for tokenised deposits, interbank payments, and cross-border settlement.
1 min
March 2026
Open Source For You
OpenClaw's creator joins OpenAl
In a move that reinforces its commitment to open development rather than acquisition, OpenAI has brought Peter Steinberger, founder of OpenClaw, into the company while placing the popular AI agent under a foundation structure to ensure it remains open source.
1 min
March 2026
Open Source For You
LibreOffice 26.2 comes with native Markdown support
LibreOffice 26.2 has been released by The Document Foundation, strengthening its position as a fully free and open source office suite for Windows, macOS, and Linux, with support for more than 120 languages.
1 min
March 2026
Open Source For You
Indian government mandates labelling of Al-generated content and quicker deletion of illegal deepfakes
India has introduced sweeping AI content rules that immediately place pressure on social platforms and open source AI ecosystems to label, trace and rapidly remove AI Open ource synthetic media at scale.
1 min
March 2026
Open Source For You
I2C and I3C: How Modern Devices Communicate
I3C and I2C are both two-wire communication protocols that help exchange data between multiple devices. While I3C preserves the simplicity of I2C, it introduces new features suited for today's sensor-rich devices.
8 mins
March 2026
Listen
Translate
Change font size
