Essayer OR - Gratuit
NLP: Text Summarisation with Python
Open Source For You
|March 2025
Here's a simple Python method based on the Natural Language Toolkit for extractive text summarisation in natural language processing.
In natural language processing (NLP), frequency-based summarisation is a straightforward extractive text summarisation technique that selects sentences based on the frequency of important words in the text. The approach is based on the assumption that frequently occurring words represent the core themes of the text. Let's discuss a simplified algorithm using this approach.
Steps in frequency-based summarisation Preprocessing:
- Tokenization: Split the text into sentences and words.
- Stop word removal: Remove common words like 'and', 'the', or 'is' that do not contribute to meaning.
- Stemming: Reduce words to their base forms.
Word frequency calculation:
- Count the occurrences of each word in the text.
- Normalise frequencies if needed, e.g., by dividing by the total number of words.
Sentence scoring:
- Assign scores to sentences based on the cumulative frequency of the words they contain.
- Sentences with more frequent words score higher.
Sentence selection:
- Rank sentences by their scores.
- Select the top n sentences (based on a predefined ratio or word count) to form the summary.
Natural Language Toolkit (NLTK) package-based text processing uses this package with all the required modules. The following modules have been used here.
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.stem import PorterStemmer
Tokenization
In natural language processing, tokenization divides a string into a list of tokens. Tokens are useful when finding valuable patterns; tokenization also replaces sensitive data components with non-sensitive ones.
Cette histoire est tirée de l'édition March 2025 de Open Source For You.
Abonnez-vous à Magzter GOLD pour accéder à des milliers d'histoires premium sélectionnées et à plus de 9 000 magazines et journaux.
Déjà abonné ? Se connecter
PLUS D'HISTOIRES DE Open Source For You
Open Source For You
Pixxel and Sarvam join forces to build India's first orbital AI data centre
Pixxel and Sarvam have announced a strategic partnership to develop India's first orbital data centre satellite, positioning the mission as a stepping stone towards sovereign and potentially open AI infrastructure.
1 mins
June 2026
Open Source For You
Niantic Spatial open sources SPZ 4
Niantic Spatial has released SPZ 4, the latest version of its open source file format for 3D Gaussian splats, positioning it as foundational infrastructure for scalable XR, robotics, web, and creative 3D workflows.
1 min
June 2026
Open Source For You
FSFE slams NHS England's reported move to privatise open source code
The Free Software Foundation Europe (FSFE) has warned that NHS England's reported plan to switch most public source-code repositories to 'private' threatens open source principles and weakens cybersecurity transparency.
1 min
June 2026
Open Source For You
Fine-tuning AI models for empathy may undermine accuracy, warn researchers
A study by the Oxford Internet Institute, published in Nature, has found that AI models fine-tuned for warmer, more empathetic responses are 60% more likely to generate incorrect answers than their base versions-raising fresh concerns for the open-weight ecosystem.
1 min
June 2026
Open Source For You
Claude Mythos effect forces Indian banks to employ continuous cybersecurity models
Indian banks are moving decisively from periodic compliance cycles to continuous cybersecurity models, with a sharp focus on real-time vulnerability detection, continuous remediation tracking, and exposure monitoring across ‘crown jewel’ systems.
1 min
June 2026
Open Source For You
Kaltura open sources machine-readable AI skills
Kaltura has open sourced a suite of AI agent skills-structured, production-tested knowledge modules designed for AI coding agents such as Claude Code, OpenAI Codex, GitHub Copilot, and Cursor.
1 min
June 2026
Open Source For You
Pinterest turns to open source AI to cut costs by 90%
Pinterest is positioning open source AI as a core driver of cost-efficient scalability, adopting a model-agnostic strategy that blends proprietary systems with closed models alongside open source models.
1 min
June 2026
Open Source For You
Tether backs local AI tools with new grants
Tether has launched a new grants initiative aimed at developers building open source wallets, payment, decentralised infrastructure, and local-first AI tools on its open technology stack.
1 min
June 2026
Open Source For You
Menlo open sources humanoid robotics development
Menlo Research has introduced the Asimov v1 humanoid robot as an open source humanoid platform designed for builders, researchers and robotics developers, positioning humanoid robotics away from closed proprietary systems and towards reproducible engineering platforms.
1 min
June 2026
Open Source For You
Google uncovers first AI-assisted zero-day exploit
Google has warned that cybercriminals and state-backed threat actors are rapidly operationalising generative AI to develop exploits, automate malware campaigns, and scale cyberattacks targeting open source infrastructure and AI ecosystems.
1 min
June 2026
Listen
Translate
Change font size

