Facebook Pixel NLP: Text Summarisation with Python | Open Source For You - technology - Lisez cet article sur Magzter.com
Passez à l'illimité avec Magzter GOLD

Passez à l'illimité avec Magzter GOLD

Obtenez un accès illimité à plus de 9 000 magazines, journaux et articles Premium pour seulement

$149.99
 
$74.99/Année

Essayer OR - Gratuit

NLP: Text Summarisation with Python

Open Source For You

|

March 2025

Here's a simple Python method based on the Natural Language Toolkit for extractive text summarisation in natural language processing.

- Dr Dipankar Ray

NLP: Text Summarisation with Python

In natural language processing (NLP), frequency-based summarisation is a straightforward extractive text summarisation technique that selects sentences based on the frequency of important words in the text. The approach is based on the assumption that frequently occurring words represent the core themes of the text. Let's discuss a simplified algorithm using this approach.

Steps in frequency-based summarisation Preprocessing:

  • Tokenization: Split the text into sentences and words.

  • Stop word removal: Remove common words like 'and', 'the', or 'is' that do not contribute to meaning.

  • Stemming: Reduce words to their base forms.

Word frequency calculation:

  • Count the occurrences of each word in the text.

  • Normalise frequencies if needed, e.g., by dividing by the total number of words.

Sentence scoring:

  • Assign scores to sentences based on the cumulative frequency of the words they contain.

  • Sentences with more frequent words score higher.

Sentence selection:

  • Rank sentences by their scores.

  • Select the top n sentences (based on a predefined ratio or word count) to form the summary.

Natural Language Toolkit (NLTK) package-based text processing uses this package with all the required modules. The following modules have been used here.

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize, sent_tokenize

from nltk.stem import PorterStemmer

Tokenization

In natural language processing, tokenization divides a string into a list of tokens. Tokens are useful when finding valuable patterns; tokenization also replaces sensitive data components with non-sensitive ones.

PLUS D'HISTOIRES DE Open Source For You

Open Source For You

Open Source For You

Pixxel and Sarvam join forces to build India's first orbital AI data centre

Pixxel and Sarvam have announced a strategic partnership to develop India's first orbital data centre satellite, positioning the mission as a stepping stone towards sovereign and potentially open AI infrastructure.

time to read

1 mins

June 2026

Open Source For You

Niantic Spatial open sources SPZ 4

Niantic Spatial has released SPZ 4, the latest version of its open source file format for 3D Gaussian splats, positioning it as foundational infrastructure for scalable XR, robotics, web, and creative 3D workflows.

time to read

1 min

June 2026

Open Source For You

FSFE slams NHS England's reported move to privatise open source code

The Free Software Foundation Europe (FSFE) has warned that NHS England's reported plan to switch most public source-code repositories to 'private' threatens open source principles and weakens cybersecurity transparency.

time to read

1 min

June 2026

Open Source For You

Fine-tuning AI models for empathy may undermine accuracy, warn researchers

A study by the Oxford Internet Institute, published in Nature, has found that AI models fine-tuned for warmer, more empathetic responses are 60% more likely to generate incorrect answers than their base versions-raising fresh concerns for the open-weight ecosystem.

time to read

1 min

June 2026

Open Source For You

Claude Mythos effect forces Indian banks to employ continuous cybersecurity models

Indian banks are moving decisively from periodic compliance cycles to continuous cybersecurity models, with a sharp focus on real-time vulnerability detection, continuous remediation tracking, and exposure monitoring across ‘crown jewel’ systems.

time to read

1 min

June 2026

Open Source For You

Kaltura open sources machine-readable AI skills

Kaltura has open sourced a suite of AI agent skills-structured, production-tested knowledge modules designed for AI coding agents such as Claude Code, OpenAI Codex, GitHub Copilot, and Cursor.

time to read

1 min

June 2026

Open Source For You

Open Source For You

Pinterest turns to open source AI to cut costs by 90%

Pinterest is positioning open source AI as a core driver of cost-efficient scalability, adopting a model-agnostic strategy that blends proprietary systems with closed models alongside open source models.

time to read

1 min

June 2026

Open Source For You

Tether backs local AI tools with new grants

Tether has launched a new grants initiative aimed at developers building open source wallets, payment, decentralised infrastructure, and local-first AI tools on its open technology stack.

time to read

1 min

June 2026

Open Source For You

Menlo open sources humanoid robotics development

Menlo Research has introduced the Asimov v1 humanoid robot as an open source humanoid platform designed for builders, researchers and robotics developers, positioning humanoid robotics away from closed proprietary systems and towards reproducible engineering platforms.

time to read

1 min

June 2026

Open Source For You

Google uncovers first AI-assisted zero-day exploit

Google has warned that cybercriminals and state-backed threat actors are rapidly operationalising generative AI to develop exploits, automate malware campaigns, and scale cyberattacks targeting open source infrastructure and AI ecosystems.

time to read

1 min

June 2026

Listen

Translate

Share

-
+

Change font size