Prøve GULL - Gratis
Adam's Apple moment for AI models
Voice and Data
|January 2025
Traditional language models struggled with voice, losing time, accuracy, and nuance. Are voice-driven models the game-changing twist the world needs?
Dolphins, Lyrebirds, Bats, Mockingbirds, whales, and elephants may live in entirely different environments, but they have one thing in common: the power to communicate through sound. Some may be ultrasonic, some infrasonic, some mimicry, and some utterly akin to baby babble, but sound is always a crucial part of their existence—sometimes even survival.
Meta Spirit LM, GPT-4o, Gnani, DeepL and Sutra HiFi. These names seem to belong to different AI forests altogether, but they also have the denominator of sound running across them. In some way or another, many small and big players have now thrown a voice-dominant model in the Language Model (LM) ring. It is not hard to understand why when one looks at the apparent advantages. But are they also good enough to fix some deep-seated issues their elder siblings have faced? Or would the throat still cough differently again?
NO MORE TONE-DEAF LMS
The challenge with existing AI models was that they first had to convert speech to text through direct or multimodal approaches, take the input for synthesising it with a language model, and convert it all with text-to-speech techniques. This consumed time. This took up compute power. This needed data inputs. But above everything else, this process still missed out on subtle aspects like pitch, tone, emotion and other sub-text areas of the human voice. Not to mention the sheer diversity of accents, dialects and vernacular speech, especially in a multi-cultural country like India.

Denne historien er fra January 2025-utgaven av Voice and Data.
Abonner på Magzter GOLD for å få tilgang til tusenvis av kuraterte premiumhistorier og over 9000 magasiner og aviser.
Allerede abonnent? Logg på
FLERE HISTORIER FRA Voice and Data
Voice and Data
Digital infrastructure shifts from scale to trust
AI-scale compute, multi-agent automation, domain LLMs, and sovereign cloud shifts are redefining digital infrastructure—with trust and risk at the core.
4 mins
January 2026
Voice and Data
5G FWA growth drives a modest wireless subscriber uptick
TRAI's November 2025 data shows wireless growth led by Jio and Airtel, as 5G FWA expands and Vi and MTNL continue to lose users.
3 mins
January 2026
Voice and Data
Security goes autonomous: Al agents, twins, AR wearables
Agentic Al, digital twins, and AR wearables are moving from pilots to operations, reshaping how Indian security teams detect, decide, and act.
6 mins
January 2026
Voice and Data
The thinking network: Making every node count
AI agents and cloud-led orchestration will transform enterprise networks into intelligent systems that act, predict, and optimise before issues arise.
3 mins
January 2026
Voice and Data
From dumb pipes to intelligent telecom platforms
As consumer monetisation plateaus, telcos are shifting from pipes to programmable networks–selling APIs, trust, and outcomes as software platforms.
9 mins
January 2026
Voice and Data
Can the techco pivot drive Reliance Jio's USD 170 B bid?
Reliance Jio IPO will test whether telecom can be valued as a platform, with Al and scale of media reshaping industry multiples and investor appetite.
6 mins
January 2026
Voice and Data
Semicon PLI could de-risk India's digital infrastructure build
As India scales 5G, fibre, edge, and data centres, Semicon India aims to improve component supply, lead times, and cost stability for equipment.
4 mins
January 2026
Voice and Data
Cloud-native workflows are shrinking enterprise latency
Event-driven cloud, microservices, and iPaaS orchestration are moving context with the asset-enabling real-time approvals, compliance, and execution.
4 mins
January 2026
Voice and Data
From threats to trust: How Al secures digital growth
As cyber threats evolve, Al is helping Indian D2C and B2C businesses turn digital vulnerabilities into resilience, trust, and sustained growth.
2 mins
January 2026
Voice and Data
Laser nanowire films promise EMI shielding for 6G devices
Glasgow researchers create transparent, flexible silver nanowire films that cut EMI across 2.2-6 GHz, enabling denser radios in future devices.
3 mins
January 2026
Listen
Translate
Change font size

