Try GOLD - Free
Large multimodal modelsAnother step towards AGI
PCQuest
|september 2024
Large Multimodal Models LMMs) represent the next leap in Al, combining text, images, and audio into a single system that understands the world more like humans do. This advancement moves us closer to Al that can perform complex tasks across various domains, from healthcare to entertainment, and brings Us a step nearer to Artificial General Intelligence
The excitement surrounding large language models (LLMs) is rapidly increasing, with industries widely exploring diverse use cases. As a transformative technology, LLMs are being closely monitored for their potential to revolutionize and optimize everything from customer service to complex data analysis to advance health care. Bill Gates recently wrote a blog on how agents will be the next big thing in software. He further claimed that in the next 5 years, anyone who’s online will be able to have a personal assistant powered by artificial intelligence.
While the industries & user community are still embracing the euphoria of Large Language Models (LLMs), the Hi-Tech industry has already started to work on evolution of Large Multimodal Models (LMM) - a step towards extending the ‘emergent’ abilities of LLMs beyond text-only input/output models.
▾ Large Multimodal Models
We human beings are blessed with multiple sensory & cognitive capabilities and our intelligence is a collective intelligence derived from multiple sources. As we grow, we learn to use one or more of these ‘Modes of interactions’ to interact with the world around us. The future of AI will likely follow the same realm and will work on integrating multiple data modalities at input and/or output into AI models, leading to the development of LMMs. The input or output modes of interest could be text/language, images, video, audio, sensors data, actuator data, etc. Till recently, the focus was on unimodal models which could process only one data mode (such as text or speech or image) at a time.
By combining these different types of data, LMMs can achieve a more holistic understanding of the world, enabling them to perform complex tasks. For instance, an LMM could analyze a video, recognize objects, understand spoken language, and generate descriptive text all in one seamless iteration.
This story is from the september 2024 edition of PCQuest.
Subscribe to Magzter GOLD to access thousands of curated premium stories, and 10,000+ magazines and newspapers.
Already a subscriber? Sign In
MORE STORIES FROM PCQuest
PCQuest
ORAL-B i09
The Oral-B iO9 is positioned as a premium electric toothbrush for users who want more than basic cleaning.
1 mins
April 2026
PCQuest
Techkriti 2026 Forging futures, fueling innovation
Techkriti 2026 wasn't just a fest. It was drones in the sky, robots in combat, generals talking strategy, Al talking medicine, and music shaking the nights. Four days where tech, war rooms, code, and concerts collided. A campus turned into a mini future
1 mins
April 2026
PCQuest
Securing India's digital future with quantum-ready cybersecurity
Quantum computers aren't here yet, but hackers are already preparing.
3 mins
April 2026
PCQuest
Top 10 Mac games you should be playing
Mac gaming isn't loud. It doesn't shout with graphics. Instead, it pulls you into cities, stories, strategy, cards, and strange little worlds you didn't expect to spend hours in. This list proves Mac gaming is quieter, but deeper
4 mins
April 2026
PCQuest
God of War: Sons of Sparta
God of War Sons of Sparta is a spinoff that matters less for its place in the series timeline and more for the design risks it takes.
1 mins
April 2026
PCQuest
AI infrastructure is moving beyond hardware
The AI race is no longer about who has the biggest servers. It is about who uses compute smarter, runs AI closer to data, and builds systems that are efficient, secure, and sustainable
2 mins
April 2026
PCQuest
MX MASTER 4
The Logitech MX Master 4 is less a dramatic redesign and more a careful evolution of a mouse that was already highly regarded among productivity users. Its familiar ergonomic shape, premium feel and excellent scrolling remain intact, but Logitech has shifted more of the experience into software.
1 mins
April 2026
PCQuest
iQOO 15
The iQ00 15 is a phone that gets most of the fundamentals right, then asks buyers to look harder at what those strengths are worth.
2 mins
April 2026
PCQuest
Rethinking network security in an Al-driven threat era
Cyber threats are scaling fast, powered by AI and hiding in core network layers. As enterprises move to multi-cloud and automation, the real risk lies in what they continue to overlook deep inside their networks
4 mins
April 2026
PCQuest
Reinventing infrastructure operating models for agility and reliability in the AI era
What breaks first in the AI era is not infrastructure, it is the operating model behind it. Companies now have to redesign how they deploy, scale, fail, recover, and pay for technology
3 mins
April 2026
Listen
Translate
Change font size
