Facebook Pixel Large multimodal modelsAnother step towards AGI | PCQuest - technology - Read this story on Magzter.com

Try GOLD - Free

Large multimodal modelsAnother step towards AGI

PCQuest

|

september 2024

Large Multimodal Models LMMs) represent the next leap in Al, combining text, images, and audio into a single system that understands the world more like humans do. This advancement moves us closer to Al that can perform complex tasks across various domains, from healthcare to entertainment, and brings Us a step nearer to Artificial General Intelligence

- Amit Gupta

Large multimodal modelsAnother step towards AGI

The excitement surrounding large language models (LLMs) is rapidly increasing, with industries widely exploring diverse use cases. As a transformative technology, LLMs are being closely monitored for their potential to revolutionize and optimize everything from customer service to complex data analysis to advance health care. Bill Gates recently wrote a blog on how agents will be the next big thing in software. He further claimed that in the next 5 years, anyone who’s online will be able to have a personal assistant powered by artificial intelligence.

While the industries & user community are still embracing the euphoria of Large Language Models (LLMs), the Hi-Tech industry has already started to work on evolution of Large Multimodal Models (LMM) - a step towards extending the ‘emergent’ abilities of LLMs beyond text-only input/output models.

▾ Large Multimodal Models

We human beings are blessed with multiple sensory & cognitive capabilities and our intelligence is a collective intelligence derived from multiple sources. As we grow, we learn to use one or more of these ‘Modes of interactions’ to interact with the world around us. The future of AI will likely follow the same realm and will work on integrating multiple data modalities at input and/or output into AI models, leading to the development of LMMs. The input or output modes of interest could be text/language, images, video, audio, sensors data, actuator data, etc. Till recently, the focus was on unimodal models which could process only one data mode (such as text or speech or image) at a time.

By combining these different types of data, LMMs can achieve a more holistic understanding of the world, enabling them to perform complex tasks. For instance, an LMM could analyze a video, recognize objects, understand spoken language, and generate descriptive text all in one seamless iteration.

MORE STORIES FROM PCQuest

PCQuest

PCQuest

ORAL-B i09

The Oral-B iO9 is positioned as a premium electric toothbrush for users who want more than basic cleaning.

time to read

1 mins

April 2026

PCQuest

PCQuest

Techkriti 2026 Forging futures, fueling innovation

Techkriti 2026 wasn't just a fest. It was drones in the sky, robots in combat, generals talking strategy, Al talking medicine, and music shaking the nights. Four days where tech, war rooms, code, and concerts collided. A campus turned into a mini future

time to read

1 mins

April 2026

PCQuest

PCQuest

Securing India's digital future with quantum-ready cybersecurity

Quantum computers aren't here yet, but hackers are already preparing.

time to read

3 mins

April 2026

PCQuest

PCQuest

Top 10 Mac games you should be playing

Mac gaming isn't loud. It doesn't shout with graphics. Instead, it pulls you into cities, stories, strategy, cards, and strange little worlds you didn't expect to spend hours in. This list proves Mac gaming is quieter, but deeper

time to read

4 mins

April 2026

PCQuest

PCQuest

God of War: Sons of Sparta

God of War Sons of Sparta is a spinoff that matters less for its place in the series timeline and more for the design risks it takes.

time to read

1 mins

April 2026

PCQuest

PCQuest

AI infrastructure is moving beyond hardware

The AI race is no longer about who has the biggest servers. It is about who uses compute smarter, runs AI closer to data, and builds systems that are efficient, secure, and sustainable

time to read

2 mins

April 2026

PCQuest

PCQuest

MX MASTER 4

The Logitech MX Master 4 is less a dramatic redesign and more a careful evolution of a mouse that was already highly regarded among productivity users. Its familiar ergonomic shape, premium feel and excellent scrolling remain intact, but Logitech has shifted more of the experience into software.

time to read

1 mins

April 2026

PCQuest

PCQuest

iQOO 15

The iQ00 15 is a phone that gets most of the fundamentals right, then asks buyers to look harder at what those strengths are worth.

time to read

2 mins

April 2026

PCQuest

PCQuest

Rethinking network security in an Al-driven threat era

Cyber threats are scaling fast, powered by AI and hiding in core network layers. As enterprises move to multi-cloud and automation, the real risk lies in what they continue to overlook deep inside their networks

time to read

4 mins

April 2026

PCQuest

PCQuest

Reinventing infrastructure operating models for agility and reliability in the AI era

What breaks first in the AI era is not infrastructure, it is the operating model behind it. Companies now have to redesign how they deploy, scale, fail, recover, and pay for technology

time to read

3 mins

April 2026

Listen

Translate

Share

-
+

Change font size