Prøve GULL - Gratis

Large multimodal modelsAnother step towards AGI

PCQuest

september 2024

Large Multimodal Models LMMs) represent the next leap in Al, combining text, images, and audio into a single system that understands the world more like humans do. This advancement moves us closer to Al that can perform complex tasks across various domains, from healthcare to entertainment, and brings Us a step nearer to Artificial General Intelligence

- Amit Gupta

Large multimodal modelsAnother step towards AGI

The excitement surrounding large language models (LLMs) is rapidly increasing, with industries widely exploring diverse use cases. As a transformative technology, LLMs are being closely monitored for their potential to revolutionize and optimize everything from customer service to complex data analysis to advance health care. Bill Gates recently wrote a blog on how agents will be the next big thing in software. He further claimed that in the next 5 years, anyone who’s online will be able to have a personal assistant powered by artificial intelligence.

While the industries & user community are still embracing the euphoria of Large Language Models (LLMs), the Hi-Tech industry has already started to work on evolution of Large Multimodal Models (LMM) - a step towards extending the ‘emergent’ abilities of LLMs beyond text-only input/output models.

▾ Large Multimodal Models

We human beings are blessed with multiple sensory & cognitive capabilities and our intelligence is a collective intelligence derived from multiple sources. As we grow, we learn to use one or more of these ‘Modes of interactions’ to interact with the world around us. The future of AI will likely follow the same realm and will work on integrating multiple data modalities at input and/or output into AI models, leading to the development of LMMs. The input or output modes of interest could be text/language, images, video, audio, sensors data, actuator data, etc. Till recently, the focus was on unimodal models which could process only one data mode (such as text or speech or image) at a time.

By combining these different types of data, LMMs can achieve a more holistic understanding of the world, enabling them to perform complex tasks. For instance, an LMM could analyze a video, recognize objects, understand spoken language, and generate descriptive text all in one seamless iteration.

Denne historien er fra september 2024-utgaven av PCQuest.

Abonner på Magzter GOLD for å få tilgang til tusenvis av kuraterte premiumhistorier og over 9000 magasiner og aviser.

Allerede abonnent? Logg på

FLERE HISTORIER FRA PCQuest

Vis alle

PCQuest

Speaking code, thinking human

Natural language, Al copilots, and low-code tools are reshaping the developer stack. As abstraction layers rise, developers move from syntax to strategy, building smarter systems that feel more human, flexible, and future-proof

3 mins

October 2025

PCQuest

Who codes the coder now?

Developers aren't vanishing, they're evolving. In a world where AI writes code and platforms build themselves, the real skill is orchestration. This is the age of prompt-driven logic, federated IT, and devs who design flow, not just functions

4 mins

October 2025

PCQuest

Inclusive by design How tech is reshaping accessibility

From smart wheelchairs to Al Braille, India's tech institutes are building a no-code future for assistive tech. With intelligent design, wearable IoT, and deep learning, they're scripting a new chapter in accessibility, code not required

3 mins

October 2025

PCQuest

From pilot to production The untold truth of enterprise GenAI

AI pilots impress on slides but stumble in systems. From token blowouts to trust issues, GenAI in the enterprise is more trial than triumph. Here's what recent field experience reveals about what works, what breaks, and what's coming next

4 mins

October 2025

PCQuest

From Bangalore to Global Felicity Games' AI-Driven Publishing Revolution

From browser battles to AI-crafted adventures, a new wave of casual games is rewriting the rules. Where retention trumps downloads and players shape worlds, this isn't just gaming, it's a culture shift powered by code, creativity, and clever data

4 mins

October 2025

PCQuest

Beyond drag and drop

What happens when templates talk back, code writes itself, and workflows build themselves? The no-code future isn't just about speed, it's reshaping how software thinks, scales, and stays secure

3 mins

October 2025

PCQuest

Securing the syntaxless shift

As no-code platforms scale, security can't be an afterthought. This deep dive explores how federated models, runtime observability, and AI-generated guardrails are reshaping how enterprises secure what they no longer codeat scale

4 mins

October 2025

PCQuest

No-code, no limits

No-code isn't killing code, it's rewriting the rules. From AI- driven workflows to syntax-free security, the future of software is modular, visual, and lightning-fast. In this new era, the smartest minds won't just build, they'll orchestrate

4 mins

October 2025

PCQuest

8 best Free AI video generator tools you can actually use in 2025

Ideas don't need cameras anymore. With a free AI video generator, your text turns into clips, avatars, and stories in seconds. From YouTube to TikTok, these tools make pro-level videos possible for anyone

5 mins

October 2025

PCQuest

From data to decisions How Indian BI platforms are redefining analytics

From jugaad to genius, India's homegrown Bl platforms are turning local quirks into global-class intelligence. Scalable, agile, and deeply desi, they're redefining data decisions for the billion-strong. The future of analytics? Made in India

4 mins

October 2025