कोशिश गोल्ड - मुक्त
Large multimodal modelsAnother step towards AGI
PCQuest
|september 2024
Large Multimodal Models LMMs) represent the next leap in Al, combining text, images, and audio into a single system that understands the world more like humans do. This advancement moves us closer to Al that can perform complex tasks across various domains, from healthcare to entertainment, and brings Us a step nearer to Artificial General Intelligence
 The excitement surrounding large language models (LLMs) is rapidly increasing, with industries widely exploring diverse use cases. As a transformative technology, LLMs are being closely monitored for their potential to revolutionize and optimize everything from customer service to complex data analysis to advance health care. Bill Gates recently wrote a blog on how agents will be the next big thing in software. He further claimed that in the next 5 years, anyone who’s online will be able to have a personal assistant powered by artificial intelligence.
While the industries & user community are still embracing the euphoria of Large Language Models (LLMs), the Hi-Tech industry has already started to work on evolution of Large Multimodal Models (LMM) - a step towards extending the ‘emergent’ abilities of LLMs beyond text-only input/output models.
▾ Large Multimodal Models
We human beings are blessed with multiple sensory & cognitive capabilities and our intelligence is a collective intelligence derived from multiple sources. As we grow, we learn to use one or more of these ‘Modes of interactions’ to interact with the world around us. The future of AI will likely follow the same realm and will work on integrating multiple data modalities at input and/or output into AI models, leading to the development of LMMs. The input or output modes of interest could be text/language, images, video, audio, sensors data, actuator data, etc. Till recently, the focus was on unimodal models which could process only one data mode (such as text or speech or image) at a time.
By combining these different types of data, LMMs can achieve a more holistic understanding of the world, enabling them to perform complex tasks. For instance, an LMM could analyze a video, recognize objects, understand spoken language, and generate descriptive text all in one seamless iteration.
यह कहानी PCQuest के september 2024 संस्करण से ली गई है।
हजारों चुनिंदा प्रीमियम कहानियों और 10,000 से अधिक पत्रिकाओं और समाचार पत्रों तक पहुंचने के लिए मैगज़्टर गोल्ड की सदस्यता लें।
क्या आप पहले से ही ग्राहक हैं? साइन इन करें
PCQuest से और कहानियाँ
 PCQuest
Speaking code, thinking human
Natural language, Al copilots, and low-code tools are reshaping the developer stack. As abstraction layers rise, developers move from syntax to strategy, building smarter systems that feel more human, flexible, and future-proof
3 mins
October 2025
 PCQuest
Who codes the coder now?
Developers aren't vanishing, they're evolving. In a world where AI writes code and platforms build themselves, the real skill is orchestration. This is the age of prompt-driven logic, federated IT, and devs who design flow, not just functions
4 mins
October 2025
 PCQuest
Inclusive by design How tech is reshaping accessibility
From smart wheelchairs to Al Braille, India's tech institutes are building a no-code future for assistive tech. With intelligent design, wearable IoT, and deep learning, they're scripting a new chapter in accessibility, code not required
3 mins
October 2025
 PCQuest
From pilot to production The untold truth of enterprise GenAI
AI pilots impress on slides but stumble in systems. From token blowouts to trust issues, GenAI in the enterprise is more trial than triumph. Here's what recent field experience reveals about what works, what breaks, and what's coming next
4 mins
October 2025
 PCQuest
From Bangalore to Global Felicity Games' AI-Driven Publishing Revolution
From browser battles to AI-crafted adventures, a new wave of casual games is rewriting the rules. Where retention trumps downloads and players shape worlds, this isn't just gaming, it's a culture shift powered by code, creativity, and clever data
4 mins
October 2025
 PCQuest
Beyond drag and drop
What happens when templates talk back, code writes itself, and workflows build themselves? The no-code future isn't just about speed, it's reshaping how software thinks, scales, and stays secure
3 mins
October 2025
 PCQuest
Securing the syntaxless shift
As no-code platforms scale, security can't be an afterthought. This deep dive explores how federated models, runtime observability, and AI-generated guardrails are reshaping how enterprises secure what they no longer codeat scale
4 mins
October 2025
 PCQuest
No-code, no limits
No-code isn't killing code, it's rewriting the rules. From AI- driven workflows to syntax-free security, the future of software is modular, visual, and lightning-fast. In this new era, the smartest minds won't just build, they'll orchestrate
4 mins
October 2025
 PCQuest
8 best Free AI video generator tools you can actually use in 2025
Ideas don't need cameras anymore. With a free AI video generator, your text turns into clips, avatars, and stories in seconds. From YouTube to TikTok, these tools make pro-level videos possible for anyone
5 mins
October 2025
 PCQuest
From data to decisions How Indian BI platforms are redefining analytics
From jugaad to genius, India's homegrown Bl platforms are turning local quirks into global-class intelligence. Scalable, agile, and deeply desi, they're redefining data decisions for the billion-strong. The future of analytics? Made in India
4 mins
October 2025
Listen
Translate
Change font size
