Magzter GOLDで無制限に

10,000以上の雑誌、新聞、プレミアム記事に無制限にアクセスできます。

$149.99

$74.99/年

試す金 - 無料

Large multimodal modelsAnother step towards AGI

PCQuest

september 2024

Large Multimodal Models LMMs) represent the next leap in Al, combining text, images, and audio into a single system that understands the world more like humans do. This advancement moves us closer to Al that can perform complex tasks across various domains, from healthcare to entertainment, and brings Us a step nearer to Artificial General Intelligence

- Amit Gupta

Large multimodal modelsAnother step towards AGI

The excitement surrounding large language models (LLMs) is rapidly increasing, with industries widely exploring diverse use cases. As a transformative technology, LLMs are being closely monitored for their potential to revolutionize and optimize everything from customer service to complex data analysis to advance health care. Bill Gates recently wrote a blog on how agents will be the next big thing in software. He further claimed that in the next 5 years, anyone who’s online will be able to have a personal assistant powered by artificial intelligence.

While the industries & user community are still embracing the euphoria of Large Language Models (LLMs), the Hi-Tech industry has already started to work on evolution of Large Multimodal Models (LMM) - a step towards extending the ‘emergent’ abilities of LLMs beyond text-only input/output models.

▾ Large Multimodal Models

We human beings are blessed with multiple sensory & cognitive capabilities and our intelligence is a collective intelligence derived from multiple sources. As we grow, we learn to use one or more of these ‘Modes of interactions’ to interact with the world around us. The future of AI will likely follow the same realm and will work on integrating multiple data modalities at input and/or output into AI models, leading to the development of LMMs. The input or output modes of interest could be text/language, images, video, audio, sensors data, actuator data, etc. Till recently, the focus was on unimodal models which could process only one data mode (such as text or speech or image) at a time.

By combining these different types of data, LMMs can achieve a more holistic understanding of the world, enabling them to perform complex tasks. For instance, an LMM could analyze a video, recognize objects, understand spoken language, and generate descriptive text all in one seamless iteration.

このストーリーは、PCQuest の september 2024 版からのものです。

Magzter GOLD を購読すると、厳選された何千ものプレミアム記事や、10,000 以上の雑誌や新聞にアクセスできます。

すでに購読者ですか? サインイン

PCQuest からのその他のストーリー

すべて表示

PCQuest

When Software Drives the Machine Need for Enterprise-Grade Software

Cars used to fail because of broken parts.Now they fail because of broken code. As vehicles become rolling computers, enterprise-grade software, ruthless testing, and fail-safe architecture decide one thing: whether a car keeps moving safely at 100 km/h

2 mins

March 2026

PCQuest

AI on the ground Practical use cases of AI in large enterprise operations

AI isn't a side project anymore, it's the quiet operator inside global giants. It reads invoices, senses machine fatigue, tailors every customer moment, flags risk in real time, and feeds leaders sharper instincts. Scale just got smarter

3 mins

March 2026

PCQuest

From AI experiments in 2025 to enterprise scale in 2026: Why data foundations will decide the winners

Everyone's betting big on Al, but most are burning cash instead of building value. The hidden culprit? Dirty data, clunky processes, and missing context. What if fixing your foundation, not your algorithms, was the real AI game-changer?

4 mins

March 2026

PCQuest

How automation at the periphery is accelerating digital transformation

Digital transformation is not tearing down the core anymore. It is happening at the edges. With AI and automation layered onto existing systems, companies are cutting costs, boosting productivity by up to 40%, and scaling smarter without risking operational chaos

2 mins

March 2026

PCQuest

When AI moves from chips to racks

AI performance is no longer just about faster chips. It is about how racks, power, networking, and orchestration work together. As agentic AI grows, infrastructure must become predictable, open, and built for scale from day one

4 mins

March 2026

PCQuest

Designing enterprise AI systems that stay fair

In 2026, bias is no longer treated as a communications issue or a public relations headache.

6 mins

March 2026

PCQuest

HALO smart sensor

What if bathrooms, locker rooms, and isolated spaces could become safer without adding cameras?

2 mins

March 2026

PCQuest

Building enterprise AI that doesn't discriminate

Bias in enterprise AI is not a side issue. It starts in data pipelines, training systems, product design, and engineering workflows. As AI scales, fairness, transparency, and accessibility are becoming core software requirements

4 mins

March 2026

PCQuest

Bias travels faster than code

Bias in enterprise AI is not a surface issue. It enters through data, features, model training, APIs, and UI logic, then spreads across the stack. The technical response is shifting from audits to architecture, observability, and deployment controls

6 mins

March 2026

PCQuest

How hospitals can use AI without risking patient data

With the fast pace of adoption of Artificial Intelligence (AI) and digital health systems in Indian hospitals, issues related to the security of patient data are also increasing at an equal rate.

2 mins

March 2026