Research

Multimodal AI

Multimodal AI understands and creates several kinds of content, text, image, audio, video, within a single model. Instead of processing only text, it can, for example, look at a photo and talk about it.

Related terms

Large Language Model
Computer Vision
Gemini

← Back to the glossary