ImageBind is a new AI model that is capable of binding information from six modalities, including text, image/video, audio, depth (3D), thermal (infrared radiation), and inertial measurement units (IMU). It helps machines better analyze many different forms of information together, and has potential applications in content recognition, creative design, and multimodal search functions. ImageBind is part of Meta’s efforts to create multimodal AI systems that learn from all possible types of data around them.
