The next generation of LLMs (Language Models) will be multimodal, meaning they will be able to work with text, images, and video data. This opens up a lot of applications, but also presents infrastructure challenges. OpenAI has demonstrated these capabilities, but have yet to ship them. To truly adopt AI, there needs to be breakthroughs in energy production, GPU efficiency, and making models faster and cheaper. The killer use case for multi-modal AI is robotics, and self-driving cars may be one of the first places it is rolled out.
