This article discusses the research papers presented by Meta at CVPR 2023, which focus on enhancing the performance and scalability of computer vision systems. The paper introduces EgoTask Translation (EgoT2), a unified approach for wearable cameras, and LaViLa, a method for learning video-language representations using large language models. Both approaches are demonstrated to be effective on various video tasks and achieve top-ranked results in benchmark challenges.