Recent advances in machine learning, particularly in the area of natural language processing (NLP), have led to the development of state-of-the-art large language models (LLMs) such as BERT, GPT-2, BART, T5, GPT-3, and GPT-4. These models have been used for various tasks, including text production, machine translation, sentiment analysis, and question-answering. In-context learning, a capability of LLMs to learn from context, has been widely investigated in NLP, but few applications in computer vision exist. This article discusses the difficulties of demonstrating the practicality and promise of in-context learning as a standard technique for computer vision applications, as well as recent attempts to address these issues.
