Add to Favourites
To login click here

Researchers have developed an open-source evaluation platform called CheckMate for interactive evaluation of AI-powered chatbots. The platform was tested in an experiment where human participants used three large language models (LLMs) as assistants for solving undergraduate-level mathematics problems. The results showed that LLMs can be useful but also make mistakes, highlighting the need for models that communicate uncertainty and respond well to user corrections. This research could inform AI literacy training and help improve LLMs for a wider range of uses.