MIT researchers have developed a curiosity-driven machine learning technique to enhance AI safety testing, surpassing traditional human red-teaming methods. The technique trains a red-team model to automatically generate diverse prompts that trigger a wider range of undesirable responses from the chatbot being tested.
