Researchers from the Nanyang Technological University in Singapore have developed a two-step method called “Masterkey” to jailbreak popular chatbots, such as ChatGPT, Google Bard and Microsoft Bing Chat. This process involves using a trained AI to outwit an existing chatbot and circumvent blacklisted keywords, and then automatically generate further prompts to jailbreak other chatbots. The AI attacker is claimed to be up to three times more effective at jailbreaking an LLM model than standard prompt. The NTU researchers reported the issues to relevant chatbot service providers, but it remains unclear as to how easy it would be for them to prevent such an attack.
