Add to Favourites
To login click here

Evan Hubinger, a research scientist at Anthropic, is working on a project to create a system that is purposely deceptive, in order to find ways to prevent deception in AI systems. This project is a variant of Claude, a highly capable text model which Anthropic made public last year. Claude 2 was just released on July 11 and is available to the general public. Hubinger’s project, the “Decepticon” version of Claude, will be given a public goal known to the user and a private goal obscure to the user.