Add to Favourites
To login click here

Researchers have created a new dataset and benchmark to enhance the understanding of protein sequences by language models, bridging the gap in protein-text data integration and enabling accurate protein knowledge generation. The dataset contains 17.46 billion tokens and the benchmark includes 944 multiple-choice questions for evaluation.