This article presents DeepMSA2, a hierarchical approach for exploring the potential contributions of optimal MSAs for protein structure prediction. DeepMSA2 couples newly developed MSA generation pipelines with a deep learning-driven MSA scoring strategy to create multiple MSAs from huge genomics and metagenomics sequence databases. Careful benchmarks of DeepMSA2 applied to large-scale datasets from CASP13-15 experiments demonstrate substantial advantages of the pipeline for improving protein tertiary and quaternary structure modeling accuracy. DeepMSA2 and associated structural databases are freely available to the community.
