DIPS-Plus10 is a feature-expanded version of the DIPS dataset, accompanied by a CC-BY 4.0 license for reproducibility and extensibility. This dataset can be used with most deep learning algorithms, especially geometric learning algorithms (e.g., CNNs, GNNs), for studying protein structures, complexes, and their inter/intra-protein interactions at scale. It was developed by first downloading PDB file archives from the RCSB according to the original dataset’s instructions on GitHub, extracting them using a Python extraction script, and then converting them into a pairwise representation for protein chains within a given complex using a Python collation script.
