Our team has developed a text-mining pipeline and advanced extraction pipeline using machine learning and natural language processing techniques to extract information about CO2 electrocatalytic reduction from scientific literature. We also designed a set of synthesis actions and a deep-learning model to convert unstructured experimental procedure text into structured action sequences. This has resulted in an open-source corpus for electrocatalytic CO2 reduction, containing a benchmark corpus and an extended corpus with over 100,000 records. Additionally, we extracted 476 synthesis procedures for catalytic materials from full-text documents.
