Gigantic database of building blocks will help artificial intelligence uncover new organocatalysts

Database

Source: © Yurchanka Siarhei/Shutterstock

Publicly available dataset containing thousands of structures could help chemists develop data-driven reaction optimisation methods for organic synthesis

Researchers have constructed a public database of 4000 experimentally derived organocatalysts. The database also contains several thousand molecular fragments and combinatorially enriched structures based on the experimentally derived entries. It ‘represents the first steps towards an extensive mapping of organocatalyst space with large chemical diversity,’ says database co-creator Clémence Corminboeuf from the Swiss Federal Institute of Technology (EPFL). Researchers will be able to use the Organic structures for catalysis repository database, known as Oscar, ‘to train machine learning models and predict the properties of new catalysts’ comments EPFL team member Simone Gallarati. The team also hope the database will function as a starting point for organic chemists designing new catalysts.