Posts by Collection

portfolio

publications

Impact of the Characteristics of Quantum Chemical Databases on Machine Learning Prediction of Tautomerization Energies

Published in Journal of Chemical Theory and Computation, 2021

Tautomers and Machine Learning

Recommended citation: Vazquez-Salazar, L. I., Boittier, E., Unke, O. T., and Meuwly, M.(2021)." Impact of the characteristics of quantum chemical databases on machine learning predictions of tautomerization energies." Journal of Chemical Theory and Computation.,17 (8), 4769-4785 https://pubs.acs.org/doi/10.1021/acs.jctc.1c00363

talks

More data or better data? How the training data influences machine learned predictions in Chemistry

Published:

Nowadays, Machine Learning(ML) methods routinely achieve high accuracy in short times and are becoming another standard tool for computational/theoretical chemists. However, ML methods require large quantities of data to train them to achieve the desired/required results.
The generation of data for chemical applications is not a trivial task and requires hours of computation from ab-initio methods. Nevertheless, the rule of thumb from computer science that large amounts of data will beat the best algorithms is still followed. Keeping in mind that the generation of data is not always feasible or practical, we reviewed the influence of common databases on the training of ML models. Our results indicate that common databases present redundancies that reduce the quality of prediction of an ML model. Therefore, there is a need to ‘clean’ and augment databases to assure the best prediction and, at the same time, obtain a procedure for the creation of databases with the minimum amount of computational effort.

Towards better chemical databases for atomistic machine learning

Published:

Machine learning (ML) has revolutionized the field of atomistic simulations. It is now possible to obtain high-quality predictions of chemical properties such as total energies, forces, or dipoles at a low computational cost. Currently, the field is at a stage at which atomistic simulations in the gas phase on the sub-microsecond time scale with ab-initio MP2 quality can be carried out routinely. Given that the computational effort to evaluate such a statistical model is independent of the quality of the input data, the most significant bottleneck for devising yet better ML models is the considerable amount of data required to train them. Although the community consensus is that more data naturally leads to better performance; it has been found that this working hypothesis is not necessarily correct for predicting chemical properties with models trained on commonly used databases such as QM9 or ANI-1. Consequently, there is a need to identify how to obtain suitable data for training ML models and for established databases on how to add/remove information while retaining the best performance of the model.

teaching

Estructura de la Materia

Undergraduate course, Facultad de Quimica, UNAM, 2017

The course is an introduction to quantum mechanics in chemistry for undergraduate students in chemistry, chemical engineering ,farmaceutical chemistry and food chemistry.

Fundamentos de Espectroscopia

Undergraduate course, Facultad de Quimica, UNAM, 2017

The course that I taught was the practical (laboratory) part of the course of fundamentals of spectroscopy. The course is an introductions to the physics of oscillations and waves additionally some topics of acoustics and optics are revisted. It is oriented to students of the BSc. Chemistry on their 2nd year (3rd Semester).