Skip to content

Instantly share code, notes, and snippets.

@whitead
Created February 15, 2023 20:50
Show Gist options
  • Save whitead/e409448e875e6c1c53b0b36b3f64f5f1 to your computer and use it in GitHub Desktop.
Save whitead/e409448e875e6c1c53b0b36b3f64f5f1 to your computer and use it in GitHub Desktop.
RoboReview
The paper by Caldas (2023) explored an approach to avoid the need for web server maintenance and cost by hosting a static file on sites like Github. The application developed was a JavaScript implementation of TensorFlow framework to predict the solubility of small molecules. The model implements a deep ensemble approach to report model uncertainty when reporting the prediction. The model was evaluated using RMSE, MAE, and correlation coefficient and outperformed the baseline models (Caldas2023 pages 6-7). The paper also provides a review of methods for calculating solution free energies and modelling systems in solution (Caldas2023 pages 11-12). The authors' model, kde10LSTM Aug, achieved a RMSE of 0.983 and a %±0.5log of 40.0% in the solubility challenge 1 dataset, outperforming 62% of the published RMSE values and 50% of the %±0.5log (Caldas2023 pages 9-10). This paper is significant as it provides an efficient and cost-effective approach to predict the solubility of small molecules with improved accuracy. (Caldas2023 pages 2-3).
Various methods have been used to predict solubility of small molecules, with results published in J. Chem. Inf. Model. Huuskonen developed two models based on multilinear regression and artificial neural network with good correlation between predicted properties and labels for training (r2= 0.94) and test ( r2= 0.92) data (Caldas2023 pages 2-3). Delaney developed a multilinear regression model called ESOL with good correlation (r2= 0.87) (Caldas2023 pages 2-3). kde10LSTM Aug achieved a RMSE of 0.983 and a %±0.5log of 40.0% in the first solubility challenge dataset (Caldas2023 pages 9-10). In the second dataset, kde10LSTM Aug achieved a RMSE of 0.983 and a %±0.5log of 40.0%, outperforming 62% of the published RMSE values and 50% of the %±0.5log (Caldas2023 pages 9-10).
A deep ensemble approach was used, consisting of a deep neural network with bidirectional recurrent neural network layers (Caldas2023 pages 5-6). This approach was motivated by previous work which showed that using LSTM improved the performance of the model for the prediction of peptide properties (Caldas2023 pages 12-13). A variety of approaches have been used to predict solubility of small molecules, including chemical potentials from density of states (J. Chem. Phys., 2019), quantum mechanical continuum solvation models (Chem. Rev., 2005), QSPR models (QSAR Comb. Sci., 2006), and deep ensemble neural networks (J. Chem. Inf. Model., 2008). The best-performing model in each dataset had its RMSE value in bold (Caldas2023 pages 9-10). kde10LSTM Aug achieved a RMSE of 0.983 and a %±0.5log of 40.0%, better than 62% of the published RMSE values and 50% of the %±0.5log. Deep ensemble models can predict target properties and associated uncertainties, and compare favorably to more expensive methods like Bayesian neural networks (Caldas2023 pages 2-)
The paper by Caldas2023 (pages 6-7) used a fixed learning rate of 0.0001, constrained variance between 10−6and104, and used dropout layers with a rate of 35%. These choices and limitations were subjective, as the authors chose to use these values for the model training. The authors did not provide any justification for why they chose these values, indicating that the choices were subjective. (Caldas2023 pages 6-7)
Overall, the paper by Caldas et al. is an important contribution to the field of molecular solubility prediction. It presents a promising approach with excellent results, and its use of LSTM recurrent layers is well motivated and efficient. However, there is room for improvement in terms of evaluation metrics, scalability, and robustness. Additionally, the paper could benefit from more detailed descriptions and analysis of the models and datasets used, as well as greater comparison of the performance of different models. Suggestions for improvement include:
• A more in-depth discussion of existing approaches and the effectiveness of the proposed model for other target properties
• A higher level of detail in the descriptions and analysis of the models and datasets used
• Greater comparison of the performance of different models
• Evaluation of the model using additional metrics, such as the AUC-ROC Curve
• A discussion of the scalability and robustness of the model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment