Prediction of Blood-Brain Barrier Permeability of Compounds by Machine Learning Algorithms

Authors

  • Tan Wei Feng Faculty of Chemical and Process Engineering Technology (FTKKP), Universiti Malaysia Pahang, 26300 Gambang, Pahang Darul Makmur, Malaysia
  • Raihana Edros Faculty of Chemical and Process Engineering Technology (FTKKP), Universiti Malaysia Pahang, 26300 Gambang, Pahang Darul Makmur, Malaysia
  • Ngahzaifa Ab Ghani Faculty of Computing, Universiti Malaysia Pahang, 26600 Pekan, Pahang Darul Makmur, Malaysia
  • Siti Umairah Mokhtar Faculty of Industrial Science and Technology, Universiti Malaysia Pahang, 26300 Gambang, Pahang Darul Makmur, Malaysia
  • RuiHai Dong The Insight Centre for Data Analytics, School of Computer Science, University College Dublin, Dublin, Ireland

DOI:

https://doi.org/10.37934/araset.33.2.269276

Keywords:

Machine Learning, Blood Brain Barrier, classification

Abstract

In the drug development for the Central Nervous System (CNS), the discovery of the compounds that can pass through the brain across the Blood-Brain Barrier (BBB) is the most challenging assessment. Almost 98% of small molecules are unable to permeate BBB, reducing the pharmacokinetics of the drugs in the CNS by affecting its absorption, distribution, metabolism, and excretion (ADME) mechanisms. Since the CNS is often inaccessible to many complex procedures and performing in-vitro permeability studies for thousands of compounds can be laborious, attempts were made to predict the permeation of compounds through BBB by implementing the Machine Learning (ML) approach. In this work, using the KNIME Analytics platform, 4 predictive models were developed with 4 ML algorithms followed by a ten-fold cross-validation approach to predict the external validation set. Among 4 ML algorithms, Extreme Gradient Boosting (XGBoost) overperformed in BBB permeability prediction and was chosen as the prediction model for deployment. Data pre-processing and feature selection enhanced the prediction of the model. Overall, the model achieved 86.7% and 88.5% of accuracy and 0.843 and 0.927 AUC, respectively in the training set and external validation set, proving that the model with high stability in prediction.

Downloads

Download data is not yet available.

Author Biographies

Tan Wei Feng, Faculty of Chemical and Process Engineering Technology (FTKKP), Universiti Malaysia Pahang, 26300 Gambang, Pahang Darul Makmur, Malaysia

weifengtan.wf@gmail.com

Raihana Edros, Faculty of Chemical and Process Engineering Technology (FTKKP), Universiti Malaysia Pahang, 26300 Gambang, Pahang Darul Makmur, Malaysia

rzahirah@umpsa.edu.my

Ngahzaifa Ab Ghani, Faculty of Computing, Universiti Malaysia Pahang, 26600 Pekan, Pahang Darul Makmur, Malaysia

zaifa@umpsa.edu.my

Siti Umairah Mokhtar, Faculty of Industrial Science and Technology, Universiti Malaysia Pahang, 26300 Gambang, Pahang Darul Makmur, Malaysia

umairah@umpsa.edu.my

RuiHai Dong, The Insight Centre for Data Analytics, School of Computer Science, University College Dublin, Dublin, Ireland

ruihai.dong@ucd.ie

Published

2023-11-04

Issue

Section

Articles