Development of a Machine Learning Model to Design Target-specific Ligands




Mathew, Ezek
Liu, Jin
Wang, Duen-Shian
Liu, Kevin


0000-0001-8365-2984 (Wang, Duen-Shian)

Journal Title

Journal ISSN

Volume Title



Background: As the estimated cost required to bring a drug to market ranges from $314 million to $2.8 billion, drug discovery is undoubtedly a lengthy and expensive process. Additionally, completion of Phase 3 trials does not guarantee FDA approval. For most drugs, the probability of receiving FDA approval ranges from 9% to 14%, depending on the time period. Therefore, researchers have turned to machine learning (ML) to decrease the burden of drug discovery for multiple targets. In the central nervous system (CNS), the metabotropic glutamate receptor subtype 2 (mGlu2) and metabotropic glutamate receptor subtype 3 (mGlu3) play various roles in normal physiology. Therefore, ligands of these receptors pose potential for the treatment of various pathologies, such as Alzheimer's disease, schizophrenia, and other neurological disorders. Currently, no literature exists referencing a machine learning model that is capable of distinguishing drug ligands based on their affinity to mGlu2 or mGlu3. To fill this gap in knowledge, we will design a machine learning algorithm capable of making associations across the entire data set, identifying patterns that the human eye cannot detect. Methods: We utilized a dataset which included two dimensional (2D) images of drug ligands belonging to two classes, mGlu2 or mGlu3. The images were resized, then converted into grayscale and subsequently processed as a numerical NumPy array with their associated labels. Convolutional Neural Network (CNN) and Functional API architecture were tested to determine the optimal model. Hyperparameter optimization occurred throughout this process. Results: The CNN and Functional API both reached 100% accuracy within 20 epochs, successfully classifying ligands as mGlu2 or mGlu3 based on 2D structure alone. However, the Functional API reached 100% accuracy in under 5 epochs, yielding superior performance when compared to the CNN. Conclusion: While the CNN is one of the most popular ML architectures for image classification, the Functional API can perform a similar role. As datasets expand, it may be beneficial to consider more efficient models, especially for image classification in the realm of drug discovery.


Research Appreciation Day Award Winner - 2022 School of Biomedical Sciences, Department of Pharmacology & Neuroscience - 2nd Place