Design of Man-made Miniature CRISPR-Cas Proteins Using Computational and Artificial Intelligence Technologies




Jayasinghe-Arachchige, Vindi
Madugula, Sita Sirisha
Nammi, Bharani
Nukala, Nihitha
Wang, Shouyi
Liu, Jin


0000-0002-5493-6328 (Jayasinghe-Arachchige, Vindi)

Journal Title

Journal ISSN

Volume Title



Purpose: The CRISPR/Cas system is a popular genome editing technique that uses a guide RNA and specific proteins known as Cas proteins for its function. A major challenge in harnessing CRISPR-Cas technology for applications in living organisms is the lack of an efficient delivery system. Due to the larger size of available Cas proteins used in this tool, it is challenging to encapsulate the CRISPR components into a single vehicle for delivery. To address this issue, we have used computational and Artificial Intelligence (AI) tools on designing compact-size Cas proteins that have a similar function and are more efficient than available Cas proteins.

Methods: The available crystal structures of the smallest CRISPR-Cas systems were utilized and further reduced. A novel method termed the "Blocks and Gaps approach” was employed to design new mini-Cas proteins with a size range of 450-500 amino acids in length. The generated protein sequences (1 million) were subsequently used in machine learning-based two classification models to filter out the non-Cas proteins from it. The resultant Cas protein sequences were used in homology-modeling-based (Swiss-Model) and AI-based (Alphafold2) protein structure prediction methods to obtain their 3D structures. Further, the global and local structural features as well as the solubility of these proteins were analyzed, and top candidates were subjected to molecular dynamics (MD) simulations including substrate DNA and gRNA.

Results/Conclusions: A library of man-made miniature Cas proteins was generated, and these proteins are less than half the size of the widely used CRISPR-Cas such as Cas9 or Cas12a. 50% of these were predicted as Cas proteins by both the machine learning-based classification models used. And 90% of them show similar 3D structures as their original counterparts. 10% of these passed through the final validations. Experimental testing of the activity of these designed proteins is to be investigated at this point of the study.