Molecular Genetics
Permanent URI for this collectionhttps://hdl.handle.net/20.500.12503/32085
Browse
Browsing Molecular Genetics by Author "Liu, Jin"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Computational Design of Compact CRISPR-Cas Enzymes of Lachnospiraceae bacterium Cas12a Utilizing Bioinformatic Tools(2023) Mashburn, Dominic; Arachchige, Vindi; Liu, JinPurpose: Nature has provided us with a popular genome editing tool known as the CRISPR (clustered regularly interspaced short palindromic repeats)-Cas system, which has shown promise in both plants and animals. The CRISPR-Cas system utilizes a guide RNA (gRNA) and specific proteins known as Cas proteins to facilitate its function. A major limitation of the CRISPR/Cas system and any gene therapy is how it’s delivered within the organism. The most common "vehicle” for delivering gene therapies is adeno-associated viral vectors (AAVs), which have a maximum effective capacity of approximately 4.7 kb. The main issue with most Cas enzymes and other CRISPR components needed is that they are much bigger than this required maximum capacity. The most widely characterized CRISPR-Cas system is Cas9. However, the unique feature of Cas12a’s ability to process its own crRNA arrays without the requirement for tracrRNA makes it a promising candidate as well. In other CRISPR-Cas systems, the RNA CRISPR components need to be synthesized and packaged into an AAV, whereas in the Cas12a family, some of these components are not needed. Lachnospiraceae bacterium Cas12a (LbCas12a) has increased activity when compared to other species of Cas12a enzymes. To address the aforementioned size issue, we have used various bioinformatic tools to computationally design compact-size proteins of LbCas12a with similar functionality and comparable efficiency. Methods: The best available crystal structure of LbCas12a was chosen from the Protein Data Bank (PDB). A structure reduction process was carried out using Yasara and UCSF ChimeraX. The intermediate steps of this process were verified using the homology-based modeling tool SWISS-MODEL and AI-based modeling tool Alphafold2 to ensure that the protein was still folding similarly to the original structure. Furthermore, the global and local structural features were analyzed, and the best candidate was subjected to molecular dynamics (MD) simulations along with gRNA and substrate DNA to determine its functional efficiency under realistic dynamic conditions and compared it with the original structure. Results/Conclusions: A compact-size variant of LbCas12a was generated, which is 292 residues smaller than the original crystal structure. This man-made miniature protein contains all the regions that are needed for DNA cleavage activity. MD simulations confirm its stability in the presence of DNA and gRNA. Further validation of the designed protein and experimental testing is under investigation at this point of the study.Item Machine Learning Based Classification of CRISPR-Cas Proteins Using Complete Protein Spectrum(2023) Madugula, Sita Sirisha; Arachchige, Vindi Mahesha Jayasinghe; Pham, Tyler; Nammi, Bharani; Wang, Shouyi; Liu, JinPurpose: Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and its associated (Cas) proteins together form the CRISPR-Cas system. The CRISPR-Cas system typically forms the machinery for innate defense mechanism in prokaryotes against foreign genetic elements such as phages and plasmids. The recent development of this mechanism into a gene editing technology holds a promise to correct gene level defects for several genetic diseases. The key element of CRISPR-Cas system is the Cas protein that are nucleases and possess the ability to edit gene of interest. Different types of Cas proteins are involved in different CRISPR-Cas systems. Cas proteins however suffer from inherent limitations like specificity and off-target effects which limits its widespread application as a gene editing tool. In the current study, a novel method has been developed for classifying the Cas9 and Cas12 families. Existing classification tools have a low overall accuracy and are usually built using only a few types of protein features. We also attempt to understand the different protein features governing the Cas9 and Cas12 classes using a multitude of protein features. Method: We built Random Forest (RF) binary classifiers to classify Cas12 and Cas9 proteins respectively using the complete spectrum of protein features (13,495 features) encoding the physiochemical, constitutional, and evolutionary information. Additionally, we also built multiclass RF classifiers that differentiates between Cas9, Cas12 and non-Cas proteins. The performance of all models was evaluated using a 5-fold cross validation and six evaluation metrices like accuracy, precision, recall, F1-score, AUC score and specificity. We also tested our models on the respective independent datasets that were developed in-house from various public domain databases. Results: The Cas12 and Cas9 models achieved a high overall accuracy of 0.97 and 0.96 on their independent datasets respectively while the multiclass classifier achieved a high F1 score of 1.0. We observed that amino acid composition, Qasi-sequence-order and Composition-based protein features are particularly important for the Cas12 and Cas9 family of proteins. Conclusions: We successfully built the classification models for Cas12 and Cas9 protein families and identified the protein features that are unique to each family, which enhance the understanding of the structure and functions of Cas9 and Cas12 proteins and also provide valuable insights into plausible structural modifications in these proteins to achieve enhanced specificity and reduced off-target effects.