Browsing by Author "Mashburn, Dominic"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Computational Design of Compact CRISPR-Cas Enzymes of Lachnospiraceae bacterium Cas12a Utilizing Bioinformatic Tools(2023) Mashburn, Dominic; Arachchige, Vindi; Liu, JinPurpose: Nature has provided us with a popular genome editing tool known as the CRISPR (clustered regularly interspaced short palindromic repeats)-Cas system, which has shown promise in both plants and animals. The CRISPR-Cas system utilizes a guide RNA (gRNA) and specific proteins known as Cas proteins to facilitate its function. A major limitation of the CRISPR/Cas system and any gene therapy is how it’s delivered within the organism. The most common "vehicle” for delivering gene therapies is adeno-associated viral vectors (AAVs), which have a maximum effective capacity of approximately 4.7 kb. The main issue with most Cas enzymes and other CRISPR components needed is that they are much bigger than this required maximum capacity. The most widely characterized CRISPR-Cas system is Cas9. However, the unique feature of Cas12a’s ability to process its own crRNA arrays without the requirement for tracrRNA makes it a promising candidate as well. In other CRISPR-Cas systems, the RNA CRISPR components need to be synthesized and packaged into an AAV, whereas in the Cas12a family, some of these components are not needed. Lachnospiraceae bacterium Cas12a (LbCas12a) has increased activity when compared to other species of Cas12a enzymes. To address the aforementioned size issue, we have used various bioinformatic tools to computationally design compact-size proteins of LbCas12a with similar functionality and comparable efficiency. Methods: The best available crystal structure of LbCas12a was chosen from the Protein Data Bank (PDB). A structure reduction process was carried out using Yasara and UCSF ChimeraX. The intermediate steps of this process were verified using the homology-based modeling tool SWISS-MODEL and AI-based modeling tool Alphafold2 to ensure that the protein was still folding similarly to the original structure. Furthermore, the global and local structural features were analyzed, and the best candidate was subjected to molecular dynamics (MD) simulations along with gRNA and substrate DNA to determine its functional efficiency under realistic dynamic conditions and compared it with the original structure. Results/Conclusions: A compact-size variant of LbCas12a was generated, which is 292 residues smaller than the original crystal structure. This man-made miniature protein contains all the regions that are needed for DNA cleavage activity. MD simulations confirm its stability in the presence of DNA and gRNA. Further validation of the designed protein and experimental testing is under investigation at this point of the study.Item Identification of Family-Specific Features in Cas9 and Cas12 Proteins: A Machine Learning Approach Using Complete Protein Feature Spectrum(Cold Spring Harbor Laboratory, 2024-02-08) Madugula, Sita S.; Pujar, Pranav; Bharani, Nammi; Wang, Shouyi; Jayasinghe-Arachchige, Vindi M.; Pham, Tyler; Mashburn, Dominic; Artilis, Maria; Liu, JinThe recent development of CRISPR-Cas technology holds promise to correct gene-level defects for genetic diseases. The key element of the CRISPR-Cas system is the Cas protein, a nuclease that can edit the gene of interest assisted by guide RNA. However, these Cas proteins suffer from inherent limitations like large size, low cleavage efficiency, and off-target effects, hindering their widespread application as a gene editing tool. Therefore, there is a need to identify novel Cas proteins with improved editing properties, for which it is necessary to understand the underlying features governing the Cas families. In the current study, we aim to elucidate the unique protein attributes associated with Cas9 and Cas12 families and identify the features that distinguish each family from the other. Here, we built Random Forest (RF) binary classifiers to distinguish Cas12 and Cas9 proteins from non-Cas proteins, respectively, using the complete protein feature spectrum (13,495 features) encoding various physiochemical, topological, constitutional, and coevolutionary information of Cas proteins. Furthermore, we built multiclass RF classifiers differentiating Cas9, Cas12, and Non-Cas proteins. All the models were evaluated rigorously on the test and independent datasets. The Cas12 and Cas9 binary models achieved a high overall accuracy of 95% and 97% on their respective independent datasets, while the multiclass classifier achieved a high F1 score of 0.97. We observed that Quasi-sequence-order descriptors like Schneider-lag descriptors and Composition descriptors like charge, volume, and polarizability are essential for the Cas12 family. More interestingly, we discovered that Amino Acid Composition descriptors, especially the Tripeptide Composition (TPC) descriptors, are important for the Cas9 family. Four of the identified important descriptors of Cas9 classification are tripeptides PWN, PYY, HHA, and DHI, which are seen to be conserved across all the Cas9 proteins and were located within different catalytically important domains of the Cas9 protein structure. Among these four tripeptides, tripeptides DHI and HHA are well-known to be involved in the DNA cleavage activity of the Cas9 protein. We therefore propose the the other two tripeptides, PWN and PYY, may also be essential for the Cas9 family. Our identified important descriptors enhanced the understanding of the catalytic mechanisms of Cas9 and Cas12 proteins and provide valuable insights into design of novel Cas systems to achieve enhanced gene-editing properties.