CRISPR Protein Engineering
Data Driven Engineering of CRISPR-Cas12a for PAM Recognition
Masters of Science in Chemical Engineering Thesis
College of Engineering, University of Wisconsin-Madison
Abstract
We know that amino acids are combined in sequence to constitute proteins for an undefined number of biological functions. Proteins thus evolved for millions of years before being repurposed for human applications in the medical field, food and chemicals. CRISPR enzymes are emerging as a highly versatile workhorse for targeting of specific DNA sequences, useful in biomedicine and biotechnology. Exploring the vast space of possible protein sequences is intractable using traditional protein engineering approaches of rational design and directed evolution. Data-driven methods can greatly accelerate protein engineering strategies and aid in CRISPR enzyme engineering. Data-driven methods also leverage the vast and exponentially growing volume of biological data.
Here we design an experimental and computational pipeline to investigate the binding function of CRISPR-Cas12a. CRISPR-Cas12a works as a pair of molecular scissors that are programmed using an RNA molecule to a site with matching genetic material in DNA. An important limitation for human applications is that before they bind to their target DNA site, they must also bind to a protospacer adjacent motif (PAM). We design a library of mutant CRISPR-Cas12a proteins with chimeric sequences made by DNA recombination. To investigate PAM binding function, we develop an assay based on a Green Fluorescent Protein (GFP) reporter system presented by collaborators in the Beisel lab. We generate data on the order of millions of sequences by using long-read DNA sequencing or nanopore sequencing after we performed fluorescence activated cell sorting (FACS) using our assay on our chimeric library. Our assay is reproducible, shown by enrichment analysis on chimeric sequences, which yielded a consensus protein sequence between three sorting replicates. We further demonstrate machine learning methods to investigate a generalized model for CRISPR-Cas12a-PAM binding.
See my full report and/or presentation for further details.
References
Greenhalgh, Jonathan, Apoorv Saraogee, and Philip A. Romero. 2021. “Data-Driven Protein Engineering.” In Protein Engineering, 133–51. John Wiley & Sons, Ltd. https://doi.org/10.1002/9783527815128.ch6.
Saraogee, Apoorv. 2020. “Data-Driven Engineering of CRISPR-Cas12a for PAM Recognition.” Masters of Science, University of Wisconsin-Madison.