Home Biomedical research HKUMed research team combines artificial intelligence and protein engineering technology to improve gene editing efficiency

HKUMed research team combines artificial intelligence and protein engineering technology to improve gene editing efficiency


A research team from the University of Hong Kong’s LKS School of Medicine (HKUMed) has discovered more efficient CRISPR-Cas9 variants that could be useful for gene therapy applications. By establishing a new pipeline methodology that implements machine learning on high-throughput screening to accurately predict the activity of protein variants, the team expands the ability to analyze up to 20 times more variants at time without the need to acquire additional experimental data, which dramatically accelerates the speed of protein engineering. The research team successfully applied the pipeline in several Cas9 optimizations and engineered novel Staphylococcus aureus Cas9 (SaCas9) variants with improved gene-editing efficiency. The results are now published in Nature Communications (link to publication) and a patent application has been filed based on this work.


Staphylococcus aureus Cas9 (SaCas9) is an excellent candidate for in vivo gene therapy due to its small size allowing packaging into adeno-associated viral vectors to be delivered into human cells for therapeutic applications. However, its gene editing activity might be insufficient for some specific disease loci. Further optimizations of SaCas9 are crucial in precision medicine before it can be used as a reliable tool to treat human diseases. These optimizations consist of increasing its efficiency and accuracy by modifying the Cas9 protein. The standard protocol for modifying the protein involves saturation mutagenesis, where the number of possible modifications that could be introduced into the protein far exceeds the experimental screening capacity of even state-of-the-art high-throughput platforms by orders of magnitude.

In this work, the research team explored whether the combination of machine learning with structure-guided mutagenesis library screening could enable virtual screening of many other modifications to accurately identify rare and top-performing variants. for further validation.

search results

The research team tested the machine learning framework on several previously published mutagenesis screens on Cas9 variants and illustrated that machine learning could robustly identify the best performing variants using only 5-20% of the experimentally determined data.

The Cas9 protein contains several parts, including protospacer-adjacent (PAM) (PI) and Wedge (WED) motif-interacting domains to facilitate its interaction with the target DNA duplex. The research team coupled the platforms of machine learning and high-throughput screening to design an enhanced activity SaCas9 protein by combining mutations in its PI and WED domains surrounding the DNA duplex carrying a (PAM) . PAM is essential for Cas9 to modify the target DNA and the idea was to reduce the PAM constraint for broader genome targeting while securing the protein structure by enhancing the interaction with the PAM-containing DNA duplex via the domain WED.

In the screen and subsequent validations, the researchers identified new variants, including one named KKH-SaCas9-plus, with up to 33% improved activity at specific genomic loci. The subsequent protein modeling analysis revealed the new interactions created between the WED and PI domains at several locations in the PAM-containing DNA duplex, attributing to the increased efficiency of KKH-SaCas9-plus.

Importance of research

Structure-guided design dominates the field of Cas9 engineering; however, it only explores a small number of sites, amino acid residues, and combinations. In this study, the research team showed that screening on a larger scale and with less experimental effort, time and cost can be performed using the multi-domain combinatorial mutagenesis screening approach. coupled with machine learning, which led them to identify a novel high-efficiency variant KKH-SaCas9-plus.

“This approach will significantly accelerate the optimization of Cas9 proteins, which could allow genome editing to be applied to the treatment of genetic diseases more efficiently,” said Dr. Alan Wong Siu-lun, assistant professor at the School of Biomedical Sciences, HKUMed.

About the research team

This research was led by Dr. Alan Wong Siu-lun, Assistant Professor, School of Biomedical Sciences, HKUMed, as corresponding author. Ms. Dawn Thean Gek-lian, research assistant; Dr. Athena Chu Hoi-yee, Postdoctoral Fellow, School of Biomedical Sciences, HKUMed, were co-first authors, with assistance from Dr. Fong Hoi-chun, PhD student; Mrs. Becky Chan Ka-ching, doctoral student; Dr. Zhou Peng, postdoctoral fellow; Ms. Cynthia Kwok Chui-shan, Research Assistant, and Dr. Gigi Choi Ching-gee, Postdoctoral Fellow, School of Biomedical Sciences, HKUMed. Other collaborators included Dr. Joshua Ho Wing-kei, Associate Professor, School of Biomedical Sciences, HKUMed; Dr. Zheng Zongli and his team from Ming Wai Lau Center for Restorative Medicine, Karolinska Institutet, Hong Kong Node.


This work was supported by the Excellent Young Scientists Fund, the National Natural Science Foundation of China (32022089), the Hong Kong Research Grants Council (17104619) and the Center for Oncology and Immunology Limited under the [email protected] Program initiated by the Innovation and Technology Commission, Hong Kong Special Administrative Region (HKSAR) government. This work was also supported in part by the Associate Member Program of the Ming Wai Lau Center for Reparative Medicine and the [email protected] launched by the Innovation and Technology Commission, HKSAR Government.