Transcriptome profiling of neurodegenerative disorders modelled with IPSCs and computational methods for CRISPR/Cas9 editing

PHD student: Giulia Corsi, giulia@rth.dk 
Thesis defended: 3th September 2021

Background for the project

For several decades the study of neurodegenerative disorders has been limited by the lack of disease models. More recently, technologies that allow to generate induced pluripotent stem cells (IPSCs), edit their genome, and differentiate them into brain cells, allowed scientists to study the impact of genomic variants linked to neurodegenerative disorders in different types of brain cells.

The effect of a disease-causing genomic variant can be studied by analysing and comparing the transcriptome of diseased cells bearing the variant and healthy controls, in which the variant is removed via genome editing. Such analysis provides an overview of how gene expression is altered by the presence of the variant. Further analysis of networks of functionally related genes that present altered expression in the presence of the variant can highlight cellular processes and disease mechanisms in the cells. A better understanding of the molecular changes that lead to neurodegeneration is key for finding biomarkers and potential targets for therapies to cure such devastating diseases.

Editing the genome to study disease-related variants has been made easier by the discovery of the CRISPR/Cas9 RNA-guided endonuclease system, a bacterial immune-defence machinery that cleaves the DNA. Obtaining cleavage in the genome via CRISPR/Cas9 comes, however, with some challenges and risks, the main ones being on-target efficiency and off-target cleavage. The prior relates to the fact that the efficiency of the system variates depending on the target site, while the latter refers to the possibility that DNA cleavage will happen at non-target sites due to the tolerance of Cas9 and of the gRNA to mismatches in the binding to a target sequence.

Purpose of the project

The primary goal of the project was to better understand the cellular mechanisms related to two neurodegenerative disorders, namely Alzheimer’s disease (AD) and frontotemporal dementia linked to chromosome 3 (FTD3). For this, RNA-seq data was produced by our collaborators who employed patient-derived IPSCs to generate 1) neurons carrying one of two missense mutations in the Presenilin-1 protein linked to AD and corresponding controls and 2) astrocytes carrying a mutation at the 

splice acceptor of the Charged Multivesicular Body Protein 2B that leads to the production of a truncated protein and corresponding controls. To analyse such RNA-seq data we aimed to develop a computational pipeline for the evaluation of gene expression changes. Further analysis of the functional relations between these differentially expression genes was also planned to help driving and supporting the additional molecular and phenotypical analyses carried out by our collaborators on such cells, with the final aim of improving our understanding of the disease mechanism.

As a second major goal, we intended to facilitate and enhance the design of future experiments using CRISPR/Cas9 for genome editing. For this, the binding mechanism of CRISPR/Cas9 and its associated guide RNA molecule was to be investigated using cleavage profiling data. The same data could then be exploited to develop improved machine learning methods for on-target efficiency prediction.

Results so far

The analysis of RNA-seq data derived from AD neurons, FTD3 astrocytes, and corresponding isogenic controls was carried out with a computational pipeline developed in Snakemake, a workflow management system that simplifies modularity, reusability, scalability, and reproducibility of results. We report the altered expression of hundreds of genes in the disease carrying cells, some of which relate to specific pathways or organelles that, based on our analysis, play a central role in the disease.

As part of this project, we developed a computational pipeline for the analysis of RNA-seq data from experiments in which CRISPR/Cas9 edited cells and corresponding non-edited controls are compared. The pipeline comes under the name of CRISPRroots (CRISPR RNA-seq based on-target and off-target assessment) and it is available via https://rth.dk/resources/crispr/.

To facilitate the design of experiments that make use of CRISPR/Cas9 for genome editing we analysed efficiency data related to the CRISPR/Cas9 – guide RNA (gRNA) system to explain the mechanism of binding between the CRISPR/Cas9 – gRNA complex and target DNA sites, focusing on binding and folding energies of nucleic acids molecules. Moreover, by combining existing and new datasets of CRISPR/Cas9 cleavage activity, generated by our collaborators, we developed CRISPRon, a deep learning model that predicts gRNA on-target efficiency. Our model outperforms other currently available tools that predict gRNA efficiency on independent test datasets. The model is implemented in a webserver that is accessible via https://rth.dk/resources/crispr/. To help researchers designing their CRISPR/Cas9 – mediated genome editing experiments we organized an online course on gRNA design, in which CRISPRon and other tools for CRISPR/Cas9 – gRNA design were presented (https://eventsignup.ku.dk/crispr-tsunami-sep2021).