Building effective mRNA vaccines or personalized therapeutics requires the design of mRNA sequences which exhibit several desired biological properties. Among these are mRNA expression, stability, immunogenicity.
Currently, sequence/codon optimization algorithms are not optimally adapted to these purposes, as they work with parameters and statistics which were taken over the host genome while, e.g., the codon adaptation index CAI (and other parameters) of highly expressing RNA viruses differs significantly from their host. The idea of this project is to combine data mining on existing gene sequences in public databases with emerging novel algorithmic approaches for predicting gene phenotypes.
The ideal outcome of the project is that we not only discover parameters and target values that correlate with desired properties of the genes we want to use, but to also find scientific explanations for their behavior.
PhD project description
A main aim of the project will be to develop an optimization algorithm which is based on the obtained results, and to test it in vitro and in vivo against other available (classical) optimization algorithms to see how their performances are improved.
Required profile of the candidate
The candidate should be a bioinformatician who has ideally worked on any form of DNA or RNA sequence design before and/or has done analyses of large datasets of DNA or RNA sequences. The candidate should be familiar with statistical analyses of data (from NCBI or other sources).
Lab work is not required from the candidate – it is expected that the candidate will create several “experiments” in the form of DNA/RNA sequences) which will be handed over to our relevant departments which will produce the sequences and evaluate the biological parameters which are required. The candidate should therefore have good communication skills.
Finally, the candidate should be willing to use (existing) machine learning models and other approaches as tools. A deep understanding of machine learning is thus not required.
Publications relevant to the project
Mahendran N, Durai Raj Vincent PM, Srinivasan K, Chang CY. Machine Learning Based Computational Gene Selection Models: A Survey, Performance Evaluation, Open Issues, and Future Research Directions. Front Genet. 2020 Dec 10;11:603808. doi: 10.3389/fgene.2020.603808.
Mordstein et al.: Codon Usage and Splicing Jointly Influence mRNA Localization, Cell Syst., 2020 Apr 22; 10(4): 351-362.e8. doi 10.1016/j.cels.2020.03.001
Mordstein et al.: Transcription, mRNA Export, and Immune Evasion Shape the Codon Usage of Viruses, Genome Biol. Evol. 2021 Sep 1;13(9):evab106. doi: 10.1093/gbe/evab106
Zhang, H., Zhang, L., Lin, A. et al. Algorithm for Optimized mRNA Design Improves Stability and Immunogenicity. Nature (2023). doi: 10.1038/s41586-023-06127-z