Deep Learning for Alzheimer's Disease Genomics

Oxford MSc Dissertation Project

I completed my Oxford MSc in CS dissertation under the supervision of Prof. Alejo Nevado-Holgado. In this project, I applied transformer neural network models and support vector machine (SVM) models to whole genome sequencing data to predict presence of Alzheimer’s disease. The tranformer modles were based on the BERT model for natural language processing (110M parameters). The transformer models did not find a signal in the Alzheimer’s Disease Neuroimaging Initiative dataset, but the SVM models were able to find a predictive signal on many single nucleotide polymorphisms within the genes tested. The results suggested that SVM models are more effective when applied to small datasets with limited signal, and that transformers require a larger dataset or greater amount of signal to be effective. To our knowledge, this is the first study applying neural networks to unbroken stretches of DNA sequencing data to attempt the prediction of a phenotypic trait.