# Interpretable Deep Learning to Predict Cancer Outcomes
**Name: Eoghan McGlinchey**
**Supervisor: Dr. Colm Ryan**
This repository contains the code, data and figures that were written, used and generated as part of this project. The raw and processed data can be found in their respective folders, although viewing them may be quite difficult as the files contained withing these folders are so large. Similarly, the training, validation and test data used for the models can be found in their respective clinical and binary mutation matrix folders, again, viewing this data may be difficult for the reasons outlined above.
The Python files contain the code that was used to preprocess the raw data as well as the code that was used to create training, validation and test datasets. Meanwhile, the Jupyter notebooks contain the code that was used to train the models. The "training_dataset_selection.ipynb" file contains the code that was used to determine which training dataset from the binary mutation matrix would be used to train each model. With regards to the code that was used to train the models, a lot of the Jupyter Notebooks contain hyperparameter tuning, which if run, will take quite a while (at least an hour) so I would advise skipping these cells.
To run the Python scripts and the Jupyter Notebooks, there is a "requirements.txt" file that can be used to install the packages that were installed in the virtual environment. These packages can be installed using the "pip install -r requirements.txt" command.