Many disease risks can be seen in a person’s genes, if you only know where to look. The challenge being that the human DNA is a hard-to-read chain with more than three billion nucleobase pairs and a total length of roughly two meters.
The development of a single disease can be influenced by hundreds of thousands of different gene variants, each located at some point of the DNA chain. But the significance of any single gene variant is typically minor, and diseases are caused by a combination of genetic factors.
The FinnGen project for charting the genes of the Finns is exceptionally ambitious. It is also one of only a few European scientific projects of this scale to be uploaded into the cloud and analyzed through cloud services. Qvik has built a system for the secure storage and use of the FinnGen project’s genome data in the Google Cloud Platform.
Challenges in data security and the enormous volume of data
The six-year FinnGen project is a joint effort by the public sector and pharmaceutical companies, with the potential for some groundbreaking results from combining genome information with various disease history and drug prescription data based on various national health records.
Drawing connections between specific genetic variations and diseases requires a large volume of research data, which sets special requirements for the system.
“Even though the genetic information has not been linked to personal data, genomic data is always classified as sensitive data”, says Jarmo Harju, the IT and data lead of the FinnGen project.
The processing of this type of data is subject to strict regulations.
“The scientists are only permitted to access a limited dataset, which must be impossible to copy or download. We were very pleased that the implementation of the system stayed on schedule, even though the specifications were being detailed in real time during the project.”
Use of cloud services in scientific projects still novel
The system built by Qvik allows the FinnGen project’s scientists to access the data required for core analyses. The participating pharmaceutical companies, on the other hand, have their own sandboxes, through which they can access the results of the core research performed by FinnGen’s own group of scientists.
The cloud service makes the work of the scientists easier in many ways, and the project has made some waves in the scientific community.
“Many medical research teams have approached us to ask about the implementation of the project, the types of agreements involved and our user experiences”, Harju says. “For example, the unlimited capacity and uninterrupted service offered by cloud services make a scientist’s life easier in a multitude of ways.”
The principal concern in medical research projects is data security, however.
“The FinnGen project is fascinating because of the wide network of academic research institutes and pharmaceutical companies involved. Opening an internal network for so many external users would not be feasible solution, but cloud services offer secure solutions for this”, Harju says. “Many research teams have been interested in how Qvik has implemented its sandboxes, for example.”
The research environment for the FinnGen project has now been set up, and the actual research can begin.
Illustration: Aija Malmioja