Big data promises better and faster diagnoses. But managing the increasing volume of data, creating common standards and guaranteeing patient privacy will require medical researchers to coordinate their efforts. Many Swiss organizations are working on it.

Florian Fisch
Swiss National Science Foundation | Science editor


Why data is continously getting bigger

On first hearing, ‘big data’ sounds great. The more we know about the human body, brain or gene expression, the better we can treat patients. So many researchers and companies try to sell their project or their product using this buzzword. But on closer examination, it quickly becomes clear that bigger data does not mean better knowledge. Often people are overwhelmed by the flood of data and struggle to make sense of it – especially in life sciences.

Take the intensive care unit. Every day, one single, critically ill patient generates up to 100 gigabytes of data according to Switzerland’s National Research Program big data (NRP 75). This data comes from patient monitoring, computer and magnetic resonance tomography of the brain, or laboratory results and biosensors. The monitoring system triggers about 700 alarms a day; one alarm every two minutes and most of these, false. It is clear that patient safety can be improved by getting to grips with that amount of information.

As we continue to use more apparatuses and more assays, the data continues to grow. It varies from gene expression data to daily activities recorded on smartwatches worn by patients or trial participants. Even environmental information from where these people live is recorded. It is an enormous quantity of data from different sources and of varying levels of quality. The data mountain gets bigger every day. That is a simple fact.


Better and faster diagnoses

Once it becomes possible to make sense of this big flood of data – and currently life sciences are struggling to stay afloat – there are many interesting uses. Take safety of the critically ill patient in the intensive care unit. A research project from the University Hospital Zurich, the ETH Zurich and IBM Research is working on procedures to filter the false alarms that occur almost every two minutes and thereby enable early detection of epileptic seizures and diagnose secondary brain damage, caused by cellular processes.

With data mining and machine learning, the researchers want to improve the quality of the alarm system and rapidly propose innovations to the medical community. “With this project, we want to initiate a fundamental development in emergency and intensive care medicine – and thus significantly improve the way hospitals work in day-to-day practice", says Emanuela Keller, professor at the University Hospital Zurich, in the NRP 75 press release.

Less urgent, but no less important, is to decide what therapy is suitable to treat a special type of cancer or any other disease. This means looking for informative biomarkers – these include “omics” data (like DNA sequences, gene expression profiles and metabolite levels), images, data from biobanks, doctor diagnoses, and environmental data. “Thanks to bioinformatics methods and tools and because we have now hundreds of thousands of data points, researchers can use tailor-made algorithms to identify biomarkers or common patterns. We can ask the algorithm: ‘please find a difference, between patients with disease X and healthy individuals’”, says Valérie Barbié from the Swiss Institute of Bioinformatics (SIB).

This could help to make the treatment decision for a cancer more precise, find unknown environmental factors that influence a lung problem or make it possible to diagnose a very rare bowel disease. SIB is working on the technical infrastructure, analysis methods, software tools and knowledge bases to make that dream come true.


Biology is complex and delivers noisy data

The term big data comes from information technology: all mobile phone connection data or all the petabytes (1015 bytes) CERN has gathered. This is all very well structured. Not so in the life sciences where the data is very heterogeneous – meaning varying between healthy individuals and very sensitive to small perturbations. In addition, there is an astronomical number of possible interactions with molecules or physical factors. “And you can easily trick your data to get a significant p-value with almost any data. Whatever you are putting into your algorithm, you will get the answer you want”, warns Valérie Barbié.

To base conclusions on a solid foundation and deliver on promises, standards have to be established on how to generate, handle and analyze the data. The way data is produced has to be described clearly. The exact type of assay might also be important. Algorithms need to be able to distinguish between medium quality data from a smart watch for example and high quality data from a medical brain scan. One of the challenges for the Swiss Personalized Health Network (SPHN) is therefore to implement standards, which are essential to aggregate, compare and analyze data stemming from different sources, like hospitals from different parts of the country. Such standards are long and well established in other industries like the banking sector.


Physicians lack clear standards

“If the data you put into the analysis is not well characterized, the answers you get out will lead to wrong diagnoses”, says Valérie Barbié. That is a problem because every country, every hospital and every medical field has its own vocabulary and its own categories. A condition named colorectal cancer in one place can be defined as colorectal adenocarcinoma in another, just to name one simple example. This means that the data is not comparable. And, only now are hospitals across Switzerland starting to implement electronic health records (EHRs).

The way hospitals are run adds to the confusion. Huldrych Günthard from the Zurich University Hospital is the head of the Swiss cohort survey on HIV that collects data that is comprehensive, well-structured and consistent over long time periods. He struggles with this. As he explained to the Swiss research magazine Horizons in 2016: “Specific diagnoses in hospitals are sometimes distorted by economic factors, such as codifying invoices according to flat-rate payments.”

Here again, the SPHN comes into play: Meetings are held to agree on international standards on how to name the information and store it. To allow the exchange of data among researchers, the Swiss Academy of Medical Sciences (SAMS) has implemented a general consent for patients to agree on the use of their data. Therefore, in principle, the way is open for big studies.


It's difficult to guarantee privacy

Genomic data contains a lot of information. Information, we hope to be able to use to decide how to best treat many conditions. But more is hidden in there: for example, information about other family members who may not agree to be part of the research. Many risk factors are genetic too, and might tell you something people do not want to know. Maybe because it scares them or they cannot do anything about it. Maybe they don’t want their insurance to know, as it might increase the premium or exclude them from its services.

In a connected world, data is easily exchanged but also easily stolen. Many companies have fallen victim to attacks on their IT systems – even places considered to be safe. Hospitals with their decentralised structures and different systems are not the safest places for data to start with. And with the introduction of electronic health records it becomes easier to access large quantities of data – to the benefit of research, but to the detriment of safety.

How can we guarantee that the data is safe? Of course, data has to be anonymized. It has to be encrypted. Researchers should not be allowed to download data but have to work on secured research platforms. This is possible. Nordic and Baltic countries like Estonia, Sweden and Denmark have shown that it is possible to go digital without having to suffer major security breaches. This is important. After all, research relies on the general consent of patients and study participants.

Biology is complex and delivers noisy data. To base conclusions on a solid foundation and deliver on promises, standards have to be established on how to generate, handle and analyse the data.

THE PARTICIPANTS: Switzerland is developing big data

  • Swiss Personalized Health Network (SPHN)
    The federal initiative rallies all decision-makers around the table and contributes to the development, implementation and validation of coordinated infrastructures for personalized health research across Switzerland. The goal is to access health data nationwide, while preserving individual’s privacy. The initiative is coordinated by the Swiss Academy of Medical Sciences (SAMS) and the Swiss Institute of Bioinformatics (SIB).
  • Swiss Biobanking Platform (SBP)
    The national coordination platform for human and non-human biobanks aims at improving the quality and the interconnectedness of biobanks for research purposes. SBP was initiated by the Swiss National Science Foundation (SNSF). The SNSF also issues BioLink grants to network biobanks for research purposes in collaboration with SBP.
  • National Research Program Big Data (NRP 75)
    The program supports research projects that provide foundations for effective and appropriate use of big data. Commissioned by the Swiss Federal Council it is run by the Swiss National Science Foundation (SNSF). The research projects dispose of a total of 25 million and will run until 2021.
  • Swiss Clinical Trial Organization (SCTO)
    The organization coordinates a network of 7 Clinical Trial Units (CTUs) across Switzerland and is the central cooperation platform for patient-oriented clinical research. The aim is to improve the quality, efficiency, and transparency of clinical research and make it more visible. The SCTO has been jointly founded by the Swiss National Science Foundation (SNSF) and the Swiss Academy of Medical Sciences (SAMS) and is today funded by the State Secretariat of Education, Research and Innovation (SERI) and the SNSF.
  • Longitudinal studies
    This funding instrument of the Swiss National Science Foundation (SNSF) allows the support of multi-centric, population-based or disease-oriented studies with a longitudinal design. Studies of this nature allow to follow a group of people sharing a defining characteristic over a long period of time. They are dedicated to a topic of high relevance to the Swiss health system.