Is Medicine a Big Data Problem?

Ted Driscoll

theHealthCareBlogLogoThis post was originally published at The Health Care Blog please check there to see comments.


Human beings are big data. We aren’t just 175 pounds of meat and bone. We aren’t just piles of hydrogen and carbon and oxygen. What makes us all different is how it’s all organized and that is information.

We can no longer treat people based on simple numbers like weight, pulse, blood pressure, and temperature. What makes us different is much more complicated than that.

We’ve known for decades that we are all slightly different genetically, but now we can increasingly see those differences. The Hippocratic oath will require doctors to take this genetic variability into account.

I’m not saying there isn’t a place for hands-on medicine, empathy, psychology and moral support. But the personalized handling of each patient is becoming much more complicated. The more data we can gather, the more each individual is different from others.

In our genome, we have approximately 3 billion base pairs in each of our trillions of cells. We have more than 25,000 genes in that genome, sometimes called the exome. Each gene contains instructions on how to make a useful protein. And then there are long stretches of our genomes that regulate those protein-manufacturing genes.

In the early days, some researchers called this “junk DNA” because they didn’t know what it did. But this was foolish because why would evolution conserve these DNA sequences between genes if they did nothing? Now we know they too do things that make us unique.

Recent science has found there are variations in each individual’s genome — in raw numbers more than 60 million locations where individual base pairs vary from one person to another. They are called SNPs or single nucleotide polymorphisms. This is a large part of what makes us different. Science has so far figured out what a few million of those SNPs do.

And it isn’t even as simple as that. Overlaying our incredibly complex genomes are our epigenomes. This is a “layer” that turns our genetic code up or down, on or off. It is affected by our past and even our ancestors’ past. So that makes us more individually unique. Even identical twins that have identical genomes can have distinct epigenomes.

And we are “villages”, not just individual creatures. We have ten times as many bacterial inhabitants in our bodies as we have our own cells. This collection of bacteria we each carry is called our microbiome. And each individual bacterial cell has its own genome, albeit smaller than ours. The population distribution of those bacteria is important to our health.

We can’t survive without some of them, but others, or distorted populations of others, can make us sick. They have to be taken into account in evaluating health and determining personalized treatment.

And there are even more viruses floating around in our village. This is called our virome. Many of those viruses infect our bacterial inhabitants, not us. Some affect us but benignly. Others threaten us.

And I haven’t even touched on the metabolome, the immunicome, our t-cell repertoire…

The point is: we are all defined by the information content held in our bodies. This is analogous to computer code. It is these instructions that make us unique, that run our systems that help us cooperate with our bacterial villagers, or fight them if they attack us. In short, we are big piles of code, not in bits and bytes, but in base pairs. And those bacterial cells and viral cells are also just stray pieces of code that direct how they operate or infect.

This big pile of various codes is measured in many terabytes — trillions of bytes of information. And it’s unique for each individual human being. The medical profession will soon be unable to diagnose and treat us without using that information, and it’s too voluminous to be processed in a doctor’s mind. It requires IT, Information Technology, to find the signal in the vast noise, to personalize the treatment for every patient.

This is the world we are going to be living in very soon.

This is the world I am investing in as an early stage, digital health venture capitalist. My focus is simple: helping doctors make better, more personalized treatment decisions. And that means gathering and personalizing diagnosis and treatment based on that vast amount of data encoded in our bodies.

I’m focusing my attention on three general areas right now, because I think they will have an effect in the time frame I’m looking for: Prenatal Testing, Personalized Drug Selection and Cancer Detection and Stratification.

Curiously, I’m coming at this problem area as a computer scientist who’s been playing with big datasets my entire career–mainly in imaging when I was pushing giant pixel datasets. And interestingly, founders with limited medical backgrounds, coming from CS or EE backgrounds, run many of the startups I’ve funded. The digital world is upon us, and perhaps no field needs it more than medicine.

Finally we are coming to the point where we don’t diagnose disease by a rash or temperature, by a lump or even a stained microscope slide. We are coming to know the actual molecular pathways of disease, the actual atomic mechanisms that are going wrong, or being hijacked by a pathogen.

We can now see a future where we will detect the earliest molecular signs of a disease and can molecularly address it before it is even symptomatic, before there is a rash or a lump.

This transition has been going on since at least Watson and Crick discovered the double helix of DNA. It will continue being unveiled for at least another fifty years. But the magnitude of our understanding has gotten so big that computers must be involved going forward. And this is a true revolution. I would argue it is the greatest revolution humanity has ever experienced.

So how is medicine going to adapt to this changing world? Initially our electronic medical records are going to have to capture this giant dataset for each individual. We haven’t even finished digitizing all those paper, handwritten medical records, and they are now going to grow exponentially with all this new “-omic” data.

Some of that dataset is constant through a person’s life, for example our genome. Yes, it slowly ages and our telomeres slowly shorten, and rarely we are struck by a cosmic ray that knocks a base pair out of our genetic sequence in one cell. But most of the individual genome we are born with is the same as the one we die with.

In my opinion, our genomes will be captured at or even before birth in the near future, and will inform our doctors on what will work and what won’t, throughout our lives.

But the other sources of the big data we carry are changing as we age. Our epigenome is slowly adapting to experiences we’ve had. Our microbiome and virome are shifting every day, depending on what we’ve eaten, and where we’ve been. So doctors will likely have to collect more data throughout our lives, as we age and change.

And it won’t all be reading our “-omic” data and interpreting it. Soon we will be writing that data, changing our genomes, and changing the genomes of the villagers we live with. This will raise powerful ethical questions that will be the most complex ones humanity has ever encountered.

Life has been automatic for billions of years, very slowly evolving to adapt and survive. In the next century, we will be able to take over that automatic process, to create life forms, to bypass evolution, to improve our own genomes, to hijack others. But should we? Some changes will save lives. Some will improve them.

Where will we draw the line?

The revolution we are in the midst of is huge. It’s taking us a century to figure this out and we are still a few decades from having the complete picture. But we are coming to understand the workings and programming of our bodies, of life itself, at the atomic level. And soon we will be able to write the code of life, not just read it.

There truly is no lower level. This is a momentous transition. Life has been on this planet for 3 or 4 billion years. Humanity is going from using leeches and tourniquets to genetic engineering in a couple of hundred years. That’s .000002% of the time life has been evolving on this planet. It’s a flashbulb in the metaphorical period of a year.

But we will have to confront this revolution because it is happening around us as we sit here. It will have many outcomes and changes. It will save lives and reduce suffering. But it will also raise big ethical questions, and they won’t be uniformly decided across the planet in every country, in every culture and religion.

For billions of years, we’ve been passengers on the bus of life. Now we are seizing the steering wheel. Interesting times ahead.