Big Data – Driving Precision Medicine

I heard the term Big Data for the first time three years ago when I was participating in the American Association for Cancer Research’s (AACR) Annual Meeting as part of the Scientist-Survivor Program (SSP). Each survivor participant in the SSP is placed into a working group whose task is to answer a question relevant to the meeting and present their findings to the entire SSP group.  My working group’s question was…

“As a direct consequence of the exponential increase in the number of cancer patients having their genome sequenced via WGS/NGS – and the phenotypic (clinical) data from patients that is being collected in tandem – there is a data “tsunami” that is already overwhelming our ability to manage and analyze it.  From what you know and what you observed at this Annual Meeting, do you think this is a major barrier to realizing what is being touted as the future of medicine – i.e. molecularly targeted cancer medicine (precision/personalized medicine)?  What should be done on a national basis to address this problem – look around you, are we dealing with big data elsewhere?”

When I received this question weeks before the actual meeting I thought, “What did I get myself into?” I had no idea what WGS or NGS stood for and I wasn’t quite sure if I fully understood what phenotypic and Big Data meant. Prior to the meeting I made a few phone calls to my buddies at the International Myeloma Foundation and The Multiple Myeloma Research Foundation hoping they would help me get a preliminary understanding of these terms.

During the SSP’s opening session at the AACR’s conference in San Diego I was introduced to my working group. As a group we attended various sessions and examined posters that related to our question. We worked together before and after the general and poster sessions to answer our working group’s question. In retrospect I now realize it was a forward thinking question.  We were examining the issues associated with Big Data before President Obama announced his Precision Medicine Initiative on January 20, 2015. Precision Medicine is dependent on Big Data.


What are WGS, NGS, and Big Data and what challenges need to be overcome when using Big Data to guide Precision Medicine?   WGS and NGS stand for Whole Genome Sequencing and Next Generation Sequencing respectively. Genome sequencing is figuring out the order of DNA bases. Genome sequencing can also help identify an error or mutation in a gene that may cause a disease. Sometimes a translocation occurs. Translocations are chromosomal rearrangements in which a segment of genetic material from one chromosome becomes linked to another chromosome. An example of this is the (4,14) translocation in myeloma. Other times a part of a chromosome or a sequence of DNA is lost during DNA replication. This is called a deletion. The deletion of chromosome 17 is a high risk indicator in myeloma. Other chromosomal mutations are duplications and inversions. A duplication mutation occurs when chromosomal fragments are doubled, resulting in duplication of genetic material. Inversion mutations occur when chromosomes change their original directions.

Next Generation Sequencing

NGS was once very expensive and time consuming. Not many people had their cancer tumor sequenced for mutations because of cost.  But the cost and time to complete a sequence has decreased dramatically making this an option for more people. More genomes are being sequenced than ever before and more data is being produced. This is a good thing. Larger sets of data from NGS matched to clinical responses will lead to more predictive models for targeted therapies.

Conversely genome sequencing creates so much data that we currently don’t know what to do with it, but we are learning! Michael Schatz, a professor at Cold Spring Harbor Laboratory in New York, said if all  the genomic data that has currently been collected were combined with all the extra information that comes with sequencing genomes and recorded on typical 4-gigabyte DVDs the result would be a stack about half a mile high. Now that’s BIG!

Big Data and Precision Medicine

The National Institute of Health (NIH) defines precision medicine as an emerging approach for disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle for each person. How is this done?  This is where Big Data comes into play. Big Data are extremely large sets of data from multiple sources. In cancer research this data comes from genetic testing, the person’s electronic health records, doctor notes, response to treatment, imaging studies, lifestyle and countless other sources. This data is aggregated and analyzed in hopes of building better prognostic paradigms as to how to treat a particular individual.

Managing the Big Data comes with challenges. Some of these challenges can be described using  4 V’s- volume, variety, velocity and veracity. Volume refers to the amount of data that is being generated. Variety indicates the different types of data that can be gathered. Velocity describes the speed at which new data is being generated and veracity signifies the trustworthiness of the data. In recent years with advances in technology the volume, variety and velocity of data has grown exponentially. Finding effective ways to tackle these challenges will pave the way for the new era of Precision Medicine.

The medical community doesn’t need to start at ground zero in dealing with the challenges of Big Data.  Data scientist working in healthcare can learn from businesses such as Amazon, Google and Netflix that have been successfully mining Big Data for quite a while. When I attended the Cancer Moonshot Summit in Washington, DC I learned of several collaborations in this area.  For example IBM’s Watson of Jeopardy! fame will help doctors expand access to precision medicine for 10,000 American veterans.  Using Watson’s advanced computing technology Veteran’s Adminstartion doctors will be able to quickly identify genomic treatment options for veterans. VA doctors will sequence a cancer patients’ DNA. Then these files will be feed into IBM’s Watson. Watson in return will generate a report that lists the likely cancer-causing mutations and potential treatment options.

To make personalized medicine a reality requires accessing and processing vast volumes of structured and unstructured data about individual patients. A Cloud Based storage system is required to be able to save and access these extremely large data sets. Data needs to be shared and open for all to study (including patients in my opinion). Problems with the interoperability between different electronical health records need to be overcome.  Partnerships must be forged with data scientists, supercomputer experts, pharmaceutical/biotech companies, foundations, advocacy organizations, philanthropists, and government agencies. Patients need to be willing to share their tissue samples, bone marrow samples and genomic data. We have the tools to do this. Now is the time to forge ahead.

Cindy Chmielewski Precision Medicine Advocate

Cindy Chmielewski Precision Medicine Advocate

Cynthia Chmielewski was diagnosed with multiple myeloma, a blood cancer, in 2008. Cindy’s induction therapy stopped working after a few cycles and she proceeded with a stem cell transplant which failed to put her into remission. Depressed and scared she continued her fight using newly FDA-approved targeted therapies which eventually put her in remission. Cynthia continues treatment with a maintenance protocol.  Cynthia is using her passion for education to teach a new group of “students” – myeloma patients, their caregivers and others interested in myeloma.  She is a trained mentor, advocate and Patient Ambassador.