My homework assignment from the VP: How to bring data science to cancer research

Author:

Dan Wagner | CEO, Civis Analytics

Cancer Data by Source: Venn diagram of sources of data from the Government, Research, and Patient sectors

Earlier this year, Vice President Biden gave me a homework assignment in his dining room.

He asked me, along with a few other data and analytics industry folks, to join him for a discussion about the significant role that data could play in doubling the pace of progress in cancer research and accelerating the discovery of treatments.

Now, you might be wondering, what did I stand to contribute to the Vice President’s Cancer Moonshot initiative? It’s a reasonable question, as cancer isn’t commonly thought of as a data problem, and I’m not a doctor or a medical researcher. But I have spent my career using data to improve outcomes that matter, including bringing cutting-edge analytics to Obama’s re-election campaign in 2012 and working to identify Americans who are most likely lacking health insurance.

Pyramid graphic of available data sources

It turns out that cancer is — increasingly — a big data problem.

Gene sequencing, the critical tool for identifying cancer-causing genetic mutations, can produce over 100 gigabytes of data per person, which is roughly the amount of information that can be stored on your personal laptop. Concurrently, your hospital and insurance company are creating large amounts of important digital medical history. All that data could be used to better diagnose cancer patients (who’s sick?), improve precision delivery (what treatment should they receive?), and to identify new types of therapy (what new treatment will help this subset of patients?) — but right now, it’s not being used to its full potential. As cancer data collection technology improves, research centers need to learn how to store, aggregate, integrate, analyze, and interpret these vast data sets to improve research and delivery. And it’s not easy.

The Vice President’s charge to us was simple: “Tell me what I need to know. Tell me where we can make the most progress.”

So we got to work.

Together with a dedicated team at Civis Analytics, we worked to craft a set of data-centric policy recommendations for the Moonshot team. Over the course of four months, we talked to representatives from over 40 institutions, including pharmaceutical companies, academic research groups, insurers, legal teams, commercial and non-profit groups working on data systems, and survivors.

Then, we put together a report on what we learned. To read our recommendations, download the full report.

Here’s the topline summary:

Good data science requires three big things: a comprehensive dataset to analyze, technology infrastructure to store and analyze that dataset, and skilled personnel to structure the data and carry out the analysis.

Right now, however, there are major systemic barriers that prevent the U.S. cancer research system from meeting these foundational requirements for good data science. As a result, there is a consensus within the community that the system is under-delivering on its potential for leveraging new data sources into better research. Our conversations identified challenges that range from complex technical problems, like how to efficiently store and analyze vast amounts of genetic sequencing data, to bureaucratic barriers around information sharing that slow innovation.

While it’s tempting to endorse a specific solution, like the newest subfield of research or a given institution, we recommend systemic reform in these three key areas — data sharing, data infrastructure and people and skills.

Despite its challenges, the industry has never been more hopeful about its future. Our conversations suggest that there is enormous promise in bringing together individual-level genomic and clinical data, and using data science techniques to uncover patterns. By supplementing existing basic and clinical research with data science, researchers can accelerate the development of new treatments.

Big Data Analytics and the Cancer Moonshot

Recommendations for the White House Cancer Moonshot Task Force

Media

Additional Resource Center Content to Explore

COVID-19 Data Models: A guide for Public Sector Officials

This blog was originally published on civstart.org on May 26, 2020 Data science and predictive modeling are increasingly important in modern-day governments, and this is especially true as teams focus…

Blog Article

icons of a doctor in TV and a person in bed

Filling in the Data Gaps of At-Home COVID-19 Testing

Publicly-available data on COVID-19 case counts and positivity rates might not be as accurate with the prevalence and use of at-home test kits. Civis's own Ryan Jewell went to find out if a data gap exists and how to fill it with survey data.

Research Report

A letter from our CEO: What a Census undercount means for the private sector

Below is an open letter to the business community from Civis Founder and CEO, Dan Wagner. I’m writing today about something near and dear to my data-driven heart: the 2020…

Blog Article

Microscopic view of the COVID-19 Virus with data points next to it

Which Americans are still hesitant to take the COVID vaccine — and how to change their minds

Four months after the FDA authorized emergency use of the first COVID-19 vaccine, President Joe Biden announced he is directing states to offer vaccines to all American adults by April…

Blog Article