Researchers need guidance as they navigate a jungle of biomedical data in their search for therapies, prevention techniques and cures to diseases.

To assist them, the National Institutes of Health has awarded USC a three-year, $6.3 million grant to build Big Data U, the nation’s first so-called Training Coordination Center aimed at teaching people with different backgrounds how to translate astronomical amounts of data into compatible and comparable statistics. The goal is to find trends, interesting relationships and clustering effects.

“A lot of the big data we are dealing with haven’t even been collected yet,” said the project’s lead investigator, John Van Horn, PhD, associate professor of neurology and education, and director of the new Master of Science program in neuroimaging and informatics at the Keck School of Medicine of USC. “It’s still off in the future. What we do now and how we train people to be able to deal with that will prepare us for the time when getting many terabytes worth of data is considered trivial — a relatively small or even ‘cute’ little study.”

Big data science has moved away from a traditional reductionist model, where a hypothesis is formed and tested by including a single variable in a controlled experiment.

Disorders such as Alzheimer’s disease involve intricate components. Isolating a single variable when it comes to conditions involving the brain may provide one answer, but not necessarily the complete one, said Arthur Toga, PhD, a provost professor with joint appointments at the Keck School of Medicine and the USC Viterbi School of Engineering.

“We’re letting the data lead us to the discovery. It’s kind of an upside down way of thinking about things,” he said. “Big data allows us to look at all these variables simultaneously and put together a comprehensive picture. Only in concert do they produce the function and structure that you’re trying to understand. If you study only one variable at a time, you may never fully understand how it works.”

Big Data U, tentatively set to launch in the spring of next year, will be a hybrid of massive open online courses (MOOCs) and YouTube video tutorials. It’s a free resource for anyone who wants a self-guided or semi-structured study of topics relevant to biomedical science. Social media tools will provide ratings for course content and guide the selection of relevant training media.

“We will promote opportunities for big data research rotations, host ‘innovation labs’ for new grant proposal development, develop hackathons and other training activities,” Van Horn said. “Some of these activities will be up to the user to complete, but others will have an expectation of required completion and will entail a report or tangible product.”

The Training Coordination Center is a part of the NIH’s Big Data to Knowledge (BD2K) initiative, launched in 2012 to transform how science is done. BD2K has 11 Centers of Excellence for Big Data Computing, two of which are at USC: the Big Data for Discovery Science Center with Toga as principal investigator and ENIGMA Consortium with Paul Thompson, PhD, as principal investigator. Stanford University, Harvard University Medical School and UCLA also host Centers of Excellence.

While each Center of Excellence has its own training responsibilities, Big Data U at USC is the only center tasked with harmonizing these efforts into a concerted action.

— Zen Vuong