Thanks to rapid advancements in imaging and genetics technologies, researchers are collecting more neuroscience data than ever. But storing, managing and disseminating these data is a challenge that requires immense storage and computing capabilities. Fortunately, a highly advanced data storage infrastructure exists at the USC Mark and Mary Stevens Neuroimaging and Informatics Institute (INI), which houses one of the world’s largest collections of brain data.

Institutions and scientists around the world rely on INI’s resources to conduct research. Those launching new investigations conserve time and energy by using the institute’s data repository to review the results of completed studies. With access to data from more than 500,000 subjects, researchers can also query existing datasets in new ways to obtain valuable insights. And when new data has been collected, scientists can securely store and share it with their colleagues.

“In order to understand the complexities of diseases like Alzheimer’s, researchers need vast numbers of observations to obtain adequate statistical power,” said INI Director Arthur W. Toga, PhD, Provost Professor of Ophthalmology, Neurology, Psychiatry and the Behavioral Sciences, Radiology and Engineering, and Ghada Irani Chair in Neuroscience at the Keck School of Medicine of USC. “The technology we use at the institute is revolutionary because it allows our collaborators to access and download massive datasets almost instantaneously. This enables big data approaches and solutions and is already accelerating the pace of discovery in ways that seemed impossible a decade ago.”

 

Data from every continent

INI currently stores more than 4,800 terabytes of information — the equivalent of 50 million movies in 4K — and has the capacity for nearly twice that amount. This includes data collected on every continent except Antarctica, from hundreds of different studies and laboratories.

Brain images, genetic sequences, clinical assessments and biological specimens are some of the data types available to researchers. Many studies in the archive focus on disease progression, as well as the processes of development and aging, with data across the lifespan from subjects aged 1 to 89.

The institute hosts data from its own leading research on Parkinson’s disease — the Parkinson’s Progression Markers Initiative — and Alzheimer’s disease, including the worldwide Alzheimer’s Disease Neuroimaging Initiative. Paul Thompson, PhD, professor of ophthalmology, neurology, psychiatry and the behavioral sciences, radiology and engineering and associate director of the institute, leads the Enhancing Neuro Imaging Genetics through Meta-Analysis (ENIGMA) consortium, which studies brain diseases and disorders such as schizophrenia and autism, and relies on the institute’s resources to share hundreds of thousands of files with investigators around the world. Other USC researchers also store data at INI: Paul Aisen, PhD, professor of neurology, director of the Alzheimer’s Therapeutic Research Institute and one of the leaders of the National Institutes of Health’s new Alzheimer’s Clinical Trials Consortium, collaborates with INI to store and share massive quantities of data on the disease and its possible treatments.

In the past year, the data archive has facilitated more than 9 million downloads to investigators in 62 countries. “We’ve also welcomed 16 new studies into the data repository this year. The studies investigate a broad range of neurological diseases and disorders, including vascular contributions to dementia, pediatric deafness and health and aging in the Latino community,” said Karen Crawford, MLIS, management information systems manager at INI.

Investigators who use the archive, which receives a combination of federal and private foundation funding, can upload, download, search, monitor and visualize the diverse collection of datasets.

 

Maintaining the system

Maintaining a world-renowned data archive requires more than just a vast amount of storage space. INI also has technology in place to facilitate access for authorized users, as well as protect data from potential intruders.

In order to guarantee reliable data access and reasonable download speeds, the institute also maintains a top-of-the-line supercomputing system specifically designed for big data applications. An onsite high-performance computing (HPC) cluster contains 4,096 processer cores and 38,912 gigabytes of memory. To put this in perspective, most laptops have just two to four processor cores and eight gigabytes of memory.

To safeguard sensitive information, the institute uses sophisticated security systems and biometric checkpoints to ensure that only authorized personnel enter the data center. The archive also features an advanced virtual firewall system and multiple switches, routers and Internet connections to prevent system failure.

— Zara Abrams