Achieving lead authorship on a peer-reviewed publication — which Shaina Sta. Cruz did for the first time in May — is a milestone for any aspiring researcher. Her study of automatic error correction in brain scan data, which she conducted as a research assistant at the USC Mark and Mary Stevens Neuroimaging and Informatics Institute (INI), was published in the journal Neuroinformatics.

Sta. Cruz, who completed her bachelor’s degree in communicative disorders at California State University, Fullerton in 2018, was part of the Big Data Discovery and Diversity through Research Education Advancement and Partnerships (BD3-REAP) program, which trains underrepresented students in big data science. Through the National Institutes of Health-funded BD3-REAP program, a group of undergraduates studying big data, computation and analytics receive training and mentorship from CSU Fullerton faculty in neuroscience, coding and research methods. The students then conduct research at INI each summer with guidance from the institute’s faculty and postdoctoral fellows.

“Part of our mission is to connect a broad range of students to research opportunities early in their careers, and to train them in the tools and methodologies they’ll need to be successful,” said Arthur W. Toga, PhD, Provost Professor of Ophthalmology, Neurology, Psychiatry and the Behavioral Sciences, Radiology and Engineering at USC and director of the INI. “We’re proud that our efforts are helping shape the careers of students like Shaina.”

 

Rethinking error correction for brain scans

In the paper, “Imputation Strategy for Reliable Regional MRI Morphological Measurements,” Sta. Cruz and her coauthors, which include INI’s Farshid Sepehrband, PhD, assistant professor of research neurology at the Keck School of Medicine of USC; Hosung Kim, PhD, assistant professor of neurology at the Keck School; graduate student Clio Gonzalez-Zacarias; and Toga, used machine learning to test an alternative to the long and laborious process of correcting errors that occur during the automated processing of brain scan data.

One part of the processing phase, called segmentation, separates brain imaging data into labeled regions that make it easier to study — but requires painstaking manual correction and saps researchers’ valuable time. As a possible fix, Sta. Cruz and her team tested a new strategy: Throw out the error-ridden part of the imaging data and use machine learning to predict the correct values for those missing parts. They compared this new method to the slower, manual error-correction process and found that machine learning could accurately estimate data values in large datasets.

“What started as a small summer research project grew into a full-fledged and innovative investigation, largely because of Shaina’s dedication and discipline,” said Sepehrband, one of Sta. Cruz’s mentors at INI. “Her work in machine learning and big data analytics is particularly impressive considering they did not relate directly to her undergraduate training.”

 

A bright future

For Sta. Cruz, studying statistics and data analytics was one of the main benefits of her work with INI. Now, as a PhD student in public health at the University of California, Merced, she’s looking for other ways to apply what she learned.

“I’m already finding that in public health research, we have the same questions about which statistical techniques are best for handling missing data,” she said. “It will be interesting to see how this line of research can apply more broadly beyond correcting errors in brain scan data.”

But the most meaningful takeaway from her work at USC involves a newfound confidence in her own research abilities.

“Being able to see the research process firsthand made it a lot less daunting,” Sta. Cruz said. “This program gave me the opportunity to problem-solve on my own in a mentored setting. It helped me trust that I really am cut out for research.”

— Zara Greenbaum