Jason Ernst: Computational Biology

courtesy of STEM to the Sky

Jason Ernst, Class of '98, is an Associate Professor of Biological Chemistry, Computer Science, and Computational Medicine at UCLA. His research group’s interdisciplinary work focuses on using computational methods for understanding the human genome and interpreting epigenomic data.

What are some of the research projects your lab is working on?

The projects we work on aim to understand the genome. The human genome contains 3 billion pairs of bases and only about 1% codes for protein. There’s this vast majority of the genome which doesn’t code for proteins and is less well understood but can be very important to disease. A lot of genetic variation associated with disease falls in these regions, and there is a lot of interest in understanding these parts of the genome involving regulating genes.

My work has a lot to do with methods to interpret the non-coding genome and looking at epigenomic data, which is a type of data that looks at chemical markers on top of the DNA or the proteins around which the DNA is packaged. In my group, we are developing and applying machine learning methods to take advantage of the high throughput biological data that’s emerging.

More specifically, one project looked at comparing humans and mice at the epigenetic level. There might be thousands of data sets collected in humans, which involves mapping different chemical markers in different cells and tissue types. This is similar in mice, which are model organisms for a lot of human research. Researchers would like to be able to say if this human region, in some sense, is similar to this mouse region.

Traditionally, there are systematic ways to do this at the sequence level, but we came up with a computational strategy that could score this at an epigenomic level by integrating information from lots of different datasets from humans and mice. It’s based on machine learning that automatically finds the relevant patterns to classify two regions that have evidence of this conservation across species. This is a project that was recently completed.

Other work we do involves trying to understand whole genome sequencing data– particularly, rare non-coding variation in psychiatric disorders, for example.

What are the benefits of working in an interdisciplinary field?

The field that I work in is inherently interdisciplinary. When you have these large biological datasets, it needs both the computational researchers who are familiar with computer science, statistics, and approaches to best analyze the data, and also the experimental researchers who generate the data and focus on the biological questions.

What does a typical workday look like for you?

On a typical day, I usually meet with a few of my graduate students or attend a research seminar. I also might spend some time working on a manuscript, like going over a draft a student has sent me. I might spend time reviewing a paper that a journal asked me to review. I might be having meetings with other faculty on a committee. I might have collaboration calls with colleagues outside the university, and there are emails to handle as well.

Is there anything that surprised you about your current role or field?

When I started, what was potentially the biggest surprise was how much impact I could have on biologists while knowing so little biology at the time: I actually hadn’t taken biology since high school. When I started in this field as a graduate student, my first project was already having a large audience among biologists. Although I wasn’t specifically a focused biologist, I could still have a lot of impact.

Similarly, when I was faculty later on, I was hired as my primary appointment into a biology-type department, even though my whole training wasn’t through biology departments.

What advancements do you foresee happening in the future of genomics?

There’s an overall shift in genomics to have more of an impact on health. Now, you hear people using terms like precision medicine, where there’s more personalized health care to individuals based on genomic profiling. I think as we have a better understanding of the genome, which still requires a lot of more basic work, it provides the foundation for more of that translation to healthcare.

I also think, in general, the accumulation of large datasets is a large advancement. There are new assays continuing to be developed that allow us to probe biological systems in ways we haven’t been able to before at larger scales. Each of these new assays often leads to new computational challenges and opportunities. In the genetics space, there is a large impact in the move to large scale biobanks, where they collect genetic data on large cohorts of individuals and then phenotype them for a large range of phenotypes all at once.