Understanding how life works through computation

Rice computer scientists help develop methods to further understand cells at the molecular level.

dna strand

This article originally appeared in the 2020 issue of Rice Engineering Magazine.

Luay Nakhleh uses the word “dogma” when explaining his chief research interest and a well-defined sub-discipline within his department: computational molecular biology.

“The central dogma of modern biology is that DNA encodes RNA which encodes proteins. Out of that basic assumption emerges our understanding of how life works. Computational biologists use computation to interrogate different aspects of the dogma,” said Nakhleh, the J.S. Abercrombie Professor and Chair of Computer Science (CS), and professor of biosciences.

Over the last several decades, computer scientists have developed sophisticated computational methods to further understand cells at the molecular level. In Rice’s CS department, faculty members are devoted to each of the research focuses outlined by Nakhleh: he and Todd Treangen work at the DNA level; Vicky Yao, RNA; Yao and Lydia Kavraki, protein and interaction networks.

Nakhleh works in the fields of phylogenomics, population genomics and cellular networks. His group conducts computational research related to evolution and develops tools that examine genetic data to discover previously unknown connections among genes and species.

A major thrust in his work is statistical inference of the evolution of genes and genomes under increasingly complex models of evolution. More recently, Nakhleh and his group have begun developing computational methodologies for evolutionary analyses of single-cell genomic data, particularly in cancer.

Treangen, assistant professor of CS, notes that the COVID-19 pandemic along with advancements in synthetic biology have increased the demand for improved computational tools to identify and characterize genes of interest from short sequence fragments. Major hurdles for characterizing such sequences are the limitations of current ontologies, the sheer size of existing databases and the lack of software specifically designed to detect poorly characterized pathogenic proteins and emerging pathogens such as SARS-CoV-2.

“The objective of our work in this area is to develop computational methods that flag functions of concern and biological processes of interest such as neurotoxins and genes implicated in antibiotic resistance. The overarching goal is to provide an automated computational platform for characterizing any short nucleotide or protein sequence that is likely harmful to human health,” Treangen said.

Vicky Yao, assistant professor of CS, uses machine learning and statistical methods to improve understanding of the biological circuitry that underlies living organisms and how its dysfunction leads to neurological diseases, cancer and autoimmune disorders.

One of her central research interests lies in how to translate findings across species; most importantly, between model organisms and humans. She also develops methods to integrate heterogeneous data sources and build network models of biological processes.

For example, she has identified specific genes and pathways in the brain associated with Alzheimer’s disease. More specifically, her study focused on understanding why memory-forming neurons in the brain are the first to die when Alzheimer’s strikes. Using mouse datasets collected by collaborators at Rockefeller University, Yao employed computational methods to identify candidate molecular drivers of the disease in the human brain.

Kavraki’s work is on the modeling of proteins and biomolecular complexes. In a recent paper, the Noah Harding Professor of Computer Science and of bioengineering examines human leukocyte antigens, or HLAs, which are proteins or markers on most cells in the body. The immune system uses HLAs to distinguish between a healthy body’s cells, and diseased or foreign ones.

“We created a customizable environment called HLA-Arena with user-friendly computational workflows that allow for various structure-based analyses of peptide-HLA complexes,” she said.

Researchers can use HLA-Arena to perform geometry prediction of peptide binding modes, peptide binding energy prediction and structure-based virtual screening of tumor-derived peptides.

“HLA-Arena can be integrated into computational pipelines to support basic cancer research or to inform physicians in pre-clinical settings. It can be used to perform structure-based selection of peptides for T-cell-based immunotherapy, neoantigen discovery and vaccine development,” Kavraki said.