Applications in life sciences are primary research drivers for the Center for Science for of Information. Given the ever-expanding repository of diverse data-sets, the complexity of underlying processes, and the importance of spatio-temporal context, life sciences applications serve arguably the most interesting and challenging test-beds for models and methods. Broadly, challenges in life sciences targeted by the Center may be viewed in four categories.
We address a few problems below:
Dynamical Data: A key problem in life sciences is the development of models at different granularity from time series measurements of cellular constituents such as proteins, nucleic acids, and metabolites and phenotypes such as gene expression profiles, cellular proliferation, and cellular death. From cellular component measurements at different time instances following a stimulus, it would be desirable to build a biochemical pathway model. Such models may be correlative or causal and can contain myriad nodes and edges. If one were to consider modules that are varying with time, hypergraphs can be constructed based on a correlation metric or interaction data. These hypergraphs provide a glimpse of the dynamics of the system. However, it would be desirable to convert these hypergraphs into necessary and sufficient models to quantitatively describe the cellular phenotypes. This is a major challenge for the Center.
Many-to-many Network and Biochemical Pathways: Shannon's methods deal with point to point interaction or communications. However all biological systems are many-point-to-many-point communications and there are no algorithms for understanding the information complexity of this system. We will develop methods to pose the following questions. What are minimal networks that will provide quantitative information on phenotypes? What is the sensitivity of different connections for a given phenotype? Entirely new methods need to be developed to address this problem.
Modularity in Networks: We will develop algorithms for deciphering modularity in systems. Amongst the interaction networks, biologists have painstakingly identified cliques that have relevance for chosen phenotypes. However, there are few methods that can predict modules in networks. One quest of this center is to identify modules from complex pathways.
Genome Encoding and Evolution: A large fraction of the Human Genome codes for gene expression control during the life time of an organism. Driven by exciting new technologies, the field of Genomics is now beginning to decipher the language of gene control. This process holds many challenges related to Information Theory. At a pragmatic level, it requires the integration of large amounts of heterogeneous, noisy and missing data, which nonetheless describe the action of robust networks. There are also fascinating questions of classification and identification of the different functional components of the regulatory networks. Also, by comparing the genomes of different individuals and different species we stand to learn about modes of information transmission through the generations. In many ways the genome is the ultimate information repository, and using Information Theory to better understand it is a major challenge.