Recent advancements in Next Generation high-throughput Sequencing (NGS) have led to a drastic reduction in the cost of sequencing a genome. This has generated an unprecedented amount of genomic data that must be stored, processed, and transmitted. To facilitate this effort, compression techniques that allow for more efficient storage as well as fast exchange and dissemination of these data have been proposed in the literature.
The aim of this talk is to give an overview of the current state of the art in genomic data compression, together with the main challenges that the community is currently facing. For that, we will start by describing the type of data that is commonly used in practice, and the characteristics of each of them. We will then review the compression techniques, and what we believe will be the trend in the future.
Idoia Ochoa is currently a Ph.D. student in the Electrical Engineering department at Stanford University, working under the supervision of Prof. Tsachy Weissman. She also received her MSc from the same department in 2012. Previous to Stanford, she got a BS and MSc from the Telecommunications Engineering (Electrical Engineering) department at TECNUN in Donostia, Spain, which included a stay of 6 months at Lulea Tekniska Universitet, Lulea, Sweden, as part of the Erasmus program. Then she worked as a researcher at CEIT, also in Donostia, within the group INTECOM (Communication Systems and Mathematical Principles of Information).
Her main interests are in the field of information theory, genetics, compression, coding, communications and signal processing.
Her research focuses mainly on helping the bio community to handle the massive amounts of genomic data that are being generated, for example by designing new and more effective compression programs for genomic data (see EE information theory is guiding improved ways to model and compress data).