PiCo talks are structured lightning talks consisting of 3 slides to share 1 idea in 4 minutes. Talks will be videocasted and archived. You'll also get a chance to vote for your favorite talk. The winner of the "people's choice award" will be invited to give a talk at a joint meeting of the Data Science in Biomedicine and Bioinformatics Special Interest Groups.
Talks will be given in the order listed here; click on a talk's title to see its abstract.
Computing a brain atlas using high resolution gene expression data
Harold Burgess, NICHD
Brain atlases are critically important for integrating information from studies of neuronal function and connectivity. Due to the growing popularity of zebrafish for mapping neuronal circuits, there is great interest in establishing a comprehensive brain atlas for this model. Historically, most neuroanatomy atlases have been constructed manually, by identifying regions with salient differences in cell composition or activity in functional studies. However, a particular challenge in the larval zebrafish model is that many brain regions lack conspicuous nuclear organization, making accurate segmentation of distinct regions difficult and subjective. To solve this problem, we have implemented an automated computational procedure that segments distinct neuroanatomical regions by clustering voxels with similar genetic identity. This procedure takes advantage of a large gene expression dataset obtained from 130 transgenic zebrafish lines. For each line, we imaged the entire brain at single cell resolution, then registered images to a common reference brain, yielding a 3D representation of transgene expression. To reduce the feature space, we applied linear discriminant analysis based on existing manually segmented brain regions. We next performed genetic identity clustering on many millions of voxels, with a weighting factor for adjacency information to delineate independent brain regions. We tested both agglomerative and divisive clustering strategies, multiple methods for normalization of expression data and differential weighting of expression patterns. Finally, we evaluated results by calculating homogeneity and completeness of computationally identified regions, compared to a set of 30 conservatively outlined neuroanatomical regions that were identified by a human expert. The top ranking brain map derived by this procedure accurately delineated known areas, and revealed brain regions that were previously not recognized. By using this unbiased and accurate segmentation technique, we have obtained a new neuroanatomical map of the zebrafish brain that facilitates mapping of neuronal circuits and enables comparative studies with mammals.
Image segmentation in renal biopsy whole slide images
Jonathan Street, NIDDK
Accurate detection and counting of glomeruli in renal biopsies is important for diagnostic accuracy. The percentage of sclerotic glomeruli is one important example. A recent evaluation indicates that reported numbers are only 50% of the actual number present. To improve accuracy we are developing a machine learning based image segmentation utility to support pathologists.
A custom interface has been developed for manual labeling of a training dataset and concepts adapted from geographical information systems for the efficient storage of segmentations. Despite challenges due to a modest sample size and images in excess of 5 billion pixels, a convolutional neural network is showing great promise. Although false positives are currently higher than desired our model has also been able to catch glomeruli missed during human annotation.
Papers as Apps: Bidirectional Data Flow
David McGaughey, NEI
The traditional manner scientists share results is via publications. Their paper-based origins means that these are one-directional interactions. Figures and tables are static. Reader-led investigations require laborious copying of tables or diving into inconsistently formatted supplementary information. The New York Times have demonstrated with articles like 'How Much Warmer Was Your City in 2015?' the value of allowing easy bidirectional flow of information. With the development of frameworks like R-based Shiny, it is becoming feasible for small teams or even individual researchers to make their results bidirectional. As a case study, we present a web app that allows for user-lead investigations of data from a large human eye tissue RNA-Seq meta-analysis.
PubMed Labs: A Sandbox Toward PubMed 2.0
Nicolas Fiorini, NLM
PubMed is a widely-used search engine enabling access to the MEDLINE database and its 27 million articles in biomedicine. Here we describe our recent development of an experimental system for searching the biomedical literature, with the goal of improved search quality and overall better user experience: PubMed Labs.
MetaMapLite in Excel: Named-Entity Recognition for Non-technical Users
Ravi Teja Bhupatiraju
With the current tools, Natural Language Processing can be daunting for biomedical researchers, untrained in computer science. We developed an easy-to-use tool for non-technical biomedical researchers to conveniently conduct Named-Entity Recognition (NER) on biomedical text, in a familiar spreadsheet system. With a simple standalone installer, the system deploys a client-server system that presents a web service that is consumed from an Excel spread sheet with an embedded macro. The system is highly responsive for interactive use and can be further scripted from both spreadsheet macros and any external scripts.
Striking Back at Alzheimer’s Disease: From Bioenergetics to Behavioral Modification
Robert Pawlosky, NIAAA
Patients who develop Alzheimer’s disease (AD) have low fluorodeoxyglucose uptake in the brain years prior to advent of clinical symptoms. Because of brain glucose hypometabolism AD patients and AD transgenic mice are unable to maintain a favorable energetic status resulting, for example, in an accumulation of ROS-damaged biomolecules and organelles. As an alternative, mild ketosis can potentially overcome glucose hypometabolism, improve the cellular redox environment and limit the buildup of oxidized biomolecules. Moreover, ketones can supply key intermediates for the synthesis of amino acids and lipids. Hippocampal n-acetyl aspartate (NAA) is an anxiolytic biomarker associated with exploratory behavior and also a signature of mitochondrial functionality. Utilizing quantitative mass spectrometry, we investigated the effects of mild ketosis on bioenergetic metabolites, the associated cellular redox potentials, energy of ATP hydrolysis and oxidized biomolecules in older AD mice. Mildly ketotic mice had significantly higher concentrations of glycolytic and Krebs cycle intermediates, an improved mitochondrial redox potential, greater energy of ATP hydrolysis and lower amounts of oxidized lipids and proteins. Importantly, higher concentrations of hippocampal NAA in these same animals was associated with potent anxiolytic outcomes that corresponded with increased exploratory behavior. These results strongly suggest that ketosis is a practical therapy for correcting energy deficiencies and modifying behavior patterns in Alzheimer’s disease.
Integrated analysis of genomics data with OmicPath
Chunhua Yan, NCI
Cancer is a complex category of diseases caused, in large part, by genetic or genomic, transcriptomic, and epigenetic or epigenomic alterations. Historically, studies have identified sets of signature genes involved in the development and progression of cancer. However, these approaches are not sufficient to comprehend the complex nature of this disease. Pathway/network-based methods using a comprehensive, integrated computational approach are essential to understanding cancer biology and to identifying optimal treatment. To this end, we have developed OmicPath with extended capabilities beyond traditional pathway software. OmicPath integrates database manipulation, visualization, statistical analysis, topology analysis, machine learning (including deep learning), and de novo network construction into a single R package. It can be used to identify the significantly altered pathways, determine highly co-regulated gene modules, discover networks associated with disease subtypes and predict clinical outcomes based on the pathway/network structures and omics data. In addition to statistical analysis, OmicPath offers advanced graphic functionalities to produce publication quality pathway diagrams with flexible layouts and rich annotations.
The Cancer Genomic Cloud Pilots
KanakaDurga Addepalli, NCI
Cancer data is complex owing to its etiology. Novel translational methods and next-generation sequencing (NGS) technologies coupled with advances in computational analytical tools have revolutionized our ability to explore, investigate, and characterize genomic changes in cancer cells. These technologies are able to produce huge volumes of large and complex data with smaller amounts of sample. The exponential growth of genomic big data has put forth many challenges with high throughput data storage, management and analytics. This advocates the necessity for a powerful computational infrastructure, high quality bioinformatics software, skilled personnel to operate the tools, data management and analytical solutions that simplify handling petabyte-scale data, exceeding the capabilities of traditional methods. These data, coupled with associated clinical and phenotypic features, help in-depth study of tumor biology at different levels and targeted and personalized treatment for cancer patients.
The Cancer Genomics Cloud pilots are an attempt to transition from the traditional ways of analyzing genomics big data to an innovative, scalable, secure and high compute-capable one-stop model for computational analysis of this data. Leveraging virtual technology to provide and better utilize computational resources seems to be the solution compared to expensive distributed systems with shared resources requiring a lot of support and maintenance.
The Cancer Genomic Cloud (CGC) Pilots program was conceptualized in 2013 to democratize access to NCI-generated genomic data and to efficiently provide computational support to the cancer research community by co-locating the huge genomic data and the compute power on the same platform. The three Cloud Pilot awardees – Seven Bridges Genomics (cgc.sbgenomics.com), the Broad Institute (FireCloud – https://software.broadinstitute.org/firecloud), and the Institute for Systems Biology (ISB-CGC) have developed the architectures and rolled out their products in 2016. To exemplify the usage, the pilots host one of the richest and most complete genomics datasets, The Cancer Genome Atlas (TCGA).
The NCI CGC Pilots have seen a lot of interest from the scientific community since their release in 2016. As part of assessing the research utility of the Cloud Pilots, our team, together with the awardees, has undertaken efforts to explore the seamless analytical possibilities emerging from the concurrent development of these cloud environments. These efforts include evaluating and advocating the three Pilot teams, here at NCI by introducing the cloud pilots to the NCI intramural research community, offering hands-on workshops and training, orchestrating system set-up, implementing analysis pipelines to facilitate the use of these platforms and expanding the utility of the CGC to perform various ‘-omics’ analysis such as, RNA-Seq, microbiome, WXS, metagenomics, and pathogen detection.
As a novel approach, the Cloud Pilots will further our knowledge of the biology of cancer and ultimately provide additional insights into cancer diagnosis, prevention, and treatment.
Integrating Multiple Sources of Data for Better Individualized Treatment of Cancer
Uma Shankavaram, NCI
Traditionally, tumors have been classified and treated based on where in the body they originated, rather than by considering their molecular origins. There is no denying that cancer is an incredibly complex disease; a single tumor can have more than 100 billion cells, and each cell can acquire mutations individually. The disease is always changing, evolving, and adapting. The future of cancer research and treatment is ever evolving thanks to new technologies and a number of large-scale, government-led sequencing initiatives, including NIH’s The Cancer Genome Atlas (TCGA), which holds data from more than 11,000 patients and is open to researchers everywhere to analyze via the cloud. To that end, we employed the power of big data analytics and high-performance computing to integrate multiple data streams, from sequenced genomes to gene expression, protein, chemical and genetic interactions. We created a resourceful database of mutations that drive the proliferation of tumor cells, their synthetic lethal interactions to target and to know what drugs to use that potentially kill tumor cells, leading to personalized cancer treatments.
Which Alpha Globin Gene Is Primarily Expressed in the Vascular Endothelium?
Steven D. Brooks, NHLBI
Alpha globin was recently discovered to be present in the endothelium of resistance arteries, where it interacts with endothelial nitric oxide synthase (NOS3) to regulate the diffusion of nitric oxide. In human red blood cells, alpha globin locus HBA2 contributes 65% of total alpha globin, with HBA1 contributing 35%. However, expression of each locus in the vasculature is currently unknown. Determining total and relative locus-specific expression of alpha globin in resistance arteries is critical to understand the consequences of HBA1 or HBA2 locus deletion. Perfused vessels and whole blood were collected from C57Bl/6J mice. Middle cerebral arteries, skeletal muscle arterioles, mesenteric arteries, and renal arterioles were isolated and dissected and placed in RNAlater. Total mRNA was isolated from homogenized vessels and converted to cDNA. Absolute gene expression of Hba-a1 (mouse homolog of HBA2), Hba-a2 (mouse homolog of HBA1), Nos3, and Ae1 was quantified by digital droplet PCR. Abundant expression of Hba-a1 and Hba-a2 was observed in whole blood, as well as in all four vascular tissues. In whole blood, the Hba-a1/Hba-a2 ratio was (2.55:1), consistent with expression reported in human blood. However, in all four groups of arteries the Hba-a1/Hba-a2 ratio was inverted (0.60:1). We report robust, locus-specific expression of alpha globin in four anatomically distinct arteries. The expression ratio of Hba-a1 and Hba-a2 is inverted between vascular tissue and whole blood in mice, suggesting differential regulation of alpha globin transcription.