The Pi Day Networking Event will feature posters, demos, and exhibit tables immediately following the PiCo talks. All NIH staff are invited to attend, as well as vote for their favorite poster. The winner of the "people's choice award" will be invited to give a talk at a joint meeting of the Data Science in Biomedicine and Bioinformatics Special Interest Groups.
The following posters will be displayed on the FAES Terrace in Building 10, organized according to their poster number. To see a poster's abstract, click on its title.
1. Content-based fMRI activation maps retrieval
Alba García Seco de Herrera, NLM
Functional Magnetic Resonance Imaging (fMRI) is a powerful tool used in the study of brain function. It can non-invasively detect signal changes of cerebral blood flow in areas of the brain where neuronal activity is varying. Statistical analysis of fMRI data is used to locate brain activity and generate brain activation maps. These maps are used to determine how a task is correlated with particular perceptual or cognitive state that is encoded by active brain regions. Neuroimaging data sharing is becoming increasingly common. Currently, some efforts have been made to develop fMRI repositories. However, there is a need for content-based (CB-) fMRI retrieval methods that can retrieve studies relevant to a “query” brain activation. One approach is to take into account the full spatial pattern of brain activity to retrieve similar activity maps. This approach could also be extended to support cognitive state-based retrieval. This work presents an approach for CB-fMRI activations maps retrieval which return activation maps that have similar activation patterns to the given one. The proposed method develops a similarity score that matches map activation maps."
2. Demonstration of analysis of high-throughput single cell RNA-Seq data using open-source R-packages
Michael Kelly, NIDCD
A rich set of open-source analysis tools have been and continue to be developed for single-cell genomics datasets. Here we demonstrate the use of at least two tools, Seurat and Monocle2, to analyze high-throughput single-cell RNA-Seq datasets that we have generated from cells of the developing mouse auditory sensory end organ. We show the ability to reliably perform unbiased clustering of cells in a biologically relevant manner using transcriptional profiles and the generation of novel marker gene lists that can be further validated by in situ hybridization and immunohistochemistry. Furthermore, we utilize these highly-multidimensional datasets to model differentiation of individual cell types and infer key regulatory transcriptional events. We hope to help increase the awareness of what is currently possible with single-cell RNA-Seq analysis, and help promote opportunities for more individuals at NIH to gain experience with the open-source analysis tools currently being developed.
3. MeSHgram: A Tool to Visually Browse Co-occurrence of MeSH Terms in PubMed
Ravi Teja, NLM
With the current tools, Natural Language Processing can be daunting for biomedical researchers, untrained in computer science. We developed an easy-to-use tool for non-technical biomedical researchers to conveniently conduct Named-Entity Recognition (NER) on biomedical text, in a familiar spreadsheet system. With a simple standalone installer, the system deploys a client-server system that presents a web service that is consumed from an Excel spread sheet with an embedded macro. The system is highly responsive for interactive use and can be further scripted from both spreadsheet macros and any external scripts.
4. Awesome new stuff from NCBI!
Ben Busby, NCBI
Come check out new and nascent computational biology resources from NCBI!
5. Scientific Supercomputing at NIH
Susan Chacko, CIT
The NIH HPC group plans, manages, and supports high-performance computing systems for the intramural NIH community. An overview of the recently expanded Biowulf compute cluster, and near-term additional upgrades to compute and storage will be presented. A few of the 500+ scientific applications and databases that are installed and maintained on the HPC systems will also be highlighted.
6. Forecasting Grant Application Counts
Darren Schneider, CSR
The Center for Scientific Review's core activity is reviewing grant applications; however, fluctuations in the number of applications received from month to month and from year to year present a number of challenges. Fluctuations in applications received make it difficult to adequately staff and distribute workload among Scientific Review Officers and present challenges for financial management as well. Our team developed a solution to forecast the number of unsolicited applications the client will receive on a month to month basis to better inform their resource and budgetary planning. The forecast is based on machine learning and time series forecasting techniques, including NNAR and ARIMA, and leverages both historical data from IRDB and supplementary data (including Google Analytics and Twitter). Our poster will focus on the pilot implementation of our model, mock-up of our production solution (ui/visualizations) and the CSR impact.
7. Krona - interactive metagenomic visualization and comparison in a Web browser
Brian Ondov, NHGRI
Krona allows the complex, quantitative hierarchies of a metagenomic dataset to be visualized and explored in an intuitive way using layered, interactive pi(e) charts. However, its radial nature has limited its utility for the comparison of multiple samples, a commonly desired task for controlled experiments. Here we present a method for visualizing two datasets within one chart, while preserving the depth and density of information that is characteristic of Krona charts.
8. The Cancer Genomic Cloud Pilots
Hsinyi Tsang, NCI
Cancer data is complex owing to its etiology. Novel translational methods and next-generation sequencing (NGS) technologies coupled with advances in computational analytical tools have revolutionized our ability to explore, investigate, and characterize genomic changes in cancer cells. These technologies are able to produce huge volumes of large and complex data with smaller amounts of sample. The exponential growth of genomic big data has put forth many challenges with high throughput data storage, management and analytics. This advocates the necessity for a powerful computational infrastructure, high quality bioinformatics software, skilled personnel to operate the tools, data management and analytical solutions that simplify handling petabyte-scale data, exceeding the capabilities of traditional methods. These data, coupled with associated clinical and phenotypic features, help in-depth study of tumor biology at different levels and targeted and personalized treatment for cancer patients. The Cancer Genomics Cloud pilots are an attempt to transition from the traditional ways of analyzing genomics big data to an innovative, scalable, secure and high compute-capable one-stop model for computational analysis of this data. Leveraging virtual technology to provide and better utilize computational resources seems to be the solution compared to expensive distributed systems with shared resources requiring a lot of support and maintenance. The Cancer Genomic Cloud (CGC) Pilots program was conceptualized in 2013 to democratize access to NCI-generated genomic data and to efficiently provide computational support to the cancer research community by co-locating the huge genomic data and the compute power on the same platform. The three Cloud Pilot awardees – Seven Bridges Genomics (cgc.sbgenomics.com), the Broad Institute (FireCloud – https://software.broadinstitute.org/firecloud), and the Institute for Systems Biology (ISB-CGC) have developed the architectures and rolled out their products in 2016. To exemplify the usage, the pilots host one of the richest and most complete genomics datasets, The Cancer Genome Atlas (TCGA).The NCI CGC Pilots have seen a lot of interest from the scientific community since their release in 2016. As part of assessing the research utility of the Cloud Pilots, our team, together with the awardees, has undertaken efforts to explore the seamless analytical possibilities emerging from the concurrent development of these cloud environments. These efforts include evaluating and advocating the three Pilot teams, here at NCI by introducing the cloud pilots to the NCI intramural research community, offering hands-on workshops and training, orchestrating system set-up, implementing analysis pipelines to facilitate the use of these platforms and expanding the utility of the CGC to perform various ‘-omics’ analysis such as, RNA-Seq, microbiome, WXS, metagenomics, and pathogen detection. As a novel approach, the Cloud Pilots will further our knowledge of the biology of cancer and ultimately provide additional insights into cancer diagnosis, prevention, and treatment.
9. Detection of homozygous deletions in tumor suppressor genes ranging from dozens to hundreds nucleotides in cancer models
Suleyman Vural, NCI
Tumor suppressor genes can be inactivated by several mechanisms and in majority of cases both alleles need to be affected. One of the mechanisms of inactivation is due to deletions ranging from dozens to hundreds of nucleotides in size and such deletions are often missed by variant callers. HomDelDetect is a method which we developed in order to detect such homozygous deletions in cancer models such as cancer cell lines and patient tumor derived xenografts. This method can be applied to partial exome, whole exome, whole genome and RNA-seq sequencing data. We applied our method across panel of cancer cell lines and detected deletions which have been missed by variant callers, and demonstrating the ability of HomDelDetect to improve annotations of tumor suppressor genes in cancer models.
10. Analysis of APOBEC3A and APOBEC3B mutational signatures using next-generation sequencing data from cancer cell lines
Suleyman Vural, NCI
The APOBEC (apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like) gene family of cytidine deaminases includes evolutionarily conserved genes that play important roles in DNA repair and mRNA editing. It has been suggested that activity of at least two APOBEC family members, APOBEC3A and APOBEC3B, may lead to kataegis, a mutagenic process in cancer cells that generates clusters of closely spaced, single strand specific C->T DNA substitutions. APOBEC mutagenesis has a characteristic signature, most commonly represented by the 5’-Tp(C->T)pW-3’ sequence motif, with additional substitutions also reported. This hypermutation signature and high mRNA expression of APOBEC3A and APOBEC3B have been associated with several cancer types. Most previous studies of APOBEC signatures have examined tumor sequence data from clinical samples, for which limited or no information about drug response was available. We investigated the presence of the mutational signature and mRNA expression patterns of the APOBEC3A and APOBEC3B genes in extensively characterized cell lines, in order to identify those cell lines that carry mutations generated by kataegis, with the aim of establishing associations between the APOBEC mutational signature, individual cancer types, and the patterns of sensitivity to antitumor agents. For this purpose, we analyzed whole exome sequencing (WES) data and mRNA expression of the APOBEC3A and APOBEC3B genes in two resources with extensive drug response data: the NCI-60 cell line panel, which includes 59 human cancer cell lines representing 9 cancer types and drug response information for thousands of anticancer agents, and the Cancer Cell Line Encyclopedia (CCLE), which provides WES and gene expression information on hundreds of cancer cell lines and drug response data to over 200 agents. We analyzed WES data of 325 CCLE cell lines and 59 NCI-60 cell lines, with variants identified using Varscan2 software. The variants in each cell line were filtered to remove common polymorphisms in dbSNP and 1000 Genome Project databases. We searched the discovered variants for the presence of APOBEC signatures, 5’-Tp(C->K)pW-3’, 5’-Tp(C->D)pR-3’, and 5’-Tp(C->D)pD-3’ in closely spaced (1000 and 10,000 bp) windows that appeared on the same DNA strand. We discuss the use of optimal filters for detecting APOBEC mutational signatures and present the analyses of associations between APOBEC signatures, mutational load of the tumor cell lines, APOBEC gene expression, and chemosensitivity to treatment. These results contribute to additional characterization of available cell lines by providing information about specific mutational signatures in different categories of cancer. Our findings may assist with identifying antitumor agents that would be appropriate for treatment of cancer cells with specific signature patterns generated by APOBEC mutagenesis.
11. New multiple sclerosis disease severity scale (MS-DSS) predicts future accumulation of disability
Ann Marie Weideman, NINDS
The search for the genetic foundation of MS severity remains elusive. It is, in fact, controversial whether MS severity is a stable feature that predicts future disability progression. If MS severity is not stable, it is unlikely that genotype decisively determines disability progression.An alternative explanation is that apparent instability of MS severity is caused by inaccuracies of its current measurement. We applied statistical learning techniques to a 739 patient-years longitudinal cohort of MS patients, divided into training (n=118) and validation (n=62) sub-cohorts, to test four hypotheses: 1) There is an intra-individual stability in the rate of accumulation of MS-related disability, which is also influenced by extrinsic factors; 2) previous results from observational studies are negatively affected by the insensitive nature of the Expanded Disability Status Scale (EDSS). The EDSS-based MS Severity Score (MSSS) is further disadvantaged by the inability to reliably measure MS onset and, consequently, disease duration (DD); 3) replacing EDSS with a sensitive scale, i.e. Combinatorial Weight-adjusted Disability Score (CombiWISE), and substituting age for DD will significantly improve predictions of future accumulation of disability; and 4) adjusting measured disability for the efficacy of applied therapies and other relevant external features will further strengthen predictions of future MS course.The result is a statistical model of MS severity with a significantly enhanced ability to predict future disability progression in comparison to MSSS. This represents a much needed tool for genotype-phenotype correlations, for identifying biological processes that underlie MS progression, and for aiding therapeutic decisions.
12. Extrapolate lifetime exposure of sexual violence victimization using past-year recall among Malawian girls and young women
Amy Fan, NIAAA
Lifetime exposure to sexual violence (SV) is underestimated to a greater extent with aging. While we lack a golden standard to calibrate the underestimation, the self-report of exposure during the past 12-month may be a more reliable alternative. This study presents a novel approach to obtain a calibrated estimate of lifetime exposure to SV for young women using past 12-month recall and other information obtained from the Malawi 2013 Violence against Children and Youth Survey (VACS). This was the first national survey of violence against children in the Republic of Malawi and was implemented during September and October of 2013. VACS Malawi was a nationally representative household survey of females (n=1029, response rate=83.4%) and males 13 to 24 years of age based on a multi–stage cluster design. The respondent was considered to have been exposed to sexual violence if he/she reported any of the following experiences during his/her lifetime: unwanted sexual touching; attempted unwanted sex; physically forced sex; and pressured sex. For each form of SV, the respondent was further queried about whether the first or most recent incident happened within the past 12 months. We made the following assumptions when we extrapolated to estimate the lifetime SV exposure: (1) Self-report of lifetime SV victimization from 13 to 18 years of age is valid; (2) Self-report of SV victimization during the past 12 months is valid; (3) an individual has a different risk of victimization based on whether or not he/she was victimized before; (4) the proportion of new cases among 12-month female victims each year from 19 to 24 years of age is stable. Compared with extrapolated results, we found that the under-reporting of lifetime exposure for young women aged 19 to 24 years ranged between 82% to 273% based on most conservative to most liberal assumptions.
13. Assessing the Bioenergetics of Alzheimer’s Disease
Robert Pawlosky, NIAAA
Several years prior to development of clinical symptoms patients with Alzheimer’s disease (AD) often exhibit low brain glucose metabolism which may result from decreased glucose uptake or glycolytic deficiencies. One consequence of glucose hypometabolism in both patients and AD transgenic animals is a chronically unfavorable energetic state whereupon oxidized biomolecules accumulate and mitochondrial function is damaged. As a means to counter glucose hypometabolism we investigated the effects of mild ketosis in an AD (3xTgAD) mouse model employing highly sensitive and definitive techniques utilizing stable isotope dilution mass spectrometry to quantify a broad range of neural chemicals from the hippocampus, a brain region prone to ROS (reactive oxygen species) damage. From these determinations we then derived the cytosolic and mitochondrial redox potentials and the energy of ATP hydrolysis. We also investigated the biosynthesis of amino acids and lipids as ketones also supply key biosynthetic intermediates and reducing equivalents needed in their production. In the hippocampi of mildly ketotic mice there were higher concentrations of glycolytic and Krebs cycle intermediates, a more reduced mitochondria redox potential (lower free [NAD+]/[NADH] ratio), a greater energy of ATP hydrolysis and lower amounts of oxidized lipids and proteins compared to non-ketotic controls. There too, we also observed higher concentrations of aspartate, N-acetyl aspartate (an anxiolytic biomarker) and 24-S-hydroxycholesterol. These results demonstrate that ketosis is a practical therapy for diminishing the effects of Alzheimer’s disease- overcoming glucose hypometabolism, correcting energy deficiencies, diminishing damaging ROS products and maintaining lipid and amino acid production.
14. Robust Spike Sorting of Retinal Ganglion Cells Tuned to Spot and Whole FieldStimuli
Alireza Ghahari, NEI
We propose a spike sorting approach for the data recorded from a microelectrode array during visual stimulation of wild type retinas with spot and whole field square wave stimuli. The purpose of this study is threefold: (1) sorting spikes of retinal ganglion cells into different clusters by using two feature extraction techniques; (2) finding the number of active cells during each whole field stimulation; and (3) classifying those cells into three classical light-sensitive types. The analysis consisted initially of five steps applied individually to each electrode. In the data preparation step, we reduced baseline fluctuations caused by local field potentials, and enhanced signal-to-noise-ratio using an Eigendecomposition technique. Second, an adaptive thresholding method finds spike waveforms, irrespective of their amplitudes and modes of firing. Third, we compare two feature extraction techniques by projecting each detected spike waveform into two feature spaces: i) a spike-shape parametric space and ii) a principal components space. Fourth, the cluster analysis based on a leader-follower and minimum-squared-error clustering algorithms sorts the spikes into different clusters. Subsequently, a template waveform for each cluster type per electrode is defined. Next, to detect the false-positive (FP) and false-negative (FN) errors due to clustering, a number of reliability tests are performed in three iterations on each cluster’s template and its corresponding spikes. The results construct a diagnostic table (D-table) that outlines different failures (FP and FN) that occurred during the initial clustering. Depending on the number of clusters, failures and spikes, a logical decision is made whether a posthoc re-clustering is required. If so, a conjugate gradient method is used to iteratively lessen the FP and FN errors, arisen from the initial clustering. Then, we re-run the reliability tests to see the extent of decay of the errors after re-clustering. The new D-table is examined next to decide, upon little improvement in failures, which cluster or electrode (severe failures) has to be discarded for the rest of analysis. The remaining electrodes are checked by the last iteration of reliability tests that aims to remedy the FP and FN failures. After completing the cluster analysis, we deal with the spatial correlation between clusters of nearby electrodes by a divisive hierarchical clustering algorithm, resulting in the detection of cells that appear as clusters on multiple neighboring electrodes. This allows us to report the number of detected cells. Through the completion of this part, the exact temporal features of each spike per cell per electrode are known. Lastly, we calculate the post-stimulus time histogram for spikes per cluster type, and define a ratio index that indicates whether the light-responsiveness of a cell is on, off, or on-off. At completion, we quantified all FP and FN errors incurred from the full cluster analysis. The results show that the median of total error is below 2.5% using parametric features and below 6% when applying principal components. In addition, in most spot and whole field experiments, these approaches detect the number of cells and identify their types closely. Given that the herein visual stimulation paradigm results in massive, highly correlated RGC bursting combined with robust field potential responses from the retina, we are confident that our algorithm will perform well when extended to more complex stimulation conditions. The herein approach is robust, automatic, considers measures for the correlation of spikes in an electrode and among electrodes, and sorts cell units emitting bursts of spikes. It can be also used for an application of sequential, real-time spike sorting, critical for retinal prosthetics.
15. Traveling waves in the human cortex facilitate associative memory
Vishnu Sreekumar, NINDS
Recent studies have quantified the emergent spatiotemporal patterns of population neuronal activity in the primate cortex during rest. Here, we draw upon the physics of wave propagation to characterize the spatiotemporal organization of activity in the human cortex during memory encoding and retrieval. Thirteen patients with medically refractive epilepsy engaged in a paired associates task. We examined the instantaneous phase of bandpass filtered LFP signals during encoding and retrieval trials at each site on a grid of electrodes implanted on the cortical surface. We calculated phase velocity fields to capture changes in the spatial organization of activity across time and detected both plane waves and synchronous waves. The dominant axes of propagation of plane waves were anterior-posterior and left-right. Beta-band and theta-band plane wave velocities were significantly different, supporting a weakly-coupled oscillator model of propagating waves. Pre-stimulus plane wave directions were related to the likelihood of correctly encoding/retrieving an association. Furthermore, a greater proportion of plane waves before the test cue was related to faster reaction times for successfully retrieved associations. We conclude that pre-stimulus spatiotemporal patterns of brain activity influence learning and memory.
16. p53 Represses Global Gene Expression via c-Myc in the DNA Double-Strand Break Response
Joshua Porter, NCI
In response to DNA double-strand breaks (DSBs), levels of the transcription factor p53 increase and decrease in a series of pulses. p53 pulses have been shown to impact cell fate decisions, but the mechanisms by which the pulses affect target gene expression to generate such cell fate changes are not well understood. We recently showed that p53 generates pulses in the mRNA levels of many of its target genes, including the proto-oncogene MYC. The mRNA and protein levels encoded by MYC mirror those of p53: as p53 levels rise, MYC mRNA and the associated c-Myc protein levels fall, and vice versa. c-Myc has recently been shown to act as a universal amplifier of expressed genes, which suggests that its pulsatile dynamics during the DSB response may function to redistribute global transcription. Using RNA-seq of newly synthesized transcripts in MCF-7 cells, we show that, indeed, 77% of transcripts are produced at a lower rate during the peak of a p53 pulse compared to cells without p53 pulsing, and this effect is attenuated when c-Myc expression is maintained above basal levels. Moreover, preventing MYC repression during the DSB response leads to less cell cycle arrest and more apoptosis, suggesting that MYC repression is necessary for properly regulating cell fate decisions in response to DSBs. Finally, we present evidence that p53 represses MYC by binding to a distal enhancer. The mechanism described here forms a direct, functionally significant link between two of the genes most frequently mutated in cancer.
17. Understanding cancer mortality estimation: Comparison of three data sources in the Caribbean
Camille Morgan, NCI
Estimating cancer morality and burden in low- and middle-income countries is challenged by difficulties in data collection, data reporting, accurate diagnosis, data coding, among others. Two major institutions, the World Health Organization (WHO) and the Institute for Health Metrics and Evaluation (IHME), have undertaken efforts to estimate cancer incidence and mortality statistics for countries and regions around the world, by using available data from cancer registries and vital statistics and through modeling. In December 2016, a report published in Morbidity and Mortality Weekly Report, “Leading Causes of Cancer Mortality in the Caribbean Region,” described a detailed analysis of the number of deaths and age-standardized mortality rate (ASMR) of the top 10 cancer sites for males and for females of 23 islands and territories in the Caribbean and the United States. This report offers a unique opportunity to compare cancer statistics estimated by the MMWR article, the WHO’s GLOBOCAN project, and the IHME’s Global Burden of Disease Study, within the context of the Caribbean region. We compared ASMR for all cancers and for five cancers for males (colorectal, esophageal, lung, pancreatic, and prostate) and five cancers for females (breast, cervical, colorectal, lung, and pancreatic). We find distinct differences in the estimates, most notably a consistent pattern of higher values published by the IHME compared to values published in the MMWR and WHO. We attribute these differences to a few causes: first, the use of two different age-standardization methods between the IHME and MMWR/WHO; second, differences in redistribution of inaccurate cancer codes; and third, noise reduction methods used, given the Caribbean presents the challenge of small populations. This comparison is important for end-users of these data for several reasons. First, as each source presents different estimates, it is important for the range of users (clinicians, epidemiologists, policy-makers, public health practitioners) to understand why these differences exist. Secondly, this analysis serves as an important reminder that estimates are not true values, and users must know the limitations of estimates. Lastly, this analysis offers a unique opportunity to discuss the strengths and weaknesses of different epidemiological methods.
18. Pharmacogenomics Prediction Pipeline (P3) Predicts Potential for Auranofin in Myeloma
Sayeh Gorjifard, NCI
Multiple Myeloma (MM) is a cancer of plasma cells that is largely incurable due to tumor heterogeneity and therapeutic resistance over time. In order to accelerate novel therapeutic development for MM and identify molecular markers predictive of drug response, we have been developing an analysis pipeline, referred to as the pharmacogenomics prediction pipeline (P3), for prediction of drug sensitivity using high-throughput in vitro drug screening (HTS) and large, diverse genomic datasets. The MIPE (Mechanism Interrogation Plates) compound library of ~1900 small molecules (NCATS) was screened in 45 MM cell lines. Cell lines were treated in 1,536-well plates and drug sensitivity measures were calculated based on CellTiter-Glo® Luminescent Cell Viability Assay after 48 hours of drug exposure. A series of genomic profiles (exome and RNA sequencing, CGH copy number variation) and metadata (gene function and pathway enrichment scores) were utilized to generate multivariate predictors of drug sensitivity in the MM cell lines. A predictive modeling strategy with leave-one-out cross-validation (LOOCV) was developed based on the super learning approach (R SuperLearner), which allows to identify an optimal predictor out of a collection of machine-learning algorithms. A total of 34 algorithms were analyzed with respect to prediction of AC50 (the half maximal response concentration) and AUC (the area under concentration-response curve) for each tested compound. Among the best performing algorithms were elastic net and regression trees where the relative risk index (1-RRI) reached 0.67-0.75. There were 15 drugs that had at least one predictor performing at 1-RRI > 0.5. Auranofin was selected as a top candidate compound for further investigation to delineate molecular markers of drug sensitivity since it had the largest number (6) of well performing predictors. Auranofin is a gold complex that has been used as an anti-rheumatic agent. Recent studies have shown that Auranofin can act as a thioredoxin reductase inhibitor, with activity against hematopoietic tumors. In small scale follow-up, we are studying the effect of Auranofin on cell viability of several human myeloma cell lines to validate the predictive results of the algorithms. Using the algorithms, we plan to find and test the biomarkers that Auranofin targets. We are also validating additional drugs predicted by the pipeline. Our approach, combining genomic datasets with high through-put drug screens, allows for the modeling of drug sensitivity for a cancer type, namely multiple myeloma. Thus, the development of the P3 pipeline facilitates the identification of promising treatments for further evaluation. Since many of the compounds in the MIPE library have known mechanisms of action and are FDA-approved, they are readily available for clinical evaluation.
19. Machine learning identifies predictive gene expression signatures of controlled and resistant hypertension using RNA-Seq data
Cihan Oguz, NHGRI
Hypertension (HTN), or persistently high blood pressure (BP), is a medical condition that substantially increases the risk for heart attack, congestive heart failure, chronic kidney disease, and stroke, if left untreated. African-Americans are disproportionately affected by HTN and are more prone to its earlier onset compared to other ethnicities in the United States. In this study, we used random forests (RF) and neural networks (NN) for predictive modeling of HTN with clinical data and RNA-Seq based gene expression data from the peripheral whole blood samples of 180 African-American patients in the Minority Health Genomics and Translational Research Bio-Repository Database (MH-GRID) Network. This cohort was composed of healthy controls, severe controlled hypertensive (SCH) cases with well controlled BP under two or more antihypertensives, and severe resistant hypertensive (SRH) cases with inadequately controlled BP despite three or more antihypertensives. With the aim of identifying the differences between the predictive signatures of SRH and SCH, we first built RF models of HTN by using 28 clinical variables and expression levels of 440 genes (Gene Panel 1) previously identified as putative BP regulators in the literature. Combining clinical and expression data led to improved predictive performance as opposed to using each layer of data in isolation. We then determined SRH-specific and SCH-specific genes, as well as genes highly predictive of both HTN phenotypes. Biological processes linked to inflammation, including cytokine and collagen production were enriched among SRH-specific genes, whereas fundamental BP regulation processes, such as the regulation of cytosolic calcium and vasoconstriction were enriched among SCH-specific genes. In contrast, transcriptional regulatory processes were highly enriched among genes robustly predictive of both HTN phenotypes. Next, we derived an alternative set of genes (Gene Panel 2) that generated significantly more predictive RF models than Gene Panel 1 and compared the predictive processes enriched in the two gene panels to derive biological insights. Finally, we used NN models to verify the performances of the RF-based gene subsets predictive of either HTN phenotype. Our systems-level approach illustrates the potential of multiple machine learning methods as diagnostic and informative tools for modeling HTN. The derived biological insights and the identified phenotype-specific predictive genes have potential implications within the context of HTN treatment for African-Americans."
20. Much too fast? Time-to-event analysis reveals rate of alcohol binge exposure as a marker of risk for alcohol use disorder
Joshua Gowin, NIAAA
Although several risk factors have been identified for alcohol use disorder (AUD), many individuals with these factors do not develop AUD. Identifying early phenotypic differences between vulnerable individuals and healthy controls could help identify those at higher risk. An important factor, the rate of alcohol consumption, particularly to binge levels of exposure, has received little attention. Using a carefully controlled experimental paradigm, we tested the hypothesis that risk factors for AUD, including family history of alcoholism, male sex, impulsivity, and low level of response to alcohol, would predict a faster rate of consumption. This cross-sectional study included 159 young social drinkers who completed a laboratory session in which they self-administered alcohol intravenously. Cox proportional hazards models were used to determine whether risk factors for AUD were associated with the rate of achieving a binge-level exposure, defined as a breath alcohol concentration above 80mg%, during the session. Greater number of relatives with alcoholism (hazard ratio=1.04, 95% CI 1.02 to 1.07), male sex (hazard ratio=1.74, 95% CI 1.03 to 2.93), and higher impulsivity (hazard ratio=1.17, 95% CI 1.00 to 1.37), were all associated with a higher rate of binging throughout the session. Participants with all 3 risk factors had the highest rate of binging throughout the session compared to the lowest risk group (hazard ratio=5.27, 95% CI 1.81 to 15.30). Rapid consumption of alcohol to binge levels may be an early indicator of AUD vulnerability and should be evaluated as part of a thorough clinical assessment.
For questions, contact Esther Asaki (NIH/CIT) (firstname.lastname@example.org).