Reproducible Research


A research team led by Duke University has engaged in research to examine the host (body) response to viruses. In the course of that research we have collected blood samples in human challenge studies. Four different Institutional Review Boards have approved these challenge studies. The host response has been investigated in terms of the gene-expression response. In the interest of encouraging other investigators to reproduce our results, and to build upon them and find new discoveries, on this webpage we provide all data and the software used to produce every figure in our papers. The software is in Matlab, and the data are in the form of associated .mat files. The data have been normalized from the raw expression values, to constitute the data posted here. Interested individuals may contact Lawrence Carin ( to learn details of how the normalization was done; standard techniques were applied. The raw data are available in GEO (accession no. GSE17156), if one wishes to start from the raw data.


For the following three papers are:


A.K. Zaas, M. Chen, J. Varkey, T. Veldman, A.O. Hero III, J. Lucas, R. Turner, A. Gilbert, C. Oien, B. Nicholson, S. Kingsmore, L. Carin, C.W. Woods, and G.S. Ginsburg, Gene Expression Signatures Diagnose Influenza and Other Symptomatic Respiratory Viral Infections in Humans, Cell Host and Microbe, 2009.


M. Chen, D. Carlson, A. Zaas, C. Woods, G. Ginsburg, A. O. Hero III, J. Lucas, and L. Carin, Detection of Viruses via Statistical Gene-Expression Analysis, IEEE Transactions on Biomedical Engineering, 2010.


B. Chen, M. Chen, J. Paisley, A. Zaas, C. Woods, G.S. Ginsburg, A. Hero III, J. Lucas, D. Dunson, and L. Carin, Bayesian inference of the number of factors in gene-expression analysis: application to human virus challenge studies, BMC Bioinformatics, 2010.


The data and Matlab software needed to reproduce every figure in these papers are here (517 MB).


We also present an example here, based on toy data, which shows the ability of the model to infer the number of factors present. In this example we consider a case for which the data are pure noise, and demonstrate that the model is able to infer that no factors are present.





For the following paper:


C.W. Woods, M.T. McClain, M. Chen, A.K. Zaas, B.P. Nicholson, J. Varkey, T. Veldman, S.F. Kingsmore, Y. Huang, R. Lambkin-Williams, A.G. Gilbert, A.O. Hero III, E. Ramsburg, S. Glickman, J.E. Lucas, L. Carin, and G.S. Ginsburg, A Host Transcriptional Signature for Presymptomatic Detection of Infection in Humans Exposed to Influenza H1N1 or H3N2, PLOS ONE, 2013


The data and the code are here (205 MB; the main function to run is Flu_Validate.m ).