Hans Rosling Center for Population Health
3980 15th Avenue NE
Seattle, WA 98195-1617
In a biomedical setting, understanding how a patient’s genome relates to their clinical information is of great interest, and one method to do this is canonical correlation analysis, a statistical method that models the linear relationship between two datasets. However, if scientists seek to explore the relationship two high-dimensional datasets (n<<p), they are unable to test if there is a true linear relationship between these datasets. Because scientists do not expect all features to contain relevant information, a natural solution might be to select subsets of features from both datasets that are highly correlated with one another. I am curretly working with Dr. Daniela Witten and Dr. Arkajyoti Saha is to derive a hypothesis test that controls type I error for testing if there truly is a linear association between datasets by conditioning on this initial feature selection.