Variable Screening


Once the pre-processing has been completed one usually conducts an initial examination of the univariate relationships between the dependent variable (chd in our case) and the independent variables (all of the variables other than chd in our case). The main focus of this examination of the data is dimension reduction. We want to exclude any variables for which there is no evidence of there being any relationship to chd.


Variable screening might be described as judicious application of Occam's Razor

From Wikopedia
Occam's razor (also Ockham's razor or Ocham's razor; Latin: lex parsimoniae "law of parsimony") is the problem-solving principle that, when presented with competing hypothetical answers to a problem, one should select the answer that makes the fewest assumptions. The idea is attributed to William of Ockham (c. 1287–1347), who was an English Franciscan friar, scholastic philosopher, and theologian.



A simple screen, where appropriate, is the t-test.



We will see later that Spearman Correlation and Hoeffding's D are helpful screening devices.



If there are not too many variables, univariate logistic models may be used for screening variables.



Again, if there are not too many variables, a lot can be learned from logit plots.



Smoothed plots, will be used in another context later.



An introduction to variable clustering, Part 1



An introduction to variable clustering, Part 2



Homework
Perform variable screening on the lipid2018_b data set.

NOTE:
The submission for this homework should be a text document (word, rtf, plain text, etc.). It should in the form of a report and include the programs you used. In this report explain the methods used for variable screening. List any variables that were deleted from consideration in the initial model and provide an explanation of why they were deleted.



The slides used in the videos are found here