Predictive Modeling, Variable Screening

An important idea of modeling with lots of independent variables is to reduce the dimensionality of the data. One method for dimension reduction is variable clustering. Although, I introduced this topic earlier we did not use it on the chd2018 data set. Here we will apply it to the develop data set.

Introduction

Using SQL to define the variables to cluster.

Using Using PROC VARCLUS to cluster the variables.

Define the reduced set of variables based on the clustering results. Here I arbitrarily selected one variable from each cluster that contained multiple variables.

The slides used in the videos are found here