Predictive Modeling, Introduction


All analyses begin with some pre-processing. The methods used for predictive analyses will often mirror what we did previously. For the develop data set, the first things we do will be to examine the data, do a simple variable screening, and deal with missing data for numeric variables.



One always starts with an initial examination of the data.



A simple screening of dichotomous variables allows us to drop three variables from consideration.



An examination of missing data for numeric variables.



Median imputation with missing indicators



Other possibilities for imputing missing values


Homework
Do an initial examination of the data set pva_raw_data
Create a copy of the data set in a folder on your computer.
Create a new data set pva_a in the same folder, dropping the variable target_d from the data set.
Examine numeric data on pva_a to determine which variables have missing values.
Create a new version of pva_a that has impute missing data and includes missing value indicators. You may use median imputation.
NOTE:
For this assignment, you need only submit programs used for the exercise that are commented for readability as necessary.



The slides used in the videos are found here