Last week I looked at how to create a decision tree in Base SAS. This week I will look at how to turn this into a random forest. A random forest is a collection of decision trees. They are created from random samples of the in sample data. When applied to a new set of data like the out sample the average prediction from the decision trees is taken. This usually results in a more accurate prediction than from using just one decision tree. As with last week I will be using ONS data to predict the outcome of the 2010 general election.
Below is the code from last week for generating a decision tree. This time I have amended it to be a macro that produces 100 trees. Each of these trees is based on a random selection from the in sample data. Each of the trees is put into a SAS dataset and called tree_ and then the number of the tree.
So having created my 100 decision trees using random samples from the in sample I now apply these trees to the out sample data. Below is again the code from last week and this has also been changed into a macro to run through the prediction from each of the 100 trees.
So the output is the Out Sample data with the addition of 100 columns each of these contains a prediction of the election outcome in the constituency. The below code counts the number of times each prediction appears.
Finally the predicted value is tested for whether it is correct and a table is output counting the correct predictions.
When applying the random forest to the out sample the program predicts the correct outcome in 80% of the constituencies. This is an improvement of 5 points on just using the single decision tree from last week.
Thanks for reading and if you like this article please remember to hit the like button.