A root node: this is the node that begins the splitting process by finding the variable that best splits the target variable.A target variable such as diabetic or not and its initial distribution. It comprises of the following components: Understanding components of a Decision TreeĪ decision tree is a branching flow diagram or tree chart. Let us begin with understanding the various elements of a decision tree. Target Variable Splitting process (Image source: author) For example, if the target variable is binary, with categories 1 and 0 ( shown by green and red dots in the image below, then the decision tree works to split the target variable space into sub groups that are more homogenous in terms of having either 1’s or 0’s. The first thing to understand in Decision Trees is that they split the predictor space, i.e., the target variable into different sub groups which are relatively more homogenous from the perspective of the target variable. Unlike other classification algorithms such as Logistic Regression, Decision Trees have a somewhat different way of functioning and identifying which variables are important. We also have 1000 patient records to help us develop an understanding of which features are most useful in predicting. This is obviously a prediction problem for a new patient. For example, we are trying to classify whether a patient is diabetic or not based on various predictor variables such as fasting blood sugar, BMI, BP, etc. Let’s begin at the real beginning with core problem. Why do trees overfit and how to stop this Splitting criteria: Entropy, Information Gain vs Gini IndexĤ. What is a decision tree: root node, sub nodes, terminal/leaf nodesĢ. This post is therefore more like a tutorial or a demo where I will work through a toy dataset that I have created to understand the following:ġ. The best way to understand Decision Trees is to work through a small example which has sufficient complexity to be able to demonstrate some of the common points one suddenly goes, ‘ not sure what happens here…?’. ![]() As you can see there are lots of tricky problems on which you can get stuck on. The criterion for selecting variables and hierarchy can be tricky to get, not to mention Gini index, Entropy ( wait, isn’t that physics?) and information gain (isn’t that information theory?). But, the seemingly intuitive interface hides complexities. Iris Decision Tree from Scikit Learn ( Image source: sklearn)ĭecision Trees are a popular and surprisingly effective technique, particularly for classification problems.
0 Comments
Leave a Reply. |