Decision Tree: Gini Impurity

Gini Impurity is somewhat similar to entropy in Decision Tree. In decision tree, both methods are used for building the tree by splitting as per the feature but there is quite a difference in using both methods. Gini Impurity is used to determine how the feature of the dataset should split nodes from the tree. These divisions can be termed as pure if all the elements are accurately separated into different classes. 

The Gini Impurity is used in predicting the likelihood that a randomly selected example would be incorrectly classified by a specific node. It is called "Impurity" because it shows how the model differs from the pure node.

The degree of Gini Impurity ranges from 0 to 1 where 0 shows that all the elements belong to a single class and 1 shows that all the elements are randomly distributed in various classes whereas 0.5 shows that all the elements are uniformly distributed across some class. 

Gini Impurity=1-\sum_{i=1}^{n} p_{i}^{2}

This formula was given by Leo Breiman in 1984.

Steps to Calculate Gini Impurity

To Calculate Gini Impurity follow these steps:

Firstly Calculate the Gini Impurity of all sub-nodes:

Gini Impurity=1-\sum_{i=1}^{n} p_{i}^{2}

Gini Impurity = 1 - Gini

Consider that there are n classes in the given datasets. Thus, here is the sum of all the possible probabilities of each class.

Gini=(p_{1}^{2}+p_{2}^{2}+p_{3}^{2}+p_{4}^{2}+p_{5}^{2}+p_{6}^{2}+..............+p_{n}^{2})

Once done with the calculation of Gini Impurity of sub-nodes. Now we will calculate the weighted Gini impurity of sub-nodes. Here we can decide the weight of a node is the number of samples in that node divided by the total number of samples in the parent nodes. 

Then similarly we will calculate the Gini impurity of all splits using the weighted impurity of both sub-nodes of that split. 

The split producing the minimum value Gini Impurity will be selected as the final split. As we know, the minimum value of Gini Impurity shows that the node will be purer and more homogeneous.