Decision Tree Optimization
Decision Tree Optimization Parameters Explained
Here are some of the most commonly adjusted parameters with Decision Trees. Let’s take a deeper look at what they are used for and how to change their values:
criterion: (default: gini) This parameter allows choosing between two values: gini or entropy. This parameter applies to both regression and classification decision trees.
While these two values usually yield very similar results there is a performance difference. Gini is usually the faster route since entropy uses a logarithmic algorithm.
Without diving into mathematics, gini en entropy can be explained as two similar formulas about inheriting information. Inheritance here refers to passing information from parent tree to children trees (nodes) at every splitting step. Which brings us to the 2nd important parameter: splitter.
While entropy might have the upper hand in exploratory analysis, Gini can be advantageous for reduced false-classifications.
splitter: (default: best) Splitter parameter can be used to define the split strategy and takes two values: best or random.
According to scikitlearn documentation best will choose the best split and random will choose the best random split. But what does this mean?
Best will initiate splits on each the best feature that makes the provides the most information after the split while random will base the splitting strategy on random features.
This comes with different consequences as best will provide reduced computation needs hence more efficiency. Random might be useful when the analyst is confident that all features provide equal or similar importance to the classification.
max_depth: (default: None) This parameter signifies the maximum depth of the decision tree.
When left at default (None), nodes will be expanded until all leaves are pure or they contain samples less than the amount of min_samples_split.
This parameter can also take an integer value.
DT = DTC(criterion= "entropy")
DT = DTC(max_features="sqrt")
DT = DTC(splitter="random")
DT = DTC(criterion= "gini", max_depth= 5)
More Decision Tree Optimization Parameters for fine tuning
Further on, these parameters can be used for further optimization, to avoid overfitting and make adjustments based on impurity: