# Random Forest Optimization & Parameters

#### Random Forest Optimization Parameters Explained

- n_estimators
- max_depth
- criterion
- min_samples_split
- max_features
- random_state

- Here are some of the most significant optimization parameters you can adjust and play with when you’re working with
**Random Forests**. - Considering the similarities, it’s no surprise that some of the parameters are identical or very similar to decision trees.
- However, Random Forests also have their own unique parameters that can be important since
**forest**is bigger and more complex than the**trees.** **Random Forests**can also be computationally costly. I will share some exclusive tips that you can use to make your random forest more efficient and lightweight!

**n_estimators**: (default **100**), this parameter signifies the amount of trees in the forest. This is probably the most characteristic optimization parameter of a random forest algorithm.

**max_depth**: (default **None**) Another important parameter, max_depth signifies allowed depth of individual decision trees.

It can take an **integer** value. It can also take **None, **in which case nodes will continue to be expanded until all leaves are pure or contain less samples than **min_samples_split**.

**min_samples_split**: (default **2**) This is the minimum number of samples required for a node split.

It can take an **integer** or float value, integer being the more straightforward approach.

**criterion**: (default **gini**) Criterion is the same as in decision tree algorithm.

It allows choosing between two values: ** gini **or

*and it’s gini by default.*

**entropy****This parameter applies to both regression and classification decision trees.**

While these two values usually yield very similar results there is a performance difference. Gini is usually the faster route since entropy uses a logarithmic algorithm.

Without diving into mathematics, gini en entropy can be explained as two similar formulas about inheriting information. Inheritance here refers to passing information from parent tree to children trees (nodes) at every splitting step. Which brings us to the 2nd important parameter: splitter.

**While entropy might the upper hand in exploratory analysis, Gini can be advantageous for reduced false-classification.**

*This parameter is the same as in Decision Trees.*

**max_features**: (default **auto**) This parameter concerns the best split scenario. max_features defines how many features should be considered when looking for the best split. It can take these values: None, “auto”, “sqrt”, “log2”, int or float.

Best split will be considered with max features of:

- Total number of features, if
__None__ - Certain number of features, if
__int__ - Square root of total features, if
or__sqrt____auto__ - A fraction of features, if
__float__ - log2 of features, if
__log2__

random_state: (default: **None**) This decision tree parameter defines the seed options for randomness used to shuffle input data.

*none*: seed will be RandomState instance of numpy’s random module: *numpy.random*

*int*: seed will be random_state used by *random number generator*

*RandomState *instance: *random_state* will be the random number generator (seed)

## Examples:

`GC = RFC(n_estimators=200)`

`GC = RFC(max_depth=5)`

`GC = RFC(min_samples_split=3)`

`GC = RFC(n_jobs=-1)`

`GC = RFC(warm_start=True)`

`GC = RFC(verbose=2)`

# More parameters

#### Some more Random Forest Optimization Parameters for fine tuning

Further on, these parameters can be used for further optimization, to avoid inefficiency and make adjustments based on how data is handled or forest is constructed:

### bootstrap

*(default: True)*__True__: Trees are built with bootstrap samples. __False__: Trees are built with the whole dataset.

### verbose

*(default: 0 )*

Signifies information printed while building trees. (Verbosity)

__0__: Least information

__1__: More information

__2__: Even more information### oob_score

*(default: False)**Stands for out-of-bag score.*__True__: generalization accuracy estimation will be done with out-of-bag samples

### warm_start

*(default: False)*__False__: Fits a new forest__True__: Adds estimators and fits by using the solution of previous call

### n_jobs

*(default: None)*

Signifies number or jobs to be run in parallel.__None__: 1 job will be run in parallel__int__: jobs run in parallel will be the integer provided.__-1__: All processors will be used for the task.

### class_weight

*(default: None)*

Assigns class weights__None__: All classes have the same weight of 1.__“balanced”__: Class weights will be automatically balanced based on y values.__dict__ or __list of dicts____:__ Assigns custom class weight values based on sequence provided (list of dicts for multi-output problems).

I hope you found this Decision Tree Optimization Tutorial useful. Check out more useful resources about Random Forests we have, or take a look at our special Machine Learning guide: all the different Machine Learning Algorithm Tutorials with Python.

Official Scikit Learn Documentation: sklearn.ensemble.RandomForestClassifier