entropy

Created Date : 2021 March 06

Last Modified : 2022 February 26

References :
- https://www.wikiwand.com/en/Entropy%5F(information%5Ftheory)
- Entropy in image, scikit-image
[3] https://bricaud.github.io/personal-blog/entropy-in-decision-trees/
Questions :
- What is the point of entropy in decision tree?

Entropy is a measure of disorder. A high entropy is essentially saying that the data is scattered around while a low entropy means that nearly all the data is the same.

In information theory, information entropy is the log-base-2 of the number of possible outcomes for a message.
- \(- \log_{2} p\)
Based on our dataset we can say
- It is an indicator of how messy your data is.
- Characterizes the (im)purity of an arbitary collection of examples.
Given a discrete random variable X, with possible outcomes \(x_{1},…,x_{n}\) which occur with probability \( {P} (x_{1}),…, {P} (x_{n}), \) the entropy of X is formally defined as:
- \( H(X) = - \sum_{i=1}^{n} P(x_i) log_2 P(x_i) \) ,

What is the point of entropy in decision tree?

At each step, each branching, you want to decrease the entropy, so this quantity is computed before the cut and after the cut. If it decreases, the split is validated and we can proceed to the next step, otherwise, we must try to split with another feature or stop this branch.[3]

Application

In image filtering entropy is used for texture analysis
In decision tree

⇦ ⇨

entropy

What is the point of entropy in decision tree?

Application

Links to this note