Back to top

me | blogs | notes | tags | categories | feed | home |

entropy



tags: entropy
categories: machine learning


Entropy is a measure of disorder. A high entropy is essentially saying that the data is scattered around while a low entropy means that nearly all the data is the same.

  • In information theory, information entropy is the log-base-2 of the number of possible outcomes for a message.

    • \(- \log_{2} p\)
  • Based on our dataset we can say

    • It is an indicator of how messy your data is.
    • Characterizes the (im)purity of an arbitary collection of examples.
  • Given a discrete random variable X, with possible outcomes \(x_{1},…,x_{n}\) which occur with probability \( {P} (x_{1}),…, {P} (x_{n}), \) the entropy of X is formally defined as:

    • \( H(X) = - \sum_{i=1}^{n} P(x_i) log_2 P(x_i) \) ,

What is the point of entropy in decision tree?

At each step, each branching, you want to decrease the entropy, so this quantity is computed before the cut and after the cut. If it decreases, the split is validated and we can proceed to the next step, otherwise, we must try to split with another feature or stop this branch.[3]

Application