Back to top

me | blogs | notes | tags | categories | feed | dlog | home |

ML Model Evaluation

tags: accuracy f1 score dice score precision recall
categories: Machine Learning

ml model evaluation methods


True Positive (TP)

  • Positive class correctly labeled/predicted

False Negative (FN)

  • Positive class incorrectly labeled/predicted

False Positive (FP)

  • Negative class incorrectly labeled/predicted

True Negative (TN)

  • Negative class correctly labeled/predicted


  • It is simply a ratio of correctly predicted observation to the total observations.
  • Accuracy = \(\frac{TP + TN}{TP + TN + FN + FP}\)


  • Precision = \(\frac{True Positive}{True Positive + False Positive}\)
  • From all the postive prediction given by our hypothesis/model how many examples were true positive


  • Recall = \(\frac{True Positive}{True Positive + False Negative}\)
  • From all the postive examples how many examples were correctly classified by our hypothesis/model

F1 Score

  • A harmonic mean between recall and precision

    • Why ?
      • Tries to give the lowest value between recall and precision
      • biased to the lowest value
      • Balances recall and precision
  • F1 Score = \(\frac{2}{\frac{1}{Precision} + \frac{1}{Recall}}\)

  • From Wikipedia

    • In information retrieval and machine learning, the harmonic mean of the precision and the recall is often used as an aggregated performance score for the evaluation of algorithms and systems: the F-score (or F-measure). This is used in information retrieval because only the positive class is of relevance, while number of negatives, in general, is large and unknown.[14] It is thus a trade-off as to whether the correct positive predictions should be measured in relation to the number of predicted positives or the number of real positives, so it is measured versus a putative number of positives that is an arithmetic mean of the two possible denominators.

Dice Score

  • It is F1 Score

  • Dice Score = \(\frac{2 * Intersection}{Union + Intersection}\)

    = \(\frac{2*TP}{2*TP + FP + FN}\)

  • For Image Segmentaion evaluation

Confusion Matrix

  • The scikit learn confusion matrix representation will be a bit different, as scikit learn considers
    • the actual target classes as columns
    • the predicted classes as rows,

Classfication Report

  • It shows a representation of the main classification metrics on a per-class basis.
  • The classification report displays the precision, recall, F1, and support scores for the model.
  • These metrics are defined in terms of true and false positives, and true and false negatives.