Stay humble. Stay hungry. Stay foolish.

  1. Hoeffding’s Inequality
    1. \mathbb{P}[|\nu - \mu| > \epsilon] \le 2exp(-2\epsilon^2N).
    2. Unlike central limit therom, it holds for all N. The statement \nu = \mu is probably approximately correct (PAC). The professor simplified the formula based on the factor the output lies in the interval [0, 1]. Consequently b_i - a_i = 1.
  2. Connection to Learning
    1. Concept Mapping
      • The probability \mu : fixed hypothesis h(x) = target f(x).
      • The sample space: the input space
      • The sample equals to 1: h(x) is true
      • The sample equals to 0: h(x) is wrong
      • Sampling: check h(x) on the training data set (with i.i.d inputs)
    2.  Conclusion
      For any fixed h, can probably infer unknown
      E_{out}(h) = \epsilon_{x~P}[h(x) \ne f(x)] by
      E_{in}(h) = \frac{1}{N}\sum_{n-1}^N[h(x_n) \ne y_n]
      \mathbb{P}[|E_{in}(h) - E_{out}(h)| > \epsilon] \le 2exp(-2\epsilon^2N).
    3. Problem
      The previous discussion assumes a fixed hypothesis. Actually, if there are multiple hypotheses, there can always a probability that the hypothesis fits the training data set well but not the actual problem (overfitting). To overcome this problem, we need a large data set N to reduce the probability.

 

 

Tags

Leave a comment