Note 4 – Feasibility of Learning

Hoeffding’s Inequality
1. $\mathbb{P}[|\nu - \mu| > \epsilon] \le 2exp(-2\epsilon^2N)$ .
2. Unlike central limit therom, it holds for all N. The statement $\nu = \mu$ is probably approximately correct (PAC). The professor simplified the formula based on the factor the output lies in the interval $[0, 1]$ . Consequently $b_i - a_i = 1$ .
Connection to Learning
1. Concept Mapping
  - The probability $\mu$ : fixed hypothesis h(x) = target f(x).
  - The sample space: the input space
  - The sample equals to 1: h(x) is true
  - The sample equals to 0: h(x) is wrong
  - Sampling: check h(x) on the training data set (with i.i.d inputs)
2. Conclusion
  For any fixed h, can probably infer unknown
  $E_{out}(h) = \epsilon_{x~P}[h(x) \ne f(x)]$ by
  $E_{in}(h) = \frac{1}{N}\sum_{n-1}^N[h(x_n) \ne y_n]$
  $\mathbb{P}[|E_{in}(h) - E_{out}(h)| > \epsilon] \le 2exp(-2\epsilon^2N)$ .
3. Problem
  The previous discussion assumes a fixed hypothesis. Actually, if there are multiple hypotheses, there can always a probability that the hypothesis fits the training data set well but not the actual problem (overfitting). To overcome this problem, we need a large data set $N$ to reduce the probability.