Bayes Classifier

$g$ is the classifier.

$g: \mathcal{X} \to \mathcal{Y}$ $~~~~~~~~~~\mathbb{R}^d \to \{0,1\}$

To model the learning problem, we use the pair $(X,Y)$ described by $(\mu, \eta)$ where $\mu$ is the probability measure:

\[\mu(A) = \mathbb{P}(X \in A)\]

And $\eta$ is the regression of $Y$ on $X$:

\[\eta(X) = \mathbb{P}(Y=1 | X=x) = \mathbb{E}[Y | X=x]\]

$\eta$ is also called the a posteriori probability.

The Bayes classifier’s decision function is:

\[\text{label (depends on } x \text{)} = g(x) = \left\{ \begin{array}{ll} 1 & \text{if}\ \eta(x) > 1/2 \\ 0 & \text{otherwise} \end{array} \right.\]

Or, if $\mathcal{Y}$ is $\{-1,1\}$, we can directly write the decision function in one row, handling both labels: $g(x) = 2 \unicode{x1D7D9} \{ \eta(x)>1/2 \}-1$.

Intuitively, the Bayes classifier is simply the classifier choosing the most probable label looking at the past. If there are only 2 classes, it’s the one associated with a probability higher than $0.5$.

Theorem

For any classifier g: $\mathbb{R}^d \to {0,1}$, $\mathbb{P}(g^*(X) \neq Y) \le \mathbb{P}(g(X) \neq Y)$

In other words, the Bayes classifier is theorically the best classifier.

Proof: express $\mathbb{P}(g(X) \neq Y) - \mathbb{P}(g^*(X) \neq Y)$ in terms of dummies (use complementaries) and show that it is superior to 0.