
Bayes Classifier

$g$ is the classifier.

\(g: \mathcal{X} \to \mathcal{Y}\) \(~~~~~~~~~~\mathbb{R}^d \to \{0,1\}\)

To model the learning problem, we use the pair $(X,Y)$ described by $(\mu, \eta)$ where $\mu$ is the probability measure:

\[\mu(A) = \mathbb{P}(X \in A)\]

And $\eta$ is the regression of $Y$ on $X$:

\[\eta(X) = \mathbb{P}(Y=1 | X=x) = \mathbb{E}[Y | X=x]\]

$\eta$ is also called the a posteriori probability.

The Bayes classifier’s decision function is:

\[\text{label (depends on } x \text{)} = g(x) = \left\{ \begin{array}{ll} 1 & \text{if}\ \eta(x) > 1/2 \\ 0 & \text{otherwise} \end{array} \right.\]

Or, if $\mathcal{Y}$ is \(\{-1,1\}\), we can directly write the decision function in one row, handling both labels: \(g(x) = 2 \unicode{x1D7D9} \{ \eta(x)>1/2 \}-1\).

Intuitively, the Bayes classifier is simply the classifier choosing the most probable label looking at the past. If there are only 2 classes, it’s the one associated with a probability higher than $0.5$.


For any classifier g: $\mathbb{R}^d \to {0,1}$, \(\mathbb{P}(g^*(X) \neq Y) \le \mathbb{P}(g(X) \neq Y)\)

In other words, the Bayes classifier is theorically the best classifier.

Proof: express $\mathbb{P}(g(X) \neq Y) - \mathbb{P}(g^*(X) \neq Y)$ in terms of dummies (use complementaries) and show that it is superior to 0.