Savoga

Likelihood


This method consists on finding the parameter that maximizes the likelihood of an event. The event here is to observe some data. It is usually done when we know the type of law of a random variable (uniform, gaussian etc.) and we are looking for the parameter that maximizes the likelihood ($\approx$ probability) that an event occurs.

$L(\theta; x_1,…,x_n) = \prod_{i=1}^{n}f(x_i;\theta)$ which is the product of densities across all samples.

In discrete form: $L(\theta; x_1,…,x_n) = \prod_{i=1}^{n}\mathbb{P}(X = x_i; \theta)$

Note (wording clarification): $L(\theta | X) = \mathbb{P} (X | \theta)$

  • $\mathbb{P} (X | \theta)$: the probability of observing an event with fixed model parameters.

  • $L(\theta | X)$: the likelihood of the parameters taking certain values given that we observe an event.

Intuitively, we want to find the $\theta$ that maximizes a certain event, that is, obtaining some data $X$ (which is why we have $X | \theta$).

We often use the log in order to get rid of power coefficients appearing with the product.

likelihood equation: $\frac{d}{d\theta}ln(L(x_1,…,x_n;\theta))=0$

Note: in machine learning, we use likelihood maximization in unsupervised learning when we want to estimate parameters of a distribution sample (generative models).

Example

You toss a coin 10 times and get 8 tails and 2 heads. The coin has a probability $\theta$ to give a tail. What is the most likely $\theta$ based on the observed results?

Since it’s a binary choice, $X \sim Bernoulli(\theta)$.

\[\begin{align*} L(\theta; x_1,...,x_{10}) &= \prod_{i=1}^{10}\mathbb{P}(X = x_i; \theta) \\ &= \mathbb{P}(X = "tail"; \theta) \mathbb{P}(X = "tail"; \theta) ... \mathbb{P}(X = "head"; \theta) \\ &= \theta^8 (1-\theta)^{2} \end{align*}\]

Using the log for simplification:

$\log L(\theta; x_1,…,x_{10}) = 8 \log(\theta) + 2\log(1-\theta)$

Since we are looking at the maximum: $\frac{d}{d\theta} \log L(\theta; x_1,…,x_{10}) = 0$

$=> 8 \frac{1}{\theta} + 2 \frac{-1}{1-\theta} = 0$

$=> 8 (1-\theta) = 2 \theta$

$=> \theta = \frac{8}{10}$

The most likely probability of the coin to give these observations is $\theta = \frac{8}{10}$. That would be a loaded coin.