Part of list:

Hidden Markov Models

- Weather example
- Problems
- Forward-backward algorithm: Problems (a) and (b)
- Viterbi algorithm: Problems (c)
- Baum-Welch algorithm: Problem (d)
- Applications

Hidden Markov Models[ Edit ]

Let us imagine a dynamic system whose future states only depend on the current one. Such a stochastic process is called a **Markov process**, and it is said to satisfy the **Markov property (“memoryless”)**:

p(h_{t+1} |h_1,..., h_t) = p(h_{t+1}|h_t)

Now, let us assume that each real state is not directly available, but an observation of it that works as an indicator. In fact, this is a realistic scenario. For example, a system with noise-corrupted measurements or a process that cannot be completely measured. There is an uncertainty about the real state of the world, which is referred to as **hidden**. A **Hidden Markov Model (HMM)** serves as a probabilistic model of such a system.

Let H be the *latent, hidden* variable that evolves over time. Let O be the random variable over its *observations,* also known as the *output sequence. *Graphically, the system at time steps {1, …, T} can be seen as follows:

Parameters of the model are **θ = (π, A, B)**, with:

- the initial state vector π of elements p(h
_{1}) - the state transition matrix A of probabilities p(h
_{t}| h_{t-1}) - the emission or observation matrix B of elements p(o
_{t}| h_{t})

Let us see a simple example. David and his twin brother Eric talk daily over phone about what the did during the day. Eric is considering moving back to their hometown (he currently lives abroad) and would like to know whether it was a sunny or a rainy day. Let H be the random variable that describes the weather a given day, whose states are hidden to Eric. There are two possibles states: rainy and sunny. Let O={walk, shop, clean} be the possible activities David performed and later told his brother. Following figure describes the system with all transition and emission probabilities. For example, given that it was sunny yesterday, there is a 40% probability that today it will rain (transition probability).

There are four major types of questions one can ask when modeling a situation using HMMs

**Inference problems (known θ)**

- a) Likelihood of an observation sequence: p(O) = p(o
_{1}, o_{2}, ..., o_{T}) - b) Hidden state inference (most probable state given a set of observations): h
_{t}* = argmax_{h_t}p(h_{t}| O) - c) Most likely explanation (most probable path of hidden states that results in a sequence of observations): H* = argmax
_{H}p(H, O)

**Learning (unknown θ)**

- d) Train the model parameters, i.e., given an output sequence, find the best set of transition and emission probabilities:
**θ* =**argmaxp(O)_{θ}

The HMM model has been *designed precisely *such that there are* efficient *ways of computing the answers to all the above problems.

This is a procedure based on dynamic programming to solve some of the above problems. Lets introduce the following auxiliary variables:

\alpha_t (h_t) = p(h_t, o_1, ..., o_t),\ \ \beta_t (h_t)= p(o_{t+1}, ..., o_T | h_t)

The idea is to compute in two passes their values for all *t. *The equations below can be derived using Markov property. First pass goes __forward__ in time:

\begin{aligned}
&\alpha_1 (h_1) = p(h_1)\ p(o_1|h_1) \\
&\alpha_{t+1} (h_{t+1}) = \sum_{h_t} \alpha_t(h_t) \ p(h_{t+1} | h_t) \ p(o_{t+1} | h_{t+1})
\end{aligned}

And second pass goes __backward__:

\begin{aligned}
&\beta_1 (h_T) = 1; \\
&\beta_{t-1} (h_{t-1}) = \sum_{h_t} p(h_t|h_{t-1}) \ p(o_t | h_t) \ \beta_t(h_t)
\end{aligned}

__Solution to problem (a):__ (only uses the forward-pass)

p(O) = p(o_1, ..., o_T) = \sum_{h_T} p(o_1, ...,o_T|h_T)= \sum_{h_T} \alpha_T(h_T)

__Solution to problem (b)__:

\begin{aligned}
&p(h_t|O)p(O) = p(h_t, O) = \alpha_t(h_t)\beta_t(h_t) \\
&\Rightarrow p(h_t|O) = \dfrac{\alpha_t(h_t)\ \beta_t(h_t)}{\sum_{h_T} \alpha_T}
\end{aligned}

Let us define the probability of the most likely path through state h_{t} = k

\begin{aligned}
&V_1(k) = p(o_1|h_1=k)\ p(h_1=k) &(1) \\
&V_t(k) = p(o_t|h_t=k)\ \max_x p(h_t=k|h_{t-1}=x) \ V_{t-1}(x) &(2)
\end{aligned}

__Solution to c)__:

This technique works by recursively computing V_{t}(k) until T to find the most likely path based on a sequence of observations:

# Initializationcalculate V_{1}(k) using (1)# Iterationfor t=2, ..., Tcalculate V_{t}(k) by applying (2)# Terminationcalculate p(H, O) as the maximum over all k at step T: max V_{T}(k)

This procedure instantiates the expectation-maximization algorithm by using the forward-backward inference in the Expectation-step. It is used to __solve the learning problem (d)__, which is approached by deriving the maximum likelihood estimate of the parameters θ.

Given: (Long) observed sequence OStart with θ = transition and emission probabilities = uniform distribution1. Compute the expected values of the hidden states given θ [E-step]2. Re-estimate θ = maximum likelihood estimate given hidden states [M-step]Repeat till convergence

In step 2, the maximum likelihood estimate for each probability distribution is simply obtained by counting and normalizing.

HMMs are one of the best known algorithms for

__automatic speech recognition__systems, where the goal is to reconstruct spoken words (hidden states) from an acoustic signal (observations)__part-of-speech tagging__in natural language texts, which aims to assign grammatical categories such as noun, verb, adjective (hidden states) to words in a text (observations)

Read more…(692 words)

Mark as completed

Part of lists:

Previous

Naive Bayes (and text classification)

Next

Recommendation Systems and Matrix Factorization

About the contributors:

Marta EnescoData Scientist, Graduate Research Assistant at University of Potsdam

72%

Keshav DhandhaniaMSc in Deep Learning @ MIT (2014)

28%

Loading…

Have a question? Ask here…

Post

Part of list:

Hidden Markov Models

- Weather example
- Problems
- Forward-backward algorithm: Problems (a) and (b)
- Viterbi algorithm: Problems (c)
- Baum-Welch algorithm: Problem (d)
- Applications

Contributors

Marta EnescoData Scientist, Graduate Research Assistant at University of Potsdam

72%

Keshav DhandhaniaMSc in Deep Learning @ MIT (2014)

28%

Ready to join our community?

Sign up below to automatically get notified of new lists, get **reminders** to finish ones you subscribe to, and **bookmark** articles to read later.

Continue with Facebook

— OR —

Your Full Name

Email address

I have an account. Log in instead

By signing up, you agree to our Terms and our Privacy Policy.