What is the difference between HTK and HMM?

I was wondering, what’s the difference between HTK and HMM? Are they just the same? Or am I wrong?
Maybe a bit too broad, but not exactly…
HMM stands for Hidden Markov Model and was introduced by Rabiner, Jelinek and Ffrench in a seminal paper (Rabiner 1979, The Problem of Speech Recognition in the Presence of Speaker Variation and Noise). They describe a language model and recommend it as an alternative to the Dynamic Time Warping (DTW) algorithm.
HMM can be defined as a Markov Model, i.e. a system modeled as a set of states and transitions between them. In the context of speech recognition, it can be thought of as a model, where each state encodes a hidden hypothesis of what the listener is going to say. Hidden states are called a posteriori because you are not aware of them, but only what the system emits.
The transition probability from one hidden state to another is represented by the emission probability, which can be computed as the normalized sum of the probability of what happened so far and what the listener heard, weighted with the probability of transitioning to that state.
Let’s see some pseudo-code, to give you an idea:
k = 0
p(toState | state, e) = p(e | state)p(toState)
p(e | state) = coef
p(toState) = [sum(p(e | state)]

Therefore, for each state, you want to calculate the probability of the utterance given the current state, the probability of this state and given the current utterance, and the probability of transitioning to a new state. This is a computationally intractable problem, therefore a better approach is to approximate it.
