Notes on the making of this code/R markdown file, and total run-time are presented at the end of this document. To download the data and the source code, click here.
## Loading required package: viridisLite
Part I presents the mplemention of the EM algorithm from scratch for a p-dimensional Gaussian mixture model with G components:
\[\sum_{k=1}^{G} p_k \cdot N(x; \mu_k, \Sigma)\]
This implementation consists of four functions:
Estep
function: This function should return an \(n \times G\) matrix, where the \((i,j)\)th entry represents the conditional
probability \(P(Z_i = k | x_i)\). Here
\(i\) ranges from 1 to \(n\) and \(k\) ranges from 1 to \(G\).
Mstep
function: This function should return the
updated parameters for the Gaussian mixture model.
loglik
function: This function computes the
log-likelihood of the data given the parameters.
myEM
function (main function): Inside this function,
you can call the Estep
, Mstep
, and
loglik
functions. The function should take the following
inputs and return the estimated parameters and log-likelihood:
data
: The dataset.itmax
: The number of iterations.prob
: A \(G\)-dimensional probability vector \((p_1, \dots, p_G)\).mean
: A \(p \times G\)
matrix with the \(k\)-th column being
\(\mu_k\), the \(p\)-dimensional mean for the \(k\)-th Gaussian component.Sigma
: A \(p \times
p\) covariance matrix \(\Sigma\)
shared by all \(G\) components;loglik
: A number equal to \(\sum_{i=1}^{n} \log \left[ \sum_{k=1}^{G} p_k
\cdot N(x; \mu_k, \Sigma) \right]\).Implementation Guidelines:
This implementation:
Estep <- function(X, mu, p, sigma, G) {
## Notes
# INPUT
# **n** is the number of rows in the data matrix X.
# **mu** is a matrix of [p(parameters),G] dimensions, where mu1=mu[,1], mu2=mu[,2], and so on.
# **p** is a vector of [1,G] entries, where p(Z1)=p[1] and so on.
# **sigma** is a [p,p] matrix.
# OUTPUT
# **W** is the output matrix of dimensions [n,G], where each row is associated with an observation, and each column with a certain Gaussian distribution and Z. So the [i,j] element corresponds to the probability that observation Xi comes from the normal dist. Gj.
# Implementation
# Step 0: Definition of parameters
n <- dim(X)[1]
d <- dim(X)[2]
W <- matrix(0, nrow = n, ncol = G)
# Step 1: Calculate the transpose of the data matrix A
A <- t(X)
# Step 2: Compute Σ^(-1)(A - μ) and perform element-wise multiplication Σ^(-1)(A - μ)*(A - μ)
# Step 3: Sum the columns
for (i in 1:G) {
tmp1 <- solve(sigma) %*% (A - mu[i,]) * (A - mu[i,])
tmp2 <- colSums(tmp1)
# Step 4: Calculate the Gaussian/Normal densities
gaussian.density = exp(-1/2 * tmp2) / sqrt((2*pi)^d * det(sigma))
# Step 5: Calculate the distribution of Z
W[,i] = t(p[i] * gaussian.density)
}
# **W**
w=W/rowSums(W)
return(w)
}
Mstep <- function(X, G, w) {
## Notes on the implementation
# Step 0: Definition of parameters
n <- dim(X)[1]
A <- t(X)
new.p <- numeric(G)
new.mu <- matrix(0, nrow = dim(X)[2], ncol = G)
new.sigma <- array(0, dim = c(dim(X)[2], dim(X)[2]))
######
for (i in 1:G) {
# Step 1: Updating the parameters
## Mixing coeffs p
new.p[i] <- sum(w[, i]) / n
## Means mu
new.mu[,i] <- colSums(w[, i] * X) / sum(w[, i])
}
for (i in 1:G) {
## Sigma
tmp.s <- as.matrix(sweep(X, 2, new.mu[,i], FUN = "-"))
new.sigma <- new.sigma + (t(tmp.s) %*% diag(w[,i]) %*% tmp.s) / sum(w)
}
# OUTPUT
return(parameters <- list(p = new.p, mu = new.mu, sigma = new.sigma))
}
log.likihood <- function(X, mu, p, sigma, G, w) {
# Step 0: Definition of parameters
n <- dim(X)[1]
d <- dim(X)[2]
A <- t(X)
# Initialize the log-likelihood
log.likelihood <- numeric(n)
for (i in 1:G) {
tmp1 <- solve(sigma) %*% (A - mu[,i]) * (A - mu[,i])
tmp2 <- colSums(tmp1)
gaussian.density = exp(-1/2 * tmp2) / sqrt((2*pi)^d * det(sigma))
gmm = p[i] * gaussian.density
#log.likelyhood <- w[,i] * log(gmm)
log.likelihood <- log.likelihood + (gmm)
}
log.likelihood=log(log.likelihood)
# if any NaN, =0
log.likelihood[is.na(log.likelihood)] = 0
# Return the log-likelihood
return(sum(log.likelihood))
}
myEM <- function(X, mu, p, sigma, G, itmax){
Estep <- Estep(X, mu, p, sigma, G)
w <- Estep
Mstep <- Mstep(X, G, w)
p <- Mstep$p
mu <- Mstep$mu
sigma <- Mstep$sigma
for (i in 2:itmax) {
Estep <- Estep(X, t(mu), p, sigma, G)
w <- Estep
Mstep <- Mstep(X, G, w)
p <- Mstep$p
mu <- Mstep$mu
sigma <- Mstep$sigma
}
loglik <- log.likihood(X, mu, p, sigma, G, w)
return(list(prob = p, mean = mu, Sigma = sigma, loglik = loglik))
}
In this subsection, the code is tested with the provided dataset, faithful.dat, with both \(G = 2\) and \(G = 3\).
## eruptions waiting
## 1 3.600 79
## 2 1.800 54
## 3 3.333 74
## 4 2.283 62
## 5 4.533 85
## 6 2.883 55
## 7 4.700 88
## 8 3.600 85
## 9 1.950 51
## 10 4.350 85
## 11 1.833 54
## 12 3.917 84
## 13 4.200 78
## 14 1.750 47
## 15 4.700 83
For the case when \(G = 2\), the initial values are set as follows:
\[ \frac{1}{n} \left[ \sum_{i=1}^{10} (x_i - \mu_1) (x_i - \mu_1)^t + \sum_{i=11}^{n} (x_i - \mu_2) (x_i - \mu_2)^t \right] \]
Here \(x_i - \mu_i\) is a 2-by-1 vector, so the resulting \(\Sigma\) matrix is a 2-by-2 matrix.
The EM implementation runs for 20 iterations. The results from
myEM
are expected to look like the following. (Even though
the algorithm has not yet reached convergence, matching the expected
results below serves as a validation that the code is functioning as
intended.)
prob <- c(0.04297883, 0.95702117)
mean <- matrix(c(3.495642, 76.797892, 3.48743, 70.63206), ncol=2, byrow=TRUE)
rownames(mean) <- c("eruptions", "waiting")
colnames(mean) <- c("[,1]", "[,2]")
Sigma <- matrix(c(1.297936, 13.924336, 13.92434, 182.58009), ncol=2, byrow=TRUE)
rownames(Sigma) <- c("eruptions", "waiting")
colnames(Sigma) <- c("eruptions", "waiting")
loglik <- -1289.569
list(prob = prob, mean = mean, Sigma = Sigma, loglik = loglik)
## $prob
## [1] 0.04297883 0.95702117
##
## $mean
## [,1] [,2]
## eruptions 3.495642 76.797892
## waiting 3.487430 70.632060
##
## $Sigma
## eruptions waiting
## eruptions 1.297936 13.924336
## waiting 13.924340 182.580090
##
## $loglik
## [1] -1289.569
Input parameters
Using the following input parameters:
G <- 2
n <- dim(X)[1]
p1 <- 10/n
p2 <- 1-p1
p <- c( p1 , p2)
mu1 <- colMeans(X[1:10,])
mu2 <- colMeans(X[11:nrow(X),])
mu <- t(cbind(mu1, mu2))
Xmatr=t(as.matrix(X));
Xmu <- matrix(0, nrow = nrow(Xmatr), ncol = ncol(Xmatr));
for (i in 1:ncol(Xmatr)) {
if (i < 11) {
Xmu[,i] = Xmatr[,i] - mu1;
} else {
Xmu[,i] = Xmatr[,i] - mu2;
}
}
sigma <- 1/n * (Xmu %*% t(Xmu))
list(G=G, n=n,p=p, mu=mu, sigma=sigma)
## $G
## [1] 2
##
## $n
## [1] 272
##
## $p
## [1] 0.036764706 0.963235294
##
## $mu
## eruptions waiting
## mu1 3.3032000 71.800000
## mu2 3.4948282 70.862595
##
## $sigma
## [,1] [,2]
## [1,] 1.2966385 13.93278
## [2,] 13.9327802 184.11270
, the code runs through 20 iterations, and the results are
displayed below:
## $prob
## [1] 0.042978831 0.957021169
##
## $mean
## [,1] [,2]
## [1,] 3.4956419 3.4874302
## [2,] 76.7978915 70.6320585
##
## $Sigma
## eruptions waiting
## eruptions 1.2979361 13.924336
## waiting 13.9243363 182.580092
##
## $loglik
## [1] -1289.5694
For the case when \(G = 3\), the initial values are set as follows:
Just like for the previous case, the code goes through 20 iterations.
The results from myEM
are expected to look like the
following.
## $prob
## [1] 0.04363422 0.07718656 0.87917922
##
## $mean
## [,1] [,2] [,3]
## eruptions 3.510069 77.105638 2.816167
## waiting 63.357526 3.545641 71.250848
##
## $Sigma
## eruptions waiting
## eruptions 1.260158 13.511538
## waiting 13.511540 177.964190
##
## $loglik
## [1] -1289.351
Input parameters
With the following input parameters:
G <- 3
n <- dim(X)[1]
p1 <- 10/n
p2 <- 20/n
p3 <- 1-p1-p2
p <- c( p1 , p2, p3)
mu1 <- colMeans(X[1:10,])
mu2 <- colMeans(X[11:30,])
mu3 <- colMeans(X[31:nrow(X),])
mu <- t(cbind(mu1, mu2,mu3))
Xmatr=t(as.matrix(X));
Xmu <- matrix(0, nrow = nrow(Xmatr), ncol = ncol(Xmatr));
for (i in 1:ncol(Xmatr)) {
if (i < 11) {
Xmu[, i] = Xmatr[, i] - mu1;
} else if (i >= 11 && i <= 30) {
Xmu[, i] = Xmatr[, i] - mu2;
} else {
Xmu[, i] = Xmatr[, i] - mu3;
}
}
sigma <- 1/n * (Xmu %*% t(Xmu))
list(p=p, mu=mu, simga=sigma)
## $p
## [1] 0.036764706 0.073529412 0.889705882
##
## $mu
## eruptions waiting
## mu1 3.3032000 71.800000
## mu2 3.1750000 68.250000
## mu3 3.5212603 71.078512
##
## $simga
## [,1] [,2]
## [1,] 1.2884955 13.866263
## [2,] 13.8662627 183.569332
, the implementation returns:
## $prob
## [1] 0.043634215 0.077186562 0.879179223
##
## $mean
## [,1] [,2] [,3]
## [1,] 3.5100692 2.8161667 3.5456408
## [2,] 77.1056381 63.3575263 71.2508480
##
## $Sigma
## eruptions waiting
## eruptions 1.2601577 13.511538
## waiting 13.5115376 177.964191
##
## $loglik
## [1] -1289.351
In this section, the Baum-Welch (i.e., EM) algorithm and the Viterbi algorithm are implemented from scratch for a Hidden Markov Model (HMM) that produces an outcome sequence of discrete random variables with three distinct values.
A Quick Review on Parameters for Discrete HMM:
m_x
: Count of distinct values \(X\) can take.m_z
: Count of distinct values \(Z\) can take.w
: An \(m_z\)-by-1
probability vector representing the initial distribution for \(Z_1\).A
: The \(m_z\)-by-\(m_z\) transition probability matrix that
models the progression from \(Z_t\) to
\(Z_{t+1}\).B
: The \(m_z\)-by-\(m_x\) emission probability matrix,
indicating how \(X\) is produced from
\(Z\).The focus is on updating the parameters A
and
B
in your algorithm. The value for m_x
is
given and you’ll specify m_z
.
The parameter w
is generated uniformly, but is not
updated in this code. The reason for this is that w
denotes
the distribution of Z_1
and there is only a single sample.
It’s analogous to estimating the likelihood of a coin toss resulting in
heads by only tossing it once. Given the scant information and the
minimal influence on the estimation of other parameters, this
implementation skips updating it.
The Baum-Welch Algorithm is the EM algorithm for the HMM. A function
named BW.onestep
is designed to carry out the E-step and
M-step. This function is then called iteratively within
myBW
.
BW.onstep
data
: a \(T\)-by-1
sequence of observationsA
and B
BW.onestep <- function(data, mx, mz, w, A, B) {
### Notes ###
## INPUT:
# **data** is a [T,1] vector, that contains the value of X at each timestep T
# **mx** is the count of distinct values X can take.
# **mz** is the count of distinct values Z can take.
# **w** is a [mz,1] vector that represents the initial distribution for _Z1_
# **A** is the TRANSITION MATRIX [mz,mz], and represents the probabilities of going from a state Z_t to another state Z_t+1
# **B** is the EMISSION MATRIX [mz,mx], and represents the probabilities of getting a value of X from a value of Z.
## This function calculates:
# **Forward probabilities** matrix alpha [T,mz]
# **Backward probabilities** matrix alpha [T,mz]
# **Gamma**
## and updates:
# **A**
# **B**
##### Implementation ######################################################
##### Step 0: Definition of parameters #####
T <- length(data)
##### Step 1: Calculate **alpha** ############################################
# Initialize the alpha matrix
alpha <- matrix(0, nrow=T, ncol=mz)
# For T=1
for (i in 1:mz) {
alpha[1, i] <- w[i] * B[i, data[1]]
}
# For T=2:T, recursive computation
for (t in 1:(T-1)) {
for (i in 1:mz) {
alpha[t+1, i] <- sum(alpha[t, ] * A[, i]) * B[i, data[t+1]]
}
}
##### Step 2: Calculate **beta** #############################################
# Initialize the beta matrix
beta <- matrix(0, nrow=T, ncol=mz)
# For T=1
beta[T, ] <- rep(1, mz)
# For T=T-1:1, recursive computation going backwards! (see/run (T-1):1)
for (t in (T-1):1) {
for (i in 1:mz) {
beta[t, i] <- sum(A[i, ] * B[, data[t+1]] * beta[t+1, ])
}
}
##### Step 3: Calculate **gamma** ############################################
# Initialize the gamma data structures
gamma <- array(0, dim = c(mz,mz,T-1))
gamma.i <- matrix(0, nrow=T, ncol=mz)
for (t in 1:(T-1)) {
#GAMMA
xx <- matrix(0, nrow=mz, ncol=mz)
for (i in 1:mz){
for (j in 1:mz){
xx[i,j] <- alpha[t,i]*A[i,j]*B[j, data[t+1]]*beta[t+1,j]
}
}
gamma[,,t] <- xx/sum(xx)
#GAMMA.I
for (i in 1:mz){
gamma.i[t,i] <- sum(gamma[i,,t])
}
}
for (i in 1:mz){
gamma.i[T,i] <- sum(gamma[,i,t])
}
##### Step 4: Update **A** ###################################################
# Initialize the new A matrix
new.A <- matrix(0, nrow=mz, ncol=mz)
# Numerator
numerator <- matrix(0, nrow=mz, ncol=mz)
for (t in 1:(T-1)){
numerator <- numerator + gamma[,,t]
}
# Denominator
denominator <- matrix(0, nrow=mz, ncol=mz)
for (i in 1:mz){
for (j in 1:mz){
denominator[i,j] <- sum(gamma.i[1:(T-1), i])
}
}
new.A <- numerator/denominator
##### Step 5: Update **B** ###################################################
# Initialize the new B matrix
new.B <- matrix(0, nrow=mz, ncol=mx)
for (i in 1:mz){
for (k in 1:mx){
# Numerator: Sum of gamma_i(t) where data[t] = k
num <- sum(gamma.i[data == k, i])
# Denominator: Sum of gamma_i(t) over all t
denom <- sum(gamma.i[, i])
new.B[i, k] <- num / denom
}
}
return(list(updated.A=new.A, updated.B=new.B))
}
Again, the myBW
function that repeatedly calls the
BW.onestep
to update the A and
B parameters after a defined number of iterations:
myBW <- function(data, mx, mz, w, A, B, maxIterations) {
updated.params <- BW.onestep(data, mx, mz, w, A, B)
# Progress
taco = progress_bar$new(total = maxIterations-1, format = "[:bar] [:percent] [Elapsed time: :elapsedfull] [ETA: :eta]", clear = FALSE)
for (i in 2:maxIterations) {
updated.params = BW.onestep(data, mx, mz, w, updated.params$updated.A, updated.params$updated.B)
taco$tick()
}
A = updated.params$updated.A
B = updated.params$updated.B
return(list(updated.A=A, updated.B=B))
}
This algorithm outputs the most likely latent sequence considering the data and the MLE of the parameters.
For the myViterbi
function:
data
: a \(T\)-by-1
sequence of observationsparameters
: \(m_x\),
\(m_z\), \(w\), \(A\)
and \(B\)Z
: a \(T\)-by-1
sequence where each entry is a number ranging from 1 to
m_z
.Note on Calculations in Viterbi:
Many computations in HMM are based on the product of a sequence of probabilities, resulting in extremely small values. At times, these values are so small that software like R or Python might interpret them as zeros. This poses a challenge, especially for the Viterbi algorithm, where differentiating between magnitudes is crucial. If truncated to zero, making such distinctions becomes impossible. Therefore, these probabilities are evaluated on a logarithmic scale in the Viterbi algorithm.
myViterbi <- function(data, mx, mz, w, A, B) {
# Implementation
# Step 0: Definition of parameters
T <- length(data)
delta <- matrix(0, nrow=T, ncol=mz)
psi <- matrix(0, nrow=T, ncol=mz)
Z <- numeric(T)
## **DELTA**
# For T=1
delta[1,] <- w * B[,data[1]]
# For T=2:T
for (t in 2:T) {
for (j in 1:mz) {
temp <- log(delta[t-1,] * A[,j])
delta[t,j] <- exp(max(temp)) * B[j,data[t]]
psi[t,j] <- which.max(temp)
}
}
## **Z**
# For Z at T=T
prob <- exp(max(log(delta[T,])))
Z[T] <- which.max(delta[T,])
# For Z at T=1:T-1, backtracking!
for (t in (T-1):1) {
Z[t] <- psi[t+1,Z[t+1]]
}
return(list(Z=Z, prob=prob))
}
The code is put to test with the provided data sequence: data.txt . To initialize, we set \(m_z = 2\) and start with the following initial values:
\[ w = \left( \begin{array}{cc} 0.5 & 0.5 \end{array} \right) ,\] \[ A = \left( \begin{array}{cc} 0.5 & 0.5 \\ 0.5 & 0.5 \end{array} \right) ,\] \[ B = \left( \begin{array}{ccc} \frac{1}{9} & \frac{1}{6} & \frac{3}{9} \\ \frac{1}{6} & \frac{5}{9} & \frac{3}{6} \end{array} \right) \]
The implemented algorithms run through 100 iterations. The results from the implementation of the Baum-Welch algorithm should match with the following:
A_result <- matrix(c(0.49793938, 0.44883431, 0.50206062, 0.55116569), ncol=2)
B_result <- matrix(c(0.22159897, 0.34175148, 0.20266127, 0.17866665, 0.57573976, 0.47958186), nrow=2)
list(A = A_result, B = B_result)
## $A
## [,1] [,2]
## [1,] 0.49793938 0.50206062
## [2,] 0.44883431 0.55116569
##
## $B
## [,1] [,2] [,3]
## [1,] 0.22159897 0.20266127 0.57573976
## [2,] 0.34175148 0.17866665 0.47958186
The output from your Viterbi algorithm implementation should align with the following benchmarks. The results are cross-checked against the complete binary sequence available in [Coding4_part2_Z.txt].
## [1] 2 3 3 3 3 3 3 1 3 3 3 3 3 1 1 3 3 3 2 3 3 3 1 3 1 1 1 2 2 3 3 3 3 3 1 3 3
## [38] 2 2 3 3 3 3 1 1 3 3 3 2 3 3 1 3 1 3 3 3 2 1 1 1 1 3 3 3 3 3 3 3 3 1 1 1 3
## [75] 3 1 2 1 3 1 1 3 1 1 2 1 3 1 1 2 1 3 3 3 3 1 3 3 3 3 3 3 2 3 3 2 3 2 3 1 3
## [112] 3 3 3 1 2 2 3 3 3 3 1 2 2 2 1 3 1 3 1 1 3 1 2 3 3 3 2 2 3 1 3 1 2 1 2 1 3
## [149] 1 3 1 3 3 3 1 1 3 2 3 1 3 3 2 3 3 1 1 1 2 3 1 1 2 1 3 3 3 1 2 1 3 2 3 3 3
## [186] 3 3 2 1 2 2 3 1 2 3 2 3 2 2 2
The dataset has 200 observations, and the X values are
distributed as:
## data
## 1 2 3
## 57 38 105
The following are the input parameters to test the HMM algorithms:
mx <- nrow(table(data))
mz <- 2
w <- c(0.5, 0.5)
A <- matrix(0.5, nrow=mz, ncol=mz)
B <- matrix(c(1/9, 3/9, 5/9,
1/6, 2/6, 3/6),
nrow=2, byrow=TRUE)
list(mx=mx, mz=mz, A=A,B=B)
## $mx
## [1] 3
##
## $mz
## [1] 2
##
## $A
## [,1] [,2]
## [1,] 0.5 0.5
## [2,] 0.5 0.5
##
## $B
## [,1] [,2] [,3]
## [1,] 0.11111111 0.33333333 0.55555556
## [2,] 0.16666667 0.33333333 0.50000000
Through 100 iterations, the implementation of the Baum-Welch algorithm is able to achieve:
maxIterations <- 100
# Progress
#taco = progress_bar$new(total = maxIterations, format = "[:bar] [:percent] [Elapsed time: :elapsedfull] [ETA: :eta]", clear = FALSE)
#taco$tick()
my.params <- myBW(data, mx, mz, w, A, B, maxIterations)
my.params
## $updated.A
## [,1] [,2]
## [1,] 0.49793938 0.50206062
## [2,] 0.44883431 0.55116569
##
## $updated.B
## [,1] [,2] [,3]
## [1,] 0.22159897 0.20266127 0.57573976
## [2,] 0.34175148 0.17866665 0.47958186
To compare this to some benchmark results, the baumWelch
algorithm from the HMM
library allows to update the
parameters A and B as follows:
# Using the library(HMM), function baumWelch()
HMM <- initHMM(c("Z.1", "Z.2"), c("1", "2", "3"), startProbs=w, transProbs=A, emissionProbs=B)
bw_result <- baumWelch(HMM, data, maxIterations)
print(bw_result$hmm$transProbs)
## to
## from Z.1 Z.2
## Z.1 0.49793938 0.50206062
## Z.2 0.44883431 0.55116569
## symbols
## states 1 2 3
## Z.1 0.22159897 0.20266127 0.57573976
## Z.2 0.34175148 0.17866665 0.47958186
, which are the same values referred to in the “Test” subsection.
To test the accuracy of the implemented function, the two A and
B matrices (obtained from the implementation and from the library HMM)
are subtracted from one another:
## to
## from Z.1 Z.2
## Z.1 4.9354965e-13 -4.9282800e-13
## Z.2 -1.6447954e-13 1.6431301e-13
## symbols
## states 1 2 3
## Z.1 -2.8227420e-14 -2.8962943e-13 3.1785685e-13
## Z.2 6.2394534e-14 2.2884472e-13 -2.9121150e-13
With the implemented Viterbi algorithm, the hidden states sequence results in:
## [1] 1 1 1 1 1 1 1 2 1 1 1 1 1 2 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1 2 1 1
## [38] 1 1 1 1 1 1 2 2 1 1 1 1 1 1 2 2 2 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2
## [75] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1
## [112] 1 1 1 2 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
## [149] 2 2 2 1 1 1 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 1 1 1 2 2 2 1 1 1 1 1
## [186] 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1
## Z
## 1 2
## 112 88
Similar to the test of the BW implementation, the
viterbi
algorithm from the HMM
library allows
to get the most likely possible sequence of hidden states Z (which is
the sequence referenced in the “test” subsection earlier:
# Using the library(HMM), function viterbi()
HMM.Z <- initHMM(c("Z1", "Z2"), c("1", "2", "3"), startProbs=w, transProbs=my.params$updated.A, my.params$updated.B)
Z.hmm <- viterbi(HMM.Z, data)
Z.hmm.encoded <- as.integer(factor(Z.hmm, levels=c("Z1", "Z2")))
Z.hmm.encoded
## [1] 1 1 1 1 1 1 1 2 1 1 1 1 1 2 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1 2 1 1
## [38] 1 1 1 1 1 1 2 2 1 1 1 1 1 1 2 2 2 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2
## [75] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1
## [112] 1 1 1 2 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
## [149] 2 2 2 1 1 1 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 1 1 1 2 2 2 1 1 1 1 1
## [186] 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1
## Z.hmm.encoded
## 1 2
## 112 88
To test the accuracy of the implemented function, the two Z
vectors are subtracted from one another:
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0