Notes on the making of this code/R markdown file, and total run-time are presented at the end of this document. To download the data and the source code, click here.
The following is an implementation of a linear Support Vector Machine (SVM) classifier from scratch using stochastic gradient descent (SGD).
Traditionally, SVMs often solve the dual problem, which involves a quadratic objective function subject to linear constraints. While this approach can be efficient for small-scale tasks, it becomes less practical for large-scale problems. In such cases, it is convenient to leverage the benefits of SGD to directly solve the primal problem.
The SGD algorithm works as follows:
The Pegasos (Primal Estimated sub-GrAdient SOlver for SVM) algorithm, proposed by Shalev-Shwartz et al. (2011) Paper Link, is an application of SGD.
In SVM, the primal problem of linear SVM can be expressed as the following the Loss + Penalty format:
\[ \frac{\lambda}{2} \parallel \boldsymbol{\beta} \parallel^2 + \frac{1}{n} \sum_{i=1}^{n} \left[ 1 - y_i (x_i^t \beta + \alpha) \right]^{+} \]
, where α is the intercept and β is the p-dimensional coefficient vector.
The Pegasos Algorithm can be summarized as follows:
Here ηt is the learning rate, and Δt and δt are the (sub)gradient of Ji(β,α) when β=βt and α=αt:
\[ J_i(\beta, \alpha) = \frac{\lambda}{2}{\| \boldsymbol{\beta}\|^2} + \big [ 1 - y_i(x_i^t \beta + \alpha) \big]_+ \]
\[ \Delta_t = \left \{ \begin{array}{ll} \lambda \beta_t - y_i x_i & \text{ if } \ y_i(x_i^t \beta_t + \alpha_t) < 1 \\ \lambda \beta_t & \quad \text{otherwise} \end{array} \right. \]
\[ \delta_t = \left \{ \begin{array}{ll} - y_i & \text{ if } \ y_i(x_i^t \beta_t + \alpha_t) < 1 \\ 0 & \quad \text{otherwise} \end{array} \right. \]
In this implementation of the Pegasos algorithm, for the SVM classifier:
# Generate some sample data
set.seed(0443) #Last four digits of Farid Saud's UIN
# Pegasos algorithm
pegasos_algorithm <- function(X, y, lambda , epochs) {
n <- nrow(X)
p <- ncol(X)
# Binary coding for Y, using the smaller label as the -1 case
y <- ifelse(y == unique(y)[1], -1, 1)
# Initialize parameters
beta <- numeric(p)
alpha <- 0
t=0
# Update the parameters
for (epoch in 1:epochs) {
#set.seed(epoch) # Randomize the order of data points
idx <- sample(1:n, n, replace = FALSE)
for (i in idx) {
t <- t + 1
eta = 1/(lambda*t)
xi <- X[i, ] # shuffling
yi <- y[i]
# calculating the gradient and delta
if (yi*(beta %*% xi+alpha) < 1) {
gradient <- lambda*beta - yi * xi
delta <- -yi
} else {
gradient <- lambda*beta
delta <- 0
}
# update β and alpha
beta <- beta - eta*gradient
alpha <- alpha - eta*delta
}
}
return(list(beta = beta, alpha = alpha))
}
To test the implemented linear SVM classifier, a subset of the MNIST data: specifically, training (200 samples) and test (600 samples) datasets of 5s and 6s, is used. Each dataset consists of 257 columns, with the first 256 columns representing the features, and the last column indicating the label.
The training data is loaded
and subdivided into \(X_{train}\) and
\(Y_{train}\):
train <- read.csv("coding5_train.csv", header = TRUE)
X_train<-train[,1:256]
y_train <- train[,257]
table(y_train)
## y_train
## 5 6
## 100 100
, and similarly, the testing
data is loaded and subdivided into \(X_{test}\) and \(Y_{test}\): :
test <- read.csv("coding5_test.csv", header = TRUE)
X_test <- test[,1:256]
y_test <- test[,257]
table(y_test)
## y_test
## 5 6
## 300 300
Tables summarizing the label count for both the training and
testing data are presented as well.
To understand if the implementation is working correctly, it may be useful looking at how the algorithm works with the training data. With the weights obtained from the algorithm, and the \(X_{train}\) matrix, it is possible to classify \(Y_{train}\) as:
# Apply Pegasos algorithm to the train dataset
result_train <- pegasos_algorithm(as.matrix(X_train), y_train,lambda = 1 , epochs =20)
predicted_labels_train <- sign( as.matrix(X_train) %*% result_train$beta + result_train$alpha)
labels_train <- ifelse(predicted_labels_train < 0, 5, ifelse(predicted_labels_train > 0, 6, NA))
table(labels_train)
## labels_train
## 5 6
## 98 102
In a similar way, now with unseen data: \(X_{test}\), the algorithm can be used to
predict the labels of the test data as:
# Predict the test labels using the SVM modelesd with the train dataset the test dataset
predicted_labels_test <- sign( as.matrix(X_test) %*% result_train$beta + result_train$alpha)
labels_test <- ifelse(predicted_labels_test < 0, 5, ifelse(predicted_labels_test > 0, 6, NA))
table(labels_test)
## labels_test
## 5 6
## 293 307
With the results obtained in the previous section, confusion tables are presented for the training and testing datasets:
## Predicted
## Actual 5 6
## 5 98 2
## 6 0 100
## Predicted
## Actual 5 6
## 5 285 15
## 6 8 292
The miss-classified cases can be obtained by comparing the real
labels with the predicted labels, and these are:
## [1] 23
Finally, this implementation of a linear SVM classifier, using
the Pegasos algorithm, achieved a test error of:
## [1] "3.83%"