SCD is a C++ implementation of (a variant of) the stochastic coordinate descent algorithm proposed in
It can be used for l1-regularized loss minimization for both classification and regression problems. This means that it (approximately) solves the following minimization problem:
minw (1/m) ∑i=1, … , m L(wT xi, yi) + λ|w|1
Currently supported loss functions are the logistic loss [L(a,b) = log(1+exp(-ab))] and the squared loss [L(a,b) = (a-b)2]. The xi's are example vectors with d coordinates xi,1, … xi,d. The yi's are labels that are either ±1 (classification) or arbitrary floats (regression). The logistic loss only makes sense in a classification context but the squared loss can be used in either. λ is the regularization parameter. SCD is designed to run fast even for large values of m and d and can exploit the sparsity in the examples (i.e. if a lot of coordinates in the example vectors are zero).
The source code is released under the GNU General Public License. The authors are not responsible for any consequences arising out of the use of the code, so please use it at your own risk. We do, however, appreciate receiving bug reports from you. Please direct them to Ambuj Tewari via email.
If you use SCD in your research, please cite it as
After downloading the file scd.tar.gz into a suitable directory, unpack it by typing
tar zxvf scd.tar.gz
This will create a directory called SCD with the following 9 files in it:
This will create an executable named scd (plus a couple of object files).
./scdand the following usage message is displayed:
USAGE: ./scd [options] <data-file> <label-file> L1 regularized loss minimization using stochastic coordinate descent OPTIONS: -lambda regularization parameter (default = 0.001) -loss 0 for logistic, 2 for squared (default = 0) -iters number of iterations (default = 1000) -printout after how many iterations to print w (default 1000)All 4 command line options are optional and receive their default values indicated above if they are not supplied by the user. The options lambda, loss and iters set the regularization parameter, the loss function to use, and the number of iterations for which the stochastic coordinate descent algorithm will be run, respectively. The printout option selects the interval after which a summary of current state of the optimization will be printed on the standard output. The following information about the current weight vector w is included in this summary:
Conceptually, we often think of the data as a huge m x d matrix with the examples sitting as rows of this matrix. Viewed this way, scd expects its input data file to be in a sparse column major format as described below:
Line 1: m d Line 2: i1 v1 i2 v2 ... ⋮ Line d+1: i1 v1 i2 v2 ...
The first line tells scd that there are m example each with d features or coordinates. The second line gives the values for feature 1 for all those examples where feature 1 is non-zero. The pair i1 v1 means that example (i1+1) has value v1. Note that, following C/C++ convention, the indices i1, i2, ... start from zero. For example, if feature 1 is non-zero only for examples 1, 3 and 10 and has values 0.2, 0.9 and -0.7 for these, then line 2 would read:
0 0.2 2 0.9 9 -0.7The example indices need not appear in increasing order. It is assumed that the values v1, v2, ... lie in the interval [-1,1]. You might have to do some preprocessing of your data to ensure this.
Line 3 then gives values for feature 2, Line 4 for feature 3, and so on till Line d+1.
0 0 0 0.2 0.2 0 0 0.3 0 -0.5 0.4 0.6 0 0 0The labels for the 3 examples are 1,1 and -1 respectively. Now, run scd with logistic loss and the default value (0.001) of the regularization parameter by typing
./scd -iters 50000 -printout 5000 test_examples test_labelsThis runs scd for 50,000 iterations printing a summary line every 5,000 iterations. You should see an output that looks something like
0 11972 41.2552 5 0.0164818 0.057737 0 10 24054 43.7355 5 0.012187 0.0559226 0 10 36020 44.4665 4 0.0109725 0.055439 0 20 47932 44.6602 3 0.0104728 0.055133 0 20 59888 44.9534 3 0.0101683 0.0551217 0 30 71858 45.0208 3 0.0101003 0.0551212 0 30 83900 45.0446 3 0.0100765 0.0551211 0 40 95908 45.0546 3 0.0100664 0.055121 0 50 107974 45.0588 3 0.0100623 0.055121 0 50 119818 45.0607 3 0.0100603 0.055121 0The actual output might be slightly different due to the randomness in the algorithm but the objective function should converge to 0.05512 for this toy example.