- DAL can handle your favorite (convex, smooth) loss functions (squared loss, logistic loss, etc).
- DAL can handle several "sparsity" measures in an unified way. Currently L1, grouped L1, nuclear norm (trace norm), and non-negative L1 are supported.
- DAL is efficient when m≪n (m: #samples, n: #unknowns) or the matrix A is poorly conditioned.
- DAL is fast when the solution is sparse but the matrix A can be dense.
- DAL is written in MATLAB.

DAL solves the dual problem of (1) via the augmented Lagrangian method (see Bertsekas 82). It uses the analytic expression (and its derivatives) of the following soft-thresholding operation:

which can be computed for L1 and grouped L1 (and many other) sparsity inducing regularizers. If you are interested in our algorithm please find more details in our JMLR paper or in my talk at Optimization for Machine Learning Workshop (NIPS 2009).

- DAL (ver 1.1) ([github])
- DAL (ver 1.05) (released: 2011/5/4; size: 21,429 bytes)
- DAL (ver 1.01) (released: 2009/12/12; size: 13,791 bytes)
- DAL (ver 0.97) (released: 2009/04/13; size: 15,896 bytes)

For L1 regularized regression (aka lasso):

```
[xx,status]=dalsql1(zeros(n,1), A, bb, lambda);
```

`A`

(m x n) is the design matrix, `bb`

(m x 1) is the target vector, and `lambda`

is the regularization constant. `xx`

is the regression coefficient (solution) and the first argument of `dalsql1`

is its initial value.
For grouped L1 regularized regression (aka group lasso):

```
[xx,status]=dalsqgl(zeros(ns,nc), A, bb, lambda);
```

`A`

(m x n or m x ns x nc with ns*nc=n), `bb`

(m x 1), and `lambda`

are defined as above. `xx`

is the regression coefficient (solution) and the first argument of `dalsqgl`

is its initial value.
When the initial `xx`

is given as a matrix (as above), `dalsqgl`

assumes `nc`

groups of size `ns`

. In order to specify non-equal-sized groups, you can specify the option `blks`

as follows:

```
[xx,status]=dalsqgl(zeros(n,1), A, bb, lambda, 'blks', [10 10 20]);
```

`xx`

is a (n x 1)-vector. Note that you might need to normalize the data appropriately to make groups of different sizes comparable to each other.
For L1 regularized logistic regression:

```
[xx,bias,status]=dallrl1(zeros(n,1), 0, A, yy, lambda);
```

`A`

(m x n) is the design matrix, `yy`

(m x 1) is the target vector (-1 or +1), and `lambda`

is the regularization constant. Note that the bias term is not regularized. `xx`

(n x 1) and `bias`

are the classifier weight vector and the bias, respectively, and the first two arguments of `dalsql1`

are their initial values.
See more examples on the github page.

DAL is provided through the MIT license, i.e., free to include this software in any (non-open) project as long as the copyright information is kept. We kindly ask you to cite our paper when you publish something based on this software.

Copyright (c) 2009 Ryota Tomioka

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

I am happy to receive any kind of feedbacks (bugs, comments, etc). E-mail: tomiokamist.i.u-tokyo.ac.jp

- Augmented Lagrangian Methods for Learning, Selecting, and Combining Features. In Suvrit Sra, Sebastian Nowozin, and Stephen J. Wright, editors, Optimization for Machine Learning. MIT Press, 2011.
- A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices. Ryota Tomioka, Taiji Suzuki, Masashi Sugiyama, and Hisashi Kashima, Proc. of the 27 th Annual International Conference on Machine Learning (ICML 2010), Haifa, Israel, 2010. [Slides] [Support page]
- "Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparse Learning", Ryota Tomioka, Taiji Suzuki, and Masashi Sugiyama, arXiv:0911.4046, 2009. [Support Page]
- "Dual Augmented Lagrangian Method for Efficient Sparse Reconstruction", Ryota Tomioka & Masashi Sugiyama, IEEE Signal Proccesing Letters, 16 (12) pp. 1067-1070, 2009. [preprint]

- High-Dimensional Feature Selection by Feature-Wise Non-Linear Lasso. Makoto Yamada, Wittawat Jitkrittum, Leonid Sigal, and Masashi Sugiyama, arXiv:1202.0515v1 [stat.ML], 2012.
- Multi-population GWA mapping via multi-task regularized regression. Kriti Puniyani, Seyoung Kim and Eric P. Xing, Bioinformatics, 26 (12): pp. i208-i216, 2010.
- Large-scale EEG/MEG source localization with spatial flexibility. S. Haufe, R. Tomioka, T. Dickhaus, C. Sannelli, B. Blankertz, G. Nolte, K.-R. Müller, Neuroimage, 2010.
- Modeling sparse connectivity between underlying brain sources for EEG/MEG. Stefan Haufe, Ryota Tomioka, Guido Nolte, Klaus-Robert Müller, and Motoaki Kawanabe, IEEE Trans. Biomed. Eng. 57(8), pp. 1954-1963, 2010.