About Me
I am a research assistant professor at TTI Chicago, a philanthropically endowed academic computer science institute on the University of Chicago campus. I obtained my Ph.D. under the supervision of Prof. Rong Jin from Michigan State University in 2014. Before joining MSU at 2009, I spent two years as a Ph.D. student at Sharif University of Technology. I received my M.Sc in Computer Engineering department at Sharif University of Technology where my advisor was Prof. Mohammad Ghodsi, and my BS from Amirkabir University of Technology (Tehran Polytechnic). In the past I have worked at Microsoft Reaserch and NEC Laboratories America as research intern.
Research Interests
I am broadly interested in machine learning and design and analysis of algorithms with a focus on:

Online Learning and Online (Stochastic) Convex Optimization

Large Scale Machine Learning and Big Data

Statistical and Computational Learning Theory

Convex and Combinatorial Optimization Theory
Selected Papers [ more]
Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization [Errata] [Abstract]
with Lijun Zhang and Rong Jin
Conference on Learning Theory (COLT), 2015.
In this paper we derive \textit{high probability} lower and upper bounds on the excess risk of stochastic optimization of exponentially concave loss functions. Exponentially concave loss functions encompass several fundamental problems in machine learning such as squared loss in linear regression, logistic loss in classification, and negative logarithm loss in portfolio management. We demonstrate an $O(d \log T/T)$ upper bound on the excess risk of stochastic online Newton step algorithm, and an $O(d/T)$ lower bound on the excess risk of general stochastic exponentially concave optimization methods, indicating that the obtained upper bound is optimal up to a logarithmic factor. The analysis of upper bound is based on a novel concentration inequality for bounding martingales, which is interesting by its own right, and the proof technique used to achieve the lower bound is a probabilistic method and relies on an informationtheoretic minimax analysis.
Random Projections for Classification: A Recovery Approach [Abstract]
with Lijun Zhang, Rong Jin, Tianbao Yang, and Shenghuo Zhu
IEEE Transactions on Information Theory, 2014.
Random projection has been widely used in data classification. It maps highdimensional data into a lowdimensional subspace in order to reduce the computational cost in solving the related optimization problem. While previous studies are focused on analyzing the classification performance in the lowdimensional space, in this work, we consider the {\it recovery problem}, i.e., how to accurately recover the optimal solution to the original highdimensional optimization problem based on the solution learned after random projection. We present a simple algorithm, termed {\it Dual Random Projection}, which uses the dual solution of the lowdimensional optimization problem to recover the optimal solution to the original problem. Our theoretical analysis shows that with a high probability, the proposed algorithm is able to accurately recover the optimal solution to the original problem, provided that the data matrix is (approximately) lowrank and/or the optimal solution is (approximately) sparse. We further show that the proposed algorithm can be applied iteratively to reducing the recovery error exponentially.
Exploiting Smoothness in Statistical Learning, Sequential Prediction, and Stochastic Optimization [Slides]
PhD Thesis, 2014.
Regret Bounded by Gradual Variation for Online Convex Optimization [Abstract]
with Tianbao Yang, Rong Jin, and Shenghuo Zhu
Machine Learning Journal, 2013.
In \citep{Hazan2008extract}, the authors showed that the regret of online linear optimization can be bounded by the total variation of the cost vectors. In this paper, we extend this result to general online convex optimization. We first analyze the limitations of the algorithm in \citep{Hazan2008extract} when applied it to online convex optimization. We then present two algorithms for online convex optimization whose regrets are bounded by the variation of cost functions. We finally consider the bandit setting, and present a randomized algorithm for online bandit convex optimization with a variationbased regret bound. We show that the regret bound for online bandit convex optimization is optimal when the variation of cost functions is independent of the number of trials.
Mixed Optimization for Smooth Functions [Abstract] [SUPP]
with Lijun Zhang and Rong Jin
Advances in Neural Information Processing Systems (NIPS), 2013.
It is well known that the optimal convergence rate for stochastic optimization of smooth functions is $O(1/\sqrt{T})$, which is same as stochastic optimization of Lipschitz continuous convex functions. This is in contrast to optimizing smooth functions using full gradients, which yields a convergence rate of $O(1/T^2)$. In this work, we consider a new setup for optimizing smooth functions, termed as {\bf Mixed Optimization}, which allows to access both a stochastic oracle and a full gradient oracle. Our goal is to significantly improve the convergence rate of stochastic optimization of smooth functions by having an additional small number of accesses to the full gradient oracle. We show that, with an $O(\ln T)$ calls to the full gradient oracle and an $O(T)$ calls to the stochastic oracle, the proposed mixed optimization algorithm is able to achieve an optimization error of $O(1/T)$.
Stochastic Convex Optimization with Multiple Objectives [Abstract]
with Tianbao Yang and Rong Jin
Advances in Neural Information Processing Systems (NIPS), 2013.
In this paper, we are interested in the development of efficient algorithms for convex
optimization problems in the simultaneous presence of multiple objectives and stochasticity in the firstorder information. We cast the stochastic multiple objective optimization problem into a constrained optimization problem by choosing one function as the objective and try to bound other objectives by appropriate thresholds. We first examine a two stages explorationexploitation based algorithm which first approximates the stochastic objectives by sampling and then solves a constrained stochastic optimization problem by projected gradient method. This method attains a suboptimal convergence rate even under strong assumption on the objectives. Our second approach is an efficient primaldual stochastic algorithm. It leverages on the theory of Lagrangian method in constrained optimization and attains the optimal convergence rate of $O(1/ \sqrt{T})$ in high probability for general Lipschitz continuous objectives.
Passive Learning with Target Risk [Abstract] [Slides]
with Rong Jin
Conference on Learning Theory (COLT), 2013.
In this paper we consider learning in passive setting but with a slight modification. We assume that the target expected loss, also referred to as target risk, is provided in advance for learner as prior knowledge. Unlike most studies in the learning theory that only incorporate the prior knowledge into the generalization bounds, we are able to explicitly utilize the target risk in the learning process. Our analysis reveals a surprising result on the sample complexity of learning: by exploiting the target risk in the learning algorithm, we show that when the loss function is both strongly convex and smooth, the sample complexity reduces to $\O(\log (\frac{1}{\epsilon}))$, an exponential improvement compared to the sample complexity $\O(\frac{1}{\epsilon})$ for learning with strongly convex loss functions. Furthermore, our proof is constructive and is based on a computationally efficient stochastic optimization algorithm for such settings which demonstrate that the proposed algorithm is practically useful.
Stochastic Gradient Descent with Only One Projection [Abstract] [SUPP]
with Tianbao Yang, Rong Jin and S. Zhu
Advances in Neural Information Processing Systems (NIPS), 2012.
Although many variants of stochastic gradient descent have been proposed for largescale convex optimization, most of them require projecting the solution at {\it each} iteration to ensure that the obtained solution stays within the feasible domain. For complex domains (e.g., positive semidefinite cone), the projection step can be computationally expensive, making stochastic gradient descent unattractive for largescale optimization problems. We address this limitation by developing a novel stochastic gradient descent algorithm that does not need intermediate projections. Instead, only one projection at the last iteration is needed to obtain a feasible solution in the given domain. Our theoretical analysis shows that with a high probability, the proposed algorithms achieve an $O(1/\sqrt{T})$ convergence rate for general convex optimization, and an $O(\ln T/T)$ rate for strongly convex optimization under mild conditions about the domain and the objective function.
Online Optimization with Gradual Variations [Abstract]
with ChaoKai Chiang, Tianbao Yang, C. J. Lee, C. J. Lu, Rong Jin and S. Zhu
Conference on Learning Theory (COLT), 2012.
This paper is a merge between two independent COLT submissions that both applied the the same idea to obtain regret bounded by gradual variation. Please see the Commentary written by Satyen Kale.
Winner of the Mark Fulk Best Student Paper Award
We study the online convex optimization problem, in which an online algorithm
has to make repeated decisions with convex loss functions and hopes to achieve
a small regret. We consider a natural restriction of this problem in which the
loss functions have a small deviation, measured by the sum of the distances between
every two consecutive loss functions, according to some distance metrics. We show that
for the linear and general smooth convex loss functions, an online algorithm modified
from the gradient descend algorithm can achieve a regret which only scales as the square
root of the deviation. For the closely related problem of prediction with expert advice,
we show that an online algorithm modified from the multiplicative update algorithm can also
achieve a similar regret bound for a different measure of deviation. Finally, for loss
functions which are strictly convex, we show that an online algorithm modified from the
online Newton step algorithm can achieve a regret which is only logarithmic in terms of
the deviation, and as an application, we can also have such a logarithmic regret for
the portfolio management problem.
Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints [Abstract]
with Rong Jin and Tianbao Yang
Journal of Machine Learning Research (JMLR), 2012.
In this paper we propose efficient algorithms for solving constrained online convex optimization problems. Our motivation stems from the observation that most algorithms proposed for online convex optimization require a projection onto the convex set $\mathcal{K}$ from which the decisions are made. While the projection is straightforward for simple shapes (e.g., Euclidean ball), for arbitrary complex sets this is the main computational challenge and may be inefficient in practice. In this paper, we consider an alternative online convex optimization problem. Instead of requiring that decisions belong to $\mathcal{K}$ for all rounds, we only require that the constraints, which define the set $\mathcal{K}$, be satisfied in the long run.
By turning the problem into an online convexconcave optimization problem, we propose an efficient algorithm which achieves $O(\sqrt{T})$ regret bound and $O(T^{3/4})$ bound for the violation of constraints. Then we modify the algorithm in order to guarantee that the constraints are satisfied in the long run. This gain is achieved at the price of getting $O(T^{3/4})$ regret bound. Our second algorithm is based on the mirror prox method \citep{nemirovski2005prox} to solve variational inequalities which achieves $O(T^{2/3})$ bound for both regret and the violation of constraints when the domain $\K$ can be described by a finite number of linear constraints. Finally, we extend the results to the setting where we only have partial access to the convex set $\mathcal{K}$ and propose a multipoint bandit feedback algorithm with the same bounds in expectation as our first algorithm.
Home
Background
Publications
Experience
Code
