Deep Reinforcement Learning

Sutton et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation

Mnih et al. Human-level control through deep reinforcement learning (DQL)

Mnih et al. Asynchronous Methods for Deep Reinforcement Learning (A3C)

David Silver's course (Stanford) on Deep RL

Berkeley Course on Deep RL

see Goodfellow et al. section 20.9 for applications of Deep RL to backpropagation through discrete variables.