Sutton et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation
Mnih et al. Human-level control through deep reinforcement learning (DQL)
Mnih et al. Asynchronous Methods for Deep Reinforcement Learning (A3C)
David Silver's course (Stanford) on Deep RL
see Goodfellow et al. section 20.9 for applications of Deep RL to backpropagation through discrete variables.