By far my most cited paper in RL is the following. Rich Sutton is the first author and he understood the context and significance. I helped with some proofs. I am only recently appreciating the significance of this paper.

Policy Gradient Methods for Reinforcement Learning with Function Approximation with Yishai Mansaur, Rich Sutton, and Satinder Singh, NIPS, 1999.

Here is another RL paper from my time at ATT:

Approximate planning for factored POMDPs using belief state simplification with Satinder Singh, UAI, 1996.