Experiments In Off-Policy Reinforcement Learning With The Gq(Lambda) Algorithm