With the right small neural network architecture you could probably just repeat random weight initialization until you find one that works. This is known to work for some of the Atari games. The trick is, of course, to engineer a good input encoding and to find the right model complexity and architecture in the first place.