deep reinforcement learning guy here. Youre looking at coding up double dqn, and probably training it for about 5 to 30 minutes. Very likely it would not outperform a simple near perfect algorithm, and would fail in some weird ways that would annoy you a few tens of thousands of times. Youd have issues like it refusing to go left, ever, after dying. So youd need to know about epsilon then epsilon decay etc.
Coding that from nothing (and understanding it) could take days. Weeks to months if you dont know ml stuff.
In this particular case I think basic Q Tabular Learning could play optimally, no neural net required.
I could be wrong though, throw a full conv at it without temporal difference and see what happens! (probably a kinda okayish score)
You could probably throw a small fully connected net at it, and 'train' it via genetic algorithms: ie have a population of nets, let them each run a few games, and pick a few top scorers to seed the next generation with some mutation and cross-over.
This random exploration is less efficient than back propagation, but at least you don't have to muck around with temporal differences. (And it would be hard to figure out how to tweak the weights for backpropagation in this multi-step game; without falling back to reinforcement learning.)
The problem seems simple enough that evolution via mutation and artificial selection has a decent chance to get good in a reasonable amount of computing time.
With the right small neural network architecture you could probably just repeat random weight initialization until you find one that works. This is known to work for some of the Atari games. The trick is, of course, to engineer a good input encoding and to find the right model complexity and architecture in the first place.
NEAT? An added bonus of the evolutionary neural search is that you can still explore the solution space with small networks, which isnt really true of backpropogation without evolution. You run out of states.
Coding that from nothing (and understanding it) could take days. Weeks to months if you dont know ml stuff.
In this particular case I think basic Q Tabular Learning could play optimally, no neural net required.
I could be wrong though, throw a full conv at it without temporal difference and see what happens! (probably a kinda okayish score)