Thanks Keshav.
Did you try experiments by adding more than |A| actions, i.e. adding 2*|A| or 3*|A|actions. If yes, what were the results like?
Yes we did try with more number of repetition extents, upto 3|A|. The results did improve. However, the network needs more exploration as well to ensure all actions are sufficiently explored over the state space for proper value estimates. So, this sort of approach is not scalable.
As mentioned in the paper, we experimented with a generic structured policy, where you predict the extent of the repetition (we consider all integers from 1 to 100 as the possible extents) along with the probability of the action. This is done using an Actor Critic setup similar to the A3C paper of DeepMind. We have very good results on a few games through this architecture.
Did you try games other than the 3 mentioned in the ...