In an ideal world, a researcher with an idea could easily build on top of what has already been done (the 90%) and have 10% of work left to do in order to test his or her hypothesis. ... In practice, quite the opposite is happening. Researchers spend weeks re-doing data pre- and post-processing and re-implementing and debugging baseline models.
The difficulty of building upon other’s work is a major factor in determining what research is being done. The majority of researchers build on top of their own research, over and over again. ... I believe the main reason for this is that it’s easiest, from an experimental perspective, to build upon one’s own work. .... Baselines are already implemented in familiar code, evaluation is setup, related work is written up, and so on.
It also leads to less competition – nobody else has access to your experimental setup and can easily compete with you. If it were just as easy to build upon somebody else’s work we would probably see more diversity in published research.
In practice, as everyone re-implements techniques using different frameworks and pipelines, comparisons become meaningless. ... As you re-implement your LSTM, use a different framework, pre-process your own data, and write thousands of lines of code, how many confounding variables will you have created? My guess is that it’s in the hundreds to thousands. If you then show a 0.5% marginal improvement over some baseline models ... how can you ever prove causality?
Personally, I do not trust paper results at all. I tend to read papers for inspiration – I look at the ideas, not at the results.
A superb blog post by Denny Britz in my opinion, completely on point. A few takeaways to help not get into the problems mentioned above
- Use platforms like the OpenAI gym and OpenAI universe, and use datasets with as little pre-processing steps as possible
- Use standardized implementations from TensorFlow or Keras or similar libraries for everything that your model does not seek to innovate on
- When research papers don't do the above, take the results with a grain of salt.
For the full blog post, check out Engineering is the bottleneck in (Deep Learning) Research – Denny's Blog.