About 10,000 deep learning papers have been written about "hard-coding priors about a specific task into a NN architecture works better than a lack of prior" -- but they're typically being passed as "architecture XYZ offers superior performance for [overly generic task category]"
You can always "buy" performance by either training on more data, better data, or by injecting task information into the architecture or the preprocessing. However, this isn't informative about the generalization power of the techniques used (which is the only thing that matters)
Basically, a lot of papers can be rephrased as "we achieved better performance on this specific task by going to great lengths to inject more information about the task in our training setup"
An extreme case of this: working with a synthetically-generated dataset where samples follow a "template" (e.g. bAbI), and manually hard-coding that template into your NN architecture
Fitting parametric models via gradient descent, unsurprisingly, works best when what you are fitting is already a template of the solution.
Of course, convnets are an instance of this (but in a good way, since their assumptions generalize to all visual data).
You can follow @fchollet.
Tip: mention @threader_app on a Twitter thread with the keyword “compile” to get a link to it.
Enjoy Threader? Sign up.
Since you’re here...
... we’re asking visitors like you to make a contribution to support this independent project. In these uncertain times, access to information is vital. Threader gets 1,000,000+ visits a month and our iOS Twitter client was featured as an App of the Day by Apple. Your financial support will help two developers to keep working on this app. Everyone’s contribution, big or small, is so valuable. Support Threader by becoming premium or by donating on PayPal. Thank you.