François Chollet @fchollet Deep learning @google. Creator of Keras, neural networks library. Author of 'Deep Learning with Python'. Opinions are my own. Jun. 01, 2019 1 min read

About 10,000 deep learning papers have been written about "hard-coding priors about a specific task into a NN architecture works better than a lack of prior" -- but they're typically being passed as "architecture XYZ offers superior performance for [overly generic task category]"

You can always "buy" performance by either training on more data, better data, or by injecting task information into the architecture or the preprocessing. However, this isn't informative about the generalization power of the techniques used (which is the only thing that matters)

Basically, a lot of papers can be rephrased as "we achieved better performance on this specific task by going to great lengths to inject more information about the task in our training setup"

An extreme case of this: working with a synthetically-generated dataset where samples follow a "template" (e.g. bAbI), and manually hard-coding that template into your NN architecture

Fitting parametric models via gradient descent, unsurprisingly, works best when what you are fitting is already a template of the solution.

Of course, convnets are an instance of this (but in a good way, since their assumptions generalize to all visual data).

You can follow @fchollet.


Tip: mention @threader_app on a Twitter thread with the keyword “compile” to get a link to it.

Enjoy Threader? Sign up.

Threader is an independent project created by only two developers. The site gets 500,000+ visits a month and our iOS Twitter client was featured as an App of the Day by Apple. Running this space is expensive and time consuming. If you find Threader useful, please consider supporting us to make it a sustainable project.