François Chollet+ Your Authors @fchollet Deep learning @google. Creator of Keras, neural networks library. Author of 'Deep Learning with Python'. Opinions are my own. Dec. 26, 2019 2 min read + Your Authors

What's deep learning?

The "common usage" definition as of 2019 would be "chains of differentiable parametric layers trained end-to-end with backprop".

But this definition seems overly restrictive to me. It describes *how we do DL today*, not *what it is*.

If you have a convnet and you train its weights with ADMM, is that no longer deep learning?

Is an HMAX model (with learned features) not deep learning?

Is a deep neural network trained greedily layer-by-layer not deep learning?

I say they're all deep learning.

Deep learning refers to an approach to representation learning where your model is a chain of modules (typically a stack / pyramid, hence the notion of depth), each of which could serve as a standalone feature extractor if trained as such.

That's also how I define it in my book.

This stands in contrast to:

1) Things that are not representation learning (e.g. manual feature engineering like SIFT, symbolic AI, etc.)

2) "Shallow learning", where there is a single feature extraction layer.

It does not prescribe a specific learning mechanism (e.g. backprop) or a specific use case (e.g. supervised learning or RL), and it does not require end-to-end joint learning (as opposed to greedy learning).

It's the *what* (nature and structure), not the *how*.

This definition draws a clear boundary: some things are DL, some things aren't.

The 2019 flavors of DNNs are DL, of course. So are DNNs trained with backprop alternatives like ES, ADMM, or virtual gradients.

Genetic programming is not DL. Quicksort is not DL. Nor is SVM.

A single Dense layer is not DL. But a Dense stack is.l DL.

K-means is not DL. But stacking k-means feature extractors is DL.

When in 2011-12 I was doing stacked matrix factorization over matrices of pairwise mutual information of locations in video data, that was deep learning.

Programs typically written by human engineers are not DL. Parametrizing such programs to learn a few constants automatically is still not DL. You need to be doing representation learning with a chain of feature extractors.

By definition, deep learning is a gradual, incremental way to extract representations from data. In its modern incarnation, it's even at least C1 continuous (more typically C inf). That last part isn't essential, but *incrementality* is intrinsic to DL.

So DL is a fundamentally different beast from symbol manipulation and regular programming, which is fundamentally discrete, flow-centric, and doesn't usually involve intermediate data representations.

You could do symbol manipulation with DL, but it involves lots of extra steps.

These are two entirely different takes on data manipulation.

Deep learning isn't just end-to-end gradient descent, but not every program is deep learning either. In fact, deep learning models only represents a tiny, tiny slice of program space.

It can't hurt to look beyond it.

You can follow @fchollet.


Tip: mention @threader_app on a Twitter thread with the keyword “compile” to get a link to it.

Enjoy Threader? Sign up.

Since you’re here...

... we’re asking visitors like you to make a contribution to support this independent project. In these uncertain times, access to information is vital. Threader gets 1,000,000+ visits a month and our iOS Twitter client was featured as an App of the Day by Apple. Your financial support will help two developers to keep working on this app. Everyone’s contribution, big or small, is so valuable. Support Threader by becoming premium or by donating on PayPal. Thank you.

Follow Threader