Are you a deep learning researcher? Wondering if all this TensorFlow 2.0 stuff you heard about is relevant to you?
This thread is a crash course on everything you need to know to use TensorFlow 2.0 + Keras for deep learning research. Read on!
2) The `add_weight` method gives you a shortcut for creating weights.
3) It’s good practice to create weights in a separate `build` method, called lazily with the shape of the first inputs seen by your layer. Here, this pattern prevents us from having to specify `input_dim`:
4) You can automatically retrieve the gradients of the weights of a layer by calling it inside a GradientTape. Using these gradients, you can update the weights of the layer, either manually, or using an optimizer object. Of course, you can modify the gradients before using them.
5) Weights created by layers can be either trainable or non-trainable. They're exposed in the layer properties `trainable_weights` and `non_trainable_weights`. Here's a layer with a non-trainable weight:
8) These losses are cleared by the top-level layer at the start of each forward pass -- they don't accumulate. `layer.losses` always contain only the losses created during the *last* forward pass. You would typically use these losses by summing them when writing a training loop.
9) You know that TF 2.0 is eager by default. Running eagerly is great for debugging, but you will get better performance by compiling your computation into static graphs. Static graphs are a researcher's best friends! You can compile any function by wrapping it in a tf.function:
10) Some layers, in particular the `BatchNormalization` layer and the `Dropout` layer, have different behaviors during training and inference. For such layers, it is standard practice to expose a `training` (boolean) argument in the `call` method.
11) You have many built-in layers available, from Dense to Conv2D to LSTM to fancier ones like Conv2DTranspose or ConvLSTM2D. Be smart about reusing built-in functionality.
12) To build deep learning models, you don't have to use object-oriented programming all the time. All layers we've seen so far can also be composed functionally, like this (we call it the "Functional API"):
The Functional API tends to be more concise than subclassing, & provides a few other advantages (generally the same advantages that functional, typed languages provide over untyped OO development).
Learn more about the Functional API: https://www.tensorflow.org/alpha/guide/keras/functional …
However, note that the Functional API can only be used to define DAGs of layers -- recursive networks should be defined as `Layer` subclasses instead.
In your research workflows, you may often find yourself mix-and-matching OO models and Functional models.
That's all you need to get started with reimplementing most deep learning research papers in TensorFlow 2.0 and Keras!
Now let's check out a really quick example: hypernetworks.
A hypernetwork is a deep neural network whose weights are generated by another network (usually smaller).
Let's implement a really trivial hypernetwork: we'll take the `Linear` layer we defined earlier, and we'll use it to generate the weights of... another `Linear` layer.
This is the end of this thread. Play with these code examples in this Colab notebook: https://colab.research.google.com/drive/17u-pRZJnKN0gO5XZmq8n5A2bKGrfKEUg … 🦄🚀
You can follow @fchollet.
Tip: mention @threader_app on a Twitter thread with the keyword “compile” to get a link to it.
Threader is an independent project created by only two developers. The site gets 500,000+ visits a month and our iOS Twitter client was featured as an App of the Day by Apple. Running this space is expensive and time consuming. If you find Threader useful, please consider supporting us to make it a sustainable project.