François Chollet @fchollet Deep learning @google. Creator of Keras, neural networks library. Author of 'Deep Learning with Python'. Opinions are my own. Nov. 18, 2019 1 min read

New Keras feature: the TextVectorization layer. It takes as input strings and takes care of text standardization, tokenization, and vocabulary indexing.

This enables you to create models that process raw strings.

End-to-end text classification example:  https://colab.research.google.com/drive/1RvCnR7h0_l4Ekn5vINWToI9TNJdpUZB3 

Key features:
- Supports sparse outputs (int sequences), to be fed into an Embedding layer
- Supports dense outputs (binary, tf-idf, count)
- Built-in ngram generation

Full credits to Mark Omernick for the code example and doing much of the work on this project.

Such a layer makes your text-processing model end-to-end: ingests strings, outputs classes/etc. You can deploy your model without worrying about the external preprocessing pipeline.


You can follow @fchollet.



Bookmark

____
Tip: mention @threader_app on a Twitter thread with the keyword “compile” to get a link to it.

Enjoy Threader? Sign up.

Threader is an independent project created by only two developers. The site gets 500,000+ visits a month and our iOS Twitter client was featured as an App of the Day by Apple. Running this space is expensive and time consuming. If you find Threader useful, please consider supporting us to make it a sustainable project.