New Keras feature: the TextVectorization layer. It takes as input strings and takes care of text standardization, tokenization, and vocabulary indexing.
This enables you to create models that process raw strings.
End-to-end text classification example: https://colab.research.google.com/drive/1RvCnR7h0_l4Ekn5vINWToI9TNJdpUZB3 …
- Supports sparse outputs (int sequences), to be fed into an Embedding layer
- Supports dense outputs (binary, tf-idf, count)
- Built-in ngram generation
Full credits to Mark Omernick for the code example and doing much of the work on this project.
Such a layer makes your text-processing model end-to-end: ingests strings, outputs classes/etc. You can deploy your model without worrying about the external preprocessing pipeline.
You can follow @fchollet.
Tip: mention @threader_app on a Twitter thread with the keyword “compile” to get a link to it.
Enjoy Threader? Sign up.
Threader is an independent project created by only two developers. The site gets 500,000+ visits a month and our iOS Twitter client was featured as an App of the Day by Apple. Running this space is expensive and time consuming. If you find Threader useful, please consider supporting us to make it a sustainable project.