New Keras feature: the TextVectorization layer. It takes as input strings and takes care of text standardization, tokenization, and vocabulary indexing.
This enables you to create models that process raw strings.
End-to-end text classification example: https://colab.research.google.com/drive/1RvCnR7h0_l4Ekn5vINWToI9TNJdpUZB3 …
- Supports sparse outputs (int sequences), to be fed into an Embedding layer
- Supports dense outputs (binary, tf-idf, count)
- Built-in ngram generation
Full credits to Mark Omernick for the code example and doing much of the work on this project.
Such a layer makes your text-processing model end-to-end: ingests strings, outputs classes/etc. You can deploy your model without worrying about the external preprocessing pipeline.
You can follow @fchollet.
Tip: mention @threader_app on a Twitter thread with the keyword “compile” to get a link to it.
Enjoy Threader? Sign up.
Since you’re here...
... we’re asking visitors like you to make a contribution to support this independent project. In these uncertain times, access to information is vital. Threader gets 1,000,000+ visits a month and our iOS Twitter client was featured as an App of the Day by Apple. Your financial support will help two developers to keep working on this app. Everyone’s contribution, big or small, is so valuable. Support Threader by becoming premium or by donating on PayPal. Thank you.