Based on the observation that the GPT-2 medium-size model has memorized (and can spit back word-for-word) very long extracts from the web, such as the Gorilla Warfare meme, I had an idea for a very simple ML-less text generation algorithm. I spent the past 20 min implementing it.
My algo is to make search queries for the keywords in a prompt, plus the exact sequence of the last words in the prompt (trying different number of words to get at least one match), then stitch together result snippets by using last words as a continuity pivot. It works decently!
For the unicorn prompt, it just spits back the GPT-2 result (stitched together from multiple news sites!). Same for the Gorilla Warfare meme. For less popular prompts, it gets more creative, combining sentences or sub-sentences from multiple related sources.
I will not be releasing the code, because you guys couldn't handle the power of a Python script cobbled together in 20 minutes with Requests, BeautifulSoup, and regular expressions. It would change algorithmic cyberwar forever.
You can follow @fchollet.
Tip: mention @threader_app on a Twitter thread with the keyword “compile” to get a link to it.
Enjoy Threader? Sign up.
Threader is an independent project created by only two developers. The site gets 500,000+ visits a month and our iOS Twitter client was featured as an App of the Day by Apple. Running this space is expensive and time consuming. If you find Threader useful, please consider supporting us to make it a sustainable project.