Pycon Israel 2021

Short Text in the Wild
2021-05-02, 10:30–10:55, PyData Track 1

While most of our online lives revolve around short texts, there's very little information on how to apply NLP techniques on such texts. In this talk, I'll share the lessons we learned and the methodology we developed when dealing with short texts.

“Thanks for all the fish” ; “Happy bday grandma!” ; “Mercedes C-class Cabriolet” . Looks random, right? Well, maybe you know the old saying “one man’s trash is another woman’s treasure”. These texts, while very short, can be a virtual gold mine for many different business use-cases, some of which we tackle daily in our work. When we started working on unsupervised feature generation from very short texts, we started by looking into what’s already been done in the field, and to our surprise the answer was: not a lot. In this talk we’ll share some insights from our experience in dealing with short texts. We’ll start by defining what we mean by "short" in our unique case, why it’s interesting in various domains, where and why advanced out-of-the-box methods failed and finally, provide practical tips for handling short and unusual types of text.

Session language – English Target audience – Data Scientists