Pycon Israel 2021

Short Text in the Wild
05-02, 10:30–10:55 (Asia/Jerusalem), PyData Track 1

While most of our online lives revolve around short texts, there's very little information on how to apply NLP techniques on such texts. In this talk, I'll share the lessons we learned and the methodology we developed when dealing with short texts.


“Thanks for all the fish” ; “Happy bday grandma!” ; “Mercedes C-class Cabriolet” . Looks random, right? Well, maybe you know the old saying “one man’s trash is another woman’s treasure”. These texts, while very short, can be a virtual gold mine for many different business use-cases, some of which we tackle daily in our work. When we started working on unsupervised feature generation from very short texts, we started by looking into what’s already been done in the field, and to our surprise the answer was: not a lot. In this talk we’ll share some insights from our experience in dealing with short texts. We’ll start by defining what we mean by "short" in our unique case, why it’s interesting in various domains, where and why advanced out-of-the-box methods failed and finally, provide practical tips for handling short and unusual types of text.


Session language

English

Target audience

Data Scientists

Gal is a senior data scientist at PayPal, working mostly on NLP applications and NLP variable generation. She currently heads up construction of an internal NLP infrastructure to feed a wide variety of models in diverse domains like fraud detection, marketing, and Credit risk. Gal holds a BSc in Electrical Engineering and Physics as well as an MSc in Electrical engineering, all from Tel Aviv University. Before PayPal, Gal worked as a guidance and control algorithms engineer for aerospace systems and later as a data scientist on cyber applications.