05-03, 14:00–14:25 (Asia/Jerusalem), PyData Track 2
Text analysis in real life can often yield unsatisfactory results due to typos, alternate phrasing, abbreviations and more. In this talk, we'll cover practical and efficient string comparison methods, as well as tackle some commonly faced issues.
A common problem faced by data analysts, data scientists, and many developers who need to analyze and compare data, is that texts are often similar, but not quite identical to one another.
This can result from the existence of multiple ways to say the same thing, typos and abbreviations, common yet unindicative words (such as "the") and punctuation, that can all skew the results.
During this talk, I will walk you through several methods to compare inexact texts, using a few different libraries, cover the usages as well as advantages & disadvantages of each method, and tackle some commonly faced issues.
By the end of the talk, you should have a good basis to start comparing texts efficiently and elegantly in your code.
English
Target audience –Developers, DevOps, Data Scientists, R&D, Other (please specify below)
Other (target audience) –Data Analysts
I'm a Software Developer with previous experience in Risk & Data Analysis, working in a FinTech company.
I'm also a tech blogger at naomikriger.medium.com and an 8200 alumna.
I love programming, data, and everything in between. I also love foreign languages and chocolate.