Pycon Israel 2021

String Comparison In Real Life - Challenges and Various Ways to Resolve Them
2021-05-03, 14:00–14:25, PyData Track 2

Text analysis in real life can often yield unsatisfactory results due to typos, alternate phrasing, abbreviations and more. In this talk, we'll cover practical and efficient string comparison methods, as well as tackle some commonly faced issues.


A common problem faced by data analysts, data scientists, and many developers who need to analyze and compare data, is that texts are often similar, but not quite identical to one another.
This can result from the existence of multiple ways to say the same thing, typos and abbreviations, common yet unindicative words (such as "the") and punctuation, that can all skew the results.

During this talk, I will walk you through several methods to compare inexact texts, using a few different libraries, cover the usages as well as advantages & disadvantages of each method, and tackle some commonly faced issues.

By the end of the talk, you should have a good basis to start comparing texts efficiently and elegantly in your code.


Session language – English Target audience – Developers, DevOps, Data Scientists, R&D, Other (please specify below) Other (target audience) – Data Analysts