PyCon Israel 2024

Noah Santacruz

Noah has been contributing to Sefaria for the last 10 years, helping build its data science department . His contributions include citation recognition, search and topic modelling.

He has a master's in NLP from Cooper Union and has been focusing on the intersection of NLP with Hebrew.

Sefaria is non-profit, open source organization focusing on uploading and enriching basic texts about Judaism. We uniquely focus on giving third-party developers access to almost all of our underlying data through APIs and documentation.


Sessions

09-16
11:30
20min
Beyond KMeans - using LLMs to improve text clustering
Noah Santacruz

Text clustering is a fundamental process in NLP, but what do you do when your clusters just aren’t right? I will share my journey where I ended up combining sklearn and langchain to reduce duplication and "Misc" clusters.

Hall 7