PyCon Israel 2024

Your locale preferences have been saved. We like to think that we have excellent support for English in pretalx, but if you encounter issues or errors, please contact us!

Noah Santacruz

Noah has been contributing to Sefaria for the last 10 years, helping build its data science department . His contributions include citation recognition, search and topic modelling.

He has a master's in NLP from Cooper Union and has been focusing on the intersection of NLP with Hebrew.

Sefaria is non-profit, open source organization focusing on uploading and enriching basic texts about Judaism. We uniquely focus on giving third-party developers access to almost all of our underlying data through APIs and documentation.


Sessions

09-16
11:30
20min
Beyond KMeans - using LLMs to improve text clustering
Noah Santacruz

Text clustering is a fundamental process in NLP, but what do you do when your clusters just aren’t right? I will share my journey where I ended up combining sklearn and langchain to reduce duplication and "Misc" clusters.

Hall 7