פייקון ישראל 2024

עברת לעבוד בעברית. התמיכה עדיין חלקית מאוד, הטפסים לא מתורגמים, וייתכנו שגיאות מערכת, אבל אפשר לכתוב מימין לשמאל, ואנחנו על זה. אם נתקלת בשגיאה, אפשר בדרך כלל לחזור צעד אחורה, לעבור לאנגלית ולהשלים את הפעולה.

Noah Santacruz

Noah has been contributing to Sefaria for the last 10 years, helping build its data science department . His contributions include citation recognition, search and topic modelling.

He has a master's in NLP from Cooper Union and has been focusing on the intersection of NLP with Hebrew.

Sefaria is non-profit, open source organization focusing on uploading and enriching basic texts about Judaism. We uniquely focus on giving third-party developers access to almost all of our underlying data through APIs and documentation.


התחברויות

16/09
11:30
20min
Beyond KMeans - using LLMs to improve text clustering
Noah Santacruz

Text clustering is a fundamental process in NLP, but what do you do when your clusters just aren’t right? I will share my journey where I ended up combining sklearn and langchain to reduce duplication and "Misc" clusters.

אולם 7