Noah Santacruz
Noah has been contributing to Sefaria for the last 10 years, helping build its data science department . His contributions include citation recognition, search and topic modelling.
He has a master's in NLP from Cooper Union and has been focusing on the intersection of NLP with Hebrew.
Sefaria is non-profit, open source organization focusing on uploading and enriching basic texts about Judaism. We uniquely focus on giving third-party developers access to almost all of our underlying data through APIs and documentation.
התחברויות
16/09
11:30
20min
Beyond KMeans - using LLMs to improve text clustering
Noah Santacruz
Text clustering is a fundamental process in NLP, but what do you do when your clusters just aren’t right? I will share my journey where I ended up combining sklearn and langchain to reduce duplication and "Misc" clusters.
אולם 7