Teaching Thousands Of CPUs How To Read PyCon Israel 2019

Teaching Thousands Of CPUs How To Read
.ical

06-03, 12:15–12:40 (Asia/Jerusalem), Main Hall

Amenity Analytics does NLP and machine learning using serverless infrastructure almost exclusively. In this talk we'll cover what we do, how we optimize it, and how we scale analysis to meet demand

Running NLP (Natural Language Processing) and language based machine learning models can be time and resource consuming. For us, it's part of our daily workflow to run new models and new algorithms on data flows consisting of hundreds of thousands of documents.
Since each of the documents we analyze can be about 50 pages long our usual use case involves analyzing 5-10 million pages of fine print financial text. With our legacy system that is based on Java 8 code running on docker containers it can take between 5 hours to 5 days to finish one analysis cycle depending on the complexity of what we're running.
4 months ago we started two major evolutions at Amenity - one for moving our entire ETL (Extract Transform Load) process to run on serverless infrastructure using python via AWS lambdas, Kinesis streams, and DynamoDB. The other was to rewrite our NLP engines in python and cython so that they could run easily on AWS Lambda functions. The results are thousands of CPUs waking up in a matter of several seconds and doing extremely efficient and accurate ETL and NLP work finishing work loads between x10 to x100 faster than we used to. As a bonus we've got a more efficient team working in Python in a CI/CD environment, and our cloud costs reduced.
In the talk we'll go through the decision process of moving to python, cython, and serverless, how our architecture shifted and evolved over time, and how we plan to solve our next major challenge - a billion articles analyzed in less than an hour!

Roy Penn

Roy is the VP or Engineering at Amenity Analytics where they transform text into meaningful insights delivered at scale with speed and unparalleled accuracy.

His work focuses on scaling cloud infrastructure, preferably in a serverless and optimized fashion, and on generating insights from textual data using NLP and ML techniques.

Roy defines the best part of his job to be "developing developers".

https://www.linkedin.com/in/roy-penn-5809083/

Uri Tsemach

Uri is a senior software and NLP developer at Amenity Analytics where they transform text into meaningful insights delivered at scale with speed and unparalleled accuracy.

His work focuses on scaling cloud infrastructure and on generating insights from textual data using NLP and ML techniques.

Moshe Hazoom

NLP and Machine Learning Engineer at Amenity Analytics.

Earned M.Sc in Computer science with specialization in Machine Learning and NLP from Bar Ilan University.

Passionate about understanding natural language using advanced Machine Learning and Deep Learning techniques.

Teaching Thousands Of CPUs How To Read .ical 06-03, 12:15–12:40 (Asia/Jerusalem), Main Hall

Teaching Thousands Of CPUs How To Read
.ical

06-03, 12:15–12:40 (Asia/Jerusalem), Main Hall