Pycon Israel 2021

Set your EDA on Autopilot
2021-05-03, 10:30–10:55, PyData Track 2

This session will focus on one of the hottest topics of the past two years in the data science ecosystem - Automated Exploratory Data Analysis.

Recently Andrew Ng held a conference where his main claim was that we should be more data-centric in our research. He based his doctrine on various studies and examples that showed significant improvement in model performance once the researchers modified the data.

"If 80% of our work is data preparation, then ensuring data quality is the important work of a machine learning team."
Andrew Ng

To provide the model with strong foundations, we must explore and process the data professionally and meticulously. It can be a very long and exhausting process. To help you get through this part successfully, the new 'Automated EDA' field has emerged.

In the lecture, we will explore the field of automation in ML and how it corresponds with the variability of the projects. We will examine what can be automated in EDA and explore the latest feature of two powerful open-source tools - Pandas Profiling and SweetViz.

The audience will receive a link for the sides and to a Colab notebook with examples for:
- Exporting EDA report using Pandas profiling and SweetViz.
- Exporting EDA report that compares two data sets.
- Exporting EDA report that compares two categories.

Session language – English Target audience – Data Scientists