PyCon Israel 2022

🇺🇸 Don’t Underestimate the Obvious: Murphy’s Law in Real-life Data Science
06-29, 17:00–17:20 (Asia/Jerusalem), PyData

Murphy’s Law states that if anything can go wrong it will -- and this is particularly true in data science. Based on personal experience, I describe how to create an effective model despite data pitfalls, methodological hazards and hidden bugs.


“If anything can go wrong, it will”, states Murphy’s Law, and this holds particularly true in data science. Whereas the algorithms used in data science are mathematically flawless, the path to creating an effective model is often rife with obstacles, such as data deficiencies, methodological pitfalls, hidden bugs and human mistakes. In this talk, I offer lessons about common obstacles and how you can avoid them in your projects, as I have learnt from nearly a decade of data science work. Topics include: how data often violates our intuitive assumptions, and how Python tools can detect such violations. How to confidently build a model by starting from simple baselines and using synthetic data to your advantage. How to avoid common pitfalls – such as unintentional overfitting, train set contamination and inconsistent package versions – via defensive programming and code reviews. And how to guarantee long-term code correctness via Pytest.


Session language

English

Target audience

Data Scientists

Senior Data Scientist at Pendo.io with 15 years of industry experience in software development, PhD in Machine Learning, and MSc and BSc in Computer Science. My previous projects involved Machine Learning under uncertainty, Big Data, Modeling and Prediction, and Autonomous and Green Mobility.Senior Data Scientist at Pendo.io with 15 years of industry experience in software development, PhD in Machine Learning, and MSc and BSc in Computer Science. My previous projects involved Machine Learning under uncertainty, Big Data, Modeling and Prediction, and Autonomous and Green Mobility.