PyCon Israel 2023

🇮🇱 How to kill your PySpark performance with these simple tricks
2023-07-04, 10:30–10:50, Hall 2 (Ground Floor)

The talk would start with explaining what spark is. what problems it solves, and why you might want to use it. Then I'll describe common anti patterns, especially with the data engineering/science related code. and what you should probably do instead


Pyspark, spark’s python interface is a potent data processing tool and potentially very high performing. This talk is about PYSpark's strong points and how common anti-patterns abuse and hurt PYSpark applications' performance, forcing you to throw more money and lose many of spark benefits. But there is a better way, using native pyspark tools and patterns that I’ll present


Session language – Hebrew Target audience – R&D Other (target audience) – Data science, data engineers, and big-data practitioners