our DASK ETL Journey PyCon Israel 2019

our DASK ETL Journey
.ical

06-04, 14:00–14:25 (Asia/Jerusalem), Hall 2 (PyData)

Using DASK in an ETL pipeline has some gotcha's.
Although there are many similarities to pandas there are some issues and best practices that can optimize the usage of DASK in general

The presentation agenda:

Intro to Dask framework
Basic setup Client
Dask.dataframe
Data manipulation
Read/Write files
Advanced groupby
Debugging

There is a jupyter notebook (see attachment) to supplement the talk.

See also: jupyter notebook of the presentation (163.4 KB)

Sephi Berry

Background in Environmental Science and Geographical Systems.
Currently a Data Project Manager in the Israeli Police
Experienced in Spatial projects in startup companies and large enterprises.

our DASK ETL Journey .ical 06-04, 14:00–14:25 (Asia/Jerusalem), Hall 2 (PyData)

our DASK ETL Journey
.ical

06-04, 14:00–14:25 (Asia/Jerusalem), Hall 2 (PyData)