06-04, 14:00–14:25 (Asia/Jerusalem), Hall 2 (PyData)
Using DASK in an ETL
pipeline has some gotcha's.
Although there are many similarities to pandas there are some issues and best practices that can optimize the usage of DASK in general
The presentation agenda:
- Intro to Dask framework
- Basic setup Client
- Dask.dataframe
- Data manipulation
- Read/Write files
- Advanced groupby
- Debugging
There is a jupyter notebook (see attachment) to supplement the talk.
- Background in Environmental Science and Geographical Systems.
- Currently a Data Project Manager in the Israeli Police
- Experienced in Spatial projects in startup companies and large enterprises.