2021-05-02, 11:30–12:15, PyData Track 1
I plan to discuss three archetypical war-stories about fitting in memory. In each of them, I'll describe both the technical challenge and the human biases that needed to be overcome to arrive at sound solutions.
One aspect of handling big data is that typically a problem's dataset does not naively fit into RAM. Three episodes I'd like to discuss:
- How to chew thousands of >1GB JSON files without swallowing them whole.
- Choosing the right in-memory format for a sparse shortest-path matrix, when the dense version would be prohibitively big,
- Choosing a data-at-rest format for large dataset without reinventing the wheel.
I'll discuss the problems, their solutions and the mistakes I made along the way