Making our Municipalities more Transparent using Python!
2019-06-04, 14:30–14:55, Hall 3

DataCity is a project aimed at creating a single repository of all municipal data in Israel.
I'll talk about the project and the Python toolset we've built to create and manage this large ETL operation.


Municipalities are the branch of the government that probably affect us the most (think education, garbage collection, building permits...).
They are also notoriously known to be not transparent - making it difficult for us citizens to make sure that the people in charge are making good use of our taxes and that our city is performing well in comparison to others.

In the beginning of 2019 we (at Hasadna) embarked on a project to make municipalities more transparent - DataCity. In this project we aim to create a single API endpoint for all municipalities' data (normalized, standardized, verified, regularly-updated).

There are a few problems along the way, though:

  1. They don't really want to be transparent
  2. Data is of low quality and very non-uniform.

For solving (2) we're building a versatile framework for -

  • extracting data from various sources and formats,
  • cleaning it,
  • mapping it to a predefined schema,
  • validating it with domain-specific rules,
  • enriching it and finally
  • publishing it in our data warehouse

We're doing all that in a reusable way, based on open source tools (mainly the dataflows ETL library, which I'll describe in detail during the talk).

(we also have a solution for (1) - I'll talk about that too :) )