Automation of feature engineering: pros and cons
2019-06-03, 15:30–15:55, Hall 2 (PyData)

Data scientists spend over 60% of their time doing feature engineering. I will discuss automation of the feature engineering process, using Featuretools, in order to significantly reduce time investment, make it repeatable, more robust and creative.


Data scientists spend over 60% of their time getting familiar with data, understanding features and the relationships between them, and ultimately creating new features from the data. This process is called feature engineering. It is a fundamental step before using predictive models and directly affects the predictive power of a model.

Traditional feature engineering is often described as an art: it requires both domain knowledge and data manipulation skills. The process is problem-dependent and might be biased by personal skills, loss of patience during data analysis, and many other factors which depends on the personality of the data scientist and prior experience in the field.
Recently it has been proposed that making feature engineering an automated process would
significantly reduce time investment in this early crucial step of modeling. In addition it will be repeatable, more robust and creative.

Featuretools is an open source automated feature engineering library that was created by the developers at Feature Labs.
In my talk I will present the Featuretools library, its concepts and functions. I will address the very important question - to which extent can feature engineering be completely automated? I will discuss different scenarios presenting pros and cons. Finally, we will implement auto feature engineering and explore code examples.