PyCon Israel 2024

Your locale preferences have been saved. We like to think that we have excellent support for English in pretalx, but if you encounter issues or errors, please contact us!

Accelerating ML Development with Multi-Modal Datasets: Leveraging Python, Parquets, and Daft
09-16, 13:00–13:20 (Asia/Jerusalem), Main Hall (30)
Language: עברית

Mobileye accelerates ML development with multi-modal datasets using Python, Parquets, and Daft. We will cover dataset formats, Daft’s capabilities, its usage examples, and its integration into Mobileye's cloud-native architecture.


Large-scale multi-modal datasets play a pivotal role in the development of computer vision solutions based on deep learning algorithms. Historically, different data formats have been used at various stages of the ML development life cycle to optimize specific tasks. During training, a sequential pass over the entire dataset is essential, while validation involves map-reduce operations.

In this talk, we’ll delve into how Mobileye leverages Python, Parquets, and Daft to streamline the AI development life cycle. We’ll explore the following key aspects:
- Dataset Formats and Reading Options - We’ll discuss various options for representing multi-modal datasets, and how choosing the right format impacts training and validation efficiency.
- What Is Daft - Daft is a high-performance Python query engine designed for handling complex, multimodal data types. Its Rust core engine executes operations lazily via Daft’s Expressions API. Additionally, Daft seamlessly integrates with essential Python libraries like PyTorch and offers efficient cloud storage integration.
- Examples of Daft Usage - Through practical examples, we’ll demonstrate how Daft simplifies working with multi-modal datasets. From loading data to preprocessing.
- Daft in a Cloud-Native Architecture - Learn how Daft seamlessly integrates into Mobileye’s cloud infrastructure. We’ll explore its role in enabling multi-tenant access to datasets, ensuring fast data retrieval, and maintaining a single source of truth.

Throughtout the talk We’ll discuss real-world scenarios showing how Daft accelerates the AI development lifecycle. Attendees will gain insights into best practices for leveraging Daft effectively.


Expected experience level of participants

Intermediate

Target audience

R&D

Guy Pozer is an accomplished ML Software Engineering Team Lead at Mobileye, where he works on accelerating the ML lifecycle. His main areas of expertise lie in the optimization of the neural network training process, with a keen interest in everything from hardware acceleration to research process optimization. Guy leads the development of an SDK that empowers data scientists at Mobileye to research and deploy their solutions to Mobileye’s SOC. He is passionate about integrating the speed of system-level programming languages into the Python ecosystem. His work is a testament to his commitment to pushing the boundaries of what’s possible in the realm of machine learning.