Pycon Israel 2021

Genomic data - cost-effective scaling in the cloud
05-02, 14:00–14:25 (Asia/Jerusalem), PyData Track 1

Genomic sequencing and processing data amounts to many terabytes of data. We'll present how single-cell processing pipe-line requires strong/eventual consistency trade-offs which are different from traditional big-data systems.


immunai runs a complex single-cell RNA sequencing pipe-line. The computational-biology and machine-learning tools eco-system revolves around R and Python. We use cost-effective cloud-storage for the large sequencing files while combining them with strongly consistent meta-data. R/python API users can retrieve the data indexed by any application defined set of labels/features. We will discuss the tradeoffs compared to other big-data platforms like Apache Spark, Elastic Search etc.


Session language

English

Target audience

Developers, DevOps, Data Scientists, R&D

Working in bug-data for more than 10 years, programming for 40 years. Interested in system design, programming languages, big-data and data analytics. Consultant, VP R&D, Xoogler. Various industries - ad-tech, fin-tech, cyber and more.