2021-05-02, 14:00–14:25, PyData Track 1
Genomic sequencing and processing data amounts to many terabytes of data. We'll present how single-cell processing pipe-line requires strong/eventual consistency trade-offs which are different from traditional big-data systems.
immunai runs a complex single-cell RNA sequencing pipe-line. The computational-biology and machine-learning tools eco-system revolves around R and Python. We use cost-effective cloud-storage for the large sequencing files while combining them with strongly consistent meta-data. R/python API users can retrieve the data indexed by any application defined set of labels/features. We will discuss the tradeoffs compared to other big-data platforms like Apache Spark, Elastic Search etc.