The document discusses various genomic analysis tools and algorithms developed for use on Spark, including somatic variant caller Guacamole and coverage-depth analysis tools from Hammer Lab and the Broad Institute. It details the comparison between in-house Hadoop clusters and Google Cloud Dataproc for genomic data processing, along with technical challenges encountered during file splitting and analysis. Future work includes the release of new tools and the exploration of suffix arrays in distributed environments.
Related topics: