Phys.com November 10, 2021 Open sharing of genomic data consists of genomes assembled with different tools and levels of quality checking, and large volumes of completely unprocessed raw sequence data. Considerable computational effort is required before biological questions can be addressed. Researchers in the UK assembled and characterized 661,405 bacterial genomes retrieved from the European Nucleotide Archive (ENA) using a uniform standardized approach and produced a searchable COmpact Bit-sliced Signature (COBS) index, facilitating the easy interrogation of the entire dataset for a specific sequence. Additional MinHash and pp-sketch indices support genome-wide comparisons and estimations of genomic distance. 639,981 high-quality genomes […]