Legacy Information

Info

The data generated by the following guide is not currently used in the aggregation workflow.

The Illumina manifest files corresponding to each array can be downloaded from Illumina's website. The description of the manifest columns can be found here. A comparison of both the GSA-48v4 and GSA-24v3 arrays for the GRC38 genome build yields the following table (function GenomiccWorkflows.OneTimeChecks.array_overlap):

DescriptionNumber of SNPs
In GSA-48v4650 321
In GSA-24v3654 027
Union702 515
Intersection601 833
Only in GSA-48v448 488
Only in GSA-24v352 194

The arrays are thus quite similar with around 7% difference between the two and none being a subset of the other. This motivates the strategy to take the intersection of genotyping arrays before merging. Note that in reality the intersection might be smaller due to QC filtering of the input SNPs.

Furthermore, because the manifest files are too big to be version controlled, variants on the - RefStrand were extracted and stored as follows:

  • GSA-MD-24v3-0A1 (genome build GRC37) : `assets/GSA-24v3-0A1-minus-strand.txt`
  • GSA-MD-48v4-0A1 (genome build GRC38) : `assets/GSA-48v4-020085471_D2-minus-strand.txt`

This has been done using the make_snps_to_flip_list in the bin/one_time_checks.jl script.