Index of Julia Functions

SequentialGWAS.genotype_gvcfMethod
genotype_gvcf(gvcf_file, 
    shared_variants_plink, 
    shared_variants_gatk, 
    reference_genome; 
    output_prefix="output"
)

Genotype a GVCF file using GATK and PLINK as follow.

  • First attempts to genotype the GVCF file using GATK (this can fail for some reason in which case we just skip the individual).
  • Then converts the VCF to a PLINK bed format.
  • Updates the bim file with the mapped alleles from the shared variants (thi can also fail if some variants are not any of the known ref/alt in which case we also skip the individual).
  • Writes the new bim, bed and fam files.
source
SequentialGWAS.get_actionMethod

Compares the variant to the 1000 GP information and returns an action to take together with a reason.

Potential actions are

  1. DROP
  2. KEEP
  3. FLIP

Some particularly unexpected ACTION (REASON) are:

  • "KEEP (REVERSE-REF-ALT)"
  • "FLIP (COMPLEMENT-NOT-MATCHING-KGP)"

Because they mean the minor/major alleles are reversed in our dataset as compared to the reference KGP.

source
SequentialGWAS.read_bimMethod
read_bim(file)

Columns Description from: https://www.cog-genomics.org/plink/1.9/formats#bim

  • Chromosome code (either an integer, or 'X'/'Y'/'XY'/'MT'; '0' indicates unknown) or name
  • Variant identifier
  • Position in morgans or centimorgans (safe to use dummy value of '0')
  • Base-pair coordinate (1-based; limited to 231-2)
  • Allele 1 (corresponding to clear bits in .bed; usually minor)
  • Allele 2 (corresponding to set bits in .bed; usually major)
source
SequentialGWAS.read_famMethod
read_fam(file)

Columns Description from: https://www.cog-genomics.org/plink/1.9/formats#fam

  • Family ID ('FID')
  • Within-family ID ('IID'; cannot be '0')
  • Within-family ID of father ('0' if father isn't in dataset)
  • Within-family ID of mother ('0' if mother isn't in dataset)
  • Sex code ('1' = male, '2' = female, '0' = unknown)
  • Phenotype value ('1' = control, '2' = case, '-9'/'0'/non-numeric = missing data if case/control)
source
SequentialGWAS.read_mapMethod
read_map(file)

Columns Description from: https://www.cog-genomics.org/plink/1.9/formats

  • Chromosome code. PLINK 1.9 also permits contig names here, but most older programs do not.
  • Variant identifier
  • Position in morgans or centimorgans (optional; also safe to use dummy value of '0')
  • Base-pair coordinate
source
SequentialGWAS.OneTimeChecks.identify_snps_to_flipMethod
identify_snps_to_flip(manifest_file)

According to this link, the RefStrand column in the manifest file corresponds to the standard designation for all eukaryotic organisms used by HapMap and 1000 Genomes Project. Variants with RefStrand equal to - need to be flipped to the + strand.

source