Index of Julia Functions
SequentialGWAS.genotype_gvcf
— Methodgenotype_gvcf(gvcf_file,
shared_variants_plink,
shared_variants_gatk,
reference_genome;
output_prefix="output"
)
Genotype a GVCF file using GATK and PLINK as follow.
- First attempts to genotype the GVCF file using GATK (this can fail for some reason in which case we just skip the individual).
- Then converts the VCF to a PLINK bed format.
- Updates the bim file with the mapped alleles from the shared variants (thi can also fail if some variants are not any of the known ref/alt in which case we also skip the individual).
- Writes the new bim, bed and fam files.
SequentialGWAS.get_action
— MethodCompares the variant to the 1000 GP information and returns an action to take together with a reason.
Potential actions are
- DROP
- KEEP
- FLIP
Some particularly unexpected ACTION (REASON) are:
- "KEEP (REVERSE-REF-ALT)"
- "FLIP (COMPLEMENT-NOT-MATCHING-KGP)"
Because they mean the minor/major alleles are reversed in our dataset as compared to the reference KGP.
SequentialGWAS.kgp_unrelated_individuals
— MethodOnly keeps the first individual within each family and writes them to outfile
.
SequentialGWAS.read_bim
— Methodread_bim(file)
Columns Description from: https://www.cog-genomics.org/plink/1.9/formats#bim
- Chromosome code (either an integer, or 'X'/'Y'/'XY'/'MT'; '0' indicates unknown) or name
- Variant identifier
- Position in morgans or centimorgans (safe to use dummy value of '0')
- Base-pair coordinate (1-based; limited to 231-2)
- Allele 1 (corresponding to clear bits in .bed; usually minor)
- Allele 2 (corresponding to set bits in .bed; usually major)
SequentialGWAS.read_fam
— Methodread_fam(file)
Columns Description from: https://www.cog-genomics.org/plink/1.9/formats#fam
- Family ID ('FID')
- Within-family ID ('IID'; cannot be '0')
- Within-family ID of father ('0' if father isn't in dataset)
- Within-family ID of mother ('0' if mother isn't in dataset)
- Sex code ('1' = male, '2' = female, '0' = unknown)
- Phenotype value ('1' = control, '2' = case, '-9'/'0'/non-numeric = missing data if case/control)
SequentialGWAS.read_map
— Methodread_map(file)
Columns Description from: https://www.cog-genomics.org/plink/1.9/formats
- Chromosome code. PLINK 1.9 also permits contig names here, but most older programs do not.
- Variant identifier
- Position in morgans or centimorgans (optional; also safe to use dummy value of '0')
- Base-pair coordinate
SequentialGWAS.report_qc_effect
— Methodreport_qc_effect(input_prefix, output_prefix)
Report the effect of the QC filtering process on SNPs and samples.
SequentialGWAS.write_map
— Methodwrite_map(file_prefix, array)
SequentialGWAS.write_release_samples_to_drop
— MethodWe drop duplicate individuals according to the following priority:
WGS > More Recent Array > Older Array
SequentialGWAS.OneTimeChecks.array_overlap
— Methodarray_overlap(manifest_file1, manifest_file2)
Takes two manifest files from Illumina genotyping arrays and returns the intersection of the SNP names.
SequentialGWAS.OneTimeChecks.identify_snps_to_flip
— Methodidentify_snps_to_flip(manifest_file)
According to this link, the RefStrand column in the manifest file corresponds to the standard designation for all eukaryotic organisms used by HapMap and 1000 Genomes Project. Variants with RefStrand equal to -
need to be flipped to the + strand.
SequentialGWAS.OneTimeChecks.make_snps_to_flip_list
— Methodmake_snps_to_flip_list(output, manifest_file)
Takes a manifest file from an Illumina genotyping array and writes a list of SNPs that are on the - strand and need to be flipped to the + strand.