Workshop : Genetic Association Studies: From SNP design to analysis and interpretation. Training on open sources software
Workshop:
Genetic Association Studies: From SNP design to analysis and interpretation. Training on open sources software
دراسات الارتباط الجيني: من تصميم SNP إلى التحليل و الإستنتاج مع التدريب على برمجيات المصادر المفتوحة
Pr. Dr. Lamjed Mansour (lmansour@ksu.edu.sa)
1- Tools and criteria for selection of informative single nucleotide polymorphisms (SNPs)
2- Methods for SNP genotyping
3- Association analysis and interpretation
4-Functional prediction
Genetic Association Studies: From SNP design to analysis and interpretation. Training on open sources software
In this workshop, we will focus firstly on the website tools and criteria of selection of SNPs polymorphism that will be used on the study. This include the use of HapMap data or dbSNP, determination of all the specifications of the SNPs and the prediction of their function. We will examine the pipeline of “de novo” SNPs after sequencing of a target DNA. We will talk also about the methods for SNP genotyping and the advantage and disadvantage of each one. After that we will examine the methods of data analysis from the row data to final report after logistic regression analysis for reports and publication.
Gene candidate determination tools:
- Bibliographic references
- Genome wide association studies
- Transcriptomic analysis or protein expression
- ncbi medical genetics portal: https://www.ncbi.nlm.nih.gov/medgen/44264
DisGeNET database: platform containing collections of genes and variants associated to human diseases https://www.disgenet.org/search
- Pathway analysis : http://www.snps3d.org/
http://www.snps3d.org/modules.php?name=Search&op=advanced%20search
add your gene name CTLA4
select SNP to be directed to dbSNP
https://www.ncbi.nlm.nih.gov/variation/view/
SNP exploration
To explore any SNP we should identify it, get its physical localization (chromosome number, and coordinate, the genome project, major and minor allele frequency (MAF), region (UTR, intron, exon…) protein or gene expression effect…..
add your gene name example PDCD1 or any other gene
select SNP to be directed to dbSNP
https://www.ncbi.nlm.nih.gov/snp/ (Home - SNP - NCBI (nih.gov)
or ensembl platform: https://asia.ensembl.org/index.html
you can predict the variant effect through ensemble portal: https://asia.ensembl.org/Homo_sapiens/Tools/VEP
Exemple : CTLA-4 SNPs : Full the table…
SNP ID/assay ID |
Common name |
Chromosome position :GRCh38 |
Nucleotide change |
Region |
MAF in Human populations (1000genomes Study) |
|
Global |
European |
|||||
rs11571317 |
|
|
|
|
|
|
rs231775 |
|
|
|
|
|
|
rs3087243 |
|
|
|
|
|
|
rs11571317,rs231775,rs3087243
Decide which technique you will use for SNPs genotyping:
- TaqMan assay: Check for the presence of the selected SNP in the thermofisher portal website
- Linkage disequilbrium analysis through snipa website
Methods for SNP genotyping
- PCR-RFLP: Design primers flanking the region and suitable restriction enzyme (Gene view option, sequence text view); select flanking region, blast and search for primers using primer6, design a restriction enzyme using websites like Sequence Manipulation Suite (SMS), Restriction Digest https://www.bioinformatics.org/sms2/rest_map.html or
NEBcutter: http://nc2.neb.com/NEBcutter2/
- PCR-SSO, ARMS-PCR, sequencing, SNaPshot …..
Association analysis and interpretation
Data analysis and processing of genotypes
-Prepare you XLS (VCF) file including all genotyping results
Open this xls file
Open the website snpstat
Determine de genotype association through the calculation of OR and o values
Determine the haplotype associations and the estimation of D’ and r2
Determine de hardy Weinberg equilibrium and alleles frequency using this website
analyze the overall LD of the selected SNPs through these webapplications
Functional prediction of SNP
Explore the website
Explore the target 3' UTR region of the gene
For SNP in UTR, explore the website
Variant prediction website (ensemble)
PolyPhen :
Analyse using miRNASNiPer for miRNA interactions
use the webtool predictSNP platform to predict the deleterious effect of the SNP based on the evaluation of six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA.
-
Input type for predictSNP
Chromosome coordinates / accepted format of chromosome names: chr1-chr22, chrX, chrY
(prefix chr is not required)
PredictSNP queries must be composed of variants specified by only one input type format
Simple format
Definition: CHR,START,[END,]REFERENCE_ALLELE,ALTERNATIVE_ALLELE
Example (chromosome / GRCh37): chr18,21118528,G,C
Explore SNPs3D; add rs231775, go to KEGG and explore PDB, 3d, sequence-3D view
Linkage disequilibrium and SNPs interactions analysis using SHEsis
Or better use SCHesis Plus web version or download the software
Shesis input format
1. Case/control data
sample data for diploidy species
id1 case G A C C 1 1 A1 A2
id2 case A A T C 1 1 A2 A2
id3 ctrl A A T T 2 2 A3 A4
id4 ctrl 0 0 T T 3 3 A5 A3
id5 ctrl G G A A 2 3 A1 A2
id6 case A A C A 0 0 A6 A7
The first column is sample id. The second column is disease status, "case" for cases and "ctrl" for controls. The following columns are genotypes. They should be delimited by space, comma or tab. Adjacent tokens will be compressed and will be treated as a single token. Genotypes can be any string (e.g. 1,2,3,4, or A,T,G,C , or A1,A2,A3,A4, or anything else) except 0, which is the coding for missing genotypes. Use "NA" for missing phenotypes.
For diploid species, the columns correspond to: sample id, site1-allele1, site1-allele2, site2-allele1, site2-allele2, .
Open Webtools for bionformatic analysis |
Name |
Link |
Brief description |
Category# |
1000 Genomes |
A deep catalog of human genetic variation |
DNA |
|
AFND |
Allele Frequency Net Database |
||
dbSNP |
Database of single nucleotide polymorphisms |
||
DEG |
Database of Essential Genes |
||
EGA |
European Genome–phenome Archive |
||
Ensembl |
Ensembl genome browser |
||
euGenes |
Genomic information for eukaryotic organisms |
||
GeneCards |
Integrated database of human genes |
||
IMG/HMP |
Human Microbiome MetaGenomes |
||
JASPAR |
Transcription factor binding profile database |
||
JGA |
Japanese Genotype–phenotype Archive |
||
KEGG |
Kyoto Encyclopedia of Genes and Genomes |
||
MITOMAP |
Human mitochondrial genome database |
||
NCBI RefSeq |
NCBI Reference Sequence Database |
||
PolymiRTS |
Polymorphism in miRNAs and their Target Sites |
||
UCSC Genome Browser |
UCSC Genome Browser database |
||
ChIPBase |
Database of transcriptional regulation of lncRNA and miRNA genes |
RNA |
|
DARNED |
DAtabase of RNa EDiting in humans |
||
DIANA-LncBase |
http://diana.imis.athena-innovation.gr/DianaTools/index.php?r=lncBase/index |
miRNA targets on lncRNAs |
|
GENCODE |
Encyclopedia of genes and gene variants |
||
H-DBAS |
Human-transcriptome DataBase for Alternative Splicing |
||
HEXEvent |
Database of Human EXon splicing Events |
||
LNCipedia |
Annotated human lncRNA sequences |
||
LncRNA2Target |
Database of differentially-expressed genes after lncRNA knockdown or overexpression |
||
lncRNAdb |
lncRNA Database |
||
lncRNASNP |
Database of SNPs in lncRNAs |
||
LncRNAWiki |
Human lncRNA Wiki |
||
miRBase |
miRNA Database |
||
miRTarBase |
Experimentally-validated miRNA–target interactions |
||
miRWalk |
Database of miRNA–target interactions |
||
NONCODE |
Database of ncRNA genes |
||
NPInter |
Database of ncRNA interactions |
||
RADAR |
Rigorously Annotated Database of A-to-I RNA editing |
||
piRNABank |
Database of piwi-interacting RNAs |
||
RBPDB |
Database of RNA-binding specificities |
||
RDB |
The nucleic acid database |
||
Rfam |
Database of ncRNA families |
||
RNAcentral |
International database of ncRNA sequences |
||
snoRNABase |
Database of human H/ACA and C/D box snoRNAs |
||
starBase |
Database of ncRNA interaction networks |
||
TarBase |
http://diana.imis.athena-innovation.gr/DianaTools/index.php?r=tarbase/index |
Experimentally-validated miRNA:gene interactions |
|
TargetScan |
Predicted miRNA targets in mammals |
||
CATH |
Protein structure classification |
Protein |
|
CPLM |
Compendium of Protein Lysine Modifications |
||
DIP |
Database of Interacting Proteins |
||
EKPD |
Eukaryotic Kinase and Phosphatase Database |
||
HPRD |
Human Protein Reference Database |
||
hUbiquitome |
Ubiquitination sites and cascades |
||
InterPro |
Protein sequence analysis and classification |
||
MEROPS |
Database of proteolytic enzymes, their substrates, and inhibitors |
||
MINT |
Molecular INTeraction Database |
||
ModBase |
Database of comparative protein structure models |
||
mUbiSiDa |
Mammalian Ubiquitination Site Database |
||
PANTHER |
Protein ANalysis THrough Evolutionary Relationships |
||
PDB |
Protein Data Bank for 3D structures of biological macromolecules |
||
PDBe |
Protein Data Bank in Europe |
||
Pfam |
Database of conserved protein families and domains |
||
PhosSNP |
Genetic polymorphisms that influence protein phosphorylation |
||
PIR |
Protein Information Resource |
||
PROSITE |
Database of protein domains, families and functional sites |
||
SysPTM |
Post-translational modifications |
||
TreeFam |
Database of phylogenetic trees of animal species |
||
UniPROBE |
Universal PBM Resource for Oligonucleotide Binding Evaluation |
||
UniProt |
Universal protein resource |
||
UUCD |
Ubiquitin and Ubiquitin-like Conjugation Database |
||
ArrayExpress |
Database of functional genomics experiments |
Expression |
|
BioGPS |
Portal for querying and organizing gene annotation resources |
||
Expression Atlas |
Differential and baseline expression |
||
Human Protein Atlas |
Tissue-based map of the human proteome |
||
MOPED |
Multi-Omics Profiling Expression Database |
||
NCBI GEO |
Gene Expression Omnibus |
||
NRED |
Database of lncRNA expression |
||
ONCOMINE |
Cancer microarray database |
||
PrimerBank |
Public resource for PCR primers |
||
PRIDE |
PRoteomics IDEntifications |
||
TiGER |
Tissue-specific Gene Expression and Regulation |
||
WikiCell |
Unified resource for Human transcriptomics research |
||
CPDB |
Database of human interaction networks |
Pathway |
|
HMDB |
Human Metabolome Database |
||
KEGG PATHWAY |
KEGG pathway maps |
||
MetaCyc |
Metabolic pathway database |
||
Pathway Commons |
Pathway commons |
||
PID |
Pathway Interaction Database |
||
Reactome |
Curated and peer-reviewed pathway database |
||
UniPathway |
Universal Pathway |
||
AlzBase |
Database for gene dysregulation in Alzheimer’s disease |
Disease |
|
CADgene |
Coronary Artery Disease gene database |
||
COSMIC |
Catalog Of Somatic Mutations In Cancer |
||
DiseaseMeth |
Human disease methylation database |
||
DisGeNET |
Gene–disease associations |
||
GOBO |
Gene expression-based Outcome for Breast cancer Online |
||
GWAS Central |
A comprehensive resource for the comparison and interrogation of genome-wide association studies |
||
GWASdb |
Human genetic variants identified by genome-wide association studies |
||
HbVar |
Hemoglobin variants and thalassemias |
||
HGMD |
Human Gene Mutation Database |
||
ICGC |
International Cancer Genome Consortium |
||
IDbases |
Immunodeficiency-causing variations |
||
LncRNADisease |
lncRNA and disease database |
||
LOVD |
Leiden open (source) Variation Database |
||
MalaCards |
Human maladies and their annotations |
||
MethHC |
Database of DNA methylation and gene expression in human cancer |
||
MethyCancer |
Database of human DNA Methylation and cancer |
||
miR2Disease |
Database for miRNA deregulation in human disease |
||
MITOMAP |
Polymorphisms and mutations in human mitochondrial DNA |
||
NHGRI GWAS Catalog |
Curated resource of SNP-trait associations |
||
OMIM |
Online Mendelian Inheritance in Man |
||
T2D@ZJU |
Connections associated with type 2 diabetes |
||
TCGA |
The Cancer Genome Atlas |
||
Universal Mutation Database |
Locus-specific database |
||
ViRBase |
Virus–host ncRNA associated interactions |
||
GO |
Gene ontology |
Standard and ontology |
|
HGNC |
Database of human gene names |
||
Europe PMC |
Literature database in Europe |
Literature |
|
PubMed |
Database of biomedical literature from MEDLINE |
||
PubMed Central |
Free full-text literature archive |