Gene-select help

Gene-select description

Gene-select is a tool for picking genes, which are likely to be responsible for a pathological condition or phenotype features.

Gene-select compares genotypes of an affected person or group of persons with genotypes of healthy individuals from the 1000 Genomes Project. Variation patterns of each gene are compared by maximal likelihood method. Significance value is calculated in a permutation test. List of genes with significant difference between affected and unaffected populations are output with brief descriptions of genes, sorted by P-value. Variations, which contribute to gene difference, are listed for each output gene.

User can specify inheritance mode: dominant or recessive. In recessive mode, at least two heterozygous or one homozygous variants are needed to refer to the gene as different in affected and unaffected populations. In dominant mode, one heterozygous variant is enough.

Only missense and nonsense SNPs are processed in the current version. Non-point polymorphisms and any variations in non-coding regions are ignored.

    Supported file formats:
  1. VCF file format (http://www.1000genomes.org/node/101). A VCF file contains meta-information lines (starts with ##), a header line (start with #CHROM), and then data lines each containing information about a position in the genome. First column is chromosome, second is position, fourth is reference nucleotide, fifth is an observed non-reference nucleotide or a list of such nucleotides. Tenth and further columns contain genotype information, encoded as alleles values separated by either of “/” (for unphased genotype) or “|” (for phased genotype). In the current version, all genotypes are considered to be unphased. The allele values are 0 for the reference nucleotide, 1 for the first nucleotides listed in fifth column, 2 for the second nucleotide from fifth column and so on.
  2. ANNOVAR file format (http://www.openbioinformatics.org/annovar/annovar_input.html) with heterozygosity status (“hetero” or “het” status mean a heterozygous variation, “hom” or “homo” mean a homozygous variant). First six columns: chromosome, start position, end position, reference nucleotide, observed nucleotide and heterozygosity status. For a SNP, the start position and the end position are the same. Other columns are optional and are ignored.
  3. 23andme file format (http://snpedia.com/index.php/23andMe). Header lines starts with #. Four columns: ID, chromosome, position, observed nucleotides.

All these formats are tab-delimited, but any sequence of tabulations or spaces is considered as a column separator. The reference nucleotide for a variation is taken from the reference genome sequence. Please note that all positions must correspond to GRCh37/hg19 genome assembly and base numbering starts from 1.

ANNOVAR and 23andme files can store only genotype of one person, a VCF file can contain genotype information of one or several persons.

This is a short example of input data:

1       9323916 9323916 G       A       hetero     
1       9323991 9323991 T       C       hetero
1       9009352 9009352 A       T       hetero
1       9640291 9640291 T       A       hetero
1       11087677        11087677        G       A       homo
1       11766425        11766425        C       A       homo

Output with dominant inheritance mode:

P-value = 9.139049e-04
Gene ID: uc001apt.3
Description: Homo sapiens hexose-6-phosphate dehydrogenase (glucose 1-dehydrogenase) (H6PD), mRNA.
Type of gene: Protein coding
Clinical significance: polycystic ovary syndrome; Defects in H6PD are a cause of cortisone reductase deficiency (CRD); 
Key variations:
  chromosome 1, position 9323916
  chromosome 1, position 9323991

P-value = 9.139049e-04
Gene ID: uc001apw.3
Description: Homo sapiens solute carrier family 25, member 33 (SLC25A33), nuclear gene encoding mitochondrial protein, mRNA.
Type of gene: Protein coding
Clinical significance: unknown
Key variations:
  chromosome 1, position 9640291

P-value = 9.139049e-04
Gene ID: uc001asr.1
Description: Homo sapiens chromosome 1 open reading frame 187 (C1orf187), mRNA.
Type of gene: Protein coding
Clinical significance: unknown
Key variations:
  chromosome 1, position 11766425

Output with recessive inheritance mode:

P-value = 9.139049e-04
Gene ID: uc001apt.3
Description: Homo sapiens hexose-6-phosphate dehydrogenase (glucose 1-dehydrogenase) (H6PD), mRNA.
Type of gene: Protein coding
Clinical significance: polycystic ovary syndrome; Defects in H6PD are a cause of cortisone reductase deficiency (CRD); 
Key variations:
  chromosome 1, position 9323916
  chromosome 1, position 9323991

P-value = 9.139049e-04
Gene ID: uc001asr.1
Description: Homo sapiens chromosome 1 open reading frame 187 (C1orf187), mRNA.
Type of gene: Protein coding
Clinical significance: unknown
Key variations:
  chromosome 1, position 11766425