This mirror
was last updated on
03/02/2011

What's in a GeneCard?


This page provides information about the various GeneCards sections and tables.

General Comments

GeneCards Categories

GeneCard Header

This section provides the gene's symbol, category, GIFtS score (see below), and GCid in the box on the left hand side.
Each gene category has its distinct color: protein-coding, pseudogene, RNA gene, gene cluster, genetic locus, and uncategorized.
The gene's symbol and GCid are the color of the gene's category.
The background color of the box that contains the gene's symbol and GCid is indicative of which database the symbol is from:HGNC Approved Genes, EntrezGene Database or Ensembl Gene Database.
The header also contains a short description of the gene, and whether or not the gene symbol is HUGO Gene Nomenclature Committee (HGNC) database approved.


GeneCards Inferred Functionality Scores (GIFtS)

The GIFtS algorithm uses the wealth of GeneCards annotations to produce scores aimed at predicting the degree of a gene's functionality. Since the degree of known functionality is correlated with the amount of research done on a particular gene or its product, we use these annotations in a scoring system aimed at inferring functionality. Note that while the accumulation of data for a specific gene in certain databases is merely correlated with functionality, many GeneCards sources, like the Gene Ontology (GO) Consortium and Genatlas provide definitive information about functionality.

Our goal is to use these two types of annotations in order to measure the functionality of GeneCards genes. Our first step, was to produce for each gene, a binary vector of 67 elements , indicating presence or absence of data in each relevant source. The GIFtS score of a particular gene is a percentage which is derived from the sum of these binary values divided by the number of sources (the vector length).

Improved GIFtS includes experimenting with increased resolution by using sub-sectioning of data sources and adjusting scores based on the presence or absence of detailed annotations within a source (currently SwissProt). In addition we have introduced weights related to the quantitative aspects of annotations items, enabling better evaluation of the data relevant to annotation levels (currently orthologs and publications). In order to enrich GIFtS with respect to protein data, we selected the pivotal bioinformatics source for such data, namely SwissProt, and dissected it into 6 sub sources: protein subunit, sub cellular location, post-translational modification, function, catalytic activity, and other. Each of these subfields received a binary score as described above, thereby increasing the GIFtS vector size by 5. To weight proteins effectively in the new vectors, the sum of the binary data was still divided by the original number of sources (with SwissProt treated as 1 source for this denominator, in spite of its sub sources contributions to the numerator). To enrich GIFtS by orthologs or publications data, we define a new score for each of those components, which is then added to the default GIFtS. Specifically, the orthologs and publications scores for each gene are calculated as round (logxsum(i)), where x equals 3 for orthologs and 5 for publications, and sum(i) is the number of relevant orthologs or publications. Genes with no orthologs or publications receive score of zero for the relevant component(s); scores rounded down to 0 (for low counts) are normalized to 1.

GeneCards Sections

Aliases & Descriptions

This section displays synonyms and aliases for the relevant GeneCards gene, as extracted from OMIM, HGNC, Entrez Gene, UniProtKB (Swiss-Prot/TrEMBL), GeneLoc, and Ensembl. Also shown are accessions from HGNC, EntrezGene, UniProtKB, and/or Ensembl, and previous GC identifiers where relevant (for cases that GeneLoc deems it necessary to assign a new identifier to a gene based on updated information about its chromosomal location). Such GC ids will always remain with their original genes and will not be reused with other symbols.

Summaries

This section displays descriptions of a gene's function, cellular localization and a gene's effect on phenotype for the relevant GeneCards gene, as extracted from Entrez Gene, UniProtKB (UniprotKB/Swiss-Prot/UniprotKB/TrEMBL), Tocris Bioscience, and Gene Wiki.

Genomic Views

This section displays the chromosome, cytogenetic band and map location of the GeneCards gene as extracted from GeneLoc, HGNC, Entrez Gene, Nature (405, 311-319) and miRBase, as well as genomic views from UCSC and Ensembl, and links to transcription factor binding sites and Pyrosequencing assays at Qiagen and/or SABiosciences. The GeneLoc integrated location is shown in red on the image. If this differs from the location provided by Entrez Gene and/or Ensembl, their locations are shown on the image in green and/or blue respectively. Also provided are links to the GeneLoc gene density information for this gene's chromosome, which shows the number of genes in each 1 Mb interval along the chromosome, and to detailed exon information as provided by GeneLoc.

Proteins

This section provides annotated information of the proteins encoded by GeneCards genes according to UniProtKB, neXtProt, and/or Ensembl, the capability to view phosphorylation sites using Phosphosite, reference sequences (RefSeq) according to NCBI, cellular component ontologies visualized by the Gene Ontology Consortium (more information), links for ordering antibodies from Millipore, Cell Signaling Technology, OriGene, GenScript, Novus Biologicals, Sigma-Aldrich, R&D Systems, and/or Epitomics, recombinant proteins from Millipore, Sigma-Aldrich, R&D Systems, Enzo Life Sciences, Novus Biologicals, OriGene, GenScript, Sino Biological, and/or ProSpec, and assays from Millipore, Cell Signaling Technology, R&D Systems, OriGene, GenScript, Enzo Life Sciences, Sigma-Aldrich, and/or Uscn. Direct links to three-dimensional visualization of PDB structures provided by the OCA browser and Proteopedia. Visualizations are also provided via the (3D) for OCA Browser or the Proteopedia symbol hyperlink shown next to each PDB identifier.
Genes with similar ontologies can be seen using GeneDecks Partner Hunter (more information)

Protein Domains/Families

This section provides annotated information about protein domains and families according to InterPro, ProtoNet, UniProtKB and Blocks.
Genes with similar domains can be seen using GeneDecks Partner Hunter (more information)

Gene Function

This section provides annotated information about gene function according to MGI, UniProtKB IUBMB, and Genatlas, including: shRNA from OriGene and Sigma-Aldrich, siRNAs from Sigma-Aldrich OriGene, and Qiagen, RNAi products from Millipore, microRNA from Sigma-Aldrich, Qiagen, and SABiosciences, Clones from Sigma-Aldrich, OriGene, GenScript, and Sino Biological, Cell Lines from GenScript, as well as molecular function ontologies visualized by the Gene Ontology Consortium (more information).
Genes with similar ontologies can be seen using GeneDecks Partner Hunter (more information)
Information from MGI includes phenotypes for mouse orthologs and a popup table with information on phenotypic alleles of the orthologs. This table presents the following columns:

Genes with similar phenotypes can be seen using GeneDecks Partner Hunter (more information)

Pathways & Interactions

This section provides links to pathways, interactions, and PCR arrays according to information extracted from the Kyoto Encyclopedia of Genes and Genomes (KEGG), Cell Signaling Technology, Millipore, Sigma-Aldrich, SABiosciences, UniProtKB, String and MINT, as well as biological process ontologies visualized by the Gene Ontology Consortium (more information).
Genes with similar ontologies and those in the same pathways can be seen using GeneDecks Partner Hunter (more information)
Links to the SABiosciences Gene Network Central interacting genes and proteins network and the Sigma-Aldrich "Your Favorite Gene" Molecular Interaction Network for the relevant gene are also provided.

Interacting proteins

Each line in this table represents one interacting protein, according to EBI-IntAct, MINT and/or String. The following columns are presented:

Drugs & Chemical Compounds

This section provides relationships between GeneCards genes and both chemical compounds and drugs, as well as links to drugs and compounds for ordering at Sigma-Aldrich, Enzo Life Sciences and Tocris Biosciences. Chemical compound relationships are from Novoseek. Drug compound relationships are from PharmGKB. Pharmaceutical uses are provided by UniProtKB.

Tocris compounds and pharmacological data.

This table presents the following columns:

Novoseek chemical compound relationships.

This table presents the following columns:

PharmGKB drug compound relationships.

This table presents the following columns:

Genes with similar drug and compound relationships can be seen using GeneDecks Partner Hunter (more information)

Transcripts

This section contains associated Unigene clusters and repesentative sequences, REFSEQ mRNAs, non coding RNAs from RNAdb, siRNAs from Sigma-Aldrich, OriGene, and Qiagen, RNAi products from Millipore, shRNA from OriGene and Sigma-Aldrich, microRNA from Sigma-Aldrich, Qiagen, and SABiosciences, Clones from Sigma-Aldrich, OriGene, GenScript, and Sino Biological, Primers from OriGene and/or SABiosciences, Assemblies (sorted by a scoring scheme that gives preferences to mRNAs over EST associations), GeneTide. Highest scoring ESTs, transcript and alignment information from AceView. Additional gene/cDNA sequences from GenBank, exon structure information from GeneLoc, alternative splicing information, and transcript links to Ensembl.

Alternative Splicing

This subsection contains alternative splicing information according to ASD followed by alternative splicing isoforms from ECgene. Exons with alternative splice sites in different isoforms were broken into Exonic Units (ExUns). The letters indicate the order of the ExUns in the exon. The symbol ' ^ ' between ExUns indicates an intron, while ' ' indicates the junction of two ExUns. Mouseovers on the dark blue squares show the Exun's genomic coordinates, while mouseovers on the light blue squares show its transcript coordinates. When showing ASD's splice variants, GeneCards subtracts the 3000 bp flank that ASD adds to the transcript coordinates.

Expression

This section contains links to experimental results from GeneNote, probeset-to-gene annotations from GeneAnnot and GeneTide, electronic Northern data images and clone count from UniGene, SAGE expression data images and tag counts based on data extracted from CGAP and the Genomics Institute of the Novartis Research Foundation (GNF) BioGPS, followed by links to SOURCE, and/or EXPOLDB, Primers from OriGene and/or SABiosciences, and/or tissue specificity data from UniProtKB. Expression for PCR Arrays from SABiosciences.
Genes with similar binary patterns can be seen using GeneDecks Partner Hunter (more information)

An association of GeneCards genes to Affymetrix probe-sets, through GeneAnnot and GeneTide is presented in a table.

Other columns include data from GeneAnnot and GeneTide, where an asterisk next to the probe set name indicates lower quality annotation, as follows:

GeneAnnot GeneNote GeneTide

After the table, 3 images, for GeneNote/GNF, electronic Northern, and SAGE tissue expression data respectively, are presented:

GeneNote / GNF Normal / GNF Cancer Expression array images

Experimental tissue vectors: Duplicate measurements were obtained for twelve normal human tissues (out of 28 tissues shown) hybridized against Affymetrix GeneChips HG-U95A-E (GeneNote data) and for 22 normal human tissues hybridized against HG-U133A (GNF data **). The intensity values (shown on the y-axis) were first averaged between duplicates, then probeset values were averaged per gene, global median-normalized and scaled to have the same median of about 70 (half-way between GeneNote and GNF medians). Available at GNF BioGPS, HG-U133A expression data for 18 NCI60 cancer cell lines was processed and added to the display (a single measurement taken; normalized according to the GNF normal data). The correspondence between cell lines and tissues is given in a table below :

Tissue Legend
TISSUE NC160 COMMENT
Kidney 786-0 kidney
Heart A204 rhabdomyosarcoma
Lung A549 lung
SalivaryGland ACC3 salivary_gland
Prostate ALVA31 prostate
Thymus CCRT_CEM "blood, T cell leukemia"
WholeBlood DAUDI "blood, lymphoma"
Colon HCT116 colon
Cervix HELA cervix
Liver HEPG2 liver
Spleen HL60 "blood, B cell leukemia"
Breast MDA_MB231 breast
Ovary OVCAR3 ovary
Pancreas PANC1 pancreas
BoneMarrow SAOS2 osteosarcoma
Brain SF268 glioblastoma
Skin SKMEL28 melanoma
Bladder T24 bladder


Note that the diamonds along the x-axis of each graph hint that the tissue (cell line) expression values are available for a given gene, while empty "diamonds" tell: either that there is no such tissue for a specific microarray platform (SAGE/e-Northern), or the current gene has no matching probesets on the microarray (or tags/ESTs for SAGE/e-Northern). If there is a filled diamond along the x-axis but no data shown in the graph it indicates that after thresholding and normalization there is no meaningful expression data for that tissue.
Normalized intensities are drawn on a root scale, which is an intermediate between log and linear scales. The Affymetrix MAS5 algorithm was used for array processing.
** Reference: Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A. 2004 Apr 20;101(16):6062-7

UniGene - electronic Northern Normal / eNorthern Cancer

Electronic Northern: For the shown set of non-fetal normal and cancer human tissues, NCBI's Unigene dataset (Hs.data) is mined for information about the number of unique clones per gene per tissue. Clones are assigned to particular tissues by applying data-mining heuristics to Unigene's library information file (Hs.lib.info). Electronic expression results were calculated by dividing the number of clones per gene by the number of clones per tissue. They were then normalized by multiplying by 1M, and the obtained normalized counts are presented on the same root scale as the experimental tissue vectors.

CGAP:SAGE Normal / SAGE Cancer

Serial Analysis of Gene Expression: For the same set of normal and cancer human tissues, CGAP datasets Hs.frequencies and Hs.libraries are mined for information about the number of SAGE tags per tissue. Tags are reassigned to a Unigene cluster and after that to a particular gene by mining Hs.best_gene, Hs.best_tag and Hs_GeneData. The expression level of a particular gene in a particular tissue was calculated as the number of appearances of the corresponding tag divided by the total number of tags in libraries derived from that tissue. These fractions were then normalized by multiplying by 1.2M and the obtained normalized counts are presented on the same root scale as that used for the electronic Northern pictures. Please note: Currently, only associations with minimal ambiguity participate in the analysis.

Orthologs

This section contains Orthologs from HomoloGene, euGenes, SGD, and MGD, with possible further links to Flybase and WormBase.

The table presents the following columns:

Upon clicking the "Species with no ortholog" link, a pop-up window appears. It lists the species that do NOT have an ortholog to the relevant gene.

Superscripts represent the source from which this data was extracted. Data from HomoloGene can have one of two superscripts. If the second one is cited, it means that data for this species exists only in the older version of HomoloGene, which used unfinished genomes and where the homologs found might not be true orthologs.

Following the table is a link to Ensembl gene trees.

Paralogs

This section contains Paralogs from HomoloGene and Ensembl , and Pseudogenes from Pseudogene.org. Genes with similar paralogs can be seen using GeneDecks Partner Hunter (more information)

Genomic Variants

This section contains SNPs/Variants from the NCBI SNP Database, Ensembl, and PupaSUITE/ PupaSNP, with descriptions from UniProtKB, Linkage Disequilibrium images from HapMap, Structural Variations (CNVs/InDels/Inversions) from the Database of Genomic Variants, and PCR Resequencing Primers from Qiagen.

NCBI SNPs

SNP information is currently extracted from dbSNP XML files. Filtering is done to include only those that are not artifacts, not connected to gene duplication, not withdrawn by NCBI, fully specified, without ambiguous locations or low map quality, and having single Entrez Gene and contig ids. The order of a gene's displayed SNPs can be determined by the user. By default, SNPs are sorted first (shown in the select box as 1st) by validation status (validated before non-validated), then, within these groups, by ordered location type (first coding non-synonymous, then coding synonymous, followed by coding, splice site, mRNA-UTR, intron, locus, reference, and/or exception), as the secondary (2nd) nested criterion, and finally, by the number of validations (up to 4). The user can change this default sort order and define up to three hierarchical sorting priorities from fields available as select boxes above the relevant columns on the section's button line as follows: rs-numbers (sorted in ascending order), validation status, position on the chromosome (ascending order), location type, allele frequencies (existing info before non-existing), population types (alphabetical order), and total sample size (largest to smallest). Each displayed line includes genomic, expression, and allele frequency data sections. Only the summary is shown for the expression and allele frequency sections, with a link to the detailed information (via the magnifying glass icon).

This table presents the following columns:

Additional columns in Expression data popup:

Additional columns in Allele Frequency data popup:

This section also provides Linkage Disequilibrium (LD) information from HapMap.

Disorders & Mutations

This section contains Disorders & Mutations in which GeneCards genes are involved, according to OMIM, UniProtKB, Novoseek, PharmGKB, Genatlas, BGMUT, GeneTests, the Human Genome Variation Society's Locus Specific Mutation Databases (LSDB), HGMD, GAD, HuGENet, BCGD, and/or TGDB.

Novoseek disease relationships

This table presents the following columns:

PharmGKB disease relationships

This table presents the following columns:

Genes with similar disease relationships can be seen using GeneDecks Partner Hunter (more information)

Medical News

This section provides links to possibly related articles in Doctor's Guide.

Publications

This section provides titles of and links to research articles in PubMed, as associated via Novoseek, HGNC, Entrez Gene, UniProtKB, PharmGKB, and/or GAD.

The articles are ranked, first according to the number of GeneCards sources that associate the article with this gene, then by date of publication, and then according to the Novoseek score for this article/gene relationship. The year of publication appears in parentheses after the title of each article. Lower ranked articles may also appear in partial results if their titles or authors contain your search term.

External Searches

This section allows the user to search PubMed, OMIM, or NCBI Bookshelf. The current gene's aliases and disorders are provided, as well as the search string that led to the gene, to be used as search fodder. The user can also add new search terms.

How To Search: The search box allows the user to search for aliases and/or free text in either PubMed, OMIM or NCBI Bookshelf. If you wish to simply search for a variety of aliases, select each aliases while holding down the control key. This type of search will search for any of the aliases, if you wish to search for all aliases selected you must go to the free text box (next to the search button) and change all of the OR's to AND's, manually. You may also enter free text and search for the aliases selected AND/OR (use radio buttons to the left of the box to select this) the free text. Once again, if you would like to only find documents that have all of the aliases selected you must change the OR's to AND's in the Query String box.

Databases

These sections provide links to the GeneCards genes in other databases:

Intellectual Property

This section features Patent information from GeneIP and technologies that are available for licensing. Institutions currently featured include the Weizmann Institute of Science, the Salk Institute for Biological Studies, and Tufts University. Also included in this section is IP news from XenneX, Inc.

Products

This section provides links to reagents available from Millipore, and/or R&D Systems, proteins, lysates, and/or antibodies available from Cell Signaling Technology, Millipore, R&D Systems, Sigma-Aldrich, OriGene, GenScript, Novus Biologicals, Epitomics, and/or ProSpec, drugs and compounds available from Tocris Biosciences, Enzo Life Sciences, and/or Sigma-Aldrich, clones and/or primers available from Sigma-Aldrich, OriGene, Qiagen, GenScript, SABiosciences, and/or Sino Biological, and GPCR/Kinase Profiling, Assay development, GPCR & ELISA assays available from GenScript, R&D Systems, Sigma-Aldrich, and/or Uscn.


Gene Ontology (GO) Tables

The Gene Ontology sections in Proteins, Gene Function, and Pathways & Interactions display a table with the following columns:

GeneDecks Partner Hunter

GeneDecks Partner Hunter is available for ontologies, phenotypes, drugs and compounds, sequence-based paralogs, disorders, pathways, binary patterns, and domains. By clicking on the GeneDecks Partner Hunter button for a particular section, one arrives at the GeneDecks home page, where the gene name has been entered and the appropriate fields selected from the attribute list. From this page, changes can be made to the data requested. Submitting this form brings up a result page containing a list of genes similar to the chosen gene and their descriptions.

Selected Algorithms

Novoseek Scoring Algorithm

The relevance scores of elements related to genes (chemical substances and diseases) are based on the analysis of co-occurrences of two elements in Medline documents. The observed number of documents where both elements appear together and the number of documents where both appear independently are compared to an expected value based on a hypergeometric distribution. The more co-occurrences are observed in relation to the number expected the more unlikely it is that this happened by chance and the higher will be the value. Unfortunately the absolute numbers are not meaningful but can only give an order of importance (i.e. in the list of chemicals related to a gene the order is meaningful and the first chemicals in the list are, statistically, stronger related to the gene than the following ones but the absolute values of the scores may change from one release to another).
















Developed at the Crown Human Genome Center, Department of Molecular Genetics, the Weizmann Institute of Science

hostname: www-ab3 db genecards_305 index build: 81e solr: 1.4