ZFIN Zebrafish Nomenclature Conventions
Based on Trends in Genetics Genetic Nomenclature Guide (1998)
1. GENE NAMES AND SYMBOLS
Full gene names are lowercase italic, and gene symbols are three or more lowercase letters and are also italicized. The letters should be unique with respect to other named zebrafish mutants and genes. Gene symbols should not be the same as gene abbreviations in mouse or human, except in cases of established orthology, where the gene symbol should match that of the orthologue. Zebrafish gene designations should not include any reference to species, for example d, dr, z or zf. The use of punctuation such as period and hyphens in gene names or symbols is discouraged, except under specific circumstances described below.
Gene names should be registered at ZFIN.
1.1. Gene Nomenclature
Genes should be named after the mammalian orthologue whenever possible. When mammalian orthologues are known, the same name and abbreviation should be used, except all letters are italicized and lower case. Members of a gene family are sequentially numbered.
Examples:
Names - engrailed 1a, engrailed 2b
Symbols - eng1a, eng2b
In some cases when a zebrafish gene has been renamed to the mammalian orthologue from an older zebrafish name, it is still preferable within a publication to refer to the previous name. Refer to the previous name by appending the previous name in parentheses. Previous names are searchable at ZFIN.
Examples: shha (syu), bmp2b (swr)
1.2. Duplicated genes The zebrafish genome contains duplicated segments that resulted from a genome-wide duplication in the ray fin fish lineage after it diverged from the lobe fin lineage (that included avian and mammalian species). For this reason, zebrafish often have two copies of a gene that is present as a single copy in mammals.
In these cases, symbols for the two zebrafish genes should be the same as the approved symbol of the human or mouse orthologue followed by "a" or "b" to indicate that they are duplicate copies. Before these symbols are assigned, it is important to provide evidence by mapping that the two copies reside on duplicated chromosome segments. It is preferable that all copies in one of the duplicate chromosome segments use the same "a" or "b" suffix, although this will not always be possible for historical reasons. The a or b suffix does not indicate primacy of publication and will be assigned purely based on the suffix of the surrounding genes. This terminology should not be used for duplicates that resulted prior to the divergence of ray fin and lobe fin fish. In these cases it is preferable to use terminology that is most consistent with the mammalian nomenclature.
Examples: hoxa13a, hoxa13b
In some cases when there is a unique mammalian orthologue, but addition of the a, b suffixes would conflict with a different mammalian gene symbol, then numerical suffixes .1, .2 should be appended to the orthologous mammalian gene symbol instead of a, b.
Tandem duplicate gene, with a single mammalian orthologue should have gene symbols appended with a .1, .2, using the same symbol as the mammalian orthologue. The gene name should include the words, "tandem duplicate".
Examples: alkaline phosphatase, intestinal, tandem duplicate 1 (alpi.1) and alkaline phosphatase, intestinal, tandem duplicate 2 (alpi.2)
When mammalian gene duplications prevent identification of a unique mammalian orthologue, then an alternate gene symbol should be chosen. A possible choice would be an approved gene symbol from a unique non-mammalian orthologue. When a gene is homologous to a human gene, but orthology is ambiguous, the gene should be named after the closest mammalian homologue with the word 'like' appended to the name of the homologue. In some cases, a gene family described in zebrafish is homologous to a mammalian gene family but the evolution of the gene family is ambiguous. Under these circumstances the zebrafish gene family should be named with the same stem as the mammalian gene family with the gene number beginning after the end of the mammalian numbering and continuing sequentially throughout the gene family. If the members of the gene family are on the same chromosome, the adjacent genes should be given sequential numbers.
1.3. Mutant loci with unidentified genes Mutant loci for which the gene has not yet been identified are given placeholder gene names. When the gene is identified, it is renamed following standard nomenclature guidelines as described above. Genes identified by mutation are typically named to reflect the mutant phenotype. The symbol should be derived from the full name. Numbers should generally not be used in naming a mutant.
Example: touchy feely, tuf
Mutant names should be registered at ZFIN.
1.4. Genes identified only by genomic sequencing projects
Large-scale genome sequencing projects use a variety of prediction methods to identify both open reading frames and genes. Some of these genes are already known, while others are new. Novel genes identified by these means often cannot be identified and are assigned a name comprised of a prefix, a clone name, and an integer. The prefix is used to specify the research institution that identified the gene (e.g., "si" for the Sanger Institute). A colon separates the prefix from the clone identifier. In many cases, there are multiple predicted reading frames in a single clone. These genes are distinguished with a full stop (period) between the clone name and an integer. Integers are assigned to genes in the clone as they are identified and do not indicate the order of genes. If part of a gene is found in more than one clone, the name of the first clone in which the 5' portion of the gene is found takes precedence.
Examples: si:bz3c13.1, si:bz3c13.2, si:bz3c13.3
Genes initially identified by genomic sequencing projects are renamed using standard nomenclature guidelines (described above) as more information about them becomes available.
1.5. Genes identified only by other large scale projects
Large-scale sequencing of ESTs or full length cDNA clone sets often result in large numbers of unidentified genes. These are given placeholder names with the project prefix, a colon and a clone number, similar to genes identified by genomic sequencing projects. In these cases, the clones usually contain only one or a fragment of a single gene.
Examples: im:7044540, zgc:165514
1.6. Transcript variants
Transcript variants that originate from the same gene are not normally given different gene symbols and names. However, variants from a single gene can be distinguished in publications by adding to the end of the full name a comma, "transcript variant", and a serial number; and by adding to the end of the symbol an underscore, "tv", and a serial number.
Examples:
Names -myosin VIa, transcript variant 1, myosin VIa, transcript variant 2,
Symbols -myo6a_tv1 myo6a_tv2
1.7 Pseudogenes
Pseudogenes are sequences that are generally untranscribed and untranslated and which have high homology to identified genes . However, it has recently been shown that in different organisms or tissues functional activation may occur. Pseudogenes will be assigned the next number in the relevant symbol series, suffixed by a "p" for pseudogene e.g. prf1.9p is the symbol for "perforin 1.9, pseudogene".
2. PROTEINS
The protein symbol is the same as the gene symbol, but non-italic and the first letter is uppercase.
Examples: Ndrw, Brs, Eng1a, Eng2b, Ntl
Note the differences between zebrafish and mammalian naming conventions:
species / gene / protein
zebrafish /shha/ Shha
human / SHH / SHH
mouse / Shh / SHH
In publications, it is sometimes convenient to refer to a protein which has been renamed based on orthology using the more commonly known name in parentheses following the current name.
Examples: Shha (Syu), Bmp2b (Swr)
3. ALLELES and GENOTYPES
3.1 Line designations
When describing genes wild-type alleles are indicated using a superscript "+", while mutant alleles are indicated using a superscript line designation. Line designations are composed of a institution-specific designation followed by a number. The full list of institution designations can be found at ZFIN.
Institute specific line designations should be two or three letters in length, preferably two letters. These designations should not be the same as a gene name in mouse or human. The institution designation should be followed by a unique number specific to a particular line. Other letters should not immediately follow the institution designation but may be appended to the end of the line designation to make it unique. Line designations should only contain alphanumeric characters. Dominant and Semi-dominant alleles have a d in the first position of the line designation to distinguish them from recessive alleles. Semi-dominant is defined as the situation when the phenotype of the mutant phenotype in a mutant-allele/wild-type allele heterozygote is less severe than the mutant-allele homozygote. This means that the letter 'd' cannot begin an institution designation. Line designations for transgenic lines follow these same rules, so the same number cannot be give to both a transgenic line and a mutant allele.
Examples: "b" is the Eugene designation; "m" is for MGH, Boston; "t" is Tuebingen, Germany
wild type: lof , ndr2 , brs +
mutant: lof dt2 , ndr2 b16 , ndr2 m101 , ndr2 t219
3.2 Genotype nomenclature for publications
Heterozygotes and homozygotes in a single locus are depicted by having each allele separated by a slash "/".
Examples:
ednrb1a b140 / ednrb1a + (heterozygote, can be abbreviated ednrb1a b140/+ )