ZFIN Zebrafish Nomenclature Conventions

Based on Trends in Genetics Genetic Nomenclature Guide (1998)


The current nomenclature guidelines are updates to rules established during a discussion session at a meeting in Ringberg, Germany, in March 1992, and are widely accepted by most zebrafish labs.

Zebrafish Nomenclature Committee(ZNC)


Please use one of our submission forms to propose a new name for a gene or mutant and to provide supporting information. Your submission will be sent to the ZFIN nomenclature coordinator for review and will be treated in confidence

Submit a Proposed GENE Name
Submit a Proposed LOCUS/LINE Name


A Tutorial for Proposing Zebrafish Gene Nomenclature
Laboratory Line Designations

Other nomenclature guidelines: Human,
Mouse, Fly (Drosophila), Yeast (Saccharomyces), Gene families


Full gene names are lowercase italic, and gene symbols are three or more lowercase letters and are also italicized. The letters should be unique with respect to other named zebrafish mutants and genes.  Gene symbols should not be the same as gene abbreviations in mouse or human, except in cases of established orthology, where the gene symbol should match that of the orthologue. Zebrafish gene designations should not include any reference to species, for example d, dr, z or zf. The use of punctuation such as period and hyphens in gene names or symbols is discouraged, except under specific circumstances described below.
Gene names should be registered at ZFIN.

1.1. Gene Nomenclature
Genes should be named after the mammalian orthologue whenever possible. When mammalian orthologues are known, the same name and abbreviation should be used, except all letters are italicized and lower case. Members of a gene family are sequentially numbered.

      Names - engrailed 1a, engrailed 2b
      Symbols - eng1a, eng2b

In some cases when a zebrafish gene has been renamed to the mammalian orthologue from an older zebrafish name, it is still preferable within a publication to refer to the previous name. Refer to the previous name by appending the previous name in parentheses. Previous names are searchable at ZFIN.

     Examples: shha (syu), bmp2b (swr)

1.2. Duplicated genes The zebrafish genome contains duplicated segments that resulted from a genome-wide duplication in the ray fin fish lineage after it diverged from the lobe fin lineage (that included avian and mammalian species). For this reason, zebrafish often have two copies of a gene that is present as a single copy in mammals.

In these cases, symbols for the two zebrafish genes should be the same as the approved symbol of the human or mouse orthologue followed by "a" or "b" to indicate that they are duplicate copies. Before these symbols are assigned, it is important to provide evidence by mapping that the two copies reside on duplicated chromosome segments. It is preferable that all copies in one of the duplicate chromosome segments use the same "a" or "b" suffix, although this will not always be possible for historical reasons. The a or b suffix does not indicate primacy of publication and will be assigned purely based on the suffix of the surrounding genes. This terminology should not be used for duplicates that resulted prior to the divergence of ray fin and lobe fin fish. In these cases it is preferable to use terminology that is most consistent with the mammalian nomenclature.

     Examples: hoxa13a, hoxa13b

In some cases when there is a unique mammalian orthologue, but addition of the a, b suffixes would conflict with a different mammalian gene symbol, then numerical suffixes .1, .2 should be appended to the orthologous mammalian gene symbol instead of a, b.

Tandem duplicate gene, with a single mammalian orthologue should have gene symbols appended with a .1, .2, using the same symbol as the mammalian orthologue.  The gene name should include the words, "tandem duplicate".

      Examples: alkaline phosphatase, intestinal, tandem duplicate 1 (alpi.1) and alkaline phosphatase, intestinal, tandem duplicate 2 (alpi.2)

When mammalian gene duplications prevent identification of a unique mammalian orthologue, then an alternate gene symbol should be chosen. A possible choice would be an approved gene symbol from a unique non-mammalian orthologue. When a gene is homologous to a human gene, but orthology is ambiguous, the gene should be named after the closest mammalian homologue with the word 'like' appended to the name of the homologue. In some cases, a gene family described in zebrafish is homologous to a mammalian gene family but the evolution of the gene family is ambiguous. Under these circumstances the zebrafish gene family should be named with the same stem as the mammalian gene family with the gene number beginning after the end of the mammalian numbering and continuing sequentially throughout the gene family. If the members of the gene family are on the same chromosome, the adjacent genes should be given sequential numbers.

1.3. Mutant loci with unidentified genes Mutant loci for which the gene has not yet been identified are given placeholder gene names. When the gene is identified, it is renamed following standard nomenclature guidelines as described above. Genes identified by mutation are typically named to reflect the mutant phenotype. The symbol should be derived from the full name. Numbers should generally not be used in naming a mutant.

      Example: touchy feely, tuf

Mutant names should be registered at ZFIN.

1.4. Genes identified only by genomic sequencing projects
Large-scale genome sequencing projects use a variety of prediction methods to identify both open reading frames and genes. Some of these genes are already known, while others are new. Novel genes identified by these means often cannot be identified and are assigned a name comprised of a prefix, a clone name, and an integer. The prefix is used to specify the research institution that identified the gene (e.g., "si" for the Sanger Institute). A colon separates the prefix from the clone identifier. In many cases, there are multiple predicted reading frames in a single clone. These genes are distinguished with a full stop (period) between the clone name and an integer. Integers are assigned to genes in the clone as they are identified and do not indicate the order of genes. If part of a gene is found in more than one clone, the name of the first clone in which the 5' portion of the gene is found takes precedence.

      Examples: si:bz3c13.1, si:bz3c13.2, si:bz3c13.3

Genes initially identified by genomic sequencing projects are renamed using standard nomenclature guidelines (described above) as more information about them becomes available.

1.5. Genes identified only by other large scale projects
Large-scale sequencing of ESTs or full length cDNA clone sets often result in large numbers of unidentified genes. These are given placeholder names with the project prefix, a colon and a clone number, similar to genes identified by genomic sequencing projects. In these cases, the clones usually contain only one or a fragment of a single gene.

      Examples: im:7044540, zgc:165514

1.6. Transcript variants
Transcript variants that originate from the same gene are not normally given different gene symbols and names. However, variants from a single gene can be distinguished in publications by adding to the end of the full name a comma, "transcript variant", and a serial number; and by adding to the end of the symbol an underscore, "tv", and a serial number.

      Names -myosin VIa, transcript variant 1, myosin VIa, transcript variant 2,
      Symbols -myo6a_tv1 myo6a_tv2

1.7 Pseudogenes

Pseudogenes are sequences that are generally untranscribed and untranslated and which have high homology to identified genes . However, it has recently been shown that in different organisms or tissues functional activation may occur.  Pseudogenes will be assigned the next number in the relevant symbol series, suffixed by a "p" for pseudogene  e.g. prf1.9p is the symbol for "perforin 1.9, pseudogene".


The protein symbol is the same as the gene symbol, but non-italic and the first letter is uppercase.

     Examples: Ndrw, Brs, Eng1a, Eng2b, Ntl

Note the differences between zebrafish and mammalian naming conventions:

species / gene / protein
zebrafish /shha/ Shha
human / SHH / SHH
mouse / Shh / SHH

In publications, it is sometimes convenient to refer to a protein which has been renamed based on orthology using the more commonly known name in parentheses following the current name.

     Examples: Shha (Syu), Bmp2b (Swr)


3.1 Line designations

When describing genes wild-type alleles are indicated using a superscript "+", while mutant alleles are indicated using a superscript line designation. Line designations are composed of a institution-specific designation followed by a number. The full list of institution designations can be found at ZFIN.

Institute specific line designations should be two or three letters in length, preferably two letters. These designations should not be the same as a gene name in mouse or human. The institution designation should be followed by a unique number specific to a particular line. Other letters should not immediately follow the institution designation but may be appended to the end of the line designation to make it unique. Line designations should only contain alphanumeric characters. Dominant and Semi-dominant alleles have a d in the first position of the line designation to distinguish them from recessive alleles. Semi-dominant is defined as the situation when the phenotype of the mutant phenotype in a mutant-allele/wild-type allele heterozygote is less severe than the mutant-allele homozygote. This means that the letter 'd' cannot begin an institution designation. Line designations for transgenic lines follow these same rules, so the same number cannot be give to both a transgenic line and a mutant allele.

      Examples: "b" is the Eugene designation; "m" is for MGH, Boston; "t" is Tuebingen, Germany

       wild type: lof , ndr2 , brs +

       mutant: lof dt2 , ndr2 b16 , ndr2 m101 , ndr2 t219

3.2 Genotype nomenclature for publications

Heterozygotes and homozygotes in a single locus are depicted by having each allele separated by a slash "/".


      ednrb1a b140 / ednrb1a + (heterozygote, can be abbreviated ednrb1a b140/+ )     

      ednrb1a b140 / ednrb1a b140 (homozygote, can be abbreviated ednrb1a b140/b140 or ednrb1a b140 )     

For homozygous genotypes involving multiple loci, the genotype at each locus is listed in order according to chromosome number, from 1 to 25, with a semicolon to separate loci on different chromosomes.


       ednrb1a b140 ; slc24a5 b16  

For heterozygous genotypes, loci on homologous chromosomes are separated by a slash.


         fgf3 t21142 /fgf3 t24149 ; slc24a5 b16 /slc24a5 m592

For linked loci, the haplotype on each chromosome is written sequentially, with a space separating syntenic loci.   Loci are placed in the order they appear on the chromosome, top to bottom. Homologous chromosomes are separated by a slash, and non-homologous chromosomes are separated by semicolons.


      ednrb1a b140 cx41.8 t1 ; slc24a5 b16

For unmapped loci, genotypes of unmapped loci are listed alphabetically within braces following genotypes of mapped loci on different chromosomes.


      ednrb1a b140 ; mycbp2 tj236 {edi tl35 } (edi is unmapped, all three loci are written as if they are on different chromosomes)

Poorly resolved loci on same chromosome are listed alphabetically within braces.


      {abcb 000 def m000 } (poorly resolved loci on same chromosome)     ednrb1a b140 {abcb 000 def m000 } cx41.8 t1 (poorly resolved loci in a known interval between mapped loci, all on same chromosome)     

3.3 Genotype displays in ZFIN

Due to technical constraints, genotypes at ZFIN are shown in alphabetical order by gene, and then by allele designation. See below for display of complex genotypes involving transgenic or chromosomal rearrangements.


The chromosome numbering system corresponds to the old Linkage Group designations with what was LG1 now named Chr1. Chromosomes are designated by non-italic numerals, 1 to 25. Reminder: cytogenetically identified chromosome numbers differ from the ‘Chr’ designations used for linkage groups and the reference genome sequence. Chromosome differences have not been observed between males and females in laboratory strains.

      Chr1 to Chr25

Chromosome rearrangements are indicated with the following prefixes, followed by the details within parentheses. See below for specific examples. Common prefixes include:

Df, deficiency
Dp, duplication
In, inversion
Is, insertion
T, translocation
Tg, transgene

4.1. Deficiencies

A deficiency is defined as a deletion that removes or disrupts 2 or more adjacent loci.  Intragenic deletions are not treated as deficiencies, but as small deletions and shall be named as an allele of the disrupted gene (see section 3).

The general format for naming a deficiency is:

Df indicates deficiency. The term xxx should describe the salient features of the deficiency, as determined by the investigator. In cases where the deficiency removes sequences from named genes, the name should contain the standard symbols for those genes. The deleted genes should be listed in order, when known, separated by commas. The line designation should follow standard nomenclature conventions (institution designation followed by line number).

The chromosome where the deficiency maps should be specified by its number (##) using two digits (i.e., 03 for Chr03) so that computers will order them properly.

     Example: Df(Chr12:dlx3b,dlx4b,tbx24)b380

When a gene is disrupted at one of the two breakpoints of the deficiency, please contact the nomenclature coordinator at ZFIN for assistance (nomenclature@zfin.org).

4.2. Translocations
The general format for naming translocations depends upon the type of translocation:

Reciprocal translocations have two separate chromosomal elements, and each element has a distinct name: T(Chr##;Chr##)xxx<line#,##U.##L and T(Chr##;Chr##)xxxline#,##U.##L

T indicates translocation. The elements in the parentheses are the chromosomes involved, the lower numbered chromosome is listed first, and the chromosomes are separated by a semicolon. The chromosomes should be specified by their numbers (##) using two digits (i.e., 03 for Chr03) so that computers will order them properly.

The term xxx should describe some salient feature of the translocation, as determined by the investigator. In cases where the translocation moves a named gene primarily studied by the investigator, xxx would usually be the standard symbol for that gene. Alternatively, xxx could just be an experimental series number.

The line designation should follow standard nomenclature conventions (institution designation followed by line number). After the line designation comes a comma, and then a phrase that indicates the new order of the chromosomes, starting from the top of the chromosome as displayed by convention. The first number (##) is the Chr number, followed by upper case U to indicate the upper arm of a chromosome or by upper case L to indicate the lower arm of a chromosome. The location of the centromere is indicated by a period. No spaces. Translocations are written as an allele of a gene when the gene is disrupted at one of the breakpoints of the translocation. There can be as many as four alleles of a translocation.


     T(Chr02;Chr12)ndr2b2131,02U.12L02L and T(Chr02;Chr12)ndr2b2131,12U.12L02L

     This example illustrates a reciprocal translocation where a portion of the lower arm of Chr12 was translocated interstitially into the proximal lower arm of Chr2 and a portion of the lower arm of Chr2 was translocated to 

     the distal lower arm of Chr12.

Resolved translocations are where the two elements of the translocation separate and a mutant line has just one of the elements. This results in the animal being monosomic for some chromosome regions and trisomic for others. In these cases, the mutant line would be designated with just one of the elements rather than two as in the reciprocal designation above. The allele name would remain the same to indicate their common origin and common breakpoint.



4.3. Transgenic lines and constructs

Transgenic constructs now have their own pages in ZFIN. Transgenic construct names are important because the construct name is used in the transgenic line nomenclature when the insertion is NOT an allele of a gene (see below).

4.3.1 Transgenic constructs

Construct Nomenclature

Tg(regulatory sequence:coding sequence)

Tg indicates transgene. Within the parentheses, the most salient features of the transgene should be described. Brevity and clarity in the transgene name are favored, in general, over exhaustive detail. Regulatory sequences, which can be derived from either an enhancer or promoter, should be listed to the left of the colon. In general, the regulatory sequence is named for the gene from which it was derived or the gene/transcript that it regulates. Coding sequences are placed to the right of the colon. Not all transgenic constructs will have both regulatory and coding elements, and in this case, the colon will not be used. In cases where a construct utilizes sequences from a named gene, it should contain the standard zebrafish lowercase symbol for that gene.  The entire transgene name should be italicized. 

  • Enhancer trap, promoter trap, gene trap constructs : These all use the same nomenclature conventions as described for transgenic constructs, substituting Et, Pt, Gt as necessary.
  • Transgenes with transcripts in constructs: For those cases where a specific transcript or transcript promoter of a gene is used, the transcript number or name should be used. It should be noted that the use of   hyphens here is distinct from the use of hyphens in regulatory or coding sequence fusions as discussed below. The hyphen in transcript names is an integral part of the transcript name and demarcates the transcript number for a gene.

     Example: Tg(pitx2-002:GFP) In this case an internal pitx2 gene promoter that generates the pitx2-002 transcript is driving expression of GFP.

  • Fusions in constructs: Regulatory or coding sequence fusions should be separated by hyphens.

     Example: Tg(actb2:stk11-mCherry)   This construct codes for a fused protein of stk11 and mCherry under the control of the actb2 promoter.

  • Promoter elements of differing sizes in constructs: In cases where a number of constructs are generated with different sizes of promoter elements, these may be specified within the parentheses using the length of the upstream DNA:

     Examples: These examples represent two constructs that code for a fusion protein of sptb and GFP driven by an upstream enhancer either 3.5kb or 6.0kb 5' to the hhex gene.

However, in many cases, the changes within the construct may be too small or too complex to change the number of kbp or cannot be determined. To differentiate these constructs, they will be appended with a sequential number between the Tg (also Et, Pt, Gt) and the parentheses, instead of including further details in the name.  Details will be provided in the notes field on the construct page.

      Examples: original construct: Tg1(uxs1:GFP); subsequent construct: Tg2(uxs1:GFP); additional constructs: Tg#(uxs1:GFP)

  • Foreign Genes used in constructs: For those cases where a gene from a different species is used, the three letter species abbreviation should be used (Homo sapien [Hsa], Mus musculus [Mmu], Salmo salar [Ssa]) followed by a period and the gene symbol. For human genes use the standard gene symbol conventions of all capital letters. For mouse and other species, the first letter of the gene is capitalized. An exception to the 3-letter rule is Chlamydomonas reinhardtii.  Please use Cr for this organism as the 3-letter abbreviation (Cre) conflicts with the abbreviation for the Cre-Lox system.

     Example: Tg(Hsa.FGF8:GFP)  Here the promoter of the human FGF8 gene is driving expression of GFP.

     Example: Tg(Ssa.Ndr2:GFP)  Here the promoter of the salmon Ndr2 gene is driving expression of GFP.

  • Mutations used in constructs: When a mutated form of a gene is used in a construct, the mutation/s in the gene can be included in the construct. The variations should be represented at the most basic level, describing either DNA or amino acid changes.  Manuscript descriptions of the mutated sequence should always be related to a reference sequence (accession number) in order to be relevant and informative. The accession number will be added to the construct page.

     Example: Tg(cav3:cav3_R26Q-GFP) The mutation results in an amino acid substitution of arginine for glutamine at position 26.

     Example: Tg(Hsa.MPZ_1026T>A:EGFP) The nucleotide mutation is in human gene MPZ at position 1026 where T has been replaced by A.

  • Clones in constructs: Transgenic constructs using modified clones, such as BACs and PACs, should be named with the clone type inserted between the "Tg" and the "(". The accession number of the clone must be included in the publication, so it can be associated with the construct. A link to the appropriate clone will be added to the construct page.

    Example: TgPAC(tal1:GFP) GFP is inserted within or near the coding sequence of tal1 in the PAC with the GenBank# AL592495.

  • Two or more cassettes in one construct: If there are two or more cassettes in a construct, it is necessary to distinguish between cassettes by using a comma.

     Example: Tg(isl2b:GAL4,UAS:GFP) Here, isl3 promoter drives GAL4, and UAS drives GFP

  • Two or more distinct constructs inserted at the same locus: If 2 or more independently injected constructs are experimentally demonstrated to be integrated at the same locus, each construct should be separated by a comma. In this case, the line will be assigned one line designation (allele) number.  Note: if it is later determined that the constructs integrated in different loci, an additional line number will be needed. 

Example: Tg(sox9a:mCherry),Tg(usx1:YFP)line#

  • One promoter drives two or more coding sequences in construct: When one promoter is used to drive more than one coding sequence, a comma is used to separate the gene names.  This includes uni- & bidirectional promoters.

     Example: Tg(abhd2a:YFP,mCherry)

  • Construct using a regulatory element that regulates more than one gene in vivo : For those situations where a construct utilizes enhancers or promoters from genes that regulate two or more genes in vivo, only one of the genes should be represented in the name such that the gene with the lowest number or gene closest to the promoter is listed.

     Example:  Tg(dlx1a:GFP) This construct utilizes regulatory elements of dlx1a and dlx2a to drive expression of GFP.  In this case the lower0numbered gene is listed in the name.

     Example: Tg(zic4:Gal4TA4, UAS:mCherry) This construct utilizes an enhancer of both the zic1 and zic4 genes to drive expression of Gal4TA4, with an additional cassette that has UAS driving mCherry expression.  In this case, the gene closest to the enhancer was listed in the name.

4.3.2 Enhancer trap, promoter trap, gene trap constructs

These all use the same nomenclature convention as described for transgenic constructs above, substituting Et, Pt, or Gt as necessary.

4.3.3 Transgenic lines

Transgenic lines are of two types, those that are known to create alleles of genes and those that are not known to create alleles of genes. For a line that does not create an allele of a gene, the feature name consists of the construct name appended with a unique line number with no superscript. The line number should begin with the laboratory designation followed by a unique number.



For lines that do create alleles of a gene, a standard genetic representation is used, where the allele designation is superscripted above the gene, but is appended with a Tg to indicate that it is a transgenic insertion allele. Details regarding the construct used will be available on the genotype page. Gene traps and enhancer traps known to create alleles of a gene are handled in a similar fashion, appending Gt or Et to the allele designation.


      arnt2 hi2639cTg

      parga mn2Et

4.3.4 Stable transgenic lines derived from another transgenic or founder line

When new lines of fish with unique, stable, and heritable transgenic compositions are generated from another transgenic line, the derived lines should all receive unique allele/line designations.  If the derived line and the original line are generated in different laboratories, the derived line should be assigned the allele/line designation associated with the second laboratory.


     Original allele generated at Caltech: ct1; new lines derived from ct1 and generated at Caltech: ct2, ct3 OR ct1a, ct1b; new lines generated from ct1, but generated at University of Oregon: b###


4.3.4 Display of complex genotypes at ZFIN

Genotypes at ZFIN are shown in alphabetical order with transgenic lines that are not alleles of genes first, then other alleles.


      Tg(-0.7her5:EGFP)ne2067;hmgcrb s617/s617


As described above, zebrafish genes are named based on orthology to a human or mouse gene. If an ortholog cannot be identified, then the name that appears first in the literature will be given priority assuming it follows other nomenclature guidelines. ZFIN recommends submission of proposed gene names via the ZFIN form or consultation with the zebrafish nomenclature committee (nomenclature@zfin.org) for nomenclature assignment.

When a mutation is found in a previously cloned zebrafish gene, then the mutant will be referred to as an allele of the gene. If both the cloned gene and the mutation are known by different names and later found to be the same gene, then the name of the gene usually takes priority. The exception to this rule is when the mammalian gene has a gene symbol that is less than two characters such as the mouse gene brachyury which has the symbol T. In this case the zebrafish gene retained the original name no tail, ntl.


The genome project began in 1994, and by 1996 the genetic map was closed. NIH funded major programs to develop a doubled haploid meiotic mapping panel, deficiency strains and expressed sequence tags (ESTs), The ESTs and anonymous markers have been mapped on two radiation-hybrid panels. The Sanger Institute began full genome sequencing in 2001. A physical map is being constructed from the BAC libraries used for sequencing. Genomic information is updated regularly on ZFIN.


Current Nomenclature Coordinator:
Amy Singer (asinger@zfin.org), ZFIN Database Team, Zebrafish Information Network, University of Oregon, USA

Active Contributors:
Richard Dorsky (richard.dorsky@neuro.utah.edu), Department of Neurobiology and Anatomy, University of Utah, USA
Marc Ekker (mekker@uottawa.ca), Center for Advanced Research in Environmental Genomics, University of Ottawa, Ontario, Canada
Mary Mullins (mullins@mail.med.upenn.edu), Department of Cell and Developmental Biology, University of Pennsylvania, USA
John Postlethwait (jpostle@oregon.uoregon.edu), Institute of Neuroscience, University of Oregon, USA
Monte Westerfield (monte@uoneuro.uoregon.edu), Institute of Neuroscience, University of Oregon, USA
Jeffrey Yoder (Jeff_Yoder@ncsu.edu), Department of Molecular Biomedical Sciences Center for Comparative Medicine and Translational Research, College of Veterinary Medicine, North Carolina State University, USA

Past Contributors:
Erik Segerdell, XenBase, University of Calgary, Canada
Melissa Haendel (haendel@ohsu.edu), Oregon Health and Sciences University, USA
Ceri Van Slyke (van_slyke@uoneuro.uoregon.edu), Zebrafish Information Network, University of Oregon, USA
Yvonne Bradford (ybradford@zfin.org), Zebrafish Information Network, University of Oregon, USA
Steve Johnson (sjohnson@genetics.wustl.edu), Department of Genetics, Washington University Medical School, USA


  1. The Zebrafish Science Monitor (1992) Sept. 21.
  2. Mullins, M. (1995) Genetic methods: conventions for naming zebrafish genes in The Zebrafish Book (3rd edition, Westerfield, M., ed.), pp 7.1-7.4, University of Oregon Press.
  3. Genetic Nomenclature Guide, Trends in Genetics (1998).


For questions and advice about appropriate nomenclature, contact us at  nomenclature@zfin.org .