2422 Nucleotide and/or Amino Acid Sequence Disclosures in Patent Applications

Go to MPEP - Table of Contents

Notice regarding Section 508 of the Workforce Investment Act of 1998. Section 508 of the Workforce Investment Act of 1998 requires all United States Federal Agencies with websites to make them accessible to individuals with disabilities. At this time, the MPEP files below do not meet all standards for web accessibility. Until changes can be made to make them fully accessible to individuals with disabilities, the USPTO is providing access assistance via telephone. MPEP Interim Accessibility Contact: 571-272-8813.

browse before

2422 Nucleotide and/or Amino Acid Sequence Disclosures in Patent Applications - 2400 Biotechnology

2422 Nucleotide and/or Amino Acid Sequence Disclosures in Patent Applications

37 CFR 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications.
(a) Nucleotide and/or amino acid sequences as used in §§ 1.821 through 1.825 are interpreted to mean an unbranched sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides. Branched sequences are specifically excluded from this definition. Sequences with fewer than four specifically defined nucleotides or amino acids are specifically excluded from this section. "Specifically defined" means those amino acids other than "Xaa" and those nucleotide bases other than "n" defined in accordance with the World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25: Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications (1998), including Tables 1 through 6 in Appendix 2, herein incorporated by reference. (Hereinafter "WIPO Standard ST.25 (1998)""). This incorporation by reference was approved by the Director of the Federal Register in accordance with 5 U.S.C. 552(a) and 1 CFR part 51. Copies of WIPO Standard ST.25 (1998) may be obtained from the World Intellectual Property Organization; 34 chemin des Colombettes; 1211 Geneva 20 Switzerland. Copies of ST.25 may be inspected at the Patent Search Room; Crystal Plaza 3, Lobby Level; 2021 South Clark Place; Arlington, VA 22202. Copies may also be inspected at the Office of the Federal Register, 800 North Capitol Street, NW, Suite 700, Washington, DC. Nucleotides and amino acids are further defined as follows:
(1) Nucleotides: Nucleotides are intended to embrace only those nucleotides that can be represented using the symbols set forth in WIPO Standard ST.25 (1998), Appendix 2, Table 1. Modifications, e.g., methylated bases, may be described as set forth in WIPO Standard ST.25 (1998), Appendix 2, Table 2, but shall not be shown explicitly in the nucleotide sequence.

(2) Amino acids: Amino acids are those L-amino acids commonly found in naturally occurring proteins and are listed in WIPO Standard ST.25 (1998), Appendix 2, Table 3. Those amino acid sequences containing D-amino acids are not intended to be embraced by this definition. Any amino acid sequence that contains post-translationally modified amino acids may be described as the amino acid sequence that is initially translated using the symbols shown in WIPO Standard ST.25 (1998), Appendix 2, Table 3 with the modified positions; e.g., hydroxylations or glycosylations, being described as set forth in WIPO Standard ST.25 (1998), Appendix 2, Table 4, but these modifications shall not be shown explicitly in the amino acid sequence. Any peptide or protein that can be expressed as a sequence using the symbols in WIPO Standard ST.25 (1998), Appendix 2, Table 3 in conjunction with a description in the Feature section to describe, for example, modified linkages, cross links and end caps, non-peptidyl bonds, etc., is embraced by this definition.

(b) Patent applications which contain disclosures of nucleotide and/or amino acid sequences, in accordance with the definition in paragraph (a) of this section, shall, with regard to the manner in which the nucleotide and/or amino acid sequences are presented and described, conform exclusively to the requirements of §§ 1.821 through 1.825.

(c) Patent applications which contain disclosures of nucleotide and/or amino acid sequences must contain, as a separate part of the disclosure, a paper copy disclosing the nucleotide and/or amino acid sequences and associated information using the symbols and format in accordance with the requirements of §§ 1.822 and 1.823. This paper copy is hereinafter referred to as the "Sequence Listing." Each sequence disclosed must appear separately in the "Sequence Listing." Each sequence set forth in the "Sequence Listing" shall be assigned a separate sequence identifier. The sequence identifiers shall begin with 1 and increase sequentially by integers. If no sequence is present for a sequence identifier, the code "000" shall be used in place of the sequence. The response for the numeric identifier <160> shall include the total number of SEQ ID NOs, whether followed by a sequence or by the code "000."

(d) Where the description or claims of a patent application discuss a sequence that is set forth in the "Sequence Listing" in accordance with paragraph (c) of this section, reference must be made to the sequence by use of the sequence identifier, preceded by "SEQ ID NO:" in the text of the description or claims, even if the sequence is also embedded in the text of the description or claims of the patent application.

(e) A copy of the "Sequence Listing" referred to in paragraph (c) of this section must also be submitted in computer readable form in accordance with the requirements of § 1.824. The computer readable form is a copy of the "Sequence Listing" and will not necessarily be retained as a part of the patent application file. If the computer readable form of a new application is to be identical with the computer readable form of another application of the applicant on file in the Patent and Trademark Office, reference may be made to the other application and computer readable form in lieu of filing a duplicate computer readable form in the new application if the computer readable form in the other application was compliant with all of the requirements of these rules. The new application shall be accompanied by a letter making such reference to the other application and computer readable form, both of which shall be completely identified. In the new application, applicant must also request the use of the compliant computer readable "Sequence Listing" that is already on file for the other application and must state that the paper copy of the "Sequence Listing" in the new application is identical to the computer readable copy filed for the other application.

(f) In addition to the paper copy required by paragraph (c) of this section and the computer readable form required by paragraph (e) of this section, a statement that the content of the paper and computer readable copies are the same must be submitted with the computer readable form, e.g., a statement that "the information recorded in computer readable form is identical to the written sequence listing."

(g) If any of the requirements of paragraphs (b) through (f) of this section are not satisfied at the time of filing under 35 U.S.C. 111(a) or at the time of entering the national stage under 35 U.S.C. 371, applicant will be notified and given a period of time within which to comply with such requirements in order to prevent abandonment of the application. Any submission in reply to a requirement under this paragraph must be accompanied by a statement that the submission includes no new matter.

(h) If any of the requirements of paragraphs (b) through (f) of this section are not satisfied at the time of filing an international application under the Patent Cooperation Treaty (PCT), which application is to be searched by the United States International Searching Authority or examined by the United States International Preliminary Examining Authority, applicant will be sent a notice necessitating compliance with the requirements within a prescribed time period. Any submission in reply to a requirement under this paragraph must be accompanied by a statement that the submission does not include matter which goes beyond the disclosure in the international application as filed. If applicant fails to timely provide the required computer readable form, the United States International Searching Authority shall search only to the extent that a meaningful search can be performed without the computer readable form and the United States International Preliminary Examining Authority shall examine only to the extent that a meaningful examination can be performed without the computer readable form.

37 CFR 1.821 incorporates by reference the World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25 (1998), including Tables 1 through 6 of Appendix 2. Copies may be obtained from the World Intellectual Property Organization; 34 chemin des Colombettes; 1211 Geneva 20 Switzerland. Copies may be inspected at the Patent Search Room; Crystal Plaza 3, Lobby Level; 2021 South Clark Place; Arlington, VA 22202. Copies may also be inspected at the Office of the Federal Register, 800 North Capitol Street, NW, Suite 700, Washington, DC 20408. These tables are reproduced below.
WIPO Standard ST.25 (1998), Appendix 2, Table 1, provides that the bases of a nucleotide sequence should be represented using the following one-letter code for nucleotide sequence characters:

Symbol Meaning Origin of designation

a a a denine

g g g uanine

c c c ytosine

t t t hymine

u u u racil

r g or a pu r ine

y t/u or c p y rimidine

m a or c a m ino

k g or t/u k eto

s g or c s trong interactions 3H-bonds

w a or t/u w eak interactions 2H-bonds

b g or c or t/u not a

d a or g or t/u not c

h a or c or t/u not g

v a or g or c not t, not u

n a or g or c or t/u, unknown, or other a n y

WIPO Standard ST.25 (1998), Appendix 2, Table 2, provides that modified bases may be represented as the corresponding unmodified bases in the sequence itself, if the modified base is one of those listed below and the modification is further described in the Feature section of the Sequence Listing. The codes from the list below may be used in the description (i.e., the specification and drawing, or in the Sequence Listing) but these codes may not be used in the sequence itself.

Symbol Meaning

ac4c 4-acetylcytidine

chm5u 5-(carboxyhydroxymethyl)uridine

cm 2'-O-methylcytidine

cmnm5s2u 5-carboxymethylaminomethyl-2-thiouridine

cmnm5u 5-carboxymethylaminomethyluridine

d dihydrouridine

fm 2'-O-methylpseudouridine

gal q beta, D-galactosylqueuosine

gm 2'-O-methylguanosine

i inosine

i6a N6-isopentenyladenosine

m1a 1-methyladenosine

m1f 1-methylpseudouridine

m1g 1-methylguanosine

m1i 1-methylinosine

m22g 2,2-dimethylguanosine

m2a 2-methyladenosine

m2g 2-methylguanosine

m3c 3-methylcytidine

m5c 5-methylcytidine

m6a N6-methyladenosine

m7g 7-methylguanosine

mam5u 5-methylaminomethyluridine

mam5s2u 5-methoxyaminomethyl-2-thiouridine

man q beta, D-mannosylqueuosine

mcm5s2u 5-methoxycarbonylmethyl-2-thiouridine

mcm5u 5-methoxycarbonylmethyluridine

mo5u 5-methoxyuridine

ms2i6a 2-methylthio-N6-isopentenyladenosine

ms2t6a N-((9-beta-D-ribofuranosyl-2-methylthiopurine-6-yl)carbamoyl)threonine

mt6a N-((9-beta-D-ribofuranosylpurine-6-yl)N-methylcarbamoyl)threonine

mv uridine-5-oxyacetic acid-methylester

o5u uridine-5-oxyacetic acid

osyw wybutoxosine

p pseudouridine

q queuosine

s2t 5-methyl-2-thiouridine

s2c 2-thiocytidine

s2t 5-methyl-2-thiouridine

s2u 2-thiouridine

s4u 4-thiouridine

t 5-methyluridine

t6a N-((9-beta-D-ribofuranosylpurine-6-yl)-carbamoyl)threonine

tm 2'-O-methyl-5-methyluridine

um 2'-O-methyluridine

yw wybutosine

x 3-(3-amino-3-carboxy-propyl)uridine, (acp3)u

WIPO Standard ST.25 (1998), Appendix 2, Table 3, provides that the amino acids should be represented using the following three-letter code with the first letter as a capital.

Symbol Meaning

Ala Alanine

Cys Cysteine

Asp Aspartic Acid

Glu Glutamic Acid

Phe Phenylalanine

Gly Glycine

His Histidine

Ile Isoleucine

Lys Lysine

Leu Leucine

Met Methionine

Asn Asparagine

Pro Proline

Gln Glutamine

Arg Arginine

Ser Serine

Thr Threonine

Val Valine

Trp Tryptophan

Tyr Tyrosine

Asx Asp or Asn

Glx Glu or Gln

Xaa unknown or other

WIPO Standard ST.25 (1998), Appendix 2, Table 4, provides that modified and unusual amino acids may be represented as the corresponding unmodified amino acids in the sequence itself if the modified or unusual amino acid is one of those listed below and the modification is further described in the Feature section of the Sequence Listing. The codes from the list below may be used in the description (i.e., the specification and drawings, or in Sequence Listing) but these codes may not be used in the sequence itself.

Symbol Meaning

Aad 2-Aminoadipic acid

bAad 3-Aminoadipic acid

bAla beta-Alanine, beta-Aminopropionic acid

Abu 2-Aminobutyric acid

4Abu 4-Aminobutyric acid, piperidinic acid

Acp 6-Aminocaproic acid

Ahe 2-Aminoheptanoic acid

Aib 2-Aminoisobutyric acid

bAib 3-Aminoisobutyric acid

Apm 2-Aminopimelic acid

Dbu 2,4-Diaminobutyric acid

Des Desmosine

Dpm 2,2' -Diaminopimelic acid

Dpr 2,3-Diaminopropionic acid

EtGly N-Ethylglycine

EtAsn N-Ethylasparagine

Hyl Hydroxylysine

aHyl allo-Hydroxylysine

3Hyp 3-Hydroxyproline

4Hyp 4-Hydroxyproline

Ide Isodesmosine

aIle allo-Isoleucine

MeGly N-Methylglycine, sarcosine

MeIle N-Methylisoleucine

MeLys 6-N-Methyllysine

MeVal N-Methylvaline

Nva Norvaline

Nle Norleucine

Orn Ornithine

WIPO Standard ST.25 (1998), Appendix 2, Table 5, provides for feature keys related to DNA sequences.

Key Description

allele a related individual or strain contains stable, alternative forms of the same gene which differs from the presented sequence at this location (and perhaps others)

attenuator (1) region of DNA at which regulation of termination of transcription occurs, which controls the expression of some bacterial operons; (2) sequence segment located between the promoter and the first structural gene that causes partial termination of transcription

C_region constant region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; includes one or more exons depending on the particular chain

CAAT_signal CAAT box; part of a conserved sequence located about 75 bp up-stream of the start point of eukaryotic transcription units which may be involved in RNA polymerase binding; consensus=GG (C or T) CAATCT

CDS coding sequence; sequence of nucleotides that corresponds with the sequence of amino acids in a protein (location includes stop codon); feature includes amino acid conceptual translation

conflict independent determinations of the "same" sequence differ at this site or region

D-loop displacement loop; a region within mitochondrial DNA in which a short stretch of RNA is paired with one strand of DNA, displacing the original partner DNA strand in this region; also used to describe the displacement of a region of one strand of duplex DNA by a single stranded invader in the reaction catalyzed by RecA protein

D-segment diversity segment of immunoglobulin heavy chain, and T-cell receptor beta chain

enhancer a cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter

exon region of genome that codes for portion of spliced mRNA; may contain 5'UTR, all CDSs, and 3'UTR

GC_signal GC box; a conserved GC-rich region located upstream of the start point of eukaryotic transcription units which may occur in multiple copies or in either orientation; consensus=GGGCGG

gene region of biological interest identified as a gene and for which a name has been assigned

iDNA intervening DNA; DNA which is eliminated through any of several kinds of recombination

intron a segment of DNA that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it

J_segment joining segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains

LTR long terminal repeat, a sequence directly repeated at both ends of a defined sequence, of the sort typically found in retroviruses

mat_peptide mature peptide or protein coding sequence; coding sequence for the mature or final peptide or protein product following post-translational modification; the location does not include the stop codon (unlike the corresponding CDS)

misc_binding site in nucleic acid which covalently or non-covalently binds another moiety that cannot be described by any other Binding key (primer_bind or protein_bind)

misc_difference feature sequence is different from that presented in the entry and cannot be described by any other Difference key (conflict, unsure, old_sequence, mutation, variation, allele, or modified_base)

misc_feature region of biological interest which cannot be described by any other feature key; a new or rare feature

misc_recomb site of any generalized, site-specific or replicative recombination event where there is a breakage and reunion of duplex DNA that cannot be described by other recombination keys (iDNA and virion) or qualifiers of source key (/insertion_seq, /transposon, /proviral)

misc_RNA any transcript or RNA product that cannot be defined by other RNA keys (prim_transcript, precursor_RNA, mRNA, 5'clip, 3'clip, 5'UTR, 3'UTR, exon, CDS, sig_peptide, transit_peptide, mat_peptide, intron, polyA_site, rRNA, tRNA, scRNA, and snRNA)

misc_signal any region containing a signal controlling or altering gene function or expression that cannot be described by other Signal keys (promoter, CAAT_signal, TATA_signal, -35_signal, -10_signal, GC_signal, RBS, polyA_signal, enhancer, attenuator, terminator, and rep_origin)

misc_structure any secondary or tertiary structure or conformation that cannot be described by other Structure keys (stem_loop and D-loop)

modified_base the indicated nucleotide is a modified nucleotide and should be substituted for by the indicated molecule (given in the mod_base qualifier value)

mRNA messenger RNA; includes 5' untranslated region (5'UTR), coding sequences (CDS, exon) and 3' untranslated region (3'UTR)

mutation a related strain has an abrupt, inheritable change in the sequence at this location

N_region extra nucleotides inserted between rearranged immunoglobulin segments

old_sequence the presented sequence revises a previous version of the sequence at this location

polyA_signal recognition region necessary for endonuclease cleavage of an RNA transcript that is followed by polyadenylation; consensus=AATAAA

polyA_site site on an RNA transcript to which will be added adenine residues by post-transcriptional polyadenylation

precursor_RNA any RNA species that is not yet the mature RNA product; may include 5' clipped region (5'clip), 5' untranslated region (5'UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3'UTR), and 3' clipped region (3'clip)

prim_transcript primary (initial, unprocessed) transcript; includes 5' clipped region (5'clip), 5' untranslated region (5'UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3'UTR), and 3' clipped region (3'clip)

primer_bind non-covalent primer binding site for initiation of replication, transcription, or reverse transcription; includes site(s) for synthetic, for example, PCR primer elements

promoter region on a DNA molecule involved in RNA polymerase binding to initiate transcription

protein_bind non-covalent protein binding site on nucleic acid

RBS ribosome binding site

repeat_region region of genome containing repeating units

repeat_unit single repeat element

rep_origin origin of replication; starting site for duplication of nucleic acid to give two identical copies

rRNA mature ribosomal RNA; the RNA component of the ribonucleoprotein particle (ribosome) which assembles amino acids into proteins

S_region switch region of immunoglobulin heavy chains; involved in the rearrangement of heavy chain DNA leading to the expression of a different immunoglobulin class from the same B-cell

satellite many tandem repeats (identical or related) of a short basic repeating unit; many have a base composition or other property different from the genome average that allows them to be separated from the bulk (main band) genomic DNA

scRNA small cytoplasmic RNA; any one of several small cytoplasmic RNA molecules present in the cytoplasm and (sometimes) nucleus of a eukaryote

sig_peptide signal peptide coding sequence; coding sequence for an N-terminal domain of a secreted protein; this domain is involved in attaching nascent polypeptide to the membrane; leader sequence

snRNA small nuclear RNA; any one of many small RNA species confined to the nucleus; several of the snRNAs are involved in splicing or other RNA processing reactions

source identifies the biological source of the specified span of the sequence; this key is mandatory; every entry will have, as a minimum, a single source key spanning the entire sequence; more than one source key per sequence is permissable

stem_loop hairpin; a double-helical region formed by base-pairing between adjacent (inverted) complementary sequences in a single strand of RNA or DNA

STS Sequence Tagged Site; short, single-copy DNA sequence that characterizes a mapping landmark on the genome and can be detected by PCR; a region of the genome can be mapped by determining the order of a series of STSs

TATA_signal TATA box; Goldberg-Hogness box; a conserved AT-rich septamer found about 25 bp before the start point of each eukaryotic RNA polymerase II transcript unit which may be involved in positioning the enzyme for correct initiation; consensus=TATA(A or T)A(A or T)

terminator sequence of DNA located either at the end of the transcript or adjacent to a promoter region that causes RNA polymerase to terminate transcription; may also be site of binding of repressor protein

transit_peptide transit peptide coding sequence; coding sequence for an N-terminal domain of a nuclear-encoded organellar protein; this domain is involved in post-translational import of the protein into the organelle

tRNA mature transfer RNA, a small RNA molecule (75-85 bases long) that mediates the translation of a nucleic acid sequence into an amino acid sequence

unsure author is unsure of exact sequence in this region

V_region variable region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for the variable amino terminal portion; can be made up from V_segments, D_segments, N_regions, and J_segments

V_segment variable segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for most of the variable region (V_region) and the last few amino acids of the leader peptide

variation a related strain contains stable mutations from the same gene (for example, RFLPs, polymorphisms, etc.) which differ from the presented sequence at this location (and possibly others)

3'clip 3'-most region of a precursor transcript that is clipped off during processing

3'UTR region at the 3' end of a mature transcript (following the stop codon) that is not translated into a protein

5'clip 5'-most region of a precursor transcript that is clipped off during processing

5'UTR region at the 5' end of a mature transcript (preceding the initiation codon) that is not translated into a protein

-10_signal pribnow box; a conserved region about 10 bp upstream of the start point of bacterial transcription units which may be involved in binding RNA polymerase; consensus=TAtAaT

-35_signal a conserved hexamer about 35 bp upstream of the start point of bacterial transcription units; consensus=TTGACa [ ] or TGTTGACA [ ]

WIPO Standard ST.25 (1998), Appendix 2, Table 6 provides for feature keys related to protein sequences .

Key Description

CONFLICT different papers report differing sequences

VARIANT authors report that sequence variants exist

VARSPLIC description of sequence variants produced by alternative splicing

MUTAGEN site which has been experimentally altered

MOD_RES post-translational modification of a residue

ACETYLATION N-terminal or other

AMIDATION generally at the C-terminal of a mature active peptide

BLOCKED undetermined N- or C-terminal blocking group

FORMYLATION of the N-terminal methionine

GAMMA-CARBOXYGLUTAMIC ACID HYDROXYLATION of asparagine, aspartic acid, proline or lysine

METHYLATION generally of lysine or arginine

PHOSPHORYLATION of serine, threonine, tyrosine, aspartic acid or histidine

PYRROLIDONE CARBOXYLIC ACID N-terminal glutamate which has formed an internal cyclic lactam

SULFATATION generally of tyrosine

LIPID covalent binding of a lipidic moiety

MYRISTATE myristate group attached through an amide bond to the N-terminal glycine residue of the mature form of a protein or to an internal lysine residue

PALMITATE palmitate group attached through a thioether bond to a cysteine residue or through an ester bond to a serine or threonine residue

FARNESYL farnesyl group attached through a thioether bond to a cysteine residue

GERANYL-GERANYL geranyl-geranyl group attached through a thioether bond to a cysteine residue

GPI-ANCHOR glycosyl-phosphatidylinositol (GPI) group linked to the alpha-carboxyl group of the C-terminal residue of the mature form of a protein

N-ACYL DIGLYCERIDE N-terminal cysteine of the mature form of a prokaryotic lipoprotein with an amide-linked fatty acid and a glyceryl group to which two fatty acids are linked by ester linkages

DISULFID disulfide bond; the `FROM' and `TO' endpoints represent the two residues which are linked by an intra-chain disulfide bond; if the `FROM' and `TO' endpoints are identical, the disulfide bond is an interchain one and the description field indicates the nature of the cross-link

THIOLEST thiolester bond; the `FROM' and `TO' endpoints represent the two residues which are linked by the thiolester bond

THIOETH thioether bond; the `FROM' and `TO' endpoints represent the two residues which are linked by the thioether bond

CARBOHYD glycosylation site; the nature of the carbohydrate (if known) is given in the description field

METAL binding site for a metal ion; the description field indicates the nature of the metal

BINDING binding site for any chemical group (co-enzyme, prosthetic group, etc.); the chemical nature of the group is given in the description field

SIGNAL extent of a signal sequence (prepeptide)

TRANSIT extent of a transit peptide (mitochondrial, chloroplastic, or for a microbody)

PROPEP extent of a propeptide

CHAIN extent of a polypeptide chain in the mature protein

PEPTIDE extent of a released active peptide

DOMAIN extent of a domain of interest on the sequence; the nature of that domain is given in the description field

CA_BIND extent of a calcium-binding region

DNA_BIND extent of a DNA-binding region

NP_BIND extent of a nucleotide phosphate binding region; the nature of the nucleotide phosphate is indicated in the description field

TRANSMEM extent of a transmembrane region

ZN_FING extent of a zinc finger region

SIMILAR extent of a similarity with another protein sequence; precise information, relative to that sequence is given in the description field

REPEAT extent of an internal sequence repetition

HELIX secondary structure: Helices, for example, Alpha-helix, 3(10) helix, or Pi-helix

STRAND secondary structure: Beta-strand, for example, Hydrogen bonded beta-strand, or Residue in an isolated beta-bridge

TURN secondary structure: Turns, for example, H-bonded turn (3-turn, 4-turn, or 5-turn)

ACT_SITE amino acid(s) involved in the activity of an enzyme

SITE any other interesting site on the sequence

INIT_MET the sequence is known to start with an initiator methionine

NON_TER the residue at an extremity of the sequence is not the terminal residue; if applied to position 1, this signifies that the first position is not the N-terminus of the complete molecule; if applied to the last position, it signifies that this position is not the C-terminus of the complete molecule; there is no description field for this key

NON_CONS non consecutive residues; indicates that two residues in a sequence are not consecutive and that there are a number of unsequenced residues between them

UNSURE uncertainties in the sequence; used to describe region(s) of a sequence for which the authors are unsure about the sequence assignment

FILING INTERNATIONALLY
The revisions to 37 CFR 1.821 through 1.825 are the result of an effort to harmonize the PTO, PCT, EPO and JPO Sequence Listing requirements to the extent possible. The requirements of WIPO Standard ST.25 are substantially identical to the requirements of 37 CFR 1.821 through 1.825. PatentIn Version 3.1 software, now available (see MPEP § 2430), generates sequence listings that meet all of the requirements of WIPO Standard ST.25 (1998). The requirements of 37 CFR 1.821 through 1.825, however, are less stringent than the requirements of WIPO Standard ST.25 (1998). Thus, applicants who wish to file in countries which adhere to WIPO Standard ST.25 (1998) should consider the following when not using PatentIn Version 3.1:

(A) The WIPO Standard ST.25 (1998) does not permit submissions using a Macintosh computer;

(B) The WIPO Standard ST.25 (1998) does not accept the range of media permitted by 37 CFR 1.821 through 1.825;

(C) The answers in fields <221> and <222> must use selections from Tables 5 and 6 of WIPO Standard ST.25 (1998) to comply with that standard. The terms from these Tables are considered language neutral vocabulary;

(D) Any free text in numeric identifier <223> of a Sequence Listing will not be translated and thus must also appear in the specification of applications filed under WIPO Standard ST.25 (1998) for compliance;

(E) A CRF filed after the filing of an application under the PCT is not considered to be part of the disclosure and will not be published in the pamphlet;

(F) Paragraph 39 of WIPO Standard ST.25 (1998) requires the specific wording "the information recorded on the form is identical to the written sequence listing"; and

(G) WIPO Standard ST.25 (1998), paragraph 24, requires spaces between specified numeric identifiers in the Sequence Listing.

browse after

Key	Description
allele	a related individual or strain contains stable, alternative forms of the same gene which differs from the presented sequence at this location (and perhaps others)
attenuator	(1) region of DNA at which regulation of termination of transcription occurs, which controls the expression of some bacterial operons; (2) sequence segment located between the promoter and the first structural gene that causes partial termination of transcription
C_region	constant region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; includes one or more exons depending on the particular chain
CAAT_signal	CAAT box; part of a conserved sequence located about 75 bp up-stream of the start point of eukaryotic transcription units which may be involved in RNA polymerase binding; consensus=GG (C or T) CAATCT
CDS	coding sequence; sequence of nucleotides that corresponds with the sequence of amino acids in a protein (location includes stop codon); feature includes amino acid conceptual translation
conflict	independent determinations of the "same" sequence differ at this site or region
D-loop	displacement loop; a region within mitochondrial DNA in which a short stretch of RNA is paired with one strand of DNA, displacing the original partner DNA strand in this region; also used to describe the displacement of a region of one strand of duplex DNA by a single stranded invader in the reaction catalyzed by RecA protein
D-segment	diversity segment of immunoglobulin heavy chain, and T-cell receptor beta chain
enhancer	a cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter
exon	region of genome that codes for portion of spliced mRNA; may contain 5'UTR, all CDSs, and 3'UTR
GC_signal	GC box; a conserved GC-rich region located upstream of the start point of eukaryotic transcription units which may occur in multiple copies or in either orientation; consensus=GGGCGG
gene	region of biological interest identified as a gene and for which a name has been assigned
iDNA	intervening DNA; DNA which is eliminated through any of several kinds of recombination
intron	a segment of DNA that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it
J_segment	joining segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains
LTR	long terminal repeat, a sequence directly repeated at both ends of a defined sequence, of the sort typically found in retroviruses
mat_peptide	mature peptide or protein coding sequence; coding sequence for the mature or final peptide or protein product following post-translational modification; the location does not include the stop codon (unlike the corresponding CDS)
misc_binding	site in nucleic acid which covalently or non-covalently binds another moiety that cannot be described by any other Binding key (primer_bind or protein_bind)
misc_difference	feature sequence is different from that presented in the entry and cannot be described by any other Difference key (conflict, unsure, old_sequence, mutation, variation, allele, or modified_base)
misc_feature	region of biological interest which cannot be described by any other feature key; a new or rare feature
misc_recomb	site of any generalized, site-specific or replicative recombination event where there is a breakage and reunion of duplex DNA that cannot be described by other recombination keys (iDNA and virion) or qualifiers of source key (/insertion_seq, /transposon, /proviral)
misc_RNA	any transcript or RNA product that cannot be defined by other RNA keys (prim_transcript, precursor_RNA, mRNA, 5'clip, 3'clip, 5'UTR, 3'UTR, exon, CDS, sig_peptide, transit_peptide, mat_peptide, intron, polyA_site, rRNA, tRNA, scRNA, and snRNA)
misc_signal	any region containing a signal controlling or altering gene function or expression that cannot be described by other Signal keys (promoter, CAAT_signal, TATA_signal, -35_signal, -10_signal, GC_signal, RBS, polyA_signal, enhancer, attenuator, terminator, and rep_origin)
misc_structure	any secondary or tertiary structure or conformation that cannot be described by other Structure keys (stem_loop and D-loop)
modified_base	the indicated nucleotide is a modified nucleotide and should be substituted for by the indicated molecule (given in the mod_base qualifier value)
mRNA	messenger RNA; includes 5' untranslated region (5'UTR), coding sequences (CDS, exon) and 3' untranslated region (3'UTR)
mutation	a related strain has an abrupt, inheritable change in the sequence at this location
N_region	extra nucleotides inserted between rearranged immunoglobulin segments
old_sequence	the presented sequence revises a previous version of the sequence at this location
polyA_signal	recognition region necessary for endonuclease cleavage of an RNA transcript that is followed by polyadenylation; consensus=AATAAA
polyA_site	site on an RNA transcript to which will be added adenine residues by post-transcriptional polyadenylation
precursor_RNA	any RNA species that is not yet the mature RNA product; may include 5' clipped region (5'clip), 5' untranslated region (5'UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3'UTR), and 3' clipped region (3'clip)
prim_transcript	primary (initial, unprocessed) transcript; includes 5' clipped region (5'clip), 5' untranslated region (5'UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3'UTR), and 3' clipped region (3'clip)
primer_bind	non-covalent primer binding site for initiation of replication, transcription, or reverse transcription; includes site(s) for synthetic, for example, PCR primer elements
promoter	region on a DNA molecule involved in RNA polymerase binding to initiate transcription
protein_bind	non-covalent protein binding site on nucleic acid
RBS	ribosome binding site
repeat_region	region of genome containing repeating units
repeat_unit	single repeat element
rep_origin	origin of replication; starting site for duplication of nucleic acid to give two identical copies
rRNA	mature ribosomal RNA; the RNA component of the ribonucleoprotein particle (ribosome) which assembles amino acids into proteins
S_region	switch region of immunoglobulin heavy chains; involved in the rearrangement of heavy chain DNA leading to the expression of a different immunoglobulin class from the same B-cell
satellite	many tandem repeats (identical or related) of a short basic repeating unit; many have a base composition or other property different from the genome average that allows them to be separated from the bulk (main band) genomic DNA
scRNA	small cytoplasmic RNA; any one of several small cytoplasmic RNA molecules present in the cytoplasm and (sometimes) nucleus of a eukaryote
sig_peptide	signal peptide coding sequence; coding sequence for an N-terminal domain of a secreted protein; this domain is involved in attaching nascent polypeptide to the membrane; leader sequence
snRNA	small nuclear RNA; any one of many small RNA species confined to the nucleus; several of the snRNAs are involved in splicing or other RNA processing reactions
source	identifies the biological source of the specified span of the sequence; this key is mandatory; every entry will have, as a minimum, a single source key spanning the entire sequence; more than one source key per sequence is permissable
stem_loop	hairpin; a double-helical region formed by base-pairing between adjacent (inverted) complementary sequences in a single strand of RNA or DNA
STS	Sequence Tagged Site; short, single-copy DNA sequence that characterizes a mapping landmark on the genome and can be detected by PCR; a region of the genome can be mapped by determining the order of a series of STSs
TATA_signal	TATA box; Goldberg-Hogness box; a conserved AT-rich septamer found about 25 bp before the start point of each eukaryotic RNA polymerase II transcript unit which may be involved in positioning the enzyme for correct initiation; consensus=TATA(A or T)A(A or T)
terminator	sequence of DNA located either at the end of the transcript or adjacent to a promoter region that causes RNA polymerase to terminate transcription; may also be site of binding of repressor protein
transit_peptide	transit peptide coding sequence; coding sequence for an N-terminal domain of a nuclear-encoded organellar protein; this domain is involved in post-translational import of the protein into the organelle
tRNA	mature transfer RNA, a small RNA molecule (75-85 bases long) that mediates the translation of a nucleic acid sequence into an amino acid sequence
unsure	author is unsure of exact sequence in this region
V_region	variable region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for the variable amino terminal portion; can be made up from V_segments, D_segments, N_regions, and J_segments
V_segment	variable segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for most of the variable region (V_region) and the last few amino acids of the leader peptide
variation	a related strain contains stable mutations from the same gene (for example, RFLPs, polymorphisms, etc.) which differ from the presented sequence at this location (and possibly others)
3'clip	3'-most region of a precursor transcript that is clipped off during processing
3'UTR	region at the 3' end of a mature transcript (following the stop codon) that is not translated into a protein
5'clip	5'-most region of a precursor transcript that is clipped off during processing
5'UTR	region at the 5' end of a mature transcript (preceding the initiation codon) that is not translated into a protein
-10_signal	pribnow box; a conserved region about 10 bp upstream of the start point of bacterial transcription units which may be involved in binding RNA polymerase; consensus=TAtAaT
-35_signal	a conserved hexamer about 35 bp upstream of the start point of bacterial transcription units; consensus=TTGACa [ ] or TGTTGACA [ ]

Go to MPEP - Table of Contents

Symbol	Meaning	Origin of designation
a	a	a denine
g	g	g uanine
c	c	c ytosine
t	t	t hymine
u	u	u racil
r	g or a	pu r ine
y	t/u or c	p y rimidine
m	a or c	a m ino
k	g or t/u	k eto
s	g or c	s trong interactions 3H-bonds
w	a or t/u	w eak interactions 2H-bonds
b	g or c or t/u	not a
d	a or g or t/u	not c
h	a or c or t/u	not g
v	a or g or c	not t, not u
n	a or g or c or t/u, unknown, or other	a n y

Symbol	Meaning
ac4c	4-acetylcytidine
chm5u	5-(carboxyhydroxymethyl)uridine
cm	2'-O-methylcytidine
cmnm5s2u	5-carboxymethylaminomethyl-2-thiouridine
cmnm5u	5-carboxymethylaminomethyluridine
d	dihydrouridine
fm	2'-O-methylpseudouridine
gal q	beta, D-galactosylqueuosine
gm	2'-O-methylguanosine
i	inosine
i6a	N6-isopentenyladenosine
m1a	1-methyladenosine
m1f	1-methylpseudouridine
m1g	1-methylguanosine
m1i	1-methylinosine
m22g	2,2-dimethylguanosine
m2a	2-methyladenosine
m2g	2-methylguanosine
m3c	3-methylcytidine
m5c	5-methylcytidine
m6a	N6-methyladenosine
m7g	7-methylguanosine

mam5u	5-methylaminomethyluridine
mam5s2u	5-methoxyaminomethyl-2-thiouridine
man q	beta, D-mannosylqueuosine
mcm5s2u	5-methoxycarbonylmethyl-2-thiouridine
mcm5u	5-methoxycarbonylmethyluridine
mo5u	5-methoxyuridine
ms2i6a	2-methylthio-N6-isopentenyladenosine
ms2t6a	N-((9-beta-D-ribofuranosyl-2-methylthiopurine-6-yl)carbamoyl)threonine
mt6a	N-((9-beta-D-ribofuranosylpurine-6-yl)N-methylcarbamoyl)threonine
mv	uridine-5-oxyacetic acid-methylester
o5u	uridine-5-oxyacetic acid
osyw	wybutoxosine
p	pseudouridine
q	queuosine
s2t	5-methyl-2-thiouridine
s2c	2-thiocytidine
s2t	5-methyl-2-thiouridine
s2u	2-thiouridine
s4u	4-thiouridine
t	5-methyluridine
t6a	N-((9-beta-D-ribofuranosylpurine-6-yl)-carbamoyl)threonine
tm	2'-O-methyl-5-methyluridine
um	2'-O-methyluridine
yw	wybutosine
x	3-(3-amino-3-carboxy-propyl)uridine, (acp3)u

Symbol	Meaning
Ala	Alanine
Cys	Cysteine
Asp	Aspartic Acid
Glu	Glutamic Acid
Phe	Phenylalanine
Gly	Glycine
His	Histidine
Ile	Isoleucine
Lys	Lysine
Leu	Leucine
Met	Methionine
Asn	Asparagine
Pro	Proline
Gln	Glutamine
Arg	Arginine
Ser	Serine
Thr	Threonine
Val	Valine
Trp	Tryptophan
Tyr	Tyrosine
Asx	Asp or Asn
Glx	Glu or Gln
Xaa	unknown or other

Symbol	Meaning
Aad	2-Aminoadipic acid
bAad	3-Aminoadipic acid
bAla	beta-Alanine, beta-Aminopropionic acid
Abu	2-Aminobutyric acid
4Abu	4-Aminobutyric acid, piperidinic acid
Acp	6-Aminocaproic acid
Ahe	2-Aminoheptanoic acid
Aib	2-Aminoisobutyric acid
bAib	3-Aminoisobutyric acid
Apm	2-Aminopimelic acid
Dbu	2,4-Diaminobutyric acid
Des	Desmosine
Dpm	2,2' -Diaminopimelic acid
Dpr	2,3-Diaminopropionic acid
EtGly	N-Ethylglycine
EtAsn	N-Ethylasparagine
Hyl	Hydroxylysine
aHyl	allo-Hydroxylysine
3Hyp	3-Hydroxyproline
4Hyp	4-Hydroxyproline
Ide	Isodesmosine
aIle	allo-Isoleucine
MeGly	N-Methylglycine, sarcosine
MeIle	N-Methylisoleucine
MeLys	6-N-Methyllysine
MeVal	N-Methylvaline
Nva	Norvaline
Nle	Norleucine
Orn	Ornithine

Key	Description
CONFLICT	different papers report differing sequences
VARIANT	authors report that sequence variants exist
VARSPLIC	description of sequence variants produced by alternative splicing
MUTAGEN	site which has been experimentally altered
MOD_RES	post-translational modification of a residue
ACETYLATION	N-terminal or other
AMIDATION	generally at the C-terminal of a mature active peptide
BLOCKED	undetermined N- or C-terminal blocking group
FORMYLATION	of the N-terminal methionine
GAMMA-CARBOXYGLUTAMIC ACID HYDROXYLATION	of asparagine, aspartic acid, proline or lysine
METHYLATION	generally of lysine or arginine
PHOSPHORYLATION	of serine, threonine, tyrosine, aspartic acid or histidine
PYRROLIDONE CARBOXYLIC ACID	N-terminal glutamate which has formed an internal cyclic lactam
SULFATATION	generally of tyrosine
LIPID	covalent binding of a lipidic moiety
MYRISTATE	myristate group attached through an amide bond to the N-terminal glycine residue of the mature form of a protein or to an internal lysine residue
PALMITATE	palmitate group attached through a thioether bond to a cysteine residue or through an ester bond to a serine or threonine residue
FARNESYL	farnesyl group attached through a thioether bond to a cysteine residue
GERANYL-GERANYL	geranyl-geranyl group attached through a thioether bond to a cysteine residue
GPI-ANCHOR	glycosyl-phosphatidylinositol (GPI) group linked to the alpha-carboxyl group of the C-terminal residue of the mature form of a protein
N-ACYL DIGLYCERIDE	N-terminal cysteine of the mature form of a prokaryotic lipoprotein with an amide-linked fatty acid and a glyceryl group to which two fatty acids are linked by ester linkages
DISULFID	disulfide bond; the `FROM' and `TO' endpoints represent the two residues which are linked by an intra-chain disulfide bond; if the `FROM' and `TO' endpoints are identical, the disulfide bond is an interchain one and the description field indicates the nature of the cross-link
THIOLEST	thiolester bond; the `FROM' and `TO' endpoints represent the two residues which are linked by the thiolester bond
THIOETH	thioether bond; the `FROM' and `TO' endpoints represent the two residues which are linked by the thioether bond
CARBOHYD	glycosylation site; the nature of the carbohydrate (if known) is given in the description field
METAL	binding site for a metal ion; the description field indicates the nature of the metal
BINDING	binding site for any chemical group (co-enzyme, prosthetic group, etc.); the chemical nature of the group is given in the description field
SIGNAL	extent of a signal sequence (prepeptide)
TRANSIT	extent of a transit peptide (mitochondrial, chloroplastic, or for a microbody)
PROPEP	extent of a propeptide
CHAIN	extent of a polypeptide chain in the mature protein
PEPTIDE	extent of a released active peptide
DOMAIN	extent of a domain of interest on the sequence; the nature of that domain is given in the description field
CA_BIND	extent of a calcium-binding region
DNA_BIND	extent of a DNA-binding region
NP_BIND	extent of a nucleotide phosphate binding region; the nature of the nucleotide phosphate is indicated in the description field
TRANSMEM	extent of a transmembrane region
ZN_FING	extent of a zinc finger region
SIMILAR	extent of a similarity with another protein sequence; precise information, relative to that sequence is given in the description field
REPEAT	extent of an internal sequence repetition
HELIX	secondary structure: Helices, for example, Alpha-helix, 3(10) helix, or Pi-helix
STRAND	secondary structure: Beta-strand, for example, Hydrogen bonded beta-strand, or Residue in an isolated beta-bridge
TURN	secondary structure: Turns, for example, H-bonded turn (3-turn, 4-turn, or 5-turn)
ACT_SITE	amino acid(s) involved in the activity of an enzyme
SITE	any other interesting site on the sequence
INIT_MET	the sequence is known to start with an initiator methionine
NON_TER	the residue at an extremity of the sequence is not the terminal residue; if applied to position 1, this signifies that the first position is not the N-terminus of the complete molecule; if applied to the last position, it signifies that this position is not the C-terminus of the complete molecule; there is no description field for this key
NON_CONS	non consecutive residues; indicates that two residues in a sequence are not consecutive and that there are a number of unsequenced residues between them
UNSURE	uncertainties in the sequence; used to describe region(s) of a sequence for which the authors are unsure about the sequence assignment