Annotation of genomic sequences and regulatory elements

Annotation files and pairwise alignments (pips) are provided for the 8 genomic regions selected by the mouse group. These include: MHC, BTK, DFNA5, ELN, KVQLT1, CFTR, SIL, and HOXA. Additional loci are described which contribute to our knowledge of annotated regulatory regions. These include: human CFTR, mouse Sil, human HOX A, B, C, and D, and the human beta globin locus.

There are two sources for human sequences.

Originally the pips were drawn for the genomic sequences posted by Dan Brown at http://waldo.wi.mit.edu/~danb/HomologSeqs/

A new set of exon files are now available with the April GoldenPath coordinates for the 8 test loci (MHC, BTK, DFNA5, ELN, KvQLT1, CFTR, SIL, and HOXA). The CD4 locus is included with GoldenPath coordinates as well. These files are called "GoldenPath exon coordinates" and appear in the list of annotation files for each locus below. The sequences corresponding to these annotations can be found at GoldenPath using the coordinates in the file "GoldenPath sequence coordinates" that are listed for each locus.

8 Test Regions and Selected Regulatory Annotations

A coordinate-based file of regulatory elements is provided for the human beta-globin locus. Detailed information of the regulatory elements in the other loci (mouse Sil, human CFTR, HOX A, B, C, and D) will be posted shortly.


Explanation of annotations

Exons were annotated with the program sim4 which allowed a comparison of RefSeq entries and the genomic sequences. Additional references for annotations came from dbEST and the Sanger Center (for the MHC locus). Within each "exons" file the directionality of a gene is indicated by an arrowhead (< or >). The genomic coordinates and gene name are followed by separate lines specifying the start- and end-positions of each exon. A "+" character indicates the first and last nucleotides of the translated region, and that information is included for genes that have cDNA entries in RefSeq. The coding region, UTRs and introns are colored differently in the pips. Each color used in the figure is identified in the underlay legend of the pip. For more information see the PipMaker instruction page

A list of conserved noncoding sequences that do not overlap annotated exons is provided as a "CNS" file. The conserved sites were identified as gap-free alignments of 100 bp or more, with >=70, 80, or 90% identity. (In the HOX region there are a few sites with 100% identity for >100 bp.) The CNSs are also color coded in the pip by the percentile group into which they fall: 70, 80 or 90.

Annotations of regulatory elements are based on published reports of functional elements. A link to the relevant journal article appears as a horizontal, colored bar in the pip. The annotation legend describes the type of link used for each color. Individual transcription factor binding sites are too small to be represented in this manner and therefore are annotated with a color underlay (vertical, colored bar) in the pip.


please direct comments or questions to elnitski@bio.cse.psu.edu

data provided by Laura Elnitski, Scott Schwartz, Ross Hardison, and Webb Miller

updated 2001-08-07