***************************************************************************
README:	Sea Urchin Genome Annotation Freeze 5.0, February, 2008
***************************************************************************

Part One:	Files included

1. gene_info.txt
2. gene_feature.txt
3. gene_expression.txt
4. gene_duplication.txt
5. Tree.txt
6. Comments.txt
7. expressionComments.txt
8. CDS.fa.txt
9. mRNA.fa.txt
10. Peptide.fa.txt
11. README.txt

Part Two: File description

1. gene_info.txt
This file contains most of the information from the Gene_info page. Each line is one annotated gene with TAB delimited fields including (in this order):

GENE ID
ANNOTATOR
GLEAN MODEL CHECK
ADDITIONAL EVIDENCE (of the existence of the gene)
FAMILY MEMBER
COMMON NAME
SYNONYMS
BEST GENBANK HIT
ORTHOLOG/HOMOLOG
GROUP COORDINATOR
DIFFICULT ANNOTATION CATEGORIES
 
2. gene_feature.txt
This file contains information from the Gene Feature page. Each line is TAB delimited including the following fields (in this order):

SEQUENCE
SOURCE
FEATURE
START
END
SCORE
STRAND
FRAME
PRODUCT
EXON NUMBER

3. gene_expression.txt
This file contains most of the information from the Gene Expression page. Each line is TAB delimited containing the following fields (in this order):

GENE ID
TIME
DOMAIN

4. gene_duplication.txt
This file contains information from the Gene Duplication page. Each line is TAB delimited containing the following fields (in this order):

GENE ID
DUPLICATION FOUND (GENE ID) 
DUPLICATION CATEGORY

5. Tree.txt
This file contains information from the NJ_Tree Alignment page. Each entry starts with a header line in the format of:
###Tree_Alignment GENE ID ###
followed by the NJ tree itself.

6. Comments.txt
This file contains the Comments field from the Gene Info page. Each entry starts with a header line in the format of:
###Gene_Info_Comments GENE ID ###
followed by the comments

7. expressionComments.txt
This file contains the COMMENTS field from the Gene Expression page. Each entry starts with a header line in the format of:
###Gene_Expression_Comments GENE ID ###
followed by the comments

8. CDS.fa.txt
This file contains CDS sequence information from the Gene Sequences page. Each fasta formatted sequence entry starts with a header line in the format of:
>GENE ID CDS Sequence
 
9. mRNA.fa.txt 
This file contains mRNA sequence information from the Gene Sequences page. Each fasta formatted sequence entry starts with a header line in the format of:
>GENE ID mRNA Sequence
 
10. Peptide.fa.txt
This file contains Peptide sequence information from the Gene Sequence page. Each fasta formatted sequence entry starts with a header line in the format of:
>GENE ID Peptide Sequence

PART THREE:	NOTES

1. These files were generated directly from the database dump at 4:30PM Mar-27-2006 CST. These files contain information from user input data. They do not include static data from pre-computation or prediction.
2. Minor changes were made on some files to make the format consistent. The information in the user input data fields remained the same after the formatting process.
3. The sequences files, CDS.fa.txt, mRNA.fa.txt, Peptide.fa.txt were generated directly from user input in database table without additional validation of sequence characters. Some files may contain non-sequence characters and may cause problems if used as input for sequence search and alignment programs.
