SPU_027229	SPU_027229	After reviewing the data it was apparent that there was no sufficient GLEAN model that matched. There was only one set of data (subject hit) that was available from the Excel spreadsheet. The sequence had an orderly arrangement but had very poor coverage.	none
SPU_017510	SPU_017510	contains MmsB domain	none
SPU_015296	SPU_015296	contains 2 NIPSNAP superfamily motifs	none
SPU_014269	SPU_014269	contains 2 7tm_1 superfamily motifs	none
SPU_017738	SPU_017738	contains COG0429 domain	none
SPU_007127	SPU_007127	contains COG0429 domain	none
SPU_006443	SPU_006443	contains LMWPc domain	none
SPU_019259	SPU_019259	contains ACTIN domain	none
SPU_010800	SPU_010800	contains Actin domain	none
SPU_004923	SPU_004923	contains CaiA domain	none
SPU_016362	SPU_016362	contains PRK12583 domain	none
SPU_016078	SPU_016078	contains 4 TSP_1 superfamily motifs	none
SPU_004963	SPU_004963	contains 2 Alpha_adaptinC2 superfamily motifs	none
SPU_023262	SPU_023262	contains 2 ArfGap superfamily motifs	none
SPU_020019	SPU_020019	contains COG5580 domain	none
SPU_005567	SPU_005567	contains HutI domain	none
SPU_011369	SPU_011369	orthologous to leucyl aminopeptidases in various species of invertebrates	none
SPU_021203	SPU_021203	contains COG0436 domain	none
SPU_021721	SPU_021721	contains PRK06872 domain	none
SPU_010443	SPU_010443	contains 6 ARM superfamily motifs	none
SPU_009698	SPU_009698	contains 2 BTB superfamily motifs	none
SPU_020434	SPU_020434	contains 2 ANK superfamily motifs	none
SPU_023368	SPU_023368	contains 2 ANK superfamily motifs	none
SPU_009189	SPU_009189	contains 2 ANK superfamily motifs	none
SPU_019054	SPU_019054	contains 6 ANK superfamily motifs	none
SPU_007842	SPU_007842	contains 10 ANK superfamily motifs	none
SPU_004324	SPU_004324	contains 2 ANK superfamily motifs	none
SPU_004095	SPU_004095	contains ANK superfamily motif near N-terminus	none
SPU_022268	SPU_022268	contains 3 ANK superfamily motifs	none
SPU_004356	SPU_004356	contains 5 ANK superfamily motifs	none
SPU_019535	SPU_019535	contains 2 ANK superfamily motifs	none
SPU_019139	SPU_019139	contains 4 Annexin superfamily motifs	none
SPU_022630	SPU_022630	contains 2 IBR superfamily motifs	none
SPU_004121	SPU_004121	contains 5 ARM superfamily motifs	none
SPU_010281	SPU_010281	contains DUF634 domain	none
SPU_021810	SPU_021810	contains 2 ACTIN superfamily motifs	none
SPU_017462	SPU_017462	contains TyrB domain	none
SPU_007950	SPU_007950	contains 2 Asp_Arg_Hydrox superfamily motifs	none
SPU_005845	SPU_005845	orthologous to plant, bacterial, and archeal proteins	none
SPU_004689	SPU_004689	contains XPG superfamily motif at N-terminus	none
SPU_003416	SPU_003416	contains 2 EGF_Lam superfamily motifs	none
SPU_013739	SPU_013739	contains PRK07883 domain	none
SPU_005458	SPU_005458	contains 2 BPI superfamily motifs	none
SPU_014242	SPU_014242	contains Seipin superfamily notif at N-terminus	none
SPU_021602	SPU_021602	also homologous to cellulases in bacteria and other mollusc spp.	none
SPU_009320	SPU_009320	also orthologous to bacterial D-aminoacylase proteins	none
SPU_006840	SPU_006840	contains 3 PHD superfamily motifs	none
SPU_015092	SPU_015092	contains MDN1 domain	none
SPU_019793	SPU_019793	orthologs are found only in Xenopus, Ciona, and fish.	none
SPU_012637	SPU_012637	contains Arp domain and COG4886 domain	none
SPU_004434	SPU_004434	contains Smc domain	none
SPU_007302	SPU_007302	contains 3 Vinculin domains	none
SPU_007613	SPU_007613	contains NOT2_3_5 superfamily motif at C-terminus and Not3 domain at N-terminus	none
SPU_022867	SPU_022867	contains MiaB domain	none
SPU_020615	SPU_020615	contains RAD18 domain	none
SPU_007275	SPU_007275	contains Smc domain	none
SPU_003772	SPU_003772	contains Cupin_2 superfamily motid at C-terminus	none
SPU_023274	SPU_023274	contains SMC domain	none
SPU_005543	SPU_005543	contains Smc domain	none
SPU_019304	SPU_019304	contains Smc domain	none
SPU_007071	SPU_007071	contains SMC domain	none
SPU_008071	SPU_008071	contains DUF572 superfamily motif at N-terminus	none
SPU_005730	SPU_005730	contains RPN6 domain	none
SPU_018438	SPU_018438	contains 2 WD40 superfamily motifs	none
SPU_009619	SPU_009619	also orthologous to numerous bacterial cryptochrome proteins	none
SPU_012898	SPU_012898	contains CBS domain	none
SPU_021060	SPU_021060	contains COG0436 domain	none
SPU_019401	SPU_019401	contains 3 VWC superfamily motifs	none
SPU_006853	SPU_006853	contains 2 nt_trans superfamily motifs, and argS domain, and CysS domain	none
SPU_009707	SPU_009707	contains CysS domain	none
SPU_014819	SPU_014819	contains COG1520 domain	none
SPU_017219	SPU_017219	contains D1tE domain	none
SPU_008465	SPU_008465	contains COG1413 domain	none
SPU_007044	SPU_007044	contains 2 FMO-like domains	none
SPU_022175	SPU_022175	contains PRK0115 domain	none
SPU_004188	SPU_004188	contains 2 DEP superfamily motifs	none
SPU_016403	SPU_016403	contains WD40 superfamily motif at C-terminus	none
SPU_006134	SPU_006134	contains 2 CDC27 superfamily motifs	none
SPU_005958	SPU_005958	contains DnaJ domain	none
SPU_007592	SPU_007592	contains DnaJ superfamily motif and CbpA domain at N-terminus	none
SPU_017259	SPU_017259	contains 4 Thioredoxin-like superfamily motifs	none
SPU_005701	SPU_005701	contains ZUO1 domain	none
SPU_016958	SPU_016958	contains CbpA domain	none
SPU_007324	SPU_007324	contains COG5273 domain	none
SPU_010151	SPU_010151	contains 2 C2 superfamily motifs	none
SPU_013604	SPU_013604	contains 2 UBQ superfamily motifs	none
SPU_018012	SPU_018012	contains Dtyr_deacylase superfamily motif at N-terminus	none
SPU_010886	SPU_010886	contains DYN1 domain and Dynein_heavy domain	none
SPU_021217	SPU_021217	contains 3 C2 superfamily motifs. homologous to C. elegans myoferlin.	none
SPU_016190	SPU_016190	contains PRK05431 domain	none
SPU_007213	SPU_007213	contains FRQ1 domain	none
SPU_016291	SPU_016291	contains PRK00055 domain	none
SPU_018807	SPU_018807	contains Cast domain and Smc domain	none
SPU_011358	SPU_011358	contains Elp3 domain	none
SPU_010189	SPU_010189	contains AP_endonuc_2 domain	none
SPU_005544	SPU_005544	orthologous to mammalian proteins bearing homology to E. coli Eral11 protein	none
SPU_010543	SPU_010543	contains 2 ANK superfamily motifs	none
SPU_016439	SPU_016439	contains Gst domain	none
SPU_007869	SPU_007869	contains IF-2B domain	none
SPU_021449	SPU_021449	contains Paf67 domain	none
SPU_022386	SPU_022386	contains 2 PCI superfamily motifs	none
SPU_008561	SPU_008561	contains RRP4 domain	none
SPU_018481	SPU_018481	contains PRK04282 domain	none
SPU_013995	SPU_013995	contains COG2123 domain	none
SPU_014762	SPU_014762	contains RRP4 domain	none
SPU_012202	SPU_012202	contains CRM1 domain	none
SPU_010172	SPU_010172	contains DadA domain	none
SPU_011159	SPU_011159	contains DadA domain	none
SPU_020945	SPU_020945	contains 2 TrkA domains and PRK06116 domain	none
SPU_003971	SPU_003971	protein function unknown	none
SPU_018968	SPU_018968	contains 2 FEZ superfamily motifs	none
SPU_003480	SPU_003480	contains NosD domain	none
SPU_007523	SPU_007523	contains MFS_1 domain	none
SPU_005896	SPU_005896	also orthologous to formin2 (Drosophila)	none
SPU_015811	SPU_015811	contains FKBP_C superfamily motif at C-terminus	none
SPU_007008	SPU_007008	contains 3 Gelsolin superfamily motifs and COG4886 domain	none
SPU_005489	SPU_005489	contains FH2 superfamily motif at C-terminus	none
SPU_020870	SPU_020870	contains RPN7 domain	none
SPU_005418	SPU_005418	contains 2 GST_N superfamily motifs	none
SPU_007979	SPU_007979	contains MDN1 domain	none
SPU_017876	SPU_017876	contains MoCF_BD superfamily motif at N-terminus	none
SPU_017961	SPU_017961	contains ERG12 domain	none
SPU_008907	SPU_008907	contains Prc domain	none
SPU_005241	SPU_005241	contains NagC domain	none
SPU_020762	SPU_020762	contains BetA domain and PhaC domain	none
SPU_005338	SPU_005338	contains 2 nt_trans superfamily motifs	none
SPU_009518	SPU_009518	contains ECM4 domain	none
SPU_012540	SPU_012540	contains Amidinotransf domain	none
SPU_007935	SPU_007935	contains GDPD domain	none
SPU_011336	SPU_011336	contains COG1215 domain	none
SPU_019011	SPU_019011	contains ATP_bind_1 domain	none
SPU_021732	SPU_021732	contains ATP_bind_1 domain	none
SPU_008412	SPU_008412	contains ATP_bind_1 domain	none
SPU_023160	SPU_023160	contains COG1413 domain	none
SPU_016418	SPU_016418	contains 2 RCC1 superfamily motifs and 2 AST1 domains	none
SPU_006582	SPU_006582	contains HECTc domain	none
SPU_022710	SPU_022710	contains COG1112 domain	none
SPU_007096	SPU_007096	contains 2 HMG-box superfamily motifs	none
SPU_022283	SPU_022283	contains MFS_1 domain	none
SPU_003428	SPU_003428	homologous to bacterial proteins	none
SPU_014437	SPU_014437	contains DUF1740 domain	none
SPU_014008	SPU_014008	also orthologous to bacterial flagella/basal body proteins	none
SPU_016446	SPU_016446	contains MFS_1 domain	none
SPU_021737	SPU_021737	contains Ion_trans domain	none
SPU_019493	SPU_019493	contains fabG domain	none
SPU_005529	SPU_005529	contains HECTc superfamily motif at C-terminus	none
SPU_018019	SPU_018019	contains COG1112 domain	none
SPU_005117	SPU_005117	contains SXM1 domain	none
SPU_008930	SPU_008930	orthologous to numerous Drosophila, C. elegans, plant, and bacterial proteins with unknown function	none
SPU_006666	SPU_006666	contains TNG2 domain	none
SPU_004902	SPU_004902	contains Ptr domain	none
SPU_016108	SPU_016108	contains 4 ANK superfamily motifs	none
SPU_013295	SPU_013295	contains SpoVK domain	none
SPU_021302	SPU_021302	contains ileS domain	none
SPU_022372	SPU_022372	contains JmjC superfamily motif at C-terminus	none
SPU_004203	SPU_004203	contains 4 Kelch_1 superfamily motifs	none
SPU_007977	SPU_007977	contains AAA domain	none
SPU_010222	SPU_010222	contains C2 superfamily motif at N-terminus	none
SPU_021937	SPU_021937	contains Smc domain	none
SPU_007532	SPU_007532	contains 4 IG superfamily motifs	none
SPU_019649	SPU_019649	contains ANK superfamily motif at N-terminus	none
SPU_014368	SPU_014368	contains CDC37_N superfamily motif and Smc domain at N-terminus	none
SPU_017024	SPU_017024	contains WSC domain	none
SPU_022784	SPU_022784	contains 2 Beta-lactamase superfamily motifs	none
SPU_015975	SPU_015975	contains COG2433 domain	none
SPU_006748	SPU_006748	contains COG4886 domain	none
SPU_017780	SPU_017780	contains COG4886 domain	none
SPU_016726	SPU_016726	contains leuS domain	none
SPU_010496	SPU_010496	contains LigT domain	none
SPU_009124	SPU_009124	contains LIM superfamily motif at N-terminus and SH3 superfamily motif at C-terminus. also orthologous to nubelette, non-muscle isoform.	none
SPU_022771	SPU_022771	contains 2 LIM superfamily motifs	none
SPU_022482	SPU_022482	contains 12 PLAT superfamily motifs	none
SPU_022354	SPU_022354	contains PH-like superfamily motif at N-terminus	none
SPU_012372	SPU_012372	contains 6 LY superfamily motifs and 4 Ld1_re superfamily motifs	none
SPU_003918	SPU_003918	contains 4 copies of LY superfamily motif. also orthologous to vitellogenin receptors and lipophorin receptors in various species.	none
SPU_011347	SPU_011347	contains DadA domain	none
SPU_020157	SPU_020157	contains SMC_N domain	none
SPU_020317	SPU_020317	contains MFS_1 domain	none
SPU_009003	SPU_009003	contains YTH1 domain	none
SPU_018902	SPU_018902	contains Man-6P_recep domain	none
SPU_021484	SPU_021484	contains RING superfamily motif near N-terminus, and SSM4 domain	none
SPU_012725	SPU_012725	contains 2 RHOD superfamily motifs and SseA domain	none
SPU_010842	SPU_010842	also orthologous to bacterial cystathione gamma-synthases	none
SPU_019030	SPU_019030	contains MBD superfamily motif at N-terminus	none
SPU_020642	SPU_020642	contains PRK09426 domain	none
SPU_003840	SPU_003840	homologous to bacterial proteins	none
SPU_018039	SPU_018039	contains COG1041 domain	none
SPU_003476	SPU_003476	homologous to bacterial proteins	none
SPU_009743	SPU_009743	contains 2 Tubulin_FtsZ superfamily motifs	none
SPU_011691	SPU_011691	contains SpoU domain	none
SPU_018187	SPU_018187	contains COG3217 domain	none
SPU_003748	SPU_003748	contains COG1112 domain	none
SPU_020695	SPU_020695	contains COG4642 domain	none
SPU_007308	SPU_007308	contains 3 vWF superfamily motifs	none
SPU_018218	SPU_018218	contains 3 RRM superfamily motifs	none
SPU_017256	SPU_017256	contains B41 domain	none
SPU_011495	SPU_011495	contains HemK domain	none
SPU_005469	SPU_005469	contains 2 DAHP_Synth_1 superfamily motifs	none
SPU_015311	SPU_015311	contains cyclophilin superfamily motif at N-terminus	none
SPU_009268	SPU_009268	contains 2 E1_enzyme_family superfamily motifs	none
SPU_010789	SPU_010789	contains Beach domain	none
SPU_010402	SPU_010402	contains csdA domain	none
SPU_004365	SPU_004365	contains 2 NIF3 superfamily motifs	none
SPU_016077	SPU_016077	contains COG4886 domain	none
SPU_013093	SPU_013093	contains NMD3 domain	none
SPU_008638	SPU_008638	contains 3 TPR superfamily motifs	none
SPU_018518	SPU_018518	contains NmrA domain	none
SPU_018234	SPU_018234	contains Sun domain	none
SPU_006180	SPU_006180	contains Sun domain	none
SPU_008013	SPU_008013	contains Sun domain	none
SPU_009226	SPU_009226	contains UMPH-1 domain	none
SPU_018617	SPU_018617	contains NPL4 domain	none
SPU_010857	SPU_010857	contains BBC superfamily motif near N-terminus	none
SPU_010282	SPU_010282	contains Nol1_Nop2_Fmu domain. also orthologous to NOP2 nucleolar protein.	none
SPU_010119	SPU_010119	contains COG1341 domain	none
SPU_009115	SPU_009115	contains 2 Nup84 superfamily motifs	none
SPU_003282	SPU_003282	contains Nsp1_C superfamily motif at N-terminus	none
SPU_021736	SPU_021736	contains 2 Thioredoxin-like superfamily motifs	none
SPU_014324	SPU_014324	contains f1hG domain	none
SPU_023092	SPU_023092	contains NPY domain	none
SPU_022052	SPU_022052	contains RhsA domain. also Sp-Ten3.	none
SPU_018021	SPU_018021	contains UBQ superfamily motif at N-terminus	none
SPU_021209	SPU_021209	contains 2 CH superfamily motifs. also homologous to parvin, alpha.	none
SPU_011154	SPU_011154	contains 5 CAP_ED superfamily motifs	none
SPU_005687	SPU_005687	contains COG5600 domain	none
SPU_017133	SPU_017133	contains 3 EFhand superfamily motifs	none
SPU_021078	SPU_021078	contains PqqL domain	none
SPU_020031	SPU_020031	contains 2 NHL superfamily motifs and COG3391 domain	none
SPU_010976	SPU_010976	contains Mpv17_PMP22 superfamily motif at C-terminus	none
SPU_011242	SPU_011242	contains 2 TPR superfamily motifs	none
SPU_007443	SPU_007443	contains 2 LNS2 superfamily motifs	none
SPU_005415	SPU_005415	contains 3 GAF superfamily motifs	none
SPU_014444	SPU_014444	contains PhnD domain	none
SPU_012693	SPU_012693	contains 2 C2 superfamily motifs	none
SPU_019706	SPU_019706	contains COG1741 domain	none
SPU_003501	SPU_003501	contains 2 PH-like superfamily motifs, each near N-terminus and C-terminus, respectively	none
SPU_004859	SPU_004859	contains PH-like superfamily motif at N-terminus	none
SPU_014211	SPU_014211	contains 2 Esterase_lipase superfamily motifs	none
SPU_021703	SPU_021703	contains 3 RNA_pol_B_RPB2 superfamily motifs and PRK08565 domain	none
SPU_006372	SPU_006372	contains RPB7 domain	none
SPU_011964	SPU_011964	contains 2 PQ_loop superfamily motifs	none
SPU_006239	SPU_006239	contains RPR domain	none
SPU_016596	SPU_016596	contains Suf domain. also homologous to rRNA processing proteins in fungi.	none
SPU_005402	SPU_005402	contains Pro_racemase domain	none
SPU_018082	SPU_018082	contains PtrB domain	none
SPU_004206	SPU_004206	contains 3 Peptidase_S28 superfamily motifs	none
SPU_003540	SPU_003540	contains Prominin domain	none
SPU_012590	SPU_012590	contains 2 VSP domains	none
SPU_023096	SPU_023096	contains COG2130 domain	none
SPU_004318	SPU_004318	orthologous to numerous proteins	none
SPU_015013	SPU_015013	contains RPN1 domain	none
SPU_018597	SPU_018597	contains RPN7 domain	none
SPU_021307	SPU_021307	contains 4 CBS_pair superfamily motifs and 2 CBS domains	none
SPU_018419	SPU_018419	contains 2 CAP_ED superfamily motifs	none
SPU_014790	SPU_014790	contains COG4886 domain	none
SPU_007665	SPU_007665	contains PPX1 domain at N-terminus	none
SPU_019552	SPU_019552	contains truA domain	none
SPU_018860	SPU_018860	contains 2 WD40 superfamily motifs	none
SPU_015873	SPU_015873	contains PRK09057 domain	none
SPU_010961	SPU_010961	contains 2 PH-like superfamily motifs, 2 zf-RING superfamily motifs and PRK07003 domain	none
SPU_012920	SPU_012920	contains TB2_DP1_HVA superfamily motif at N-terminus	none
SPU_011638	SPU_011638	contains 2 MtN3_slv superfamily motifs. also orthologous to numerous hypothetical proteins in Drosophila and C. elegans.	none
SPU_020264	SPU_020264	contains rfc domain	none
SPU_015041	SPU_015041	contains COG5109 domain	none
SPU_004118	SPU_004118	contains 2 Rgp1 domains	none
SPU_022469	SPU_022469	also homologous to ring finger protein 115	none
SPU_006149	SPU_006149	also orthologous to ring finger protein 111 in some species	none
SPU_009739	SPU_009739	contains RING superfamily motif at C-terminus	none
SPU_022878	SPU_022878	contains RRM superfamily motif at N-terminus	none
SPU_012932	SPU_012932	contains SpoU domain	none
SPU_020498	SPU_020498	contains RTC domain	none
SPU_009772	SPU_009772	contains PRK05431 domain	none
SPU_017156	SPU_017156	contains 2 ANK superfamily motifs	none
SPU_012288	SPU_012288	contains fabG domain	none
SPU_018638	SPU_018638	contains AtoC domain	none
SPU_015271	SPU_015271	contains PRK10929 domain	none
SPU_012704	SPU_012704	contains MFS_1 domain	none
SPU_010939	SPU_010939	contains 3 Mito_carr superfamily motifs	none
SPU_007195	SPU_007195	contains 3 Mito_carr superfamily motifs	none
SPU_003240	SPU_003240	contains 2 sodium phosphate cotrasporter superfamily motifs	none
SPU_017567	SPU_017567	contains 2 DUF6 superfamily motifs	none
SPU_016333	SPU_016333	contains MFS_1 domain	none
SPU_016213	SPU_016213	contains MFS_1 domain	none
SPU_018720	SPU_018720	contains PRK08581 domain	none
SPU_003450	SPU_003450	contains 3 SH3 superfamily motifs	none
SPU_005740	SPU_005740	contains VPS10 domain. also orthologous to lipoprotein receptor relative with 11 ligand-binding repeats (Mus musculus).	none
SPU_016849	SPU_016849	contains Vps5 domain	none
SPU_015644	SPU_015644	contains COG5391 domain	none
SPU_020802	SPU_020802	contains B41 domain	none
SPU_004584	SPU_004584	contains 3 MIR superfamily motifs	none
SPU_009761	SPU_009761	contains 2 E1_enzyme_family superfamily motifs	none
SPU_012457	SPU_012457	contains Bromo_TP superfamily motif at N-terminus, and PHD superfamily motif toward C-terminus	none
SPU_010113	SPU_010113	contains Metallo-dependent_hydrolases superfamily motif at C-terminus	none
SPU_015235	SPU_015235	contains CALCOCO1 domain	none
SPU_022609	SPU_022609	contains Taxilin domain	none
SPU_019748	SPU_019748	contains COG5210 domain	none
SPU_018192	SPU_018192	contains 4 TPR superfamily motifs	none
SPU_004064	SPU_004064	contains 3 TRP superfamily motifs	none
SPU_020261	SPU_020261	contains Deme6 domain	none
SPU_004719	SPU_004719	contains 2 TPR superfamily motifs near C-terminus	none
SPU_013871	SPU_013871	contains THAP superfamily motif at N-terminus	none
SPU_018272	SPU_018272	contains 2 Thioredoxin-like superfamily motifs	none
SPU_009534	SPU_009534	contains PRK06116 domain	none
SPU_020617	SPU_020617	contains PRK06078 domain	none
SPU_003806	SPU_003806	contains IF-2B domain	none
SPU_008625	SPU_008625	contains COG2319 domain	none
SPU_009640	SPU_009640	SbcC domain	none
SPU_008054	SPU_008054	contains 2 Sec63 superfamily motifs	none
SPU_017676	SPU_017676	contains 4 TPR superfamily motifs	none
SPU_009409	SPU_009409	also orthologous to numerous Drosphila proteins	none
SPU_005636	SPU_005636	contains 2 DUF6 superfamily motifs and RhaT domain	none
SPU_019214	SPU_019214	contains DUF1183 domain	none
SPU_014995	SPU_014995	contains KAP95 domain	none
SPU_020463	SPU_020463	contains PcnB domain	none
SPU_012116	SPU_012116	contains tsf domain	none
SPU_012328	SPU_012328	contains Ion_trans domain	none
SPU_018207	SPU_018207	contains HECTc domain	none
SPU_021701	SPU_021701	contains 3 E1_enzyme_family superfamily motifs	none
SPU_004630	SPU_004630	contains 2 E1_enzyme_family superfamily motifs	none
SPU_003887	SPU_003887	contains 2 copies of glycos_transf_2 superfamily motifs and RICIN superfamily motifs	none
SPU_016095	SPU_016095	contains 2 Glycos_transf_2 superfamily motifs	none
SPU_004079	SPU_004079	contains TRP superfamily motif at N-terminus, and ARM superfamily motif at C-terminus	none
SPU_015226	SPU_015226	contains PRK08125 domain	none
SPU_019032	SPU_019032	contains DUF1162 superfamily motif at N-terminus. contains MRS6 domain and COG2872 domain	none
SPU_019426	SPU_019426	contains AAA domain	none
SPU_011317	SPU_011317	contains COG2319 domain	none
SPU_017013	SPU_017013	contains MAD domain	none
SPU_010131	SPU_010131	contains COG2319 domain and PRK12704 domain	none
SPU_003791	SPU_003791	contains ATS1 domain	none
SPU_003297	SPU_003297	contains WSC superfamily motif near N-terminus	none
SPU_005771	SPU_005771	also orthologous to numerous miscellaneous proteins	none
SPU_011396	SPU_011396	contains COG5219 domain	none
SPU_020126	SPU_020126	contains AIR1 domain	none
SPU_017358	SPU_017358	contains 2 AIR1 domains	none
SPU_014362	SPU_014362	contains COG5273 domain	none
SPU_009595	SPU_009595	contains COG5273 domain	none
SPU_016930	SPU_016930	contains HRD1 domain	none
SPU_005501	SPU_005501	also orthologous to other proteins and numerous hypothetical proteins	none
SPU_010020	SPU_010020	contains hATC superfamily motif near C-terminus	none
SPU_017578	SPU_017578	also homologous to GH3.3; indole-3-acetic acid amido synthetase (A. thaliana)	none
SPU_005580	SPU_005580	contains 2 WD40 superfamily motifs near C-terminus and COG2319 domain	none
SPU_015223	SPU_015223	contains MRS6 domain at N-terminus	none
SPU_018003	SPU_018003	contains Ndh domain	none
SPU_023085	SPU_023085	contains PRK02106 odmain	none
SPU_012514	SPU_012514	contains Atrophin-1 domain	none
SPU_019967	SPU_019967	contains 4 FA58C superfamily motifs	none
SPU_020558	SPU_020558	homologous to numerous sea urchin and worm proteins	none
SPU_012851	SPU_012851	contains PRK03983 domain	none
SPU_012655	SPU_012655	contains 14 EGF_CA superfamily motifs in tandemn. also similar to Notch protein (Drosophila).	none
SPU_020607	SPU_020607	contains 4 Gelsolin superfamily motifs	none
SPU_012273	SPU_012273	contains 2 7tm_1 superfamily motifs	none
SPU_015727	SPU_015727	contains 7 MAM superfamily motifs	none
SPU_010871	SPU_010871	orthologous to G protein-coupled receptors	none
SPU_011830	SPU_011830	contains 3 Cu-oxidase superfamily motifs. also orthologous to laccase proteins in many insects.	none
SPU_017701	SPU_017701	contains 4 WD40 superfamily motifs and COG2319 domain. also homologous to transmembrane protein 38a in some mammalian spp.	none
SPU_017293	SPU_017293	contains f1hF domain, PR domain, and COG1112 domain	none
SPU_022421	SPU_022421	contains 2 COG1112 domains and PRK12678 domain	none
SPU_017466	SPU_017466	contains COG2016 domain	none
SPU_009716	SPU_009716	contains B41 domain	none
SPU_026309	SPU_026309	contains Pyr_redox_2 domain	none
SPU_028425	SPU_028425	contains Smc domain	none
SPU_028888	SPU_028888	contains 2 WD40 superfamily motifs	none
SPU_026154	SPU_026154	contains Creatinase domain	none
SPU_028183	SPU_028183	contains CbpA domain	none
SPU_026432	SPU_026432	contains 2 7tm_1 superfamily motifs	none
SPU_027289	SPU_027289	contains FRQ1 domain	none
SPU_027916	SPU_027916	contains Uvr domain	none
SPU_027982	SPU_027982	contains COG2123 domain	none
SPU_026635	SPU_026635	contains 2 CH superfamily motifs	none
SPU_027805	SPU_027805	contains RasGAP domain	none
SPU_028921	SPU_028921	contains SNF2_N domain	none
SPU_026993	SPU_026993	contains 3 EFh superfamily motifs	none
SPU_026146	SPU_026146	also orthologous to mammalian phospholipase A2	none
SPU_028147	SPU_028147	orthologous to bacterial proteins	none
SPU_028848	SPU_028848	contains PHD superfamily motif at C-terminus and RecN domain	none
SPU_028373	SPU_028373	contains 7 PDZ superfamily motifs. also orthologous to numerous PDZ domain containing hypothetical proteins.	none
SPU_027893	SPU_027893	contains COG4942 domain	none
SPU_026210	SPU_026210	contains 2 Kelch_1 superfamily motifs and COG3055 domain	none
SPU_028150	SPU_028150	contains PRK12678 domain	none
SPU_027314	SPU_027314	contains Lipin_N superfamily motif and LNS superfamily motif near N-terminus and C-terminus, respectively	none
SPU_026716	SPU_026716	contains Lon domain	none
SPU_027844	SPU_027844	contains MAD domain	none
SPU_028458	SPU_028458	contains Hyaluronidase_2 superfamily motif at N-terminus	none
SPU_026683	SPU_026683	contains 2 MIT superfamily motifs	none
SPU_028214	SPU_028214	contains Smc domain. also orthologous to KIAA0774-like proteins and other hypothetical proteins.	none
SPU_028926	SPU_028926	contains B41 domain	none
SPU_026774	SPU_026774	contains 2 COG5098 domains and PRK09894 domain	none
SPU_028807	SPU_028807	contains 2 ArsB_NhaD_permease superfamily motifs	none
SPU_027903	SPU_027903	contains Smc domain	none
SPU_026124	SPU_026124	contains 2 ANK superfamily motifs and Arp domain	none
SPU_026933	SPU_026933	contains TruA domain	none
SPU_028354	SPU_028354	contains PRK09212 domain, PRK11892 domain, AcoB domain, and Dxs domain	none
SPU_026695	SPU_026695	contains VHS_ENTH_ANTH superfamily motif at N-terminus	none
SPU_028579	SPU_028579	contains PRK04195 domain	none
SPU_028894	SPU_028894	contains 2 UBQ superfamily motifs	none
SPU_026170	SPU_026170	contains 2 SH3 superfamily motifs	none
SPU_026814	SPU_026814	contains RhaT domain	none
SPU_028069	SPU_028069	contains SAM superfamily motif near N-terminus	none
SPU_028110	SPU_028110	contains SANT superfamily motif near C-terminus	none
SPU_026042	SPU_026042	contains TSP_1 superfamily and PLAC superfamily motifs near C-terminus	none
SPU_028461	SPU_028461	contains Smc domain	none
SPU_028217	SPU_028217	contains Icc domain	none
SPU_026467	SPU_026467	contains 2 WD40 superfamily motifs	none
SPU_026244	SPU_026244	contains 2 HYR superfamily motifs	none
SPU_028567	SPU_028567	contains 6 vWA/vWFA superfamily motifs. also orthologous to numerous hypothetical proteins.	none
SPU_028754	SPU_028754	contains 2 AIG2 superfamily motifs	none
SPU_026317	SPU_026317	contains THAP superfamily motif at N-terminus	none
SPU_027433	SPU_027433	also orthologous to various subtypes of nicotinic cholinergic receptors	none
SPU_026649	SPU_026649	S. purpuratus protein family containing EGF domain	none
SPU_014248	SPU_014248	contains 2 SPEC superfamily motifs	none
SPU_005938	SPU_005938	contains 2 TSP_1 superfamily motifs	none
SPU_015342	SPU_015342	contains COG1112 domain	none
SPU_004005	SPU_004005	contains 2 PH-like superfamily motifs	none
SPU_012504	SPU_012504	contains 7 COG5022 domain motifs and AF-4 domain	none
SPU_010024	SPU_010024	contains 8 COG5022 domain motifs	none
SPU_004375	SPU_004375	huge protein with 44,492 amino acids. only 550 amino acids from N-terminus have been determined.	none
SPU_006457	SPU_006457	contains Smc domain	none
SPU_004989	SPU_004989	contains 2 CUB superfamily motifs	none
SPU_014369	SPU_014369	contains MDN1 domain	none
SPU_002057	SPU_002057	contains MDN1 domain	none
SPU_008503	SPU_008503	contains 3 SMC_N domain motifs	none
SPU_016359	SPU_016359	contains Smc domain and SMC_N domain	none
SPU_007254	SPU_007254	contains SMC_N domain and Smc domain	none
SPU_016258	SPU_016258	contains Smc domain and SMC_N domain	none
SPU_010736	SPU_010736	contains TolA domain motif near C-terminus	none
SPU_015350	SPU_015350	contains 3 ATP_gua_Ptrans superfamily C-terminal motifs and 3 ATP_gua_Ptrans superfamily N-terminal motifs and PRK01059 domain and COG3869 domain	none
SPU_010441	SPU_010441	contains 5 CCP superfamily motifs and 3 HYR superfamily motifs	none
SPU_009854	SPU_009854	contains 12 CUB superfamily motifs in tandem over the entire length of the protein	none
SPU_002016	SPU_002016	contains 6 CUB superfamily motifs	none
SPU_010596	SPU_010596	contains 2 MIF4G superfamily motifs	none
SPU_005335	SPU_005335	contains MPH1 domain	none
SPU_009877	SPU_009877	contains 2 DEXDc superfamily motifs	none
SPU_002015	SPU_002015	contains DENN superfamily motif at N-terminus and WD40 superfamily motif at C-terminus	none
SPU_015671	SPU_015671	contains 2 PRK05850 domain motifs	none
SPU_013590	SPU_013590	homologous only to 4 Drosophila proteins	none
SPU_007997	SPU_007997	contains MDN1 domain and DYN1 domain	none
SPU_012416	SPU_012416	contains DHC_N1 domain	none
SPU_002733	SPU_002733	contains DHC_N1 domain	none
SPU_004934	SPU_004934	contains 2 P_loop_NTPase superfamily motifs and DYN1 domain	none
SPU_002110	SPU_002110	contains AST1 domain and DHC_N1 domain and DYN1 domain	none
SPU_003404	SPU_003404	contains DHC_N1 domain	none
SPU_003216	SPU_003216	contains 3 P_loop_NTPase superfamily motifs and DYN1 domain	none
SPU_014702	SPU_014702	contains 2 P_loop_NTPase superfamily motifs and 2 DYN1 domain motifs and Dynein_heavy domain	none
SPU_010423	SPU_010423	contains PRK00409 domain and COG3264 domain	none
SPU_015263	SPU_015263	contains SMC_N domain	none
SPU_001069	SPU_001069	contains SMC_N domain	none
SPU_013828	SPU_013828	contains 2 SH3 superfamily motifs	none
SPU_001796	SPU_001796	contains 2 Calx-beta superfamily repeats at N-terminus	none
SPU_011173	SPU_011173	contains Smc domain and SMC_N domain	none
SPU_015240	SPU_015240	contains COG1413 domain	none
SPU_009932	SPU_009932	contains RecD domain	none
SPU_009776	SPU_009776	contains 7 TSP_1 superfamily motifs	none
SPU_011935	SPU_011935	contains PRK09039 domain	none
SPU_004147	SPU_004147	contains MDN1 domain	none
SPU_004993	SPU_004993	contains COG1413 domain	none
SPU_014663	SPU_014663	contains SMC_N domain and Smc domain	none
SPU_005893	SPU_005893	contains EGF_CA superfamily motif at N-terminus and MdoB somain	none
SPU_014426	SPU_014426	contains SMC_N domain and PRK11281 domain. Sp-specific protein.	none
SPU_014543	SPU_014543	homologous to several mosquito proteins	none
SPU_014981	SPU_014981	Sp-specific	none
SPU_015018	SPU_015018	homologous to mosquito proteins	none
SPU_015209	SPU_015209	contains TraB_pillus domain. Sp-specific protein.	none
SPU_005894	SPU_005894	contains 6 EGF_CA superfamily motifs	none
SPU_015216	SPU_015216	Sp-specific protein or bogus.	none
SPU_015286	SPU_015286	contains 2 IG superfamily motifs. Sp-specific protein with N-terminal region weakly homologous to proteins in other species.	none
SPU_015308	SPU_014308	contains MDN1 domain. Sp-specific protein with N-terminal region weakly homologous to 2 Branchiostoma floridae hypothetical proteins.	none
SPU_005924	SPU_005924	contains ANK superfamily motif at N-terminus. sea urchin-specific protein.	none
SPU_006284	SPU_006284	contains COG1305 domain	none
SPU_007346	SPU_007346	contains RING superfamily motif at C-terminus	none
SPU_008291	SPU_008291	contains CUB superfamily motifs at N-terminus and 7tm_2 superfamily motif near C-terminus	none
SPU_008687	SPU_008687	contains Sec7 superfamily motif at C-terminus	none
SPU_009081	SPU_009081	contains RecD domain	none
SPU_009154	SPU_009154	appears to be S. purpuratus-specific	none
SPU_009259	SPU_009259	appears to be S. purpuratus-specific	none
SPU_009463	SPU_009463	contains Smc (chromosome segregation ATPases) domain	none
SPU_009464	SPU_009464	contains Smc (chromosome segregation ATPases) domain and SMC_N domain and Herpes_BLLF1 domain	none
SPU_009667	SPU_009667	contains Smc domain	none
SPU_009777	SPU_009777	contains TSP_1 (thrombospondin) superfamily motifs near C-terminus	none
SPU_009921	SPU_009921	appears to be S. purpuratus-specific, unique protein	none
SPU_009928	SPU_009928	appears to be S. purpuratus-specific, unique protein	none
SPU_010074	SPU_010074	contains SMC_N domain	none
SPU_010200	SPU_010200	no homolog found	none
SPU_010202	SPU_010202	contains PH-like superfamily motif at N-terminus. S. purpuratus-specific.	none
SPU_010225	SPU_010225	contains Smc domain and 2 SMC_N domain motifs	none
SPU_011508	SPU_011508	contains 6 EGF_CA superfamily motifs. very large protein (ca 6,000 amino acids). appears to be S. purpuratus-specific.	none
SPU_011581	SPU_011581	homologous to conserved putative proteins	none
SPU_012466	SPU_012466	contains RING superfamily motif at C-terminus and Myosin_tail domain and Smc domain	none
SPU_012612	SPU_012612	contains 2 PAT1 domain motifs	none
SPU_012692	SPU_012692	contains PDZ superfamily motif near C-terminus	none
SPU_012879	SPU_012879	S. purpuratus-specific protein	none
SPU_012981	SPU_012981	S. purpuratus-specific protein	none
SPU_013286	SPU_013286	appears to be S. purpuratus-specific	none
SPU_013333	SPU_013333	S. purpuratus-specific protein	none
SPU_013439	SPU_013439	S. purpuratus-specific	none
SPU_013603	SPU_013603	large (> 4,000 amino acids) S. purpuratus-specific protein	none
SPU_013643	SPU_013643	S. purpuratus-specific protein	none
SPU_013982	SPU_013982	contains 3 NDPk superfamily motifs. S. purpuratus-specific or echinoderm-specific.	none
SPU_014072	SPU_014072	Sp-specific	none
SPU_014335	SPU_014335	contains SMC_N domain. appears to be Branchiostoma floridae-specific.	none
SPU_014414	SPU_014414	contains COG4886 domain. appears to be Branchiostoma floridae-specific.	none
SPU_015797	SPU_015797	Sp-specific protein	none
SPU_015820	SPU_015820	contains 2 EGF_CA superfamily motifs	none
SPU_015906	SPU_015906	Sp-specific protein	none
SPU_015914	SPU_015914	Sp-specific protein	none
SPU_015951	SPU_015951	contains PRK03918 domain and PAT1 domain. Sp-specific protein.	none
SPU_016132	SPU_016132	Sp-specific protein	none
SPU_016223	SPU_016223	contains PAT1 domain. Sp-specific protein.	none
SPU_016285	SPU_016285	contains LIM superfamily motif at N-terminus. Sp-specific.	none
SPU_016485	SPU_016485	Sp-specific protein	none
SPU_008282	SPU_008282	contains 2 BTB superfamily motifs and ATS1 domain	none
SPU_015416	SPU_015416	contains INCEN superfamily motif near C-terminus	none
SPU_005776	SPU_005776	contains 2 WD40 superfamily motifs	none
SPU_011304	SPU_011304	contains 4 Kelch-like superfamily motifs	none
SPU_008893	SPU_008893	contains 3 E_set superfamily motifs	none
SPU_016478	SPU_016478	contains 3 Hyd_WA superfamily motifs	none
SPU_009103	SPU_009103	contains 4 P_loop_NTPase superfamily motifs	none
SPU_001139	SPU_001139	contains 2 Glyco_hydro_31 superfamily motifs	none
SPU_008933	SPU_008933	contains CYK3 domain	none
SPU_003926	SPU_003926	contains 5 LamG superfamily motifs and PRK02224 domain	none
SPU_000376	SPU_000376	contains LRR_RI superfamily motif near N-terminus and PDZ superfamily motif at C-terminus	none
SPU_005309	SPU_005309	contains COG4886 domain	none
SPU_002862	SPU_002862	contains COG4886 domain	none
SPU_002072	SPU_002072	contains 3 MFS superfamily motifs and MFS_1 domain	none
SPU_011525	SPU_011525	contains 8 MAM superfamily motifs	none
SPU_001261	SPU_001261	contains MDN1 domain and Smc domain	none
SPU_008413	SPU_008413	contains 2 DNA_pol_phi superfamily motifs	none
SPU_004269	SPU_004269	contains 2 PHR domain motifs	none
SPU_008030	SPU_008030	contains PHR domain	none
SPU_015420	SPU_015420	contains 3 PHD superfamily motifs	none
SPU_013478	SPU_013478	contains 3 WD40 superfamily motifs and COG2319 domain	none
SPU_014279	SPU_014279	contains CH superfamily motif near N-terminus	none
SPU_001232	SPU_001232	contains SMC_N domain	none
SPU_015653	SPU_015653	contains SMC_N domain and mukB domain	none
SPU_002313	SPU_002313	contains Chaperonin-like superfamily motif near N-terminus	none
SPU_016227	SPU_016227	contains 2 Smc domain motifs	none
SPU_004772	SPU_004772	contains MIP-T3 domain and RecD domain	none
SPU_001338	SPU_001338	homologous to numerous Danio rerio genes	none
SPU_000495	SPU_000495	contains PAT1 domain	none
SPU_013890	SPU_013890	contains Smc domain	none
SPU_012581	SPU_012581	contains COG1112 domain	none
SPU_004588	SPU_004588	contains COG2940 domain	none
SPU_002571	SPU_002571	contains HepA domain. also homologous to Drosophila helicase domino.	none
SPU_010617	SPU_010617	contains OATP domain and MFS_1 domain motifs	none
SPU_014441	SPU_014441	contains 5 SPEC superfamily motifs	none
SPU_013237	SPU_013237	contains 2 CH superfamily motifs and 8 SPEC superfamily motifs and 2 SbcC domain motifs and 3 Smc domain motifs. very large protein (11475 amino acids).	none
SPU_005151	SPU_005151	contains 14 SPEC superfamily motifs	none
SPU_008368	SPU_008368	contains 4 SPEC superfamily motifs	none
SPU_002664	SPU_002664	contains 3 TPR superfamily motifs	none
SPU_008214	SPU_008214	contains 2 P_loop_NTPase superfamily motifs and Smc domain	none
SPU_007052	SPU_007052	contains 4 LDLa superfamily motifs	none
SPU_014258	SPU_014258	contains 4 CCP superfamily motifs	none
SPU_011098	SPU_011098	contains 5 CCP superfamily motifs	none
SPU_011309	SPU_011309	contains 4 CCP superfamily motifs and 4 HYR superfamily motifs and 3 GCC2_GCC3 superfamily motifs	none
SPU_012778	SPU_012778	contains Metallo-dependent_hydrolases superfamily motif at C-terminus	none
SPU_013785	SPU_013785	contains MDN1 domain	none
SPU_004998	SPU_004998	contains 3 TPR superfamily motifs	none
SPU_002498	SPU_002498	contains 10 TPR superfamily motifs and NrfG domain	none
SPU_008179	SPU_008179	contains 3 ANK superfamily motifs and Arp domain motifs	none
SPU_008241	SPU_008241	contains 2 Smc domain motifs and PRK02224 domain	none
SPU_010006	SPU_010006	contains 6 IG superfamily motifs	none
SPU_006656	SPU_006656	contains 2 TPR superfamily motifs	none
SPU_006258	SPU_006258	contains HECTc domain	none
SPU_011281	SPU_011281	contains Peptidase_C19 superfamily motif at N-terminus	none
SPU_008938	SPU_008938	contains UBP14 domain	none
SPU_009889	SPU_009889	contains MRS6 domain	none
SPU_010637	SPU_010637	contains WD40 superfamily motif near N-terminus	none
SPU_009683	SPU_009683	contains WD40 superfamily motif at C-terminus	none
SPU_011144	SPU_011144	contains 2 WD40 superfamily motifs and COG2319 domain	none
SPU_004246	SPU_004246	contains 2 WD40 superfamily motifs	none
SPU_010350	SPU_010350	contains 2 WD40 superfamily motifs	none
SPU_015882	SPU_015882	contains SFP domain	none
SPU_004455	SPU_004455	contains 2 ANK superfamily motifs and 2 Smc domain motifs and Arp domain and PRK09039 domain	none
SPU_000582	SPU_000582	contains HECTc domain	none
SPU_004314	SPU_004314	contains 2 Herpes_LMP superfamily motifs	none
SPU_015197	SPU_015197	contains 3 HYR superfamily motifs. Sp-specific protein except for N-terminal 100 amino acids that share some homologies with other metazoans.	none
SPU_009431	SPU_009431	contains DNA_pol_B_2 domain and POLBc domain	none
SPU_009485	SPU_009485	appears to be S. purpuratus-specific	none
SPU_009808	SPU_009808	contains SET superfamily motif at N-terminus and MDN1 domain	none
SPU_013317	SPU_013317	contains Smc domain and SMC_N domain	none
SPU_015950	SPU_015950	specific to S. purpuratus and Branchiostoma floridae	none
SPU_012900	SPU_012900	contains Nucleoporin2 superfamily motif near N-terminus	none
SPU_015412	SPU_015412	contains 3 PDZ superfamily motifs	none
SPU_001249	SPU_001249	contains 15 ANK superfamily motifs and 2 Arp domain motifs	none
SPU_014679	SPU_014679	contains 6 ANK superfamily motifs and Arp domain motifs	none
SPU_014779	SPU_014779	contains 10 ANK superfamily motifs and Arp domain motifs	none
SPU_014866	SPU_014866	contains 7 ANK superfamily motifs and 2 ZU5 superfamily motifs and Arp domain motifs	none
SPU_014883	SPU_014883	contains 5 ANK superfamily motifs and Arp domain motifs	none
SPU_014953	SPU_014953	contains 6 ANK superfamily motifs and Arp domain motifs	none
SPU_015024	SPU_015024	contains 6 ANK superfamily motifs and Arp domain motifs and Herpes_BLLF1 domain	none
SPU_015058	SPU_015058	contains 13 ANK superfamily motifs and Arp domain motifs	none
SPU_015270	SPU_015270	contains 15 ANK superfamily motifs and Arp domain motifs	none
SPU_015431	SPU_015431	contains 20 ANK superfamily motifs and 2 ZU5 superfamily motifs and Arp domain motifs	none
SPU_015436	SPU_015436	contains 22 ANK superfamily motifs and Arp domain motifs	none
SPU_001383	SPU_001383	contains 26 ANK superfamily motifs and 2 Arp domain motifs	none
SPU_015498	SPU_015498	contains 9 ANK superfamily motifs and Arp domain motifs	none
SPU_015875	SPU_015875	contains 8 ANK superfamily motifs and Arp domain motifs	none
SPU_016034	SPU_016034	contains 14 ANK superfamily motifs and Arp domain motifs	none
SPU_016479	SPU_016479	contains 7 ANK superfamily motifs and Arp domain motifs	none
SPU_016616	SPU_016616	contains 10 ANK superfamily motifs and Arp domain motifs	none
SPU_001385	SPU_001385	contains 20 ANK superfamily motifs and Arp domain	none
SPU_001430	SPU_001430	contains 7 ANK superfamily motifs and Arp domain	none
SPU_001432	SPU_001432	contains 6 ANK superfamily motifs and Arp domain	none
SPU_001433	SPU_001433	contains 14 ANK superfamily motifs	none
SPU_001611	SPU_001611	contains 27 ANK superfamily motifs and 2 Arp domain motifs	none
SPU_001685	SPU_001685	contains 3 ANK superfamily motifs	none
SPU_001820	SPU_001820	contains 10 ANK superfamily motifs	none
SPU_001871	SPU_001871	contains 9 ANK superfamily motifs and Arp domain	none
SPU_000327	SPU_000327	contains 3 ANK superfamily motifs	none
SPU_001988	SPU_001988	contains 10 ANK superfamily motifs and Arp domain	none
SPU_002006	SPU_002006	contains 14 ANK superfamily motifs and Arp domain	none
SPU_002247	SPU_002247	contains 8 ANK superfamily motifs and Arp domain	none
SPU_002415	SPU_002415	contains 20 ANK superfamily motifs	none
SPU_002468	SPU_002468	contains 13 ANK superfamily motifs and 2 Arp domain motifs	none
SPU_002778	SPU_002778	contains 4 ANK superfamily motifs and 2 ZU5 superfamily motifs and 2 Arp domain motifs	none
SPU_002848	SPU_002848	contains 12 ANK superfamily motifs and 2 Arp domain motifs	none
SPU_002900	SPU_002900	contains 15 ANK superfamily motifs and 2 Arp domain motifs	none
SPU_002943	SPU_002943	contains 15 ANK superfamily motifs and 2 Arp domain motifs	none
SPU_000476	SPU_000476	contains 8 ANK superfamily motifs and Arp domain	none
SPU_003370	SPU_003370	contains 11 ANK superfamily motifs and 3 Arp domain motifs	none
SPU_003600	SPU_003600	contains 4 ANK superfamily motifs	none
SPU_003809	SPU_003809	contains 19 ANK superfamily motifs and 4 Arp domain motifs	none
SPU_003834	SPU_003834	contains 15 ANK superfamily motifs and 2 Arp domain motifs	none
SPU_003935	SPU_003935	contains 9 ANK superfamily motifs and 2 Arp domain motifs	none
SPU_004426	SPU_004426	contains 12 ANK superfamily motifs and 2 Arp domain motifs	none
SPU_004512	SPU_004512	contains 8 ANK superfamily motifs and 2 Arp domain motifs	none
SPU_005023	SPU_005023	contains 9 ANK superfamily motifs and Arp domain	none
SPU_005024	SPU_005024	contains 17 ANK superfamily motifs and many Arp domain motifs	none
SPU_005027	SPU_005027	contains 9 ANK superfamily motifs and many Arp domain motifs	none
SPU_000499	SPU_000499	contains 12 ANK superfamily motifs and 2 ZU5 superfamily motifs	none
SPU_005066	SPU_005066	contains 16 ANK superfamily motifs and many Arp domain motifs	none
SPU_005094	SPU_005094	contains 14 ANK superfamily motifs and many Arp domain motifs	none
SPU_005235	SPU_005235	contains 3 ANK superfamily motifs	none
SPU_005465	SPU_005465	contains 12 ANK superfamily motifs and 2 Arp domain motifs	none
SPU_005619	SPU_005619	contains 6 ANK superfamily motifs and many Arp domain motifs	none
SPU_005858	SPU_005858	contains 3 ANK superfamily motifs and Arp domain motifs	none
SPU_005928	SPU_005928	contains 6 ANK superfamily motifs	none
SPU_005970	SPU_005970	contains 4 ANK superfamily motifs	none
SPU_005975	SPU_005975	contains 9 ANK superfamily motifs and Arp domain motifs	none
SPU_005997	SPU_005997	contains 11 ANK superfamily motifs and Arp domain motifs	none
SPU_000697	SPU_000697	contains 9 ANK superfamily motifs and 2 ZU5 superfamily motifs	none
SPU_006051	SPU_006051	contains 7 ANK superfamily motifs and Arp domain motifs	none
SPU_006131	SPU_006131	contains 12 ANK superfamily motifs	none
SPU_006327	SPU_006327	contains 12 ANK superfamily motifs and Arp domain motifs	none
SPU_006518	SPU_006518	contains 18 ANK superfamily motifs and Arp domain motifs	none
SPU_006615	SPU_006615	contains 18 ANK superfamily motifs and Arp domain motifs	none
SPU_006832	SPU_006832	contains 13 ANK superfamily motifs and 2 ZU5 superfamily motifs and Arp domain motifs	none
SPU_007123	SPU_007123	contains 4 ANK superfamily motifs and 2 ZU5 superfamily motifs and 2 Death superfamily motifs	none
SPU_007298	SPU_007298	contains 5 ANK superfamily motifs	none
SPU_007332	SPU_007332	contains 11 ANK superfamily motifs	none
SPU_008014	SPU_008014	contains 11 ANK superfamily motifs and Arp domain motifs	none
SPU_001017	SPU_001017	contains 5 ANK superfamily motifs and 2 Arp domain motifs	none
SPU_008437	SPU_008437	contains 10 ANK superfamily motifs and Arp domains	none
SPU_008887	SPU_008887	contains 12 ANK superfamily motifs and Arp domains	none
SPU_009034	SPU_009034	contains 12 ANK superfamily motifs and Arp domains	none
SPU_009792	SPU_009792	contains 9 ANK superfamily motifs and Arp domains	none
SPU_009823	SPU_009823	contains 12 ANK superfamily motifs and Arp domains	none
SPU_009997	SPU_009997	contains 12 ANK superfamily motifs and Arp domains	none
SPU_010676	SPU_010676	contains 14 ANK superfamily motifs and Arp domain motifs	none
SPU_011004	SPU_011004	contains 5 ANK superfamily motifs and Arp domain motifs	none
SPU_011006	SPU_011006	contains 3 ANK superfamily motifs and Arp domain motifs	none
SPU_001018	SPU_001018	contains 10 ANK superfamily motifs and Arp domain	none
SPU_011021	SPU_011021	contains 7 ANK superfamily motifs and Arp domain motifs	none
SPU_011341	SPU_011341	contains 9 ANK superfamily motifs and Arp domain motifs	none
SPU_011359	SPU_011359	contains 17 ANK superfamily motifs and Arp domain motifs	none
SPU_011500	SPU_011500	contains 17 ANK superfamily motifs and Arp domain motifs	none
SPU_011598	SPU_011598	contains 5 ANK superfamily motifs and Arp domain motifs	none
SPU_011599	SPU_011599	contains 22 ANK superfamily motifs and Arp domain motifs	none
SPU_011748	SPU_011748	contains 12 ANK superfamily motifs and Arp domain motifs	none
SPU_011898	SPU_011898	contains 11 ANK superfamily motifs and Arp domain motifs	none
SPU_011950	SPU_011950	contains 5 ANK superfamily motifs and Arp domain motifs	none
SPU_012058	SPU_012058	contains 11 ANK superfamily motifs and Arp domain motifs	none
SPU_001163	SPU_001163	contains 11 ANK superfamily motifs and Arp domain	none
SPU_012059	SPU_012059	contains 8 ANK superfamily motifs and Arp domain motifs	none
SPU_012060	SPU_012060	contains 25 ANK superfamily motifs and Arp domain motifs	none
SPU_012153	SPU_012153	contains 31 ANK superfamily motifs and Arp domain motifs	none
SPU_012267	SPU_012267	contains 13 ANK superfamily motifs and Arp domain motifs	none
SPU_012392	SPU_012392	contains 13 ANK superfamily motifs and Arp domain motifs	none
SPU_012587	SPU_012587	contains 8 ANK superfamily motifs and Arp domain motifs	none
SPU_013050	SPU_013050	contains 7 ANK superfamily motifs and Arp domain motifs	none
SPU_013210	SPU_013210	contains 19 ANK superfamily motifs and Arp domain motifs	none
SPU_013213	SPU_013213	contains 11 ANK superfamily motifs and Arp domain motifs	none
SPU_013221	SPU_013221	contains 8 ANK superfamily motifs and Arp domain motifs	none
SPU_001205	SPU_001205	contains 10 ANK superfamily motifs and Arp domain	none
SPU_013239	SPU_013239	contains 10 ANK superfamily motifs and Arp domain motifs	none
SPU_013418	SPU_013418	contains 7 ANK superfamily motifs and Arp domain motifs	none
SPU_013421	SPU_013421	contains 5 ANK superfamily motifs and Arp domain motifs	none
SPU_013776	SPU_013776	contains 6 ANK superfamily motifs and Arp domain motifs	none
SPU_014018	SPU_014018	contains 15 ANK superfamily motifs and Arp domain motifs	none
SPU_014096	SPU_014096	contains 17 ANK superfamily motifs and Arp domain motifs	none
SPU_014159	SPU_014159	contains 8 ANK superfamily motifs and Arp domain motifs	none
SPU_014484	SPU_014484	contains 23 ANK superfamily motifs and Arp domain motifs	none
SPU_014601	SPU_014601	contains 9 ANK superfamily motifs and Arp domain motifs	none
SPU_014661	SPU_014661	contains 18 ANK superfamily motifs and Arp domain motifs	none
SPU_010011	SPU_010011	contains 5 ANK superfamily motifs	none
SPU_001237	SPU_001237	contains MDN1 domain	none
SPU_001449	SPU_001449	contains 22 EGF-like superfamily motifs in tandem and 10 SRCR superfamily motifs in tandem. also homologous to Drosophila notch.	none
SPU_012322	SPU_012322	contains 4 FA58C superfamily motifs	none
SPU_006881	SPU_006881	contains 4 EGF_CA superfamily motifs and Trypan_PARP domain	none
SPU_011308	SPU_011308	contains 3 GCC2_GCC3 superfamily motifs and 4 EGF-like superfamily motifs and 3 CCP superfamily motifs	none
SPU_000918	SPU_000918	contains 10 EGF-like superfamily motifs and 3 HYR superfamily motifs and 2 LY (low density lipoprotein receptor) superfamily motifs	none
SPU_000635	SPU_000635	the protein consists of 3 domains each of which contains RT-like superfamily motif, RNase superfamily motif, and rve superfamily motif.	none
SPU_001918	SPU_001918	contains 2 7tm_2 superfamily motifs	none
SPU_001921	SPU_001921	contains SET superfamily motif toward N-terminus	none
SPU_008222	SPU_008222	contains 12 MAM superfamily motifs and 5 LDLa superfamily motifs	none
SPU_014415	SPU_014415	contains 14 EGF_CA superfamily motifs. member of very large Sp-specific protein families.	none
SPU_014530	SPU_014530	Sp-specific protein with some homologies to Branchiostoma floridae proteins	none
SPU_015146	SPU_015146	contains MDN1 domain. appears to be a member of Sp-specific protein family.	none
SPU_006619	SPU_006619	contains COG5635 domain and PAT1 domain. belongs to a large group of proteins specific to S. purpuratus.	none
SPU_006858	SPU_006858	contains RT-like superfamily motif at C-terminus. appears to belong to a large S. purpuratus-specific group of proteins.	none
SPU_007677	SPU_007677	contains 5 CCP superfamily motifs and 4 HYR superfamily motifs	none
SPU_005200	SPU_005200	contains 2 MAM superfamily motifs at N-terminus	none
SPU_008888	SPU_008888	contains 9 HYR superfamily motifs. member of very large S. purpuratus-specific protein family.	none
SPU_008967	SPU_008967	contains SMC_N domain. member of a large S. purpuratus-specific protein family.	none
SPU_009536	SPU_009536	member of very large S. purpuratus-specific nuclease/reverse transcriptase-like proteins	none
SPU_009591	SPU_009591	member of very large S. purpuratus-specific nuclease/reverse transcriptase-like proteins	none
SPU_009608	SPU_009608	contains 4 IG superfamily motifs and V-set domain	none
SPU_009794	SPU_009794	member of large glycoprotein fucotransferase-like, S. purpuratus-specific protein family	none
SPU_005388	SPU_005388	contains 2 RING superfamily motifs and PRK03918 domain	none
SPU_010134	SPU_010134	member of very large S. purpuratus-specific nuclease/reverse transcriptase-like proteins	none
SPU_010153	SPU_010153	member of very large S. purpuratus-specific protein family	none
SPU_010684	SPU_010684	contains Herpes_BLLF1 domain	none
SPU_011771	SPU_011771	contains 4 CUB superfamily motifs and 7 EGF_CA superfamily motifs	none
SPU_011782	SPU_011782	contains 5 CCP superfamily motifs and 10 HYR superfamily motifs	none
SPU_013077	SPU_013077	contains 8 EGF_CA superfamily motifs and 3 CCP superfamily motifs	none
SPU_015813	SPU_015813	contains 11 HYR superfamily motifs	none
SPU_015879	SPU_015879	contains 2 MAM superfamily motifs	none
SPU_016322	SPU_016322	contains SET superfamily motif near N-terminus	none
SPU_016424	SPU_016424	contains Smc domain and SMC_N domain. member of Sp-specific protein family.	none
SPU_016448	SPU_016448	contains SET superfamily motif at N-terminus	none
SPU_016473	SPU_016473	specific to Nematostella vectensis, Branchiostoma floridae, and Strongylocentrotus purpuratus	none
SPU_016179	SPU_016179	contains 8 LDLb superfamily motifs and 2 IG superfamily motifs and COG3391 domain	none
SPU_016181	SPU_016181	contains 8 LDLb superfamily motifs and COG3386 domain	none
SPU_003671	SPU_003671	contains 7 LDLa superfamily motifs and 3 CUB superfamily motifs	none
SPU_000737	SPU_000737	contains Macoilin domain	none
SPU_001336	SPU_001336	contains 6 MAM superfamily motifs	none
SPU_014016	SPU_014016	contains 5 MAM superfamily motifs and PRK08404 domain	none
SPU_000725	SPU_000725	contains 6 MAM superfamily motifs in tandem	none
SPU_001753	SPU_001753	contains 8 MAM superfamily motifs	none
SPU_002133	SPU_002133	contains 29 MAM superfamily motifs and 18 LDLa superfamily motifs	none
SPU_014087	SPU_014087	contains 6 MAM superfamily motifs	none
SPU_003589	SPU_003589	contains 5 ANK superfamily motifs. also similar to ankyrin2,3/unc44.	none
SPU_003258	SPU_003258	contains 12 SPEC superfamily motifs	none
SPU_003261	SPU_003261	contains 22 SPEC superfamily motifs	none
SPU_002173	SPU_002173	contains COG1112 domain and Keratin_B2 domain	none
SPU_014734	SPU_014734	contains Gag_spuma domain and COG1112 domain	none
SPU_015792	SPU_015792	contains 20 EGF_CA superfamily motifs	none
SPU_007264	SPU_007264	contains 2 EGF_CA superfamily motifs	none
SPU_008945	SPU_008945	contains 2 EGF_CA superfamily motifs and PRK03427 cell division protein motif	none
SPU_013491	SPU_013491	contains 4 EGF_CA superfamily motifs	none
SPU_015009	SPU_015009	contains 2 CUB superfamily motifs and 10 EGF_CA superfamily motifs	none
SPU_000675	SPU_000675	contains 15 EFG superfamily motifs in tandem and 2 CUB superfamily motifs in tandem	none
SPU_006225	SPU_006225	contains 3 GCC2_GCC3 superfamily motifs	none
SPU_014412	SPU_014412	contains 7 EGF_CA superfamily motifs	none
SPU_002783	SPU_002783	contains 2 MAM superfamily motifs and 7 EGF_C superfamily motifs	none
SPU_013808	SPU_013808	contains rne domain	none
SPU_002127	SPU_002127	contains PRK12678 domain	none
SPU_003319	SPU_003319	contains 2 motifs of RT-like superfamily, Peptidase_A17 superfamily, and rve superfamily. appears to represent a duplication of the region containing the 3 superfamily motifs.	none
SPU_004278	SPU_004278	contains 2 RVP superfamily motifs and 2 RT-like superfamily motifs that appears to be a duplication of RVP-RT-like structure	none
SPU_005695	SPU_005695	contains AIR1 domain	none
SPU_007074	SPU_007074	contains 2 SRCR superfamily motifs at N-terminus	none
SPU_010235	SPU_010235	contains PRK12678 domain	none
SPU_011612	SPU_011612	contains 2 motifs of RT-like superfamily and Peptidase_A17 superfamily, respectively	none
SPU_012595	SPU_012595	contains AIR1 domain	none
SPU_013702	SPU_013702	contains 3 HYR superfamily motifs and 2 GCC2_GCC3 superfamily motifs	none
SPU_002021	SPU_002021	contains 2 CCP superfamily motifs	none
SPU_004333	SPU_004333	contains f1hF domain and COG1112 domain	none
SPU_013438	SPU_013438	contains COG1112 domain	none
SPU_002564	SPU_002564	contains COG1112 domain	none
SPU_002278	SPU_002278	contains 2 rne domain motifs and PRK07003 domain	none
SPU_009015	SPU_009015	contains 10 LDLa superfamily motifs and 4 CUB superfamily motifs	none
SPU_014758	SPU_014758	contains 3 zf-RING superfamily motifs and 2 PH-like superfamily motifs	none
SPU_000834	SPU_000834	contains 4 CUB superfamily motifs and 2 EGF-like superfamily motifs	none
SPU_006000	SPU_006000	contains 4 EGF_CA superfamily motifs	none
SPU_006001	SPU_006001	contains 6 EGF_CA superfamily motifs	none
SPU_003055	SPU_003055	also homologous to pol polyprotein and various proteins	none
SPU_003056	SPU_003056	also homologous to pol polyprotein and various proteins	none
SPU_003057	SPU_003057	also homologous to pol polyprotein and various proteins	none
SPU_003059	SPU_003059	also homologous to pol polyprotein and various proteins	none
SPU_003060	SPU_003060	also homologous to pol polyprotein and various proteins	none
SPU_003061	SPU_003061	contains 2 rve superfamily motifs and 2 RT-like motifs. appears to be a duplication of rve and RT-like motif protein. also homologous to pol polyprotein and various proteins.	none
SPU_003062	SPU_003062	also homologous to pol polyprotein and various proteins	none
SPU_003063	SPU_003063	also homologous to pol polyprotein and various proteins	none
SPU_016187	SPU_016187	contains COG1112 domain	none
SPU_002163	SPU_002163	contains COG1112 domain and Furlin-like_cysteine-rich domain	none
SPU_002705	SPU_002705	contains PRK12678 domain and SSL2 domain and COG1112 domain	none
SPU_002706	SPU_002706	contains COG1112 domain	none
SPU_006468	SPU_006468	contains COG1112 domain	none
SPU_011697	SPU_011697	contains COG1112 domain	none
SPU_021295	SPU_021295	none	 partial, missing C-terminus\n
SPU_024836	SPU_024836	When reviewing the data it was apparent that the data provided by the excel spreadsheet did not come up as 	possible non-coding exon included in glean annotation\n
SPU_002947	SPU_002947	none	Only domains it contains is a portion of a metalloprotease domain.\n
SPU_000580	SPU_000580	none	2 SRCR domains. Probably partial.\n
SPU_001101	SPU_001101	none	SPU_001101 matches the first half of RBM19. SPU_006990 matches the latter half.\n
SPU_014788	SPU_014788	none	Partial duplication of SPU_000838.\n
SPU_011785	SPU_011785	none	The terminal part of this GLEAN is actually the first exon of a GST protein. Modified seqs are below: \nDNA: \nATGGCTTTAGCTCAGGAGGAGCTGACCATGATGAAGGGGAAGATCAACAGCCAGAAAGAGATGGGTAAACAC AAACTGGAAGGAGAACTTACCCAGCTGAAAGAGGTTCCAAGTTACCATTCAGATCTGTGCACGATTCCCAGT GTTGTTGCTGATGTCTGCAGTGACACAAGAGACCTCTCAGGGTTGGACATTGCTAAAAAGCGTGTGGCCCTG ATGGAACTCCTGCAGAAGCAGTCACAGCGTAAGGCACTAGCTGCCCTGGAAAGGCGACAGGCAACCCTTGAG GTGCAAGAAAGAACTGCTCAATTGATTCTGAAACATGAGGAGCAAAAACTTCAGGATTTTGAAACTGATCCA CGTTTGCTTGAGGGAGAGTTCGGTTCAGAGCAGTCACCTGAAAAAGAGCTTGGAGATCGTGACATTGAGAAC ACAGTGGATAGTTCTACAGAAAATGTGAGCACAGAATTGCCAGCATCAGGTCGAGTCAGTCCTGTAGGAAAC AGTGACCCTGAAACAATGCATCACAACAATGTTCCTTCTCCGAAGAAAGAGACTAAGCGCATCCAAAAGCAG GAAAACAAAGTTAAGACTATGGATCAAAAGAAGGACCAAACAAACAGTTCAAAGAAAAATTCCAGAAAAACT GATGCTAGACCTACTAGAAAGGTTGAATTAGGTCCTTCAAAAACTAAACCCAGGGAAGAACAAATAAAGAGG GAAAGTGATGGTCTATCTGCATCACCTTCAAGGTCTCCATCTCCTCCCAAAAGTTCAAGGTCAAGCTCGCCA AGTAAATCCTCGAGGTCCAGCTCACCCACCAAGTCACTAAGATCAGATTCACCAACAAAATCCTCAAGGTCC AACTCACCCACCAAGTCACTAAGATCAGATTCACCAACAAAATCCTCAAGGTCCAACTCACCCACCAAGTCA CTAAGATCAGATTCACCAACAAAATCCTCAAGGTCCAACTCACCCACCAAGTCATTAAGATCAGATTCACCA ACAAAATCTTCAAGGTCTGCATCTCCAACCAAATCTGTAAGGTCTGAATCACCGACCAAGTCATCAAGATCA TCTTCTCCTGCAAGTACCTCCTCGAGGCAGTCCAGGCAAACTTCAGGAAATGTTTTTTCTCGCCTGTATCCC CAGCAAGAAACAAAATTTAATTTTCTCAGGAAGAAGAGTCCTCCAAGATATGGTGAATATGATCCTCACAGA GAGAGAGACAGAGACTCTCCACACTTGATCAGGAGAGAACTGTCTCCTGTAGAGCATGACCCTATCAAGTTA AAGGACAAAAAACGTGAACATGTTCAGGACCATAAGCATGTGAGAACCAGCAGTGTGCCTTCAAGCAATCTT CCAGACAGAACAAAGCAAAATCTTCATTCTAGAAGTAGGAGTGCTAGTCCATGTGTAAATAATCCTGTAAAA CCAGCAACAAGGCCAAAGAAATCTCCAGTATCACAGTCCTCACAAGAAACTGAAAAGGCGGTTAGGAGAACT AAAGTGAAATCAAAAGACACAGCAAGTGCTCCACTTGATAGAACTCAAACTGATCCTGGTAATGAAAAACAA AATAATAAGTGCCAAACCAAAACCAAGCAAGCCCAACGCACATCTCCAAGTGAATCAACCCACTCCAGTCCA ACTAAAAGGAAATTAAATAAACCAGAAAACACTTCAAATGCAAAGACCCCTAAAGGCTCATCTGCCTCTCCT CGGTCAAGACCAATCATTGGGAAAGCAGTTGATAGAAGTCCATCTCCAAGGAAGAAATCAGAACAAACTCCA TCTCACCGGAAGACTGTAAAGAGGCCCCTTTCTAGGAGTCGCAACGATAAGGATGGTGAAAACACAAATTCT CCTAGTCCTAGGAGGAAATTTGCAAGAACTCCCGTTGTCAAACCAAAGAACTGGCCGTCCATAGAGAACTTA CCAAAGAGTACTCCCAAATCTGAAGCTTTCTATGTACCTCTCACTAGCGAACAGTTGAGGCTTGCACTTCAG AAGCACGCCAGTGAGCAGGACAGTGGACCACATCTGGAAGGTGCAGATTTGGGGGCTTCAGGATGTGAACAG CACTGGAGTCCCAGTCGTAAGAGGTCTATTGGAAAGCAGCCCCAGCAAACACAAACAGATCCAATTCAGGCC AGCATAGGAGGAGAGATATTTGGTTTGGAAATGGATGGAGTAAACAATTTAGAAAATGAGGAAGATCACTAC AGCTCCTCAGAA \nProtein: \nMALAQEELTMMKGKINSQKEMGKHKLEGELTQLKEVPSYHSDLCTIPSVVADVCSDTRDLSGLDIAKKRVAL MELLQKQSQRKALAALERRQATLEVQERTAQLILKHEEQKLQDFETDPRLLEGEFGSEQSPEKELGDRDIEN TVDSSTENVSTELPASGRVSPVGNSDPETMHHNNVPSPKKETKRIQKQENKVKTMDQKKDQTNSSKKNSRKT DARPTRKVELGPSKTKPREEQIKRESDGLSASPSRSPSPPKSSRSSSPSKSSRSSSPTKSLRSDSPTKSSRS NSPTKSLRSDSPTKSSRSNSPTKSLRSDSPTKSSRSNSPTKSLRSDSPTKSSRSASPTKSVRSESPTKSSRS SSPASTSSRQSRQTSGNVFSRLYPQQETKFNFLRKKSPPRYGEYDPHRERDRDSPHLIRRELSPVEHDPIKL KDKKREHVQDHKHVRTSSVPSSNLPDRTKQNLHSRSRSASPCVNNPVKPATRPKKSPVSQSSQETEKAVRRT KVKSKDTASAPLDRTQTDPGNEKQNNKCQTKTKQAQRTSPSESTHSSPTKRKLNKPENTSNAKTPKGSSASP RSRPIIGKAVDRSPSPRKKSEQTPSHRKTVKRPLSRSRNDKDGENTNSPSPRRKFARTPVVKPKNWPSIENL PKSTPKSEAFYVPLTSEQLRLALQKHASEQDSGPHLEGADLGASGCEQHWSPSRKRSIGKQPQQTQTDPIQA SIGGEIFGLEMDGVNNLENEEDHYSSSE\n
SPU_010593	SPU_010593	none	This genes spans three GLEAN predictions: SPU_010593 + SPU_013086 + SPU_003874 \nThe SPU_013086 and SPU_003874 predictions are overlapping (exons 3 and 4); gene duplication probably due to assembly \n3'UTR of this gene is missing \n \nExon \tStart \tStop \tScaffold \n1\t24676\t24229\t23709 \n1?\t2159\t1712\t12273 \n2\t13236\t13078\t23709 \n3\t95271\t95968\t1679 \n3?\t5804\t5107\t38156 \n4\t96341\t96594\t1679 \n4?\t4742\t4489\t38156 \n5\t106938\t107067\t1679 \n6\t109648\t109828\t1679 \n7\t110254\t110486? 1679\t3'UTR missing \nDatabase version 2005/07/18\n
SPU_005979	SPU_005979	none	This gene spans two GLEAN predictions: SPU_005979 + SPU_003084 \n \nExon \tStart \tStop \tScaffold \n1\t58051\t57408\t1179 \n1?\t54625\t55052?  21642\tincomplete \n2\t19833\t19675\t1179 \n3\t16229\t16386\t1201 \n4\t17471\t17710\t1201 \n5\t18192\t18357\t1201 \n6\t31949\t32079\t1201 \n7\t40928\t41191\t1201 \n8\t41532\t41731\t1201 \n9\t42426\t42601\t1201 \n10\t43112\t44008\t1201 \nDatabase version 2005/07/18\n
SPU_006815	SPU_006815	none	NOTE: Based on sequence, alignment and domain analysis--This look like the 3' end of a single OPA gene that is composed of GLEANS 09807 (5'end)  and 06815 (3'end), with no gaps between.\n
SPU_009673	SPU_009673	none	NOTE: No signal in tiling array and no EST.  May be pseudogene or adult expressed.\n
SPU_024010	SPU_024010	none	this may be a gene fragment\n
SPU_003825	SPU_003825	none	added 3'UTR based on EST evidence\n
SPU_003898	SPU_003898	For this particular GLEAN model, there was no Cds information available from both Baylor annotations and SpBase. However, after reviewing the data from the excel file, it appears that the sequence is distributed onto 3 different scaffolds. If the three different scaffolds were combined, the sequence would have an orderly continuous arrangement without any repeats or gaps present.\nAdditional gene information from Baylor annotations (comments):\nPROBLEM: in the scaffold200, where this prediction resides, regions containing exons 4 and 5 are duplicated: potential assembly problem!!! this messed up the original prediction quite badly.	PROBLEM: in the scaffold200, where this prediction resides, regions containing exons 4 and 5 are duplicated: potential assembly problem!!! this messed up the original prediction quite badly.\n
SPU_026371	SPU_026371	none	Exon 9-13 of this gene are located on another scaffold (scaffold 73533 with GLEAN_09526 gene model). \n
SPU_021816	SPU_021816	none	Added 3' exon (7107-7167) and modified next exon (8251-8350)to agree with known cDNA.\n
SPU_020677	SPU_020677	none	the first exons are wrong predicted. starting with the third exon the prediction is correct. \nuse the accession number NP_999702.1 for the full lenght cDNA\n
SPU_009173	SPU_009173	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR. \nThis is a member of sea urchin-specific Tlr Group I(orphan). \n
SPU_009435	SPU_009435	none	#\nIntronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(13 to 23), LRR-CT, TM and TIR.  \n
SPU_027685	SPU_027685	After reviewing the data and performing a BLAST search it appears that the sequence is distributed onto 2 different scaffolds. When comparing the BLAST results with the excel data it appears that there are different base pair results in the BLAST sequence than was provided in the excel data. The excel data begins at 39 and ends at 443, while the BLAST sequence begins at 1 and ends at 1329. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with scores less than 5.	This gene shows significant similarity to vertebrate SH2-B family members. However, pairwise alignments also suggest there is additional sequence missing from this model. A better alignment to vertebrate SH2-B proteins can be reconstructed if two Fgenesh++ models located in two separate scaffolds are joined. The sequence continuity of these scaffolds is supported by genomic sequence alignments (scaffold83718-to-scaffold53583-to-scaffold1921). A putative N-ter region of this gene is still missing.\n
SPU_001049	SPU_001049	For this particular GLEAN model there was no Cds information available from either Baylor annotations or SpBase. There was however mRNA information available from SpBase. There was Est support from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5.	This is part of a gene that is contained on multiple scaffolds.  See SPU_010876 for full annotation.\n
SPU_027162	SPU_027162	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(15 to 24), LRR-CT, TM and TIR.  \n
SPU_027164	SPU_027164	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(12 to 21), LRR-CT, TM and TIR.  \n
SPU_023261	SPU_023261	none	There is a missing Amino Acid 61-159 \nThat is probably lost in sequence of  \nNNNNNNN\n
SPU_003514	SPU_003514	After reviewing the data and performing a BLAST search it appears that the sequence is distributed onto 2 different scaffolds. The first scaffold covers the first 400 bases or so, and then the second scaffold continues the rest of the sequence. If the two scaffolds were combined the sequence would have an orderly continuous arrangement (after discarding the other repeated scaffolds.) There was Est information available from GBrowse assembly V0.5 and the transcriptome score intensity appeared to be weak with values ranging from1.5-5.5. 	The CDS is much longer than RhoA orthologs.  It is possible that the first exon(s) are not real CDS and that the real start site begins at bp 439 (based on sequence homology)\n
SPU_007342	SPU_007342	none	The protein coded by this sequence is identical to part of the sequence coded by an adjacent Glean model (SPU_007343), and might represent either a haplotype or assembly-originated duplication.\n
SPU_012805	SPU_012805	none	This SPU is part of the same gene as SPU_019224 (SpFrk), by fusing the two SPUs and aligning with HsFrk.  The two SPU sequences have been cloned out of egg cDNA (see mRNA sequence).\n
SPU_002633	SPU_002633	none	See also the paper, for the latest tree on Hox affinities:  \n \nCameron, R.A., Rowen, L., Nesbitt, R., Bloom, S., Rast, J.P., Berney, K., Arenas-Mena, C., Martinez, P., Lucas, S., Richardson, P.M., Davidson, E.H., Peterson, K. J. and Hood, L. (2005) Unusual gene order and organization of the sea urchin HOX cluster. J. Exp. Zoology. In press.\n
SPU_009762	SPU_009762	After reviewing the data and performing a BLAST search, it appears that there is no good fit for this particular GLEAN model. There is poor sequence coverage and numerous gaps within the sequence. There was some Est support available from GBrowse V0.5. This is an un-annotated gene so no additional information (comments) was available from Baylor annotations. 	Proteins structure suggests that Strabismus proteins may be members of the Ltap protein family (Kibar, 2001).\n
SPU_024336	SPU_024336	For this particular GLEAN model there is no CDS information available in either the SpBase search engine or on the Baylor annotations page. mRNA information was unavailable as well.\nAdditional information: Joins with model_SPU_024337.	Joins with SPU_024337.\n
SPU_024337	SPU_024337	none	Joins with other scaffolds.  Also note:  \n- exon 7 start is not defined because of unknown sequence (NNN...) \n- exon 14 is missing 1 bp after 29361 \n- exon 15 is missing 12 bp after 29180\n
SPU_021062	SPU_021062	none	Encodes the final exons of the gene on SPU_024337.\n
SPU_026549	SPU_026549	none	#\nAlignment with best blast sequence suggests that the model is correct.\n
SPU_011364	SPU_011364	none	#\nIn addition to many copies of these gene on the glean3 list, there are several scaffolds that contain excellent matches:  Scaffolds 25161 and 85005.\n
SPU_001399	SPU_001399	none	#\nOne of a set of 4 whole or parts of CPA-like genes\n
SPU_027819	SPU_027819	none	Possible gene duplication  \nSPU_016081\n
SPU_016081	SPU_016081	none	Possible gene duplication \nSPU_027819\n
SPU_023660	SPU_023660	none	Partial CDS.  Note there are repetitive elements in the best blast hit.\n
SPU_001638	SPU_001638	none	#\nSubgroup A thrombospondin with TSP type 1 repeats\n
SPU_019059	SPU_019059	none	partial CDS; 3 N'terminal exons are questionable as the sequences are not conserved with other metalloprotease1 genes. \n>SPU_019059|Scaffold64363|806|935| DNA_SRC: Scaffold64363 START: 806 STOP: 935 STRAND: +  \nATGCTTGGAAAGAAAGTGGAAGGATCCGGACTTGAAGATATCCTTTTGGAAGCTGGTCTGATGTCTTCTG \nGGTCTATAAAAGATGTGTCAACAACAGTGCGACAGGAGTCTGCATTGTCACAAGACAATG \n>SPU_019059|Scaffold64363|2797|2824| DNA_SRC: Scaffold64363 START: 2797 STOP: 2824 STRAND: +  \nTGCTGAGTGGAGGAGAAGCTATGCTGTT \n>SPU_019059|Scaffold64363|7517|7616| DNA_SRC: Scaffold64363 START: 7517 STOP: 7616 STRAND: +  \nTGCTGAGTGGAGGAGAAGCTATGCTGTTGTGAGTAAAGCCCAGGAGCGAGCAAAACAATACCAACCAGGA \nGACAGGCTCCATGGCTTCTCGGTGGAGAAA \n
SPU_028077	SPU_028077	none	partial CDS at end of scaffold\n
SPU_023734	SPU_023734	none	See SPU_017211. \n
SPU_001047	SPU_001047	none	partial CDS on short scaffold, 21370\n
SPU_007452	SPU_007452	From the excel data and the BLAST results, it is evident that this is the best match for this particular GLEAN model. The sequence appears to have an orderly and continuous arrangement without any gaps or internal repeats present. There was some Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being greater than 10. 	The first exon of Spec1 contains 5'UTR and one aminoacid: Met.SPU_007452 does not show the first exon.\n
SPU_025346	SPU_025346	none	Inspection of the tiling array suggests that glean may have missed the following exons: LVKSIGLYTYGLLLLLSSIQLLTAVRSMVKTIAHGDLQTVPFHMTKSRVIVQSRGHMEAIVLTKSWRKAVNICVINRICSVNPSKRDILITYT\n
SPU_018211	SPU_018211	none	Unknown sequence (NNN...) in the intron of the current model could make this gene model incomplete. The nucleotides in the intron except NNN... have highly similar to other Sp-Tlr genes, so it was modified to a coding region.\n
SPU_028736	SPU_028736	none	There is an excellent, although short, match on Scaffold6547 that is not on the glean3 list. \n \nAlignment with best blast hit data suggest that this model may lack N-terminal sequence.\n
SPU_012194	SPU_012194	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 3 different scaffolds. When examining the excel data and comparing it to the BLAST results, it appears that the all three individual scaffolds have an orderly arrangement within their confines. If all three scaffolds were to be combined, the entire sequence would have an orderly and continuous arrangement without any gaps or repeats present. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong containing a wide distribution of values. 	Alignment with best blast hit sequence suggests that model is lacking C-terminal half and some N-terminal sequence.\n
SPU_003264	SPU_003264	none	haplotype=SPU_008197\n
SPU_028937	SPU_028937	From the excel data and the BLAST results, it appears that the sequence is distributed onto 2 different scaffolds. When reviewing the excel data, it is apparent that there are several internal repeats present resulting in several sequence overlaps. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak overall with most of the values being less than 10.	Alignment with best blast sequence suggests that this model may be complete. \n \nThis is one of 4 closely related NALAADase genes on Scaffold496.\n
SPU_025076	SPU_025076	none	Partial Toll-like receptor. This gene model is located at the end of a small scaffold.\n
SPU_009002	SPU_009002	none	Exons 8-28 are present on this scaffold772 and SPU_009002 prediction. Exons 1-7 are present on scaffold772 and SPU_020718 prediction\n
SPU_023985	SPU_023985	After reviewing the excel data and the BLAST results, it is evident that this is best fit for this particular GLEAN model. When reviewing the excel data, it was apparent that the sequence had an orderly arrangement without any gaps or repeats present. There was no Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with about half the values being greater than 10.	FOUR COMMENTS:  \n \n!) Alignment with best blast sequence suggests that this model may be a complete NAALAD2 gene, possibly lacking short N- and C-terminal sequences. \n \n2) It is linked to a reverse transcriptase elements which may not be part of the gene.  This element is on the following predicted exon: \n>SPU_023985|Scaffold113776|19389|20291| DNA_SRC: Scaffold113776 START: 19389 STOP: 20291 STRAND: +  \nTTTCCTTTCCGACCGTCATCAGGTGGTTCGACATCAAGGCGTGACATCAAGCCCAAAGAGCCTCGCATGC \nGGAGTACCTCAGGGTACAAAGTTGGGACCAATCCTATTCCTTGCCCTTGTTAACGATGCTGCCTTAACGT \nCAACATACCGATGGAAGTATGTTGACGACTTAAGTTTGGTGGAAGTCTTGCCTAAAACCCAGCAAAGTTC \nCTTACAGGAGTACGTTGATGAGCTCGGTGAATGGTGCGCCATTAATGACGTGACGCCAAAGCCCGAAAAA \nTGTAAGGCCATGCAAGTGTCTTTCTTGAAGAATCCTCTTCCTCATTTGGACATCACCATCGCAGATGTTC \nATCTTGAACGTGTTGATTCCTTGACTCTCCTTGGTGTCGCGATCCAATCAGACCTGAAATGGGATAATCA \nGGTCCAACAGATGATCTCACGGGCCGCTCGGAGACTGTACATTCTGAGTGTTCTGAAGAAATCTGGAGTC \nAACGCGAATGATCTAGTAACCATCTACAAAGCGTATATCCGTCCCCTGATGGAATTTGGTGTCCCTGTCT \nGGGGCTCCGGCATTACTAATACGCAGAGTGATAAAATCGAACGAATCCAAAGACGTGCGCTACGTTTCAT \nTGTGTATCCAGCTGACCTCTCCTACACACAACGGCTCACTCGTTTCAACTTGCCTATGTTGTGTGAACGC \nAGGAATGATCTCCTTCTACGCTTTGGACGTGGTCTCCTCAAGTCTGAACGGCATCGTGACATGCTACCTG \nCTACTCGTCAATGTGTCTCTCACCGCAGTTCAACACTGAGAAGTGCTCATCTACTAGACCTACAGCGTTG \nTAAAACCCAACGATATAGGAACTCTGCAATCCCGTTTTTAACACGAATGCTCAATTCTTCCAA \n \n3) This model is adjacent to a closely related gene model, SPU_023984, which lacks N-terminal half sequence. \n \n4) There are many excellent matches to this model in scaffolds that are not on the glean 3 list.  In addition there are 3 copies in the glean3 list.\n
SPU_026984	SPU_026984	From the BLAST results and the excel data, it is evident that the 2 sequences are distributed onto 2 different scaffolds. There is an overlap between the two scaffolds that spans from 807-907. Both scaffolds contain an orderly arrangement without any gaps or internal repeats present. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being greater than 10. 	Alignment with best blast sequence suggests that this model may be missing N-terminal CDS.\n
SPU_014154	SPU_014154	none	Alignment with best blast sequence suggests that the models is missing N-terminal CDS.\n
SPU_000375	SPU_000375	none	Partial Toll-like receptor. The previous contig (separated by 1056 bp of NNN) contains a part of a common TLR structure. \nThis is a member of sea urchin-specific Tlr Group ID.  \n
SPU_008229	SPU_008229	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 96% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor.\n
SPU_011541	SPU_011541	none	Unknown sequence (NNN...) in the intron of the current model could make this gene model incomplete. 2nd exon was eliminated based on BLASTN search. \n
SPU_002587	SPU_002587	none	S.purpuratus elongation factor 1B gamma cDNA cloned (AJ973179) \n
SPU_020124	SPU_020124	After reviewing the data and performing a BLAST search, it appears that there is no sufficient match for this particular GLEAN model. There is a large gap present from 319-844 that is indicated by both the BLAST results and the excel data. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with the values ranging from about 5-9	Sp-Elf has two splice variants differing in the 5' region: \nSp-Elf A       SPU_020124 \nSp-Elf B       SPU_020123 \n
SPU_012071	SPU_012071	none	This gene was annotated based on a manual inspection of multiple protein alignments. \n \nThe best Blast hit for this gene corresponds to a nematode MIF-like protein, and slightly worse hits correspond to vertebrate dopachrome tautomerase genes, which are closely related to MIFs. For this reason, we have arbitrarily named this gene Sp-Mif-like1.\n
SPU_016610	SPU_016610	none	This prediction is missing 40 aa.  SPU_021931 is also likely ortholog of AADC.  \n
SPU_001178	SPU_001178	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. The sequence doesn't begin until the 12th base pair and continues until it reaches 989. There is an overlap between the 2 scaffolds at 914-989, but from there the rest of the sequence is completed. Excluding this overlap, if the 2 scaffolds were combined the sequence would have an orderly and continuous arrangement without any gaps or repeats present. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with values being 5 or greater.	Alignment with best blast sequences shows that that all but a short N-terminal region the model is conserved; an internal exon consisting of a string of serines may be missing.\n
SPU_012016	SPU_012016	none	Alignment with best blast sequence shows that only part of the gene is present on this scaffold, 1962.\n
Sp-ECE1-like	SPU_030028	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-NAALAD2 metalloprotease-like	SPU_030041	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
SPU_023989	SPU_023989	After reviewing the excel data and performing a BLAST search, it appears that the sequence is distributed onto 4 different scaffolds. There are numerous gaps within the sequence and poor overall coverage. There was Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5	Exons 3-7 are from this scaffold533 and SPU_023989 prediction except for exon 5 which was only predicted by the Fgenesh++ prediction. Exon 2 is from scaffold65249 with no tracks predicting it. Exon 1 is incomplete and present on scaffold137005 with no tracks predicting it. Exons 8-19 are from  scaffold57107 and SPU_020620 prediction. Refer to SPU_020620 for the complete gene features of REJ2.\n
SPU_010579	SPU_010579	From the BLAST results as well as the excel data, it was evident that this is the best fit for this particular GLEAN model. When reviewing the excel data there were no gaps or repeats present; however the end of the scaffold was truncated at 2430 when the entire sequence ended at 2695. Some of this missing sequence information is distributed onto:                                                          	Exons 1-12 and the begining of 13 are present on this scaffold601 and SPU_010579 prediction. In the middle of exon 13 on scaffold601 there is a huge gap of N's, where the reamining exons should be present. The last 2 exons (named 14 and 15) are present on Scaffold17211. SPU_018502 predicts exon 14 but not 15. Fgenesh++ predicts exon 15.\n
SPU_002328	SPU_002328	After reviewing the data, it appears that there is a somewhat orderly arrangement within the sequence however, there is very poor coverage. The coverage for this particular sequence is only 5K/25K. There also appears to be numerous gaps within the sequence denoted by N. There was also no Est information available and the transcriptome intensity scores appeared to be widely distributed and strong with values ranging from about 5-180.\nAdditional information from Baylor annotations (gene comments): last exon has large stretch of NNNNs, will examine later for correction.	last exon has large stretch of NNNNs, will examine later for correction\n
SPU_010926	SPU_010926	none	Difficulty to confirm expression. \nHighly similar to SPU_013313.  \n
SPU_000960	SPU_000960	From the BLAST results and the excel data, it is evident that this is the best match for this particular GLEAN model. There is a large gap present from 423-514 resulting in poor overall sequence coverage. There was Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being greater than 10. 	SPU_000960 corresponds to C-ter domain of Sp-EF1B delta \nSPU_000960 contains 7 predicted exons. However only six correspond to the known sequences for S. granularis EF1B delta (Y14235 and AJ973181). \nAmong the 6 remaining exons, one exon (Scaffold1943|187113|187206|) is strictly identical to exon3 of SPU_000959 (scaffold1943|189643|189735|)  \n \nWe therefore propose to construct the CDS for Sp-EF1B delta from 2 exons of SPU_000959 plus the common exon between SPU_000959 and SPU_000960 plus 5 exons of SPU_000960 as follows : \nexon1 SPU_000959 scaffold1943|194310|194423|Strand(-);   \nexon2 SPU_000959 scaffold1943|191630|191707|Strand(-);  \nexon3 SPU_000959 scaffold1943|189643|189735|Strand(-)  \n idem to SPU_000960 scaffold1943|187113|187206|Strand(-); \nexon4 SPU_000960: Scaffold1943|186520|186614|Strand(-); \nexon5 SPU_000960: Scaffold1943|185409|185492|Strand(-); \nexon6 SPU_000960: Scaffold1943|183100|183319|Strand(-); \nexon7 SPU_000960: Scaffold1943|181425|181540|Strand(-); \nexon8 SPU_000960: Scaffold1943|180516|180554|Strand(-); \n \n
SPU_012856	SPU_012856	none	SPU_012856 has a tandem duplication of the N-terminal half of the C2A domain.\n
SPU_009766	SPU_009766	none	See SPU_013107 (scaffold 457). \n
SPU_009210	SPU_009210	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto 3 different scaffolds. There is a small gap that occurs between the second and third scaffold that ranges from 553-581. Other than this small gap between the second and third scaffold, the overall sequence has an orderly arrangement without any repeats present. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be unusual in that the majority of the values were very large <80. However, there were several small values as well. 	This Model contains a partial sequence relative to SPU_020140\n
SPU_022916	SPU_022916	none	More than 90 % identity with Fz5/8 from P. lividus. AC number AM084899   \n
SPU_026723	SPU_026723	After reviewing the data from the excel file and performing a BLAST search, it appears that there is no sufficient GLEAN model that maps well onto V2.1. When reviewing the excel file there appears to be only a small amount of order within the sequence (some gaps are present) and sequence coverage appears to be low as well. There was Est. information available on GBrowse assembly V0.5 and the transcriptome information appeared to be widely dispersed and somewhat strong. This is an un-annotated gene as well so no additional comments were available. 	See GLEAN#_08008 for annotation\n
SPU_007948	SPU_007948	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. There is a large gap on v2.1_scaffold27007 that spans from 819-1088. This portion of the sequence is distributed onto v2.1_scaffold79512. There was no Est support available from GBrowse assembly V0.5 and most of the transcriptome intensity scores appeared to be weak, except for two outlier values at approximately 45. 	Allele of SPU_000343\n
SPU_021272	SPU_021272	none	Merge with SPU_021273 and SPU_021274.\n
SPU_001533	SPU_001533	After reviewing the data and performing A BLAST search, it appears that the data is distributed onto 2 different scaffolds. The second scaffold has a lower bit score and higher e-value compared to the other scaffold results, however, this scaffold contains more overall sequence coverage. There was some Est support available from GBrowse assembly V0.5 and the trancriptome intensity scores appeared to be strong with most of the values ranging from 5-45.	One of two fibrillin genes.  Other parts of this same gene are in SPU_001533, 20166, 21495\n
SPU_011455	SPU_011455	The BLAST results and the excel data indicate that the sequence is distributed onto 2 different scaffolds. Both scaffolds are unique in that they both cover the entire scaffolds continuously from their beginning to end. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with most of the values being less than 5. 	The nucleotide sequence of the first exon and the following intron have 100% identity to SPU_011454. This gene model may be duplicated recently or produced by wrong prediction. \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_013876	SPU_013876	According to the BLAST results in comparison with the excel data, it evident that this is the best fit for this particular GLEAN model. This scaffold is unique in that it completely covers that entire sequence (from 1-2382) without and breaks within the base pairing. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with the majority of values being <5	Unknown sequence (NNN...) in the first intron of the current model could make this gene model incomplete. The first exon shows typical Toll-like receptor structures (signal peptide, LRRNT, LRR, LRRCT, TIR(partial)).\n
SPU_006628	SPU_006628	none	Partial CDS based on alignment with best blast data suggests that this model is missing both N- and C- exons.  The sequence inferred for the protease active site is significantly altered from other SpAN-like proteins or the closely related tolloid proteins.\n
SPU_015771	SPU_015771	From the BLAST results and the excel data, it is evident that the sequence is distributed onto two different scaffolds for this particular GLEAN model. When reviewing the excel data, it was apparent that there is a small gap within the second scaffold that ranges from 2684-2737. Excluding this gap within the second scaffold, if both scaffolds were to be combined, the overall sequence would have good coverage without any internal repeats present. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 10.	Contains Domain of Unkown Function (DOF)- highly conserved in vertebrates. \nAlignment to vertebrate dentin is due to serine rich repeat.\n
SPU_009718	SPU_009718	none	CDSs of this gene align exactly with CDSs of SPU_028124 on scaffold 75874. Sequence between CDS very similar also.\n
SPU_011869	SPU_011869	none	Tiling data suggests multiple exons not found by the glean3 and NCBI modles, but the tiling data is too messy for me to alter the GLEAN3 modle  \n
SPU_001821	SPU_001821	none	one match of 4\n
SPU_013095	SPU_013095	After reviewing the three different subjects for query SPU_013095 it was determined that this was the best match since there were no repeats and the sequence had sufficient coverage. A search using GBrowse V0.5 revealed that there is no EST support for this particular GLEAN model. There also appeared to be weak intensity scores after reviewing the transcriptome results. This was an un-annotated gene so no additional comments are available.	See SPU_011220. \n
SPU_016850	SPU_016850	For this particular GLEAN model there was no Cds information available from either Baylor annotations or SpBase. However, mRNA information was available from SpBase. When examining the excel data it appears that there are several gaps and internal repeats present within each of the different subject hits. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with most of the values ranging below 10. 	No ESTs \nAppears to have the wrong N-terminus\n
SPU_007357	SPU_007357	For this particular GLEAN model no CDS were found on either the Baylor annotations page or in the SpBase search engine. There was also no gene features information available on either site.\nAdditional information:\nNo ESTs;\nExtra exons and the wrong N-terminus, missing exon in NBF;\nDeleted dubious C-terminus	No ESTs \nExtra exons and the wrong N-terminus, missing exon in NBF \nDeleted dubious C-terminus\n
SPU_019709	SPU_019709	none	See SPU_001774. \n
SPU_004062	SPU_004062	none	uncorrect model : incomplete \nunvalid exons and missing exons \nsome 3' exons in SPU_005456 \n
SPU_002238	SPU_002238	none	1 EST with the central portion of the gene. \nModel is missing an exon in the NBD and perhaps the last exon. \n
SPU_019821	SPU_019821	none	Only found the C-terminal end of the protein, there are no evidences of the N-terminal end.\n
SPU_024454	SPU_024454	none	More of sequence on scaffolds 129415 and 21113\n
SPU_017036	SPU_017036	none	incomplete\n
SPU_028783	SPU_028783	none	See SPU_002787.  \n
SPU_028784	SPU_028784	none	See SPU_013765, _24978.  \n
SPU_025999	SPU_025999	none	See SPU_025315. \n
SPU_028726	SPU_028726	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When comparing the excel data and the BLAST results, it appears that if the 2 scaffolds were combined, the sequence would have an orderly continuous arrangement without any repeats or gaps present.  There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being greater than about 10. 	First 21 exons encode plexin.  The other 19 exons encode sortilin-1.  \n
SPU_000729	SPU_000729	For this particular GLEAN model there was no Cds information available from either Baylor annotations or SpBase. However, there was mRNA information available from SpBase. When examining the excel data, it appears that subject gb|DS006912| would be the best match for this GLEAN model based on the orderly arrangement of the sequence (no gaps or repeats present.) There was some Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be some what strong with most of the values being 5 or greater.\nAdditional gene information from Baylor annotation (comments): Lacking N-terminus.  See SPU_028726, _27443, _08472.	Lacking N-terminus.  See SPU_028726, _27443, _08472.  \n
SPU_009600	SPU_009600	none	See SPU_027526. \n
SPU_002431	SPU_002431	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When examining the excel data, it is evident that there are several small gaps within the sequence.  However, the sequence has an overall continuous and orderly arrangement without any repeats present between the 2 different scaffolds. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be 	Possibly only a partial sequence.  Appears to only contain C-terminus.\n
SPU_020803	SPU_020803	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed on to 2 different scaffolds. When examining the excel data, it appears that there is one overlap within the sequence of >v2.1_scaffold22297. The overlap occurs at 228-275 and 228-377. The sequence continues to be orderly from 228-377.This is an un-annotated gene so no additional gene information was available from Baylor gene information (comments). There was also no Est information avaible from the GBrowse assembly V0.5 and the transcriptome intensity scores appear to be very strong with values averaging about a 15. 	SMART Confidently predicted domains, repeats, motifs and features: \n \nIG            begin: 134   end: 215 e-value 5.37e-04  \ntransmembrane begin: 359   end: 381 e-value -  \nTyrKc         begin: 451   end: 721 e-value 3.46e-132 \nBelongs to class II of TK receptors isdefined by  \n[DN]- [LIV]-x  (3)-Y-Y-R (Prosite PDOC00212) consensus activation loop.  \n
SPU_008443	SPU_008443	none	Probably only a partial sequence.  It is missing ~200 amino acids off of the C-terminus and 400 off of the N-terminus.\n
SPU_014430	SPU_014430	none	SPU_020121 has the first part of the gene. SPU_014430 has the rest of the gene.\n
SPU_007759	SPU_007759	none	probably incomplete sequence at c terminus\n
SPU_000288	SPU_000288	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto two different scaffolds. When reviewing the excel data appears that the second scaffold doesn't begin until the 27th base pair. Other than this small gap in the sequence both scaffolds appear to have an orderly arrangement overall, without any other gaps or internal repeats present. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with several values greater than 10. 	 partial, missing C-terminus\n
SPU_005990	SPU_005990	none	Matches c-type lectin domain (smart00034.10)\n
SPU_000025	SPU_000025	none	 partial, missing N- and C-terminus\n
SPU_000835	SPU_000835	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When examining the excel data, it is apparent that there are several internal repeats within both scaffolds. On v2.1_scaffold60368 there are several gaps present as well that range from 1136-1196 and 1331-1447. There was some Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong. 	 extra C-terminus\n
SPU_002338	SPU_002338	none	 partial, missing N-terminus\n
SPU_003632	SPU_003632	none	 missing N-terminus residues\n
SPU_004795	SPU_004795	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When examining the excel data, it is apparent that there are several repeats and gaps present within both scaffolds. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak, with most of the values being less than 5.	 extra stretches in middle\n
SPU_004854	SPU_004854	none	 missing stretch in middle\n
SPU_004877	SPU_004877	none	 partial, missing N-terminus\n
SPU_006217	SPU_006217	After reviewing the data it appears that the sequence is distributed onto 2 different scaffolds.  If the 2 scaffolds were combined the sequence would have an orderly continuous arrangement without any gaps or repeats. There was Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with values ranging from about 4-70. This is an un-annotated gene so no additional information was available from Baylor annotations (gene comments).	 missing C-terminus residues\n
SPU_006973	SPU_006973	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. If the two scaffolds were combined, the sequence would have an orderly continuous arrangement without any repeats or gaps present. This is an un-annotated gene so no additional gene information (comments) could be found from the Baylor webpage. There was Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being <10.	 partial, missing N-terminus, stretches in middle\n
SPU_006585	SPU_006585	none	 extra stretch in middle\n
SPU_007583	SPU_007583	none	 partial, extra N-terminus, missing C-terminus\n
SPU_007672	SPU_007672	none	 partial, missing N-terminus\n
SPU_007687	SPU_007687	none	 missing N-terminus residues\n
SPU_007756	SPU_007756	none	 missing N-terminus\n
SPU_008403	SPU_008403	After reviewing the data and performing a BLAST search it appears that the sequence is distributed onto 2 different scaffolds. On v2.1_scaffold22310 the sequence doesn't begin until 107 and there is a small overlap between the 2 scaffolds from 1168-1230. However, if the two scaffolds were combined the sequence would have an overall orderly arrangement. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values ranging from 5-10.	 missing N-terminus\n
SPU_008406	SPU_008406	From the BLAST results as well as the excel data, it is evident that this is the best fit for this particular GLEAN model. When reviewing he excel data, it was apparent that there were several internal repeats as well as gaps present. The scaffold is also truncated at 1225 when the entire sequence actually terminates at 1590. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with several values ranging greater than 10.	 missing C-terminus\n
SPU_008879	SPU_008879	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. Both scaffolds did not have any repeats present, however, on v2.1_scaffold40027 were two gaps present that ranged from 65-96 and from 393-654. However, v2.1_scaffold81019 filled in the missing sequence information from 392-655. There was Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong. 	 extra stretch in middle\n
SPU_010311	SPU_010311	After reviewing the data and performing a BLAST search, it appears that there is no sufficient match for this particular GLEAN model. The BLAST results indicate that the best match is v2.1_scaffold13759 based on e-value and bit score. There was Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with several scores that were weak (>5) and several that were strong (<10)	 missing stretch in middle\n
SPU_009131	SPU_009131	none	 missing N-terminus residues\n
SPU_010557	SPU_010557	none	 partial, missing N-terminus\n
SPU_010618	SPU_010618	none	 missing N-terminus\n
SPU_010627	SPU_010627	none	 extra stretches in middle\n
SPU_010742	SPU_010742	none	 missing stretch in middle\n
SPU_010804	SPU_010804	none	 missing N-terminus residues\n
SPU_012659	SPU_012659	After reviewing the data and performing a BLAST search it appears that this is the best and only results for this particular GLEAN model. When examining the excel data, it was apparent that there was a gap from 293-346 and the sequence has poor coverage overall. This is an un-annotated gene so no additional information was available from Baylor annotations. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appear to be strong with the majority of values being <10	 missing stretch in middle\n
SPU_012695	SPU_012695	none	 missing N-terminus\n
SPU_015988	SPU_015988	none	 missing C-terminus\n
SPU_016076	SPU_016076	none	 missing C-terminus; extra N-terminus residues\n
SPU_017992	SPU_017992	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. If the 2 scaffolds were combined the sequence would have an orderly continuous arrangement with no internal repeats or gaps. The transcriptome intensity scores were difficult to determine due to a very large outlier of about 125 that distorted the scale of the graph. However, the rest of the values appeared to be weak (less than 5.) There was also no Est information available from GBrowse assembly V0.5. This is an un-annotated gene so no additional gene information was available from Baylor annotations ( gene comments).	 missing N-terminus\n
SPU_018109	SPU_018109	After reviewing the data and performing	 missing N-terminus\n
SPU_022129	SPU_022129	After reviewing the data and performing a BLAST search it appears that this is the best fit for this particular GLEAN model based on the sequence coverage and orderly arrangement.  There was Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores were widely dispersed with several high values (40-70) and several moderate values (9-15). This is an un-annotated gene so no additional information was available from Baylor annotations assembly V0.5.	 missing some N-terminus residues\n
SPU_018758	SPU_018758	none	 missing C-terminus\n
SPU_022378	SPU_022378	none	 missing C-terminus\n
SPU_022397	SPU_022397	none	 missing C-terminus\n
SPU_020211	SPU_020211	none	 missing some C-terminus residues\n
SPU_020302	SPU_020302	none	 missing C-terminus\n
SPU_024267	SPU_024267	When reviewing the Excel data, subject gb|DS015797| was unique in that it appeared to have an orderly continuous arrangement with out any gaps or internal repeats present until the sequence reached about 1000. However, after performing a BLAST search additional sequence information came up that was not present in the excel data and the sequence appeared to be distributed onto 2 different scaffolds. The transcriptome intensity scores appeared to be somewhat weak (less than 5) with exception to 2 outliers of about 17 and 55 that were present. There was no Est. information available from GBrowse assembly V0.5. This is an un-annotated gene so no additional gene information (comments) were available from Baylor annotations. 	 missing C-terminus, extra stretch in middle\n
SPU_025702	SPU_025702	none	 missing N-terminus\n
SPU_004937	SPU_004937	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed on to 2 different scaffolds. There are several repeats within each scaffold that is apparent from both the BLAST results and the excel data. The sequence does however have an orderly arrangement. There was no Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being above 5 and several that were above 10.	 extra N-terminus\n
SPU_000068	SPU_000068	none	 partial, missing N-terminus\n
SPU_017350	SPU_017350	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed on to 2 different scaffolds. When reviewing the excel data, it is apparent that there are several internal repeats present as well as gaps. There was Est support from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be widely distributed, with values ranging from about 2-36. 	 extra C-terminus\n
SPU_009336	SPU_009336	none	 extra C-terminus\n
SPU_009416	SPU_009416	none	 missing C-terminus\n
SPU_003859	SPU_003859	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When examining the excel data, it is evident that the 2 scaffolds have an orderly arrangement about them. However, the sequence is truncated at 950 resulting in poor overall coverage. There was no Est. information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak overall (excluding one outlier at 77) with most of the values being less than 10.  	 partial, missing C-terminus\n
SPU_021455	SPU_021455	none	 missing N-terminus\n
SPU_021878	SPU_021878	none	 extra stretch in middle\n
SPU_026271	SPU_026271	After reviewing the data and doing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. The first scaffold doesn't start until the 9th base pair and continues until 466. If the two scaffolds were combined there would be a small overlap between the two sequences from 465-556. Other than the overlap, if the two scaffolds were combined, the sequence would have an orderly, continuous arrangement. There was Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 10. 	 extra N-terminus\n
SPU_004907	SPU_004907	none	 partial, missing C-terminus\n
SPU_028042	SPU_028042	none	 missing C-terminus half\n
SPU_020156	SPU_020156	none	 partial, missing N-terminus\n
SPU_020751	SPU_020751	none	 partial, missing N- and C-terminus\n
SPU_021188	SPU_021188	none	 partial, missing N-terminus\n
SPU_021571	SPU_021571	none	 partial, missing N-terminus\n
SPU_011187	SPU_011187	From the BLAST search and the excel data it was evident that the sequence was distributed onto 2 different scaffolds. It appears that both scaffolds have an orderly and continuous arrangement without any gaps or repeats. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with all of the values being less than 10. 	 partial, missing stretch in middle, missing C-terminus\n
SPU_011499	SPU_011499	After reviewing the data it appears that there is no sufficient GLEAN  model that fits SPU_011499. The sequence appears distributed onto 2 different scaffolds. On >v2.1_scaffold48174 there is a gap from However, the sequence coverage doesn't start until about 1500 but from that point on there is an orderly arrangement of the sequence with no gaps. The total coverage of the sequence is a little low (3066/4155) but there appears to be no repeats and the BLAST search indicated that there was a low e-value of zero coupled with a high bit score. \nAdditional comment found: extra at N-terminus; extra stretch in middle.	 extra at N-terminus; extra stretch in middle\n
SPU_010686	SPU_010686	none	 extra N- and C-terminus\n
SPU_010778	SPU_010778	none	 extra N-terminus\n
SPU_013463	SPU_013463	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. The first scaffold doesn't begin until the 9th base pair, however, it continues in an orderly arrangement until the scaffold is truncated at 234 where the rest of the sequence is carried on the second scaffold. There are no internal repeats present on either scaffold. There was some Est support availablr from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong. 	 partial, missing N-terminus and center stretch\n
SPU_014251	SPU_014251	After reviewing the data and performing a BLAST search it appears that the sequence is distributed onto 2 different scaffolds. When examining the excel data and comparing that to the BLAST results it is apparent that the sequence has an orderly continuous arrangement. If the two scaffolds were combined there would be good sequence coverage without any internal repeats or gaps present. There was some Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be widely distributed with some values either at or below 5 and some that were above 5. 	 partial, missing stretch in middle\n
SPU_013010	SPU_013010	none	 missing stretches in between\n
SPU_013037	SPU_013037	none	 missing some N-terminus residues\n
SPU_013046	SPU_013046	none	 partial, missing N-terminus\n
SPU_013105	SPU_013105	none	 partial, missing N- and C-terminus\n
SPU_015793	SPU_015793	After reviewing the data and performing a BLAST search it appears that the data is distributed onto 2 different scaffolds. According to the excel data and the BLAST search, if the 2 scaffolds were combined the sequence would have an orderly, continuous arrangement. There was no Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be very weak excluding a very large outlier at about 100.	 partial, missing central stretch\n
SPU_016569	SPU_016569	After reviewing the subjects for SPU_016569, it appears that there may be an assembly error in the sequence. The sequence appears to be ordered on the first and second scaffolds with internal repeats. There seems to be only one internal repeat that is repeated in the genome. There was no EST information available. It was unclear whether the transcriptome scores were weak or strong since they ranged from 3-15. This is an un-annotated gene so no additional comments were available on the Baylor page.	 extra N-terminus\n
SPU_016627	SPU_016627	From the excel data and the BLAST results, it is evident that this is the best fit for this particular GLEAN model. When examining the excel data, it was clear that the scaffold contained an orderly arrangement without any internal repeats present, however, the entire sequence terminates at 1659 but it was apparent that this scaffold was truncated at 1199. There was no Est support available from GBRowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with all of the values ranging less than 10. 	 partial, missing C-terminus\n
SPU_016652	SPU_016652	From the BLAST results and the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. Both scaffolds appear to have an orderly arrangement without any gaps or repeats present. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong.	 extra N-terminus\n
SPU_017836	SPU_017836	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. When examining the first scaffold, it was apparent that there were 2 internal repeats (duplicates) present that occurred within 668-800 and 885-1016. There was also a sequence over that occurred within the second scaffold that occurred from 1208-1272 that was not apart of the sequence. The was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being about 5 (excluding one outlier value of 67.)	 partial, missing C-terminus\n
SPU_018471	SPU_018471	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto 3 different scaffolds. Within the first scaffold (v2.1_scaffold71909) there is a gap that spans from 294-1164 and the missing portion of this sequence is filled in by the second scaffold (v2.1_scaffold68450). There is an overlap between these 2 scaffolds within the beginning portion of the sequence as well. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with several values that were greater than 5.	 partial, missing N- and C-terminus\n
SPU_017553	SPU_017553	none	 partial, missing N- and C-terminus\n
SPU_017563	SPU_017563	none	 partial, missing N- and C-terminus\n
SPU_019074	SPU_019074	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 3 different scaffolds. When examining the excel data, it is apparent that there are numerous gaps within the scaffolds as well as internal repeats. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5.	 partial, missing N- and C-terminus\n
SPU_018872	SPU_018872	none	 partial, missing N- and C-terminus\n
SPU_018882	SPU_018882	none	 partial, missing N-terminus\n
SPU_020118	SPU_020118	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. If the two scaffolds were combined, the sequence would have an orderly continuous arrangement, without any gaps or repeats present. There was no Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be widely distributed with values ranging from about 2-40. 	 missing N-terminus\n
SPU_021997	SPU_021997	After reviewing the data and performing a BLAST search it appears that for this particular GLEAN model there is no good match. There are numerous repeats, poor sequence coverage, and gaps within the sequence. There was no Est support available from GBrowse assembly V0.5.	 partial, missing N-terminus\n
SPU_022549	SPU_022549	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. There are several gaps present within v2.1_scaffold81745 however; this missing sequence information is present within v2.1_scaffold18034. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong. 	 extra N-terminus\n
SPU_024298	SPU_024298	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. The first portion of the sequence (v2.1_scaffold161) doesn't begin until the 5th base pair and continues until 537. The rest of the sequence is completed on v2.1_scaffold40446 from 531. However, at 531 within this scaffold there is an internal repeat present. Excluding this repeat, if the two scaffolds were combined, the sequence would have an orderly continuous arrangement. There was Est information available from GBrowse V0.5 and the transcriptome intensity scores appear to be strong with most of the values ranging from 5-10	 missing N-terminus\n
SPU_022926	SPU_022926	none	 missing N-terminus\n
SPU_023011	SPU_023011	none	 missing N-terminus\n
SPU_026351	SPU_026351	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. The beginning of the sequence (v2.1_scaffold3418) covers the first 925 base pairs and overlaps with the rest of the sequence (v2.1_scaffold63737) at 481. The beginning of the sequence (v2.1_scaffold3418) has a slightly higher bit score than some of the other scaffold results from BLAST. When reviewing the excel data it is apparent that there is an internal repeat on the second scaffold (v2.1_scaffold63737). This repeat is also visible from the BLAST results. There was no Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appear to be weak overall with most of the values being >5	 extra N-terminus\n
SPU_027419	SPU_027419	After reviewing the data and performing BLAST searches, it was determined that no orderly GLEAN model fit sufficiently. There were matches on four different scaffolds. Scaffold v2.1_scaffold50623 was unique in that it matched the first one hundred bases pairs. However, it is difficult to distinguish between the remaining scaffolds due to the similar number in base pair matching and bit scores. The GBrowse assembly V0.5 revealed that there is EST information available and the transcriptome intensity scores appear to be very strong as well. Additional information found on the Baylor page under comments:	 missing stretch in middle, extra C-terminus\n
SPU_027758	SPU_027758	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When examining the excel data it appears that if these 2 scaffolds were combined, the entire sequence would have an orderly and continuous arrangement without any gaps or repeats present. There was some Est. support available from GBrowse assembly V0.5 and the transciptome intensity scores appeared to be weak with most of the values being less than 5.	 partial, missing C-terminus\n
SPU_028763	SPU_028763	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto two different scaffolds. When examining the excel data it is apparent that the two scaffolds had an orderly arrangement without any gaps or repeats present. If the two scaffolds were to be combined the overall sequence would have good coverage overall. There was no Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5l.	 extra N-terminus, missing C-terminus\n
SPU_027814	SPU_027814	none	 missing N-terminus\n
SPU_028096	SPU_028096	none	 extra N-terminus\n
SPU_014753	SPU_014753	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. There are several gaps and numerous repeats present within the sequence. The excel data indicates that the sequence begins from 1-113 on subject gb|DS010782| however; this information is not reflected from the BLAST results. The absence of the scaffold from the BLAST results was probably due to very low bit scores and high e-value results. See below:	 missing N-terminus\n
SPU_006894	SPU_006894	none	 missing N-terminus and stretch in middle\n
SPU_009793	SPU_009793	none	 partial, missing C-terminus\n
SPU_016830	SPU_016830	none	 missing N-terminus\n
SPU_018973	SPU_018973	none	 missing N-terminus\n
SPU_021020	SPU_021020	none	 missing N-terminus\n
SPU_024989	SPU_024989	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 3 different scaffolds. When reviewing the excel data it is evident that there are no repeats or gaps within each of the subject queries. This is reflected within the BLAST results; however, if the three scaffolds were to be combined there would be a sequence overlap between the second scaffold (v2.1_scaffold61057) and the third scaffold (v2.1_scaffold16316). There was some Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with half the scores being about 5 and the other half being more than 10. This is an un-annotated gene so no additional gene information (comments) was available from Baylor annotations.	 extra N-terminus\n
SPU_025126	SPU_025126	After examining the five different subjects from this query, it was determined that this was the best match. It was initially thought that this subject was not the best match due to several repeats in the sequence. However, when the BLAST search was preformed the results indicated that this was the better match since it covered more of the sequence. 	 extra N-ter,  missing C-terminus and stretch in middle\n
SPU_026673	SPU_026673	none	 extra N-terminus and stretch in middle\n
SPU_013202	SPU_013202	none	Different parts of this gene are found in different scaffolds in a non-linear organization.\n
SPU_021651	SPU_021651	none	Different parts of this gene are found in different scaffolds in a non-linear organization.\n
SPU_005522	SPU_005522	none	Different parts of this gene are found in different scaffolds in a non-linear organization. \n
SPU_007087	SPU_007087	none	Different parts of this gene are found in different scaffolds in a non-linear organization.\n
SPU_011042	SPU_011042	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. The BLAST results do not completely coincide with the excel data. There are differences between the base pairing information when comparing the 2 sets of information. However, both sets of data have an orderly arrangement without any repeats present. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak. 	#\nThis model was annotated and modified based on a manual inspection of multiple protein sequence alignments. \n \nVertebrate and insect SARM proteins contain only 2 SAM domains, whereas the original version of this glean model coded for 3 SAM domains. When the nucleotide sequence of this model was inspected in detail, we noticed that there were some identical exon sequences that likely resulted from assembly problems. We have modified this model accordingly (following the NCBI model), and this modified glean model now presents a domain structure like that found in mammalian and insect SARM.\n
SPU_019879	SPU_019879	none	Sequence xp_786867 has been  predicted by automated computational analysis.  \nThe next best match is AB051576.1 Shiwa,M., Murayama,T. and Ogawa,Y. Molecular cloning and characterization of ryanodine receptor from unfertilized sea urchin eggs \n  JOURNAL   Am. J. Physiol. Regul. Integr. Comp. Physiol. 282 (3), R727-R737 (2002) \n \n            This record is derived from an annotated genomic sequence \n            (NW_791670) using gene prediction method: GNOMON, supported by EST \n            evidence. \n
SPU_007011	SPU_007011	After reviewing the data and performing a BLAST search, it appears that the gene sequence is on 3 different scaffolds. The best matches were on >v2.1_scaffold4300 and >v2.1_scaffold2310. However, there were 2 internal repeats on >v2.1_scaffold2310 and a gap within >v2.1_scaffold4300. There was no EST information available on GBrowse V0.5 and the transcriptome scores appear to be somewhat strong with scores averaging about a 10.\nAdditional information was found on the Baylor page under comments:\nInspection of the tiling array suggests that glean may have missed the following exons\nDINCVPLDLSIKRTNPQETSENEQEVGEEPLVEEPRMGEEPREEEPLVEELSMGEEPKGEESMQGGLLMREEPSEGELEGEEEGFEEQPGEYDSLDEELWVGGKPIKEEPLDEEQEREENIGLWEGALVEGPEGEEESLEEEPLVEEPEGEKEPLEEEEPEGEEEPEGEEPEGEESPEDSAAWIPV	Inspection of the tiling array suggests that glean may have missed the following exons: DINCVPLDLSIKRTNPQETSENEQEVGEEPLVEEPRMGEEPREEEPLVEELSMGEEPKGEESMQGGLLMREEPSEGELEGEEEGFEEQPGEYDSLDEELWVGGKPIKEEPLDEEQEREENIGLWEGALVEGPEGEEESLEEEPLVEEPEGEKEPLEEEEPEGEEEPEGEEPEGEESPEDSAAWIPV\n
SPU_023706	SPU_023706	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nWhile the best Blast hit for this model is to Transmembrane protease, serine 4 (Membrane-type serine protease 2)(MT-SP2), a careful inspection of its size and domain composition reveals that it more generally resembles members of the granzyme family and vertebrate marapsin. We therefore propose to name this and related genes Sp-Gra[nzyme]mar[apsin]-like. \n \nThe location of this model in the scaffold (away from ends or gaps), and the fact that other gene prediction protocols generated identical models strongly suggest that this is not an incomplete model.\n
SPU_008377	SPU_008377	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When reviewing the excel data it was evident that there were several gaps present within both scaffolds; however both scaffolds did contain an orderly arrangement about them. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being <10.	Inspection of the tiling array suggests that glean may have missed the following exons: SVSGRFGSKSCIDRVWGRLFRCIKSLLRPKASYFRFSFNSAKSISIWKCACNSCWSIKDTRLLDIFSFCWLMLVVLRKNSLWKGLSQGVYSPFSSSIKSSCCFC\n
SPU_008563	SPU_008563	none	Inspection of the tiling array suggests that glean may have missed the following exons: SVCNKKYRSSSDLKIHLRTHSGEKPFLCSFCGERCTDIGSLANHTIRFHREWTQQCPQCSKKSVSKSHLKTHMMVHTGEKPYQCPECSKRSATKSSLNKHMLTHNGEKRYECFECKKTYANKGDLYTHKGLIRSITLRNMSKLTPVKGLIDVL,IDIKDAVKVSTKLNGDAISSLRMQSGKQYHQCSVCGRKCPSKSDLARHLRTHTGEKPYPCPECDKRFSDKSSIPQHMLIHSGEKPYECSECSSRFNCKSNLRQHMKQHSTTKFHQCPKCDKK\n
SPU_010924	SPU_010924	none	Inspection of the tiling array suggests that glean may have missed the following exons: VLPYKCKICDKGYCYKKGLSAHMRTHTAKRSHKCTVCNERFLNIKRHMKIHSGLIQCSICNQGFSNHGNLTQHRKIHRKQNR,YQCMYCDVRFSRVDTLSRHIRSHTGEKPYECSFCNKKTFSQTAHLTRHIKIHTGERPFECSICSKMFAERSHLTDHQKIHTGEKPYLCSVCEKRFG,ATNLLHLMAVALSSPSWHHELHLPWSHYPKVLPLHSTLRSGNSLHPVVMHESPCHRLPYYSHRLRVLDLIVVNSSLWHHLPLCNHLLCLHS,IHPVPVGNRFIHPVPADNGLIQPGPECNGFIHPVLAGNGFTQPVLAGNRFFYPVHVGNGFKYAREYFLAFVVRVVILATSMITLFSICT\n
SPU_011270	SPU_011270	none	Inspection of the tiling array suggests that glean may have missed the following exons: PLIPFRSSSPDRHLPVFPLAVFPWAIASALSSFPFPSCVGLRRGRLVGHDLVGCIYAIVNRCVGLTRQQNLIGRCHLMRLCVLRGDFLASPFRRRLQVLAHVRVLEPLDRAPEVTCLLPAVDKGHE\n
SPU_012913	SPU_012913	none	Inspection of the tiling array suggests that glean may have missed the following exons: QEIIKMSTSGDDTQFNCRLCNFVGGSKTEIAEHFLTEHIEQYVSLSKATPSKASKNETETVKETKEKEEEQKEVESDNEDVISEEQAKEEEKKSSKTLDEILKRKLEKDAGPGFGRGMRRRKKATPIKYSMDDDEEEDEWLPRKEPEPSKTIYVRQPSILNKGRGRQRKKGKRGRPPMVGFKKSKPKSAPPPPPKQPPCKTKIDIRKRSDEKNLPIPYYVFNRILDDRVESWFKNYKEKRDAIEMIRCPNDRCANVMSVDEMAVHEKCHVPNLDGFRCCECGYISLHWAKMRVHYRSDHNSKLNAATCDFEGCEKVFPCIGTKLLQSHAIKAHFKPQLLARLKSPDFDPSEYDKFMQAPENEADRQSIARKRGNKRTADVEDEEEEEEEEEEEENDTTSKPNPKKPRRKKFKRYKVHVCTICFARFKDEGEMLSHKDAHYKDNSKDIIYCTECTEYNASEEEPMRNHLATVHKRMLHLHRCDECNNFSTNHYHDLKKHLVTHTGAKNYMCELCGRRTTTPFNLRIHYRRIHASEDEKKHHCMSCDYKCADKGILKVRANE,RALTLIHQSMTSSCKPLKMRLTGSQLQENVVISALQMLKMKKKKKRKKKRKRMILLVSQIQKSLVGKSLSATRSMYALSALPDSRMKGRCYLTRMHITRTIPRILSTVQSAQNTMPLKKNPCVIT,MVNVEPAHDVQPHHVKLSADNQIIRHHPMIEYSSVPPVSVAQQIIAAGNDMAQVISEQQYHQIQQQQQQQHQQQQQQQQQQQQQQQQQQQQQHQQQQQQQQQQQQQANHGHPPQSHTPLAPPLVLHSQQPKPPLHPVPLEIHAVPIHPTTHSNIPTPVVEQIVNTDHAVHHLVAAMMPRW,STAVCHRCLSPSRSSQQAMTWPKSYQSNSTIRSNNNSSSNTSSSSNNSSNNNNSNSSNNNSSINSNNSSSSSSNSKPTMGIRHSPTHL,HGPSHIRATVPSDPTTTAAATPAAAATTAATTTTATAATTTAASTATTAAAAAATASQPWASATVPHTSSTSSCTTQPAAQTTLAPGASGDPRCTHPSYHPQ,IQVDASNGMNQQEIIEYTVPNTQQIITSSGDITEVITSDHHYHASHHISDHHQLQQQHHQQQQQQQQHQQQHGGDSSIQQAAMHAGIPTTSAAEQVPTAIVEQIVRATPHSEDQNVVHNLVASMIPHDAELVMSMMQIQHVPQVQHVQQVQQIHHVHQPNVSQ,TNRRSSNTQFQILSRSSPAVATLPRLSLAITTTTPVTTSATTISCSNSIINSSNNNSNINSNMAATAAYSRQLCTQVYPQPALPSKSRLR,MHSGATNKIVSEHIMSKHAKVRPYRCNVCGWTAAYNGNMWKHVENHQKILGDQMPEFPVSVLSNIDDLNMPTPLRAPSGKKRGQGKDPTSPSSPTSRPKAKRSRPKYAEPPSGVILANRSVVQEVPVTVTVTHVEQEQPQAQAPPPPQAHTLPL,LFSNFIGATNKIISEHVMCKHANVRPYHCNICGWSTAYSGNMWKHVDTHQKELGDKMPEFPVNVVSTENHSVPTPLRAPSGKKRGMNKASNFKLKLAKPGKTRRQQQAQKTQQMEQTATISILDDNVQTIQVQAGGNLPEGVLMQVSE,YLVTCHLIHTLYFIIPQEHIKHKHGLLLNKDSYGRPKPTPTYACTVCDYVGRKPKSLEYHSRIHKENRQFKCHLCPYASKTKNNLVLHVRTHEGLQPNKCPHCDFKG,QPTCIDCLISFYPKVPIHMGPNGQMMVTKGLSEEESIGSSALSRLAAAVASAQEVHIIQGNEGLEGGGQHQEHRIIATQVCLAFCPFF\n
SPU_014643	SPU_014643	none	Inspection of the tiling array suggests that glean may have missed the following exons: NIWDNSHIELSCTPDNATYVLLYSHQTLYDPTTSKFRKSESSDIKSSTISEVIILKCTDWDLSIFKLCVDIWDHLLTMNWSVSNFHKYE\n
SPU_016490	SPU_016490	After reviewing the data for SPU_016490 is appears that there is an overlap of the sequence on two scaffolds. There is EST information available on GBrowse V0.5 and the transcriptome intensity score appear to be strong as well ranging from about 10-140.\nAdditional comments from the Baylor under gene information (comments):\nInspection of the tiling array suggests that glean may have missed the following exons: PTSVRVCGMKKLTQTWKQQKVSKLRDSSFTTASSKTQLVPSQDGGKISQGSTGSHGATSAAPTPSASGGSRKRLGQHIPRQAVPVSVPKPVHDGAEGEKREKENEEDLAKKRKVGGVIGDQSKRRSPRLQKGRDPSR	Inspection of the tiling array suggests that glean may have missed the following exons: PTSVRVCGMKKLTQTWKQQKVSKLRDSSFTTASSKTQLVPSQDGGKISQGSTGSHGATSAAPTPSASGGSRKRLGQHIPRQAVPVSVPKPVHDGAEGEKREKENEEDLAKKRKVGGVIGDQSKRRSPRLQKGRDPSR\n
SPU_015071	SPU_015071	none	Inspection of the tiling array suggests that glean may have missed the following exons: STTGFPSWSIPVPLERSGWTKAMQSCFFFIMRRGDYGPMVLIPMLFLENKKQIIFDSPRLEGETGKVMRSLWPWTNHVVTI,STTGFPSWSALVLMERRGWRSFSVQVIHCHRVGEAMINNWFPFMEHPCTIGEIRMDKSNAIMFFFYHAEGRLWTNGTNTNALLRKQKTNHL,LVFTEPTPPPPTPTPPPKSPTPPPKEPTPPPPKPKPKRKAIVKKTKAVPPPPPPKTPTPQPPTPKPPTPKIPTPQPPTPTPTPPKDPTPPPPSPPPKVTLGKFMCTCFVLFKIQIYI,LFLQNQHHHHQHRLLHQSHLPHPQRSLHLPLPSLNPSVRQSSRRPKQSHHPLHQRPPPLNLPPPSHQPPKYQLLSHPPLPQHPLKTPPLPLLHRHRKSHLVSSCVHVLFFLKFKSIL,LYDDRCVHDEHHKYRQDEFQKDGYRLKHSPEEGVREKCEDTSVVLEMGKEARRELKQESDSPWESCYELHASGVRLTPEADGEENRTKTLKCH\n
SPU_015137	SPU_015137	none	Inspection of the tiling array suggests that glean may have missed the following exons: CLGSPLSNFTMMLKSRMCLGYLQSWKEKKSELAFNQLFFQSSIFNKEHLLIQLSLTLLPLVSILAKWLLRHCRLINFSNP,LNYNFQQNILLTWVKHFTTSPLPPPPLPLSHTLSSAPPLPSPPPQCPPQHSNAGATPPPPPPSPPLPLSHTLSSAPPLPSPPQCPPQHSNAGATPLPPPSP,NILQLHLFLHRLFLYPILSPRLLLFHLLLLSVLLSTPMQVPLLLLLLPRHPFLYPILSPRLLLFPLLLSVLLSTPMQVPLLFLLPR,FSTKHIANMGKTFYNFTSSSTASSSIPYSLLGSSSSISSSSVSSSALQCRCHSSSSSSLATPSSIPYSLLGSSSSLSSSVSSSALQCRCHSSSSSLA\n
SPU_016518	SPU_016518	none	Inspection of the tiling array suggests that glean may have missed the following exons: EDSELSYLENFSPVSHQIVLKVIRSHKIKSCSLDPLPASVFSRCIDCLLPAITDIINDSLKAGVCPEPLKTALIVPTLKKSSLDPENLKNYRPISNLSFISKVIERVICTQLMTYLASNSLLASRQSAYRENHSVETVLLRVQNDLLLSLDSGNEALLVLLDLTSAFDTVDHQLLSRLEKCYGISGTAAKWFESYLSGRKQQVIIDGITSDPALLRWGVPQRSVIGPLLFICFTTLIQDIIHSHGFTSMMYADDKQLYITVKPSVINHITQKLDICLQEIRLWMQHNFLFLTIVKWQFFTCLLNLENRMSCYQYLLIIRQFIVPRQFAISELFWTTICLCAATLILCAAKHHLLLEE,DLLIVREDDESVEEVKVIHSMSSDHAAIAFTLCESDIVQSFASQTSLDEMDINQKVDMYNKTLFSFLDRHAPEMRQNVQLRPHAPWYNITLKELKQNLRAKERIWRKSKHRSPVMKDELQNSTKEYFSTLKRFRREHHRQTISSASTQELYKEIDNMTIEKAKAVLPTHKSKSDMVIFFILHSSKTKYAD,CNQPHNTETGYMSSGNPFVDATQLSLLNDSKMAILHLSSKFRKSNELLPISVNNTPVHCSKTVRNLGVILDNHLSLRSHINTVCRKASFALRRIGKVRRFFNKASTEILIHSFVSSLLDNCNSLLIGICDKDVNKLQRIQNSAARLVSLQKKCQHITPILKDLHWLPVKFRIQFKIVLLTFKSLNDLSPEYQSDLVLQYVPSKS,KQTEISDFIITENLDVLAITEAWLTGDSRDSTTIADIQNTLQDFKFLKLPRKGKRGGGICVILRNLFDCKARPYSFVTFECLEVTIRSTHKDTVSLFAIYRPPVCPRSVPVSHFFTEFS,TALCSKEFSLGQWCYSSYSLLLLLCCYVEAMVMVFTLTPVHVLTTVSKSESSSLLPHRKIYGSPIPYYSNSTSTFQLQLLDCGDVNPNPGPHTERSTTHSSTPIYHGVNSYHTPNRKYDIPFLKSLNLLSRNDPQV\n
SPU_018619	SPU_018619	none	Inspection of the tiling array suggests that glean may have missed the following exons: MTKYFLFISLFMACFMVTTCHDLRVCICTPIDLLSPSCPHICNSQLIVKIICTNNCQSLVNQSPYHFVLPEMLQLQAAEPH\n
SPU_019652	SPU_019652	From the BLAST results and the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. When reviewing the excel data, it is apparent that there is one sequence overlap that occurs within v2.1_scaffold83941 that ranges from 1725-1779 that isn't apart of the sequence. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values ranging greater than 10.	Inspection of the tiling array suggests that glean may have missed the following exons: GFSFSWTARTCNCSECLAVNPFPHIPHSCGRIPVCRSACFFMSHCLANLLLQNEHGNLSSTTLCTLPMCAFTSSFLPKPFPHTSHLNGFRPVCSPWWFFRAFSDAKERGHSVHL,SGFLPSISLLEVSPLSSCTTSMFCTTLLGLLTLFNFIRLWVVLFSGLTSSWIVSSLCGALSWFLSCSFSCSLIPCFHLNCKGHMLQV,VLLSIFLFNSLEDFDDSLPSRVCLWVLWLVEGGMWLVEGGLWLVEDGLWLVEDGLWLVEDGIWLVEDGIWLVEDGIWLVEDGIWLVEGGISLEFLLSCSVR,FFPPEALPAYVALKWFQACVFPMVVLQGILGCEGTWTFCTLVEFICNVMFLLVNFITTAIHKLLWTIQALISPPAFVDLSLMGLHTANRGVSLRAIFTLESKRWILEIIRLLAKHILVGSIPIIFMYNIHVLYHSSWFVDFVQLYTFVGRPVLRVDEFLDREFSLWRPLLVPLMLLLVFSHPMFPSELQRAHATSVVLVRLPVNFLFMAIQGFG,EVCLRTLLALVDHSLMASHVSHKLMLTGLCFSADWAIVFSCECVKHHVCLKPALCGERLTTSHANKIFPRLGGRLHIRFHFMSSTFNILVQFPGRL,TRRKPKLKRNKPSSESIFFNTGQNSKGSCSHDPTQFDANACRAALNGACAQLAINQNKQTSVDFMGFPIQVVEMVRFFYNVNGPKDQLACSKTHLLLLL\n
SPU_021843	SPU_021843	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. The first portion of the sequence (v2.1_scaffold28760) had an orderly continuous arrangement; however the rest of the sequence (v2.1_scaffold28760) had several internal repeats. There was some Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5	Inspection of the tiling array suggests that glean may have missed the following exons: PFFSLHHFFMNFSALLLLFTSACSLGFCPLYTSSLIHLLSSCSCFFLNFNLPVDSPRLTNYFGACRKTKQITTDVHKKRSKFLSAKEIPSNFFFYLILLTGQER,TFLPCFCFSRVLAPSGSVLSIRLPLFTCYLLALVFSLILIFLLTLQDSPTTLGPAEKQNKLQPMSIKNVQSSFLPRKYPPIFFFI,LKKKKYSWLQHYYPKVTNEQAIVACSYSQKIGARNHMYSMIFSKACAAMRMYQAVESVDVLCGEHISKLFQKEVLCWPVQSYAHVDTIHVMQEAVCRLRTC\n
SPU_024877	SPU_024877	none	Inspection of the tiling array suggests that glean may have missed the following exons: DNHGSPTVFPHRTFILSEALCGFDFFIFVSPVSFISNLPNLRTSSEKVISVNPKARSSLANLDIIIVLFLQAGSSFFSLILTHSHLLELVL\n
SPU_003207	SPU_003207	According to the BLAST results, this is the best match for this particular GLEAN model. When examining the excel data it appears that there are several internal repeats present as well as sequence overlaps. There is also a large gap present that spans from 1183-1925. There was no Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with all of the values being less than 10. 	Inspection of the tiling array suggests that glean may have missed the following exons: ERFHRNQALKKHMMRHAGNEPYPCSECNVRCLSKPGLVRHMATHSGTKDHQCCKCGKMFARPHDLRKHEQSHEEEPETYLCFICGQTFDHKKNYHAHIGTHTRRQHGGRPTCKAKDSSETSHLLNTSDGRIS,DVYPSLVLCGIWLHIVVQKTTSVANVGKCLLDHMISASMNNHMRKNLRLICVLYVARRSIIRKITMLILVRIREDSMVDVQHAKQKTLPKPLIC\n
SPU_000578	SPU_000578	none	Inspection of the tiling array suggests that glean may have missed the following exons: THTREKLYECSHCQKSFSHKGNLTQHLLTHTGEKPYECCSCKKGFSQKSTLNCHILTNGRKAIRVFTTKTHIAEKPYVHIVVNGFLKKVLLNIYVRKVFLTNAISHPTPTNTHRKKAL\n
Sp-Gg3	SPU_030085	none	glean inferred from est data\n
SPU_017141	SPU_017141	none	This gene is present on three GLEAN predictions. SPU_017141 contains the first ~890 AA and SPU_012930 and SPU_021952 have the rest. SPU_012930 is the largest piece with ~1400 aa.\n
SPU_016711	SPU_016711	none	Duplicate prediction. SPU_007675 is complete gene.\n
SPU_027895	SPU_027895	After reviewing the data and performing a BLAST search, it appears that there is no sufficient fit for this particular GLEAN model. The sequence is distributed onto 2 different scaffolds that contain several gaps as well as overlaps. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong overall with most of the values being <10	SPU_027895 contains the first part of the gene. SPU_026806 codes for the tail end. SPU_013513 is a duplicate prediction for SPU_026806.\n
SPU_004897	SPU_004897	none	Likely missing 3' exon not presenton contig.\n
SPU_016748	SPU_016748	From the BLAST results and the excel data, it is evident that this is the best results for this particular GLEAN model. When examining the excel data, it was apparent that the sequence did not begin until the 164th base pair and the there was a small gap within the sequence as well that spanned from 280-407. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5. 	One of 2. Overlaps with GLEAN3-17385, and matches exactly in the overlap. This sequence is longer in the C term, and appears to lack the start codon. Only the c-terminal 2/3 of the protein matches with cdc2L1, and the string of Es appears to not be correct.\n
SPU_014846	SPU_014846	From the BLAST search, it appears that this is the best fit for this particular GLEAN model. This scaffold appears to have the best overall sequence coverage when compared to the rest of the results. However, it does not have the highest bit score or lowest e-value results. When reviewing the excel data, it is evident that not all of the BLAST values coincide with the excel values resulting in base pair differences between the two sets of data. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak overall with the majority of valse being <5	1 of 2, the other is SPU_018780. These 2 are overlapping and nearly identical where they overlap, although each has gaps with respect to the other.\n
SPU_024030	SPU_024030	none	SPU_010849 is likely a duplicate prediction for SPU_024030\n
SPU_011815	SPU_011815	none	not the greatest homology. Possibly not correct.\n
SPU_024002	SPU_024002	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When examining the excel data and comparing it tot the BLAST results it is evident that the sequence doesn't begin until the 27th base pair (on  v2.1_scaffold23028 ) and continues until approximately the 590th base pair. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being <10	See SPU_016689\n
SPU_009091	SPU_009091	From the BLAST results and the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. Both scaffolds appear to have an orderly arrangement without any internal repeats or gaps present. If the two scaffolds were combined, the sequence would have a continuous arrangement with good overall coverage. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to weak with most of the values being less than 10.	Scaffold_79280 missing 5' (leader) and 3' end (remainder of serine protease domain), probably because of incomplete sequence data.\n
SPU_028187	SPU_028187	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. These two scaffolds both had the highest bit score and the lowest e-value results compared to the others. If these two scaffolds were combined, the sequence would have a continuous orderly arrangement. There was some Est information available on GBrowse assembly V0.5 and the transcriptome intensity scores were not very strong with values ranging from about 1.5-8. This is an un-annotated gene so no additional gene information (comments) was available.	Scaffold_80160 missing 5' start (leader sequence), one exon of vWF domain and 3' end (remainder of serine protease domain), probably because of incomplete sequence data.\n
SPU_001826	SPU_001826	none	SPU_001826 predictions may be incomplete. SPU_009266 matches partially completely with SPU_001826.\n
SPU_018535	SPU_018535	none	SPU_012748 appears to be a duplicate prediction for SPU_018535.\n
SPU_025813	SPU_025813	none	In complete prediction.\n
SPU_015789	SPU_015789	From the excel data as well as the BLAST results, it is evident that this is the best fit for this particular GLEAN model. When examining the excel data, it was apparent that there were no internal repeats or gaps present within the scaffold resulting in good overall coverage. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5.	#\nThe first and second exons and the intron between them were accepted to this modified gene model. The nucleotides of them has 88% identity to another Sp-Tlr gene (SPU_015066). The 3'end of the model is located at the end of a contig. \n
SPU_020045	SPU_020045	After reviewing the data and performing a BLAST search, it appears that there is no sufficient fit for this particular GLEAN model.. There are numerous repeats and sequence gaps within the scaffolds. There was no Est support available and the transcriptome intensity scores appeared to be very weak.	#\nThere is 1414bp of unkown sequence (NNN) in this gene model, which could make it incomplete.  The nucleotides except unknow sequence have 85% identity to another Sp-Tlr gene(15303). So it could be a member of Toll-like receptor. \n
SPU_014625	SPU_014625	After reviewing the data it appears that there is no sufficient match for this particular GLEAN model. The data appears to be dispersed between 3 scaffolds.  There is EST information available from GBrowse V0.5 and the transcriptome intensity scores appear to be some what strong (averaging about a 15).\nAdditional information found on Baylor under gene information (comments): Does not clade with human Ppm1g in phylogenetic analysis.	#\nDoes not clade with human Ppm1g in phylogenetic analysis.\n
SPU_015511	SPU_015511	none	#\nThis gene model may represent a pseudogene or contain a sequence error. 450bp of 3'UTR was accepted to a coding region that encodes a TIR domain.  \n
SPU_005149	SPU_005149	After reviewing the data and performing a BLAST search, it appears that there is no sufficient match for model_SPU_005149. The sequence was distributed onto 4 different scaffolds. The BLAST search indicated that the best results were in fact a short stretch of the sequence that only covered 231/996. This portion of the sequence did not have very good coverage but did however have the lowest e-value and bit score when compared to the other results. If the 4 scaffolds were combined, the sequence would have a continuous, orderly arrangement with no repeats present. There was no Est information available from GBrowse assembly V0.5 and transcriptome intensity scores appeared to be strong (5-42) and had a bell shape curve distribution.  This is an un-annotated gene so no additional comments were available from Baylor annotations (comments). 	Incomplete gene model: expected N-terminal parts are absent\n
SPU_016411	SPU_016411	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. When reviewing the excel data it appears that both sequences have an orderly arrangement without any gaps or repeats present. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with half of the values being less than 5 and the other half greater than 5.	Blasts to PTPRM, but phylogenetic analysis showed that it was does not clade with the PTPR K/M/T/U group.  Renamed PTPRorph1. Partial sequence. \n
SPU_000971	SPU_000971	After reviewing the data and performing a BLAST search it appears that the sequence is distributed onto 4 different scaffolds. However, the data appears to continuous if scaffold >v2.1_scaffold29491 was combined with scaffold >v2.1_scaffold79426. There was no Est information available from GBrowse V0.5 and the transcriptome intensity scores appeared to be strong with values ranging from 5-33. There was no annotated information available through the SpBase website regarding this gene. 	Missing C-terminus.  See SPU_022506.  \n
SPU_025257	SPU_025257	After reviewing the data and performing a BLAST search, it appears that this is the best results based on the overall coverage, bit score, e-value, and orderly arrangement of the sequence. However, when examining the excel data and the BLAST results, the entire sequence appears to be repeated within the same scaffold results.	This gene model doesn't have a TIR domain. The nucleotides encoding SP, NT, LRR(15-23), CT have 88% identity to another Sp-Tlr gene (21420). This model is located at the end of a contig, which could make it incompelte.  \n
SPU_025613	SPU_025613	From the BLAST results as well as the excel data, it is evident that this is the best fit for this particular GLEAN model. When reviewing the excel data in comparison with the BLAST results it was clear that this scaffold contained an orderly arrangement without any gaps present. However, there was two base pair duplicates within the scaffold (from 1-321 and from 322-1746) that may be apart of the sequence. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with most of the values being less than 6.	#\nThis gene model doesn't have a TIR domain.  The nucleotides encoding SP, NT, LRR(9-19), CT has 94% identity to another Sp-Tlr gene(07850).  The model is located at the end of a contig, which could make it incomplete. \n
SPU_023193	SPU_023193	none	This gene model doesn't have a TIR domain. But the nucleotides encoding SP, NT, LRR(10-21), CT have 87% identity to another Sp-Tlr gene (28576). There seems to be an assembly error in the contig of this model, which may make it incomplete.   \n
SPU_027445	SPU_027445	none	#\nThis gene model doesn't have a TIR domain. But the nucleotides encoding SP, NT, LRR(15-23), CT, TM have 88% identity to another Sp-Tlr gene (23035). This gene model is located at the end of a contig. \n
SPU_006683	SPU_006683	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When examining the excel data and comparing it to the BLAST results, it is evident that both scaffolds have an orderly arrangement about them. If the 2 scaffolds were combined, the sequence would have an orderly continuous arrangement without any gaps or internal repeats present. There was no Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appear to be weak with all of the values being >10.	Model_must_be_split_in_2. Transcriptome data indicates that Glean may have falsely predicted the following exons: 3,14.\n
SPU_022686	SPU_022686	none	Similar to R-PTP-mu.  Partial sequence. See also SPU_006528, SPU_016411, SPU_018743,  and SPU_026582. \n
SPU_016669	SPU_016669	none	Similar to Dual specificity protein phosphatase 3. Partial sequence.\n
SPU_028174	SPU_028174	none	no domains were detected\n
SPU_018743	SPU_018743	none	Similar to R-PTP-mu. Partial sequence.  See also SPU_006528, SPU_016411, SPU_026582, and SPU_022686.\n
SPU_018392	SPU_018392	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. If the two scaffolds were to be combined, the overall sequence would have an orderly arrangement without any gaps or repeats present.  There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat weak with most of the values either being 5 or less. 	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: LFSAGKLDKSGSPRHIFDDGIAAKRPRITLPPAPRLRSLPYNTPPVAPLRSEAHRREVAPQTQPSFHPRHGQAVTSPNDIEDQRQVLVSDHAQRPARSHLVQSHHILQRNHLQRQQQHHHHHLLPQQHSLVSLLREPVVTTSPAFERLGIGPRAVTGNEAGSASGMPQTRASPVCDSCTDGAGCWKEMTGIGCKLETKELWDRFHELGTEMII\n
SPU_013689	SPU_013689	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 1,2.\n
SPU_027446	SPU_027446	none	Matches_SPU_027446.\n
SPU_000129	SPU_000129	none	Matches_SPU_000129. Transcriptome data indicates that Glean may have falsely predicted the following exons: 4,9,12.\n
SPU_000749	SPU_000749	After reviewing the data and performing a BLAST search it appears that this is the best match for this particular GLEAN model. When comparing the excel data to the BLAST results it is apparent that the sequences do not correspond. The excel data indicates that the sequence is distributed onto 2 different scaffolds that have a continuous and orderly arrangement until about 528. However, there are several internal repeats present within this scaffold. The BLAST results display the same results, but the sequence terminates at 342 instead of 528. There was no Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with most of the values being less than 10.	ligand binding domain is found in SPU_011061\n
SPU_017375	SPU_017375	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 1,2,3,6,7.The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: EELTVEDGKDSHVPPLLIYLNDEYSESVLDMLLLAHLKIEDKIGIYHVLPLHCWTHQFHYHWTGFYLHHSTAISEWEVQLLRYYSLGQGSVVFCLLSLPSRHELDEYYRHHLLY,PPIPYQDGIATKIGAKPTFKSLFLKDPILALKCFFGPAVPASYRLQGPHVWSGARDTIMNVWQNTVSGTKFRDTPIANGPEGYPIALKLIFLVCIVAGLYLAMM\n
SPU_027623	SPU_027623	none	Matches_SPU_027623.\n
SPU_000424	SPU_000424	none	Matches_SPU_000424.\n
SPU_001739	SPU_001739	After reviewing the data and performing a BLAST search it appears that the sequence is distributed onto 2 different scaffolds. The first scaffold (v2.1_scaffold72707) is unique in that the first 952 base pairs are covered continuously. The second scaffold (v2.1_scaffold64493) however has one small gap from 1313-1252 that is apparent within the BLAST results and the excel data. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be widely distributed across the graph with most of the scores averaging about 5. 	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: RYKHYAENYFTITAKQENVKRKSCNSIRLTVVQCIVIYNYFCKNQRGFLTPLEQEATKDSRKMRHLYNVPCHSIPSLFVAFNSLYFLFPQKPPNSPLLIESQNY,NPICCENLLFFRSHHPIPCGLARYNLKYGRRNKNLISSTGSRNRKREKRNFVLQRQGGVLERNNRLGDAVTWILERMRVKREFIHLSTA\n
SPU_012122	SPU_012122	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: CNTFNVFIGILQPINELQGEGDKNRSLTWQKGFCVACDLSVCYDIQISDIHSIFAAVFCRQLLAVMLEHGLILYSRYVLQGTFLLSSPCHKERVVEFLSFSKVNFAAESMLVL\n
SPU_009602	SPU_009602	After reviewing the data and performing a BLAST search, it appears that this is the best fit for this particular GLEAN model. When examining the excel data, it is evident that the sequence doesn't begin until 754, however, the rest of the sequence contains an orderly and continuous arrangement until the end. There was no Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be distributed into 2 distinct clusters that had overall weak scores. 	SPU_009602 may be a duplicate prediction for SPU_028213.\n
SPU_003711	SPU_003711	From the BLAST results as well as the excel data, it was determined that for this particular GLEAN model the sequence is distributed onto 2 different scaffolds. When examining the excel data, it is evident that both scaffolds contain an orderly arrangement without any gaps or repeats present. There is however a sequence overlap between the two scaffolds that occurs between 704- 949. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with most of the values being less than 5. 	Missing N-ternimus.  See SPU_006561, _01698, _00076.  \n
SPU_006742	SPU_006742	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto 4 different scaffolds. After reviewing the excel data in comparison with the BLAST results it is clear that there are numerous gaps and internal repeats present within the 4 different scaffolds, resulting in poor overall coverage. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with all of the values being less than 5.5.	Hh signaling pathway regulator\n
SPU_008268	SPU_008268	After reviewing the data and performing a BLAST search it appears that no good GLEAN model fits sufficiently. The sequence is distributed onto 3 different scaffolds. There is some Est information available on GBrowse V0.5 and the transcriptome intensity scores appear to be somewhat strong with numerous values ranging from 3-15. Additional information from Baylor gene information (comments): Has EGF and BNR repeats but no N-terminal reeler domain - probably a fragment. SPU_023409 is very similar in structure	Has EGF and BNR repeats but no N-terminal reeler domain - probably a fragment. \n \nSPU_023409 is very similar in structure\n
SPU_028463	SPU_028463	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. There are several sequence overlaps and internal repeats present within the first scaffold that ranges from 770-857. However, the overall sequence does contain an orderly arrangement.  There was some Est support available from GBrowse assembly V0.5 and the trancsriptome intensity scores appeared to be some what strong with most of the values being greater than 5. 	The C-terminal portion of this GLEAN is clearly MAPKAPK5, whereas the N-terminal portion appears to be CG12134-PA (BLAST XP_788954.1). The sequence below has been modified to correspond only to MAPKAPK5. The MAPKAPK part is partially duplicated by SPU_021161. \n \nMWSLGVIIYIMLCGYPPFYPDTPSRQLSKDMRHKIMAGQYEFPTEEWSLISDEAKDVVKRLLRVDPTERLTIEELCSHPWLRENSAPNTELHSPAIMLDKNMLDDAKQIHSEQLTAMRIPDKKVMLKPVAKANNPIVRKRILTRGQSIDNKIGEEQPPKKQNRENSEGVTCLRNIIAHCIVPPKDANGEDALCELMKRACQYNRDCPSLDKALNNLSWNGEQFCDKVDRSELALLLKDIVDQKERHEKC\n
SPU_016527	SPU_016527	none	novel architecture - TSP1 plus LamG x2 - no homologs known\n
SPU_024020	SPU_024020	From the BLAST results as well as the excel data, it appears that the sequence is distributed onto two different scaffolds. When examining the excel data, it was apparent that there were several base pair duplicates and sequence overlaps within both scaffolds. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with all of the values ranging less than 5	novel architecture - C-terminal FBG domain - might be related to a role in immune defense. Has pfam:Nacht domain - NTP-binding?? \n \nBlast match to tenascin is probably misleading\n
SPU_020611	SPU_020611	none	This gene appears to be missing an exon encoding SLLHLITQYLNPRTLSKDFQGK (aas 213-234).  \nThis is an overlapping identical duplicate of SPU_025452.\n
SPU_025452	SPU_025452	none	Also BLASTs strongly to XP_791076.1. Appears to be an overlapping identical duplicate of SPU_020611\n
SPU_017070	SPU_017070	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 3 different scaffolds. The first stretch of the sequence is on v2.1_scaffold14511 until 1080 is reached. The sequence is continued on v2.1_scaffold77083 until 1299 is reached. And the sequence is completed on v2.1_scaffold74943. When comparing the BLAST results with the data from excel, it is apparent that if the three scaffolds were to be combined the sequence would have a continuous and orderly arrangement. There is Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appear to be strong with most of the values being greater than 10.	Comparison to best blast hit show that the following exons are not conserved with other M12A class proteins: \n>SPU_017070|Scaffold1519|80742|80817| DNA_SRC: Scaffold1519 START: 80742 STOP: 80817 STRAND: +  \nATTTCTGCAGCCTGTCGCGAGAGATATTGGTTGCGGTTGCAATCTCGACGAGACTCTTCCAACATATGAA \nGAGCAT \n>SPU_017070|Scaffold1519|82021|82153| DNA_SRC: Scaffold1519 START: 82021 STOP: 82153 STRAND: +  \nGATGACGCCGAATTTTGAAAATCAACTCTGTATGTCTGCCCATCGATTGGTACCGAGTCGTCATGTAGAA \nAATTGATACACGATCTGTTTGATATGTACTGCATAGTTTCCTCAATTACAGTTCTGAATGATT \n>SPU_017070|Scaffold1519|83467|83734| DNA_SRC: Scaffold1519 START: 83467 STOP: 83734 STRAND: +  \nCTAACTCGCTATCGATCTCATACGGTAGTGTTGCGTCGGGCCATGTCGCTCCAGTTTCCACATTCCTCTT \nGGTCCTACTCCCGTTGCCATTGTGACCATCCTCTTCCATGAACTTCTTCTGCTCTTCAGTAAGGCGGATA \nTCTCCCAGGATGACGTCACCTGGATTCAGATTGTCCATTGGTTTGCTATGCTGTTCGGACTCAGCTTCCG \nCATTGTGAGGGCGCGCCAACACCGTATCGTCAACGTCTTTCTTGAATGGTGGCAGAGA \n>SPU_017070|Scaffold1519|85911|86121| DNA_SRC: Scaffold1519 START: 85911 STOP: 86121 STRAND: +  \nTGATTTTCCCTTGTCATCGACGAAATCATGGTCGTGATCACTGTTGAGAGGAAGACGTTCGTCATCGACT \nGTCGTCGTGAATACGGCGACAGCTAGGCAGAGTAACAGTACAGAGCTCAGGCAAATCCTTTTCATCATTT \nTTTCGTTCCGGCGTTTCGGCAAAGCGACGAGATTCTCCAAACCAACGGAGTGACAGTAGTAAGCAGCAGT \nC \n
SPU_016807	SPU_016807	none	Multiple EGFCa repeats and three LamG domains interspersed before a TM segment \nLooks rather like Crumbs in overall organization but larger. \n \nSPU_020365 is similar\n
SPU_004586	SPU_004586	From the excel data and the BLAST results, it is evident that the sequence is distributed onto 2 different scaffolds. When reviewing the excel data, it was discovered that there was one base duplicate (399-672) that may be apart of the sequence. There was also a sequence overlap between the 2 scaffolds that spanned from 19-90 within the second scaffold that was not apart of the orderly arrangement of the rest of the sequence. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak all of the values being less than 5.5. 	short with TM/LDLA/TY domains\n
SPU_027371	SPU_027371	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When reviewing the excel data, there appears to be several sequence overlaps present on v2.1_scaffold44545 resulting in no orderly arrangement within the last portion of the sequence. These sequence overlaps and repeats were also apparent from the BLAST results.  There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores were difficult to determine due to the scores being in close proximity to each other and the scale of the graph. This is an un-annotated gene so no additional gene information (comments) was available from Baylor annotations.	LNB-7TM receptor - lots of calx-B repeats \n \nHOMOLOGOUS WITH "VERY LARGE G PROTEIN-COUPLED RECEPTOR 1, VLGR1/MASS1/GPR98 - mutated in Usher syndrome 2C \n \nPREVIOUSLY CHORDATE RESTRICTED\n
SPU_022164	SPU_022164	none	an exon may be missing\n
SPU_026145	SPU_026145	After reviewing the data and performing a BLAST search, it appears that the there is no good GLEAN model fit for SPU_026145. The sequence is distributed onto two different scaffolds. There are numerous repeats within both scaffolds as well as gaps and sequence overlaps. When compared to the other scaffolds the BLAST results indicated that >v2.1_scaffold34698 had a relatively low bit score and a high e-value. There was Est. information available from GBrowse assembly V0.5 and the transcriptome score intensity appeared to be somewhat weak with most of the values in below 5. This is an un-annotated gene so no additional gene information (comments) was available from Baylor annotations. 	these other Glean3 sequences also have high similarity to endonuclease-reverse transcriptase: SPU_014262, SPU_024197, 02879\n
SPU_028480	SPU_028480	none	First half completely predicted. Last half of the gene missing. SPU_017903 is a partial duplicate prediction.\n
SPU_028711	SPU_028711	none	Possibly missing an exon in the middle.\n
SPU_012383	SPU_012383	After reviewing the data it appears that there is no good GLEAN model that fits sufficiently to SPU_012383 due to the number of gaps and internal repeats present. For the first part of the sequence (from approximately 1-2000 base pairs) it appears that there's very poor sequence coverage which was indicated by the BLAST results. There were very low bit scores and e-value results for the first part of the sequence. There was no Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appear to be somewhat strong with values ranging from about 3-26 with a bell shaped (curve) distribution of scores. This is an un-annotated gene so no additional comments were available.	KR-FA58Cx3-CLECT-EGFCa-CUB X3-LDLa x6-LRR 7TM_1 \nNo GPS but looks like a bit like a member of the LNB-7TM family of adhesion domain GPCRs or like a glycoprotein hormone receptor - 7TM_1 favors latter \nNo known LDLa, KR or FA58C members of LNB7TM GPCR family \nNovel architecture\n
SPU_019367	SPU_019367	none	SPU_022254 is a partial sequence of this entry\n
SPU_006268	SPU_006268	none	only domain it contains is an ADAMs spacer\n
SPU_014069	SPU_014069	none	Similar to SpRag1L (SPU_027600), a sea urchin Rag1-like gene.  One of the more complete matches.  Probably a pseudogene.  Matches Rag1 core region, but c-terminal matching (SpRag1L 789-879) is attached to N-terminal. Region of match is SpRag1L: 380-874, ~39% AA identity. Siminlar to SPU_009909.  \n
SPU_019658	SPU_019658	After reviewing the data and performing a BLAST search, it appears that there is no sufficient fit for this particular GLEAN model. The sequence is distributed onto 3 different scaffolds. Within the first scaffold, there is an internal repeat (one duplicate) present from 1-82. Besides this repeat, the sequence has an orderly and continuous arrangement. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be	Similar to Receptor-type tyrosine-protein phosphatase delta precursor (Protein-tyrosine phosphatase delta) (R-PTP-delta), Partial sequence\n
SPU_015315	SPU_015315	none	may be trucated at c terminus\n
SPU_008886	SPU_008886	none	Similar to UBXD2.\n
SPU_014864	SPU_014864	According to the BLAST results in comparison with the excel data, it appears that this is the best match for this particular GLEAN model. When reviewing the excel data, it was apparent that there were several internal repeats present that occurred between 2358-2689. The scaffold was also truncated at 3325.  There was some Est support available from GBrowse V0.5 and the transcriptome intensity scores appeared to be weak with most of the values being less than 5 (excluding an outlier value of 43.)	#\nthis hsp has a speract domain on the carboxy terminus. assigned to hsp70(3) family along with other hsps containing hsp70 domain and some gamete domain. needs additional verification.\n
SPU_016500	SPU_016500	none	blastp shows 50% alignment to actin domain not seen in other urchin hsp70s\n
SPU_024947	SPU_024947	none	3' partial  \nSPU_024946 belongs to 5' end of SpWntA\n
SPU_024669	SPU_024669	none	3' partial \nSPU_023065 is an identical duplicated fragment \nSPU_023463 belongs to 5' end of SpWnt4  \n \nReference: \nFerkowicz,M.J., Stander,M.C. and Raff,R.A. \nPhylogenetic relationships and developmental expression of threesea urchin Wnt genes \nMol. Biol. Evol. 15 (7), 809-819 (1998)\n
SPU_023065	SPU_023065	none	3' partial \nSPU_024669 is an identical duplicated fragment \nSPU_023463 belongs to 5' end of SpWnt4  \n \nReference: \nFerkowicz,M.J., Stander,M.C. and Raff,R.A. \nPhylogenetic relationships and developmental expression of threesea urchin Wnt genes \nMol. Biol. Evol. 15 (7), 809-819 (1998)\n
SPU_005686	SPU_005686	none	Incomplete KH-domain in the Nter end of SPU_005686 predition. \n3 additional exons found in scaffold20583. \nOnly 2 KH domains are present on the scaffold, PCBPs family ususally contains 3. \n
SPU_023298	SPU_023298	none	Partial sequence\n
SPU_000337	SPU_000337	none	SigPep-SRCR(4).  Possibly incomplete.  \n
SPU_028422	SPU_028422	For this particular GLEAN model there was no Cds information available from Baylor annotations or SpBase. However, there was mRNA information from Spbase. When reviewing the excel data it appears that the sequence is distributed onto 2 different scaffolds based on the sequence starting on subject gb|DS012727| and continuing on gb|DS003972|. \nAdditional gene information from Baylor annotations (comments):\nThere wasn't a good checkbox for the problem here.  This looks to be an assembly error, where 2 contigs were inappropriately joined, cramming together to unrelated proteins into one model.  The first several exons are a copy of the more properly assembled SPU_0XXXXX.  Then there is a short repeated region (unmerged alleles), and finally the gene in question.	There wasn't a good checkbox for the problem here.  This looks to be an assembly error, where 2 contigs were inappropriately joined, cramming together to unrelated proteins into one model.  The first several exons are a copy of the more properly assembled SPU_0XXXXX.  Then there is a short repeated region (unmerged alleles), and finally the gene in question.\n
SPU_000492	SPU_000492	none	SRCR(4) + 1 partial.  Gene probably partial.\n
SPU_007587	SPU_007587	none	not full length\n
SPU_024316	SPU_024316	none	Not complete\n
SPU_025548	SPU_025548	none	C terminus is missing\n
SPU_002306	SPU_002306	none	#\nEGF/CCP/EGF/ZP\n
SPU_000323	SPU_000323	none	 fragment\n
SPU_000359	SPU_000359	none	 fragment\n
SPU_001895	SPU_001895	After reviewing the data and performing a BLAST search, it appears that this is the best results for this particular GLEAN model. When comparing the BLAST results to the excel data, it was evident that the sequence had poor overall coverage as well as a low bit score and a high e-value. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with all of the values being less than 5. 	 fragment\n
SPU_001910	SPU_001910	none	 fragment\n
SPU_001973	SPU_001973	none	 fragment\n
SPU_004141	SPU_004141	none	 fragment\n
SPU_004142	SPU_004142	none	 fragment \nidentical to SPU_000281\n
SPU_009625	SPU_009625	none	 small fragment\n
SPU_009697	SPU_009697	none	 fragment\n
SPU_014512	SPU_014512	none	 fragment\n
SPU_014648	SPU_014648	none	 fragment\n
SPU_020014	SPU_020014	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. If the two scaffolds were combined, the sequence would have an orderly and continuous arrangement without any repeats or gaps present. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak overall with all of the values being <10	 fragment\n
SPU_022989	SPU_022989	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto two different scaffolds. When examining the excel data, it is apparent that both scaffolds contain an orderly arrangement without any internal repeats or gaps present. If the two scaffolds were to be combined, the overall sequence would have good coverage. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with all of the values ranging >7.	 fragment\n
SPU_023849	SPU_023849	none	 fragment\n
SPU_013476	SPU_013476	When a BLAST search was done on this particular GLEAN model, an error message was received indicating that he server encountered an internal error or misconfiguration and was unable to complete the request. When examining the excel data, it was evident that there were numerous repeats throughout the sequence which was probably the reason for the error message received.  The transcriptome intensity scores had an unusual distribution with three clusters of information dispersed along the graph. The scores did in fact appear to be strong, with most of the values being greater than 10. There was also some Est support available from GBrowse assembly V0.5. 	5 CADH repeats-no TM-also N-terminal ANK repeats-possible fragment/possible concatenation\n
SPU_004266	SPU_004266	none	Annotation entered by Bob Obar (robar@scientist.com).\n
SPU_005500	SPU_005500	none	C1q-related\n
SPU_006578	SPU_006578	none	C1q-related\n
SPU_009020	SPU_009020	none	C1q-related\n
SPU_000433	SPU_000433	none	Single CADH domain-nothing else-obviously a cadherin fragment\n
SPU_021086	SPU_021086	After reviewing the data from the excel file and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. The bit score and e-value for v2.1_scaffold28739 were better than v2.1_scaffold20830 and the percentage of identities (base pair matching) was better as well. This resulted in v2.1_scaffold28739 coming up as the better match according to BLAST even though v2.1_scaffold20830 has more sequence coverage. If the two scaffolds were combined, it appears that they would have an orderly continuous arrangement. There was no Est information available on GBrowse V0.5 and the transcriptome intensity scores appear to be somewhat strong with most of the values being greater than 5. 	2 CADH domains and TM-short cytoplasmic domain-probable partial cadherin fragment of vertebrate type\n
SPU_013481	SPU_013481	From the BLAST results as well as the excel data, it was evident that the sequence is distributed onto two different scaffolds. When examining the excel data, it was apparent that both scaffolds contained an orderly and continuous arrangement without any gaps or repeats present. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being <5.	has LamNT-looks incomplete-could be a laminin or a netrin\n
SPU_011016	SPU_011016	none	olfactomedin-related collagen\n
SPU_001769	SPU_001769	none	has LamNT-Looks incomplete-could be a laminin or a netrin\n
SPU_027729	SPU_027729	none	SEA and EGF\n
SPU_015305	SPU_015305	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. There is one sequence overlap within the first scaffold from 3-91, but if this overlap is discarded the overall sequence has an orderly arrangement. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being greater than 10.	Unknown architecture-possible gene prediction issue\n
SPU_006054	SPU_006054	none	cf SPU_006053\n
SPU_024289	SPU_024289	none	cf SPU_006053\n
SPU_001609	SPU_001609	none	has FN3-GPS-7TM\n
SPU_021832	SPU_021832	none	has CLECT-FN3-GPS-7TM\n
SPU_028401	SPU_028401	none	Ig5/FN3/TM - may be a fragment\n
SPU_027687	SPU_027687	none	EGF/FN3-3/TM - see adjacent gene- missing kinase\n
SPU_021019	SPU_021019	none	two FN3 - could be ECM or receptor\n
SPU_000454	SPU_000454	none	Ig/FN3-4 - weak match with Nr-CAM\n
SPU_025433	SPU_025433	none	Ig8/FN3\n
SPU_000736	SPU_000736	After reviewing the data and performing a BLAST search it appears that this is the best match for this particular GLEAN model based on orderly arrangement, no repeats, and sequence coverage. There was Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values greater than 5 and several greater than 10.	all FN3 - could be ECM or receptor\n
SPU_000925	SPU_000925	From the BLAST results as well as the excel data, it is evident that this is the best results for this particular GLEAN model. When examining the excel data, there appears to be several gaps present within the sequence. These gaps however can be filled in with base pairing information from:\n                                                          Score    E\nSequences producing significant alignments:              (bits) Value\nv2.1_scaffold4976                                         634   e-179\nHowever, v2.1_scaffold70594 has the best overall sequence coverage compared to the rest of the scaffold results provided from BLAST. There was no Est suppot available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be separated into 2 distinct clusters that had overall strong scores (<10.)	all FN3 - could be ECM or receptor\n
SPU_017938	SPU_017938	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When reviewing the excel data several internal repeats were present that were also apparent from the BLAST results (on v2.1_scaffold54801). There was no Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores were very widely dispersed, with most of the values being less than 5 with exception to a few outliers. 	all FN3 - could be ECM or receptor\n
SPU_022988	SPU_022988	From the BLAST results and the excel data, it appears that the sequence is distributed onto 3 different scaffolds. 	The encoded protein has a thioester site and histidine in the N terminal direction that may function in target choice.  The sequence has a cleavage site between the alpha and beta chains, however, the beta chain is very short and the alpha chain is long.  \n \nGLEAN3-22988 overlaps with GLEAN3-19601.  See alignment with this annotation.  It is not clear why there is an overlap, but it may be an assembly problem. \n \nThe sequence overlaps with SPU_019601\n
SPU_013800	SPU_013800	none	all FN3 - could be ECM or receptor\n
SPU_010093	SPU_010093	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. The first part of the sequence has an overlap that is apparent from both the BLAST result and the excel data. Other than this overlap within the first scaffold, if v2.1_scaffold32363 and v2.1_scaffold21524 were combined, the sequence would have an orderly continuous arrangement. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with most of the values being less than 5.	this piece does not match well with any of the perlecan gene segments - so it is not from there.\n
SPU_012073	SPU_012073	none	Partial sequence.\n
SPU_000041	SPU_000041	For this particular GLEAN model, an error message occurred indicating that the server encountered an internal error or misconfiguration and was unable to complete the request. When examining the excel data, it was evident that the sequence had numerous repeats and sequence overlaps present. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat weak with scores ranging from 5-15. 	enormous protein with three VWD/TIL, three VWC, lots of segments of low complexity and a CT at C-terminus \n \ngeneral domain composition looks mucin-like\n
SPU_017361	SPU_017361	The BLAST results displayed a blank page	enormous protein with lots of segments of low complexity (including GLTT repeats) as well as a cluster of VWD/TIL/EGFs/CCPs/EGFs near C-terminus\n
SPU_003676	SPU_003676	none	two Fas1 domains - member of a family of genes with similar structures and sequences \n \nNote that SPU_003678 has same structure\n
SPU_024352	SPU_024352	After reviewing data and performing a BLAST search it appears that this is the best fit for this GLEAN model. The sequence doesn't begin until 327 and there is one internal repeat (one duplicate) of 2182. The sequence does however have an orderly arrangement.  There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity score appeared to be weak with most of the values ranging from 2-8	NIDO, AMOP, VWD - LOOKS LIKE A MUCIN\n
SPU_006004	SPU_006004	From the BLAST results and the excel data, it is evident that the sequence is distributed onto three different scaffolds. After examining the excel data, it was apparent that the first scaffold didn't begin until the 33rd base pair, however, there were no other gaps present. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with half of the values ranging >5 and the other half <5. 	has 7 TSP1 repeats, a TM domain and its long putative cytoplasmic domain contains a dual function kinase domain (could it be aPI kinase?) \n \nAnyway, it's a novel architecture - if the gene prediction is correct.\n
SPU_020247	SPU_020247	none	NIDO, VWD AND EGF_CA TM - very similar structure to mucin4d of chickens\n
SPU_019437	SPU_019437	After reviewing the data and performing a BLAST search it appears that the sequence is distributed onto 3 different scaffolds.  The first part of the sequence is distributed onto >v2.1_scaffold81583, while the end of the sequence is distributed onto >v2.1_scaffold54361. If the 3 scaffolds were combined, there would be several sequence overlaps and repeats present. There was Est information available from GBrowse assembly V0.5 and the transcriptome information appeared to be somewhat strong with most of the values greater than 5. 	large protein with multiple TSP1, FA58C, gal-lectin and CLECT domains intermingled \n \nnovel architecture - similar to SPU_000691 and SPU_005426\n
SPU_025731	SPU_025731	none	looks like a pretty good model of an LRR/Ig membrane receptor - has a set of 6 LRR repeats (no NT) and CT domains followed by Ig and TM - there are quite a few receptors of this type in humans\n
SPU_013654	SPU_013654	From the BLAST results and the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. When reviewing the excel data, it is apparent that there are several internal repeats as well as sequence over laps present in both scaffolds. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being <5	The protein encoded by this gene has an N-terminal alpha 2 macroglobulin domain.  However, it does not have a thioester site.  There is an alpha/beta chain cleavage site and perhpas another for cleavage between alpha and gamma chains, but the site is in the wrong place.  The sequence is too short to be either a complement protein or alpha 2 macroblobulin.\n
SPU_004723	SPU_004723	none	It's possible exon 4 is larger than indicated here.\n
SPU_000743	SPU_000743	none	One of 59 models with only one clectin motif and no others\n
SPU_017810	SPU_017810	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When examining the excel data, it was apparent that the two subject hits for this GLEAN model had numerous repeats present throughout both scaffolds. There was no Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being greater than 10. 	One of 59 models with only one clectin motif and no others\n
SPU_019575	SPU_019575	After reviewing the data in the Excel file and performing a BLAST search it appears that the sequence is dispersed onto two different scaffolds. The first alignment score appears to be continuous but when the BLAST results data is examined there is no data for the first 156 bases on the first scaffold. The data for the first 156 bases appears to be on the second scaffold (>v2.1_scaffold34122). There was no Est information available on GBrowse assembly V0.5 and the transcrpitome intensity scores appear to be weak as well (>5). This is an un-annotated gene so no additional information was available in Baylor gene information (comments).	One of 59 models with only one clectin motif and no others\n
SPU_015079	SPU_015079	none	One of 59 models with only one clectin motif and no others\n
SPU_015211	SPU_015211	none	One of 59 models with only one clectin motif and no others\n
SPU_017295	SPU_017295	none	One of 59 models with only one clectin motif and no others\n
SPU_000786	SPU_000786	none	It could be a partial sequence!\n
SPU_014528	SPU_014528	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 2.\n
SPU_010068	SPU_010068	none	one DSRM domain\n
SPU_002350	SPU_002350	From the results of the BLAST search as well as the excel data, it appears that the sequence is distributed onto 2 different scaffolds.  Throughout the sequence, there appears to be several small gaps as well as sequence overlaps. There is Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be widely distributed with half of the values being weak and the other half being strong.	SRCR(13). Probably partial. (DMBT1)\n
SPU_027956	SPU_027956	After reviewing the data and performing a BLAST search, it appears that this is the best results for this particular GLEAN model. When examining the excel data, it appears that there are several internal repeats present throughout the scaffold. There is also a small gap within the sequence that ranges from 2647-2860. There was no Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with all of the values ranging below 10. 	six Ig and Ig-like repeats - probably a fragment of some adhesion protein or receptor\n
SPU_015581	SPU_015581	For this particular GLEAN model there was an error message that was reiceived when a BLAST search was done. The message indicated that the server encountered an internal error or misconfiguration and was unable to complete the request. This may have occurred due to numerous internal repeats present within the sequence. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat weak (besides 2 outliers at about 30). 	eight Ig and Igc2 repeats - probably a fragment of some adhesion protein or receptor\n
SPU_003778	SPU_003778	none	SigPep-SRCR(6). Probably incomplete. (DMBT1)\n
SPU_003930	SPU_003930	none	SRCR(10). Probably partial. (DMBT1)\n
SPU_012352	SPU_012352	none	nine Ig and Igc2 repeats - probably a fragment of some adhesion protein or receptor\n
SPU_021904	SPU_021904	From the BLAST results as well as the excel data, it is evident that this is the best fit for this particular GLEAN model. When reviewing he excel data, it was apparent that there were several internal repeats as well as gaps present. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak overall, with most of the values being >5 (excluding one outlier present at approximately 47.)	seven Ig and Igc2 repeats - probably a fragment of some adhesion protein or receptor\n
SPU_024181	SPU_024181	After reviewing the data and performing a BLAST search, it appears that there is no good GLEAN model that fits SPU_024181 sufficiently. When reviewing the excel data it appears that the sequence is distributed onto 3 different scaffolds. However, within these scaffolds there are several gaps and internal repeats present. In particular within subject gb|DS015741| there is a large gap that spans from 1113 to 1489. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat weak with most of the values being 5 and below. 	Likely C-terminal truncation due to end of contig\n
SPU_008653	SPU_008653	none	seven Ig and Igc2 repeats - probably a fragment of some adhesion protein or receptor \n \nlong low-complexity sequence surrounding Ig domains is suspicious \n
SPU_023932	SPU_023932	none	Two overlapping glean predictions match DAP5 on scaffold 22799 (SPU_023932) and scaffold 105341 (SPU_028861).  \nsee SPU_028861 for gene model\n
SPU_001726	SPU_001726	When comparing the excel data with the BLAST results, it appears that the two sets of data do not coincide with each other. From the excel data it appears that the sequence begins on subject gb|DS008761| (until 1673 is reached) and continues on subject gb|DS011195| until the end of the sequence. The BLAST results display an entirely different sequence of base pairs that are not in the excel file. The BLAST results also do not have the amount of sequence coverage when compared to the data in the excel file. The sequence in the excel file ends at 2586 bases pairs while the BLAST results sequence ends at only 1016 base pairs. 	Partial sequence \nchimera: MelB N-ter + Amt3 C-ter \n \nMelB N-ter is given by: \n>SPU_001726|Scaffold620|154587|154719| DNA_SRC: Scaffold620 START: 154587 STOP: 154719 STRAND: -  \n>SPU_001726|Scaffold620|155996|156169| DNA_SRC: Scaffold620 START: 155996 STOP: 156169 STRAND: -  \n>SPU_001726|Scaffold620|157562|157827| DNA_SRC: Scaffold620 START: 157562 STOP: 157827 STRAND: -  \n>SPU_001726|Scaffold620|161305|162403| DNA_SRC: Scaffold620 START: 161305 STOP: 162403 STRAND: -\n
SPU_026576	SPU_026576	none	The sequence coded by this GLEAN is entirely contained in SPU_015285; refer to this one for further annotation \nIn SPU_026576, exon 1 and part of exon 2 are missing.\n
SPU_000503	SPU_000503	none	contians rab domain\n
SPU_019174	SPU_019174	After reviewing the data and performing a BLAST search, it appears that the data is distributed onto 2 different scaffolds. If the two scaffolds were combined the sequence would have a continuous, orderly arrangement without any gaps or repeats present. There was no Est. information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5.	Overlap with SPU_021121.  This model appears to encode the N-terminus of SpCul-3.\n
SPU_006055	SPU_006055	none	SigPep-SRCR(4)-TM.\n
SPU_009145	SPU_009145	After reviewing the data it appears that there is no sufficient GLEAN model that fits model_SPU_009145. The sequence appears to be distributed onto 2 different scaffolds. The BLAST results also displayed a large repeat within 1-165 region of the sequence that coincided with the discarded repeated portion of the excel data. There was Est information from GBrowse assembly V0.5 and the transcriptome intensity scores are widely distributed with values ranging from about 2-95.  This is an un-annotated gene so no additional comments were available from Baylor gene information. 	SRCR(2). Probably incomplete.\n
SPU_008598	SPU_008598	none	SRCR(2). Probably incomplete. \n
SPU_007285	SPU_007285	After reviewing the data, it appears that the sequence is on two different scaffolds. Scaffold >v2.1_scaffold73334 had the lowest e-value and highest bit score results, however, the continuation of the sequence on scaffold >v2.1_scaffold22829 was about 16 down on the BLAST results list (due to higher e-value and lower bit score). The transcriptome intensity scores were low (about a 5) and the there was no EST information available. This was an un-annotated gene and no additional gene information (comments) was available on Baylor.  	contains endo/exonuclease domains and phosphatase domains and ras/rho domain\n
SPU_021469	SPU_021469	none	 extra N-terminus half\n
SPU_009220	SPU_009220	From the BLAST results and the excel data, it is evident that this sequence is distributed onto 2 different scaffolds for this particular GLEAN model. When reviewing the excel data, it is appears that both scaffolds have an orderly arrangement, however, there are several internal repeats and sequence overlaps present in both scaffolds. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with all of the values being greater than 5. 	SigPep-SRCR-WSC-CUB. Possiby incomplete.\n
SPU_019158	SPU_019158	After reviewing the data and performing a BLAST search it appears that the sequence is distributed onto 2 different scaffolds. The beginning of the sequence is covered by   v2.1_scaffold60900 while the remaining of the portion is covered by v2.1_scaffold45932. If the two scaffolds were combined, the sequence would have an orderly continuous, arrangement. There was Est support available from GBrowse V0.5 and the transcriptome intensity scores appeared to be somewhat weak with all of the values being less than 10. 	NO GPS but  7TM-1  \nfive LDLa in exodomain - there are no reported LDLa-LNB7TM GPCRs but St purp has several of this type \n \nNo GPS - looks most like glycoprotein hormone receptors \n
SPU_011222	SPU_011222	For this particular GLEAN model, when a BLAST search was done an error message came up indicating that there was a misconfiguration. When examining the excel data, it appears that there are similar sequences between the different subject queries and if the data would have been able to have been mapped onto V2.1 it seems like there may be some difficultly distinguishing between the scaffolds.  There also appears to be overlaps within the sequences resulting in no clear orderly arrangement. 	SigPep-SRCR(13)-TM.\n
SPU_010523	SPU_010523	none	SRCR(2). Possibly incomplete.\n
SPU_003206	SPU_003206	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 3 different scaffolds. However, there are several gaps within each scaffold and there are sequence overlaps between the scaffolds as well. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5. 	LRRtyp x3 - 7TM \n \nNo GPS - looks most like glycoprotein hormone receptors \n
SPU_013958	SPU_013958	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto two different scaffolds. The first scaffold contains a large gap that spans from about 370-1050.this missing information is distributed onto the second scaffold. There are also several internal repeats present within both scaffolds. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with all of the values being less than 6.5.	SRCR(2)-TM. Possibly incomplete.\n
SPU_012230	SPU_012230	none	SigPep-SRCR(4)-EGF\n
SPU_021254	SPU_021254	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When examining the first portion of the sequence it is evident that this scaffold has an orderly arrangement without any gaps or internal repeats present. The rest of the sequence is continued on the second scaffold which does contain several internal repeats present but no gaps. There was no Est. support available from GBrowse assembly V0.5. The transcriptome intensity scores were difficult to determine due to the scale of the graph.	2 EGFCa-GPS-7TM_2 \nmatches to overall pattern of LNB-7TM-GPCRs \n
SPU_006781	SPU_006781	none	See also SPU_007933.\n
SPU_013572	SPU_013572	none	multiple EGFCa - Ig -GPS-  single TM segment \nApart from absence of complete 7tm_2 domain matches to overall pattern of LNB-7TM-GPCRs \nMay be missing C-terminus\n
SPU_013579	SPU_013579	none	The amino acid sequence of this gene model is nearly identical to that of SPU_007778, although they differ at their C-termini.\n
SPU_014844	SPU_014844	none	SigPep-SRCR(3). Possibly partial.\n
SPU_014993	SPU_014993	After reviewing the Excel data and performing a BLAST search it appears that the gene sequence is on two different scaffolds. There appears to be several short internal repeats within the second scaffold however, the repeats are in fact continuous with the rest sequence. Besides the repeats, it appears that the overall sequence between the two scaffolds is orderly and has good coverage. The BLAST search indicated that both scaffolds had low e-values and high bit scores as well. There was also no Est information available on the GBrowse assembly V0.5 and the transcriptome intensity scores appear to be weak (most are >5) as well. This is an un-annotated gene so no additional information was available from Baylor under gene information (comments)	RVT_1(probable prediction error?)-SRCR(4)-TM. Possibly incomplete. See SPU_014992, 14994. \n
SPU_005260	SPU_005260	none	Gene fragment\n
SPU_025176	SPU_025176	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. When reviewing the excel data, it was apparent that both scaffolds had an orderly and continuous arrangement without any internal repeats or gaps present. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being <5	Significant overlap with SPU_011072, which encodes the C-terminal portion of ATM.  The last 7 exons of this model are from a completely different gene (an oxidoreductase), and were artifactually fused to this gene as a result of incomplete sequence on the scaffold between them.\n
SPU_022501	SPU_022501	none	Partial sequence.\n
SPU_019984	SPU_019984	none	partial sequence. Also see SPU_000776.\n
SPU_018404	SPU_018404	none	looks identical/very similar to SPU_016657\n
SPU_006272	SPU_006272	none	incomplete sequence\n
SPU_016605	SPU_016605	none	Identical to 11157 (11157 is missing aa 1-36)\n
SPU_011157	SPU_011157	none	IDENTICAL TO 16605 (EXCEPT 11157 is missing aa 1-36 of 16605)\n
SPU_001094	SPU_001094	none	PARTIAL, MISSING N-TERMINUS\n
SPU_012205	SPU_012205	none	Partial gene, missing the N-terminal half.\n
SPU_019883	SPU_019883	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 1, subfamily B, polypeptide 1 [Danio rerio]" (NP_001013285.2) is 33.33% over 423 BLAST alignment positions. 264 of 693 Muscle alignment positions masked (38.000 %; 429 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_015120	SPU_015120	For this particular GLEAN model there was no Cds information available from either Baylor annotations or SpBase. There was however mRNA information available from SpBase. There were only 2 subject hits for this query. When reviewing the excel data, it appears that there is a gap within one of the subjects (gb|DS010423|) and if the two scaffolds were to be combined, there would be an overlap between the 2 sequences (at 262-432). There was no Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with most of the values being less than 5.\nAdditional gene information from Baylor annotation (comments):\nPartial sequence.     BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " hypothetical protein LOC406641 [Danio rerio]" (NP_998497.1) is 39.04% over 146 BLAST alignment positions. 3167 of 3510 Muscle alignment positions masked (90.200 %; 343 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial, missing N-terminus	Partial sequence.     BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " hypothetical protein LOC406641 [Danio rerio]" (NP_998497.1) is 39.04% over 146 BLAST alignment positions. 3167 of 3510 Muscle alignment positions masked (90.200 %; 343 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial, missing N-terminus\n
SPU_014896	SPU_014896	none	No signal in tiling array or EST.  May be pseudogene or adult only expression.\n
SPU_026507	SPU_026507	none	five EGF - GPS - one TM segment \nApart from absence of a complete tm_2 domain looks like a member of the LNB-7TM-GPCR subfamily\n
SPU_024890	SPU_024890	none	histone H2aZ of mammals  \n
SPU_014432	SPU_014432	none	There are two macro histone H2a isoforms in the urchin genome.  The other one is SPU_015435\n
SPU_015435	SPU_015435	none	there is a second isoform or macro H2a in the urchin genome  SPU_014432\n
SPU_012547	SPU_012547	none	contains SAM (sterile alpha motif) repeats; SPU_026761 is a partial sequence of this one (different scaffolds)\n
SPU_027921	SPU_027921	none	The S. cerevisiae DMC1 gene is essential for meiotic recombination. Its encoded protein is structurally and evolutionally related to the products of the yeast RAD51 and E. coli RecA genes. The ovary is one of the high-expression sites for this gene.\n
SPU_026761	SPU_026761	none	partial sequence of SPU_012547 (different scaffolds)\n
SPU_013045	SPU_013045	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When comparing the results of the BLAST search to that of the excel data, it is evident that if these 2 scaffolds were combined the sequence would have an orderly and continuous arrangement. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being <5	allele: SPU_020544\n
SPU_019262	SPU_019262	none	SRCR(2). Probably incomplete. See SPU_019263.\n
SPU_022151	SPU_022151	For this particular GLEAN model there was no Cds information available from either Baylor annotations or SpBase. However, there was mRNA sequence information available from Spbase. When reviewing the excel data, there appears to be numerous repeats and sequence overlaps within the different subjects for this GLEAN model resulting in an un-orderly arrangement.	SRCR(2). Probably incomplete. See SPU_022145, 22146, 22147, 22148, 22149, 22150.\n
SPU_011961	SPU_011961	After reviewing the data from the excel file and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. The data from the excel file doesn't completely coincide with the data from the BLAST search. The excel data has a sequence overlap between the 2 scaffolds that aren't apparent from the BLAST search. There was Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be some what weak with most of the values being less than 5.	Identical to 22991 except for 5'end (note 11961 is missing aa 73-109 of 22991)\n
SPU_021789	SPU_021789	none	SRCR(7)-TM. Possibly incomplete.\n
SPU_024480	SPU_024480	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto two different scaffolds. When examining the excel data, it was apparent that there were no large gaps present. However, there were several internal present within both scaffolds. Overall, both scaffolds had an orderly arrangement and if the two were combined, the sequence would have good coverage. There was Est support available from GBRowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with the majority of the values being greater than 5. 	PREDICTED: similar to castor homolog 1, zinc finger \n
SPU_006006	SPU_006006	none	IDENTICAL TO 11467 (except missing aa 1-69 of 11467)\n
SPU_011061	SPU_011061	From the BLAST results, there is also differing base pair information compared to the excel data. There was no Est. information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be some what strong with most of the values being greater than 10.	This is the ligand binding domain of gcnf1, and belongs with SPU_000749 (see that model for data).\n
SPU_005073	SPU_005073	none	This appears to be the N terminus; the C terminus may be on SPU_010910\n
SPU_010777	SPU_010777	none	Fas Associated death domain.  \n
SPU_008365	SPU_008365	From the BLAST results as well as the excel data, it appears that the sequence is distributed onto two different scaffolds. When examining the excel data, it was apparent that there were several base pair duplicates and sequence overlaps within both scaffolds. However there was Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being <5.	The mouse Ogg1 gene is involved in the repair of 8-hydroxyguanine in DNA damage. \n
SPU_015545	SPU_015545	none	unclear exactly which RTK this is.\n
SPU_008719	SPU_008719	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 3 different scaffolds. When examining the 3 subject hits for this particular GLEAN model, it appears that if all three were to be combined, the entire sequence would have an orderly arrangement. However, the overall sequence does not have very good coverage. The end of the third scaffold is truncated at 2628 resulting in a gap that spans until the end of the sequence (2699.) There was Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with the majority of values being greater than 5.	Alternative splicing; DNA damage; DNA repair; DNA replication; DNA synthesis; DNA-binding; DNA-directed DNA polymerase; Magnesium; Metal-binding; Mutator protein; Nuclear protein by similarity.\n
SPU_004540	SPU_004540	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. When reviewing the excel data, it was apparent that there are several internal repeats and gaps within the 2 scaffolds resulting in a disorderly arrangement about the 2 scaffolds. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with several values being greater than 10. 	May have a couple of extra exons predicted.\n
SPU_022678	SPU_022678	none	There may be a misassembly - the zinc finger part of this protein is missing.\n
SPU_006294	SPU_006294	none	Mitochondrial DNA polymerase, DNA polymerase gamma by similarity. \n
SPU_016655	SPU_016655	After reviewing the data and doing a BLAST search it appears that there isn't a GLEAN model that fits sufficiently. Scaffold >v2.1_scaffold60391 was the best fit according to the BLAST search; however, the sequence coverage is only 609/2268. The entire sequence is distributed onto 2 different scaffolds. There was no Est information available on GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with values ranging from about 5-30.\nAdditional information from Baylor gene information (comments): \ne val for AAI10990=1e-111.\ne val for NP_112494=5e-81; kinesin family member 18A [Homo sapiens].\nAnnotation by RA Obar, RL Morris, LE Shorey, SA Tower, KM Judkins	e val for AAI10990=1e-111. \ne val for NP_112494=5e-81; kinesin family member 18A [Homo sapiens].   \nAnnotation by RA Obar, RL Morris, LE Shorey, SA Tower, KM Judkins\n
SPU_003717	SPU_003717	none	E value for NP_659464 = 5e-72 KIF6 [Homo sapiens] \nAnnotation by RA Obar, RL Morris, AL Silverio, BJ Chick, and B Rossetti. \n \n
SPU_024588	SPU_024588	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When reviewing the excel data, it was evident that the sequence did not begin until the 9th base pair and continued until 391 was reached. However, there is an internal overlap within this scaffold (v2.1_scaffold824) between 245-391 and 313-388. The rest of the sequence was continued on v2.1_scaffold68414. Besides the repeat within the first scaffold, if the 2 scaffolds were to be combined, there would be an orderly continues arrangement for this GLEAN model. There was no Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be	e val = 1e-54 for NP_112494. \nlikely a fragment, based on its short length. \nAnnotated by RA Obar and RL Morris, KM Judkins\n
SPU_021946	SPU_021946	none	Responsible for preventing misincorporation of 8-oxo-dGTP into DNA thus preventing A:T to C:G transversions (by similarity).\n
SPU_005433	SPU_005433	none	SPU_005434 is a partial duplicate prediction for SPU_005433. Looking at the Genboree data, it may represent an error in the assembly process. \n
SPU_017434	SPU_017434	none	Partial sequence containing PTP catalytic domain.\n
SPU_005434	SPU_005434	none	SPU_005434 is a partial duplicate prediction for SPU_005433. Looking at the Genboree data, it may represent an error in the assembly process.\n
SPU_017234	SPU_017234	none	Partial sequence. Contains most of the PTP catalytic domain.\n
SPU_014642	SPU_014642	none	Partial sequence containing the PTP catalytic domain.\n
SPU_000838	SPU_000838	none	Partial duplication in SPU_014788\n
SPU_002329	SPU_002329	After reviewing the information it appears that the sequence is dispersed onto 3 different scaffolds. The first scaffold is continuous for the first 128 bases and the second scaffold is unique in that it is continuous from 127-1129. If the three scaffolds were combined the sequence would appear to have an orderly arrangement without any repeats or gaps within the sequence. There was EST information available from GBrowse assembly V0.5 and the transcriptome information wasn't very strong ranging from 4-16. This is an un-annotated gene and no additional information was found on Baylor gene information (comments)	In addition to Ercc6 homology the GLEAN3 model has an addition N-terminal region with sequence match to similar to Galactosylceramide sulfotransferase (GalCer  \nsulfotransferase) (Cerebroside sulfotransferase) (3-phosphoadenylylsulfate:galactosylceramide  \n3-sulfotransferase) (3-phosphoadenosine-5phosphosulfate:GalCer  \nsulfotransferase)  \n
SPU_006632	SPU_006632	none	The expected zinc fingers were not predicted in the GLEAN3 model.\n
SPU_010672	SPU_010672	none	The N-terminal region (amino acids 1-162) of the GLEAN model show sequence match to heat shock protein.  The presence of pfam H2TH domain and C-terminal sequence similarity are to Neh-2 annotated SPU_006632. \n
SPU_001388	SPU_001388	none	alpha thalassaemia mental retardation X-linked protein\n
SPU_009913	SPU_009913	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. The beginning of the sequence (v2.1_scaffold47466) contains an internal repeat at 2190-2302, however, that may be apart of the gene sequence since it's only one repeat of one base pair. The rest of the sequence is completed on v2.1_scaffold10419, but this scaffold contains a low bit score and a high e-value when compared to v2.1_scaffold47466 (the best GLEAN model fit). There was no Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat weak with most of the values being less than 5. 	#\nThere is notable sequence match to SPU_007783.\n
SPU_022339	SPU_022339	none	SigPep-SRCR(3)-TM.\n
SPU_024440	SPU_024440	For this particular GLEAN model there was no Cds information available from either Baylor annotations or SpBarse. However, there was mRNA information available from SpBase. When reviewing the excel data it appears that there are numerous gaps and repeats present. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with all of the values being <5	SigPep-SRCR(4)-TM.\n
SPU_026849	SPU_026849	For this particular GLEAN model there was no Cds information available from either Baylor annotations or SpBase. There was however mRNA information from SpBase. When reviewing the excel data it appears that the sequence might be distributed onto only one scaffold, however; there were numerous internal repeats represent within the scaffold. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak overall, with most of the values being less than 5. 	SRCR(14)-TM. Probably incomplete. See SPU_026848.\n
SPU_028233	SPU_028233	none	SigPep-SRCR(3). Probably incomplete.\n
SPU_017888	SPU_017888	none	partial\n
SPU_026743	SPU_026743	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. The first scaffold consists of an orderly arrangement and continuous arrangement until 565. The rest of the sequence is continued on v2.1_scaffold22981. However, this scaffold contains numerous gaps and repeats present that occur throughout the scaffold. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with most of the values being less than 5. 	Very similar to C-terminal sequences of SPU_009497, SPU_011339, SPU_021561, SPU_022941, and SPU_026645.  Also has significant sequence similarity to parts of SPU_019865, SPU_010466, and SPU_002718....\n
SPU_000882	SPU_000882	none	Very similar to SPU_001683.\n
SPU_017595	SPU_017595	none	Portion of the early histone gene repeat\n
SPU_018058	SPU_018058	none	Portion of the early histone gene repeat\n
SPU_002503	SPU_002503	For this particular GLEAN model there was no Cds information available from Baylor annotations or SpBase. There was however mRNA information available from SpBase. When examing the excel data it was evident that there were numerous repeats and sequence overlaps throughout the data. An Error message was received when doing a transcriptome intensity search. However, there was some Est information available from GBrowse assembly V0.5	Portion of the early histone gene repeat\n
SPU_002577	SPU_002577	none	Portion of the early histone gene repeat\n
SPU_000093	SPU_000093	For this particular GLEAN model there was no orderly arrangement of the sequence and there were numerous subject hits. When a search was performed using the SpBase search engine there were no gene features or CDS for this model. However, the mRNA sequence was available from SpBase.	Portion of the early histone gene repeat\n
SPU_021066	SPU_021066	For this particular GLEAN model there was no Cds information from either Baylor annotations or SpBase. There was however mRNA information available from SpBase. When examining the excel data it appeared that the sequence may have been distributed onto two different scaffolds. The first portion of the sequence appeared to be on subject gb|DS008717| and the rest of the sequence was on gb|DS008632|. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with several values being greater than 5.	Portion of the early histone gene repeat\n
SPU_016038	SPU_016038	none	Portion of the early histone gene repeat\n
SPU_016047	SPU_016047	none	Portion of the early histone gene repeat\n
SPU_002504	SPU_002504	For this particular GLEAN model there was no CDS information available in both the SpBase search engine and in Baylor annotations. There was also no gene information provided in either search engine as well.\nAdditional information found: Portion of the early histone gene repeat.	Portion of the early histone gene repeat\n
SPU_024131	SPU_024131	none	Portion of the early histone gene repeat\n
SPU_025530	SPU_025530	none	Portion of the early histone gene repeat\n
SPU_025880	SPU_025880	none	Portion of the early histone gene repeat\n
SPU_019983	SPU_019983	none	>SPU_019983|Scaffold499|161923|162036|  \n>SPU_019983|Scaffold499|162415|162741|  \ncontain sequences conserved in DAN proteins.\n
SPU_003281	SPU_003281	none	Partial sequence. \n
SPU_018795	SPU_018795	none	Posible missing exon middle region, length reduced relative to the Query sequence used.\n
SPU_016042	SPU_016042	none	Possible missing N-terminal coding exon relative to Query used.\n
SPU_008287	SPU_008287	none	Blasts as myotubularin-related protein 2 isoform 2.\n
SPU_017796	SPU_017796	The BLAST results displayed a blank page. Excel data display incredibly long repeats	multiple ankyrin repeats in the encoded protein\n
SPU_012067	SPU_012067	none	Assemble fragments to obtain Query coverage \nSPU_011485 N-terminal coverage \nSPU_027393 also N-terminal from alternate region or query \nSPU_012067 middle region of query covered \nSPU_018381 C-terminus \n
SPU_028676	SPU_028676	After reviewing the data and performing a BLAST search it appears that the sequence is distributed onto 3 different scaffolds. If the 3 scaffolds were combined, the sequence would have an overall orderly and continuous arrangement without any gaps or repeats present. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being <5	1 missing Kelch repeat (only 5 of 6 expected relative to query) could indicate incomplete model near c-terminus.\n
SPU_012221	SPU_012221	none	There are 6 to 8 copies of this protein in sea urchin genome based on blast search.\n
SPU_023520	SPU_023520	From the BLAST results and the excel data, it is evident that the entire sequence is distributed onto 3 different scaffolds. The when examining the excel data it appears that if the three scaffolds were to be combined, the overall sequence would have an orderly and continuous arrangement. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with all of the values being less than 6. 	SPU_023520 is a partial duplicate prediction of SPU_023519.\n
SPU_012779	SPU_012779	none	SPU_009634 is a duplicate prediction. May be missing first exon.\n
SPU_007870	SPU_007870	After reviewing the data and performing BLAST searches, it was determined that no orderly GLEAN model fit sufficiently. The best results were on >v2.1_scaffold11280, however, only 1026 base pairs matched out of a total of 6753 indicating that the sequence did not map very well. There was EST information available on GBrowse V0.5 and the transcriptome intensity scores appeared to be strong. \nAdditional information from Baylor page (comments regarding gene information):\nThis GLEAN MAY represent the RANBP2 ortholog in Urchin. RANBP2 in humans encodes a very large RAN-binding protein that immunolocalizes to the nuclear pore complex. 	This GLEAN MAY represent the RANBP2 ortholog in Urchin. RANBP2 in humans encodes a very large RAN-binding protein that immunolocalizes to the nuclear pore complex.\n
SPU_024074	SPU_024074	After reviewing the data, it appears that there is no sufficient match for this particular GLEAN model.  There appears to be no orderly arrangement of the sequence and poor coverage. The sequence appears to be distributed onto at least 3 different scaffolds. There was no Est information available and the transcriptome score intensities appear to be widely distributed (values from 4-36). This is an un-annotated gene so no additional information was available.	2 different genes.  One portion is similar to Solute carrier family 23, member 1. The other portion is similar to PTPRT (Receptor type protein tyrosine phosphatase T).  In phylogenetic analysis, the PTPRT portion does not clade with the PTPR K/M/T/U group.  It formed a unique clade with SPU_027290...both were renamed PTPRW.\n
SPU_019209	SPU_019209	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. If the 2 scaffolds were to be combined, the sequence would have an orderly and continuous arrangement without any repeats or gaps present. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values in the 5-10 range	SPU_019209 is a significant partial duplicate prediction for SPU_006216.\n
SPU_003537	SPU_003537	none	SPU_003537 has first part of USP52 and SPU_020437 has the rest.\n
SPU_008448	SPU_008448	none	This GLEAN3 prediction is likely to be incorrect. \n
SPU_028580	SPU_028580	none	SPU_028452 is a duplicate prediction for SPU_028580.\n
SPU_028452	SPU_028452	none	SPU_028452 is a duplicate prediction for SPU_028580.\n
SPU_027408	SPU_027408	none	SPU_027408 has the first part and SPU_016447 has the rest of the EXOSC10 gene. In addition, SPU_027408 and SPU_016447 share a significant partially identical overlap.\n
SPU_025914	SPU_025914	After reviewing the data and performing a BLAST search it appears that the sequence is distributed onto 2 different scaffolds. These two scaffolds had the highest bit score and the lowest e-values compared to the rest of the BLAST results. When reviewing the excel data and comparing it to the BLAST results, it is apparent that it these two scaffolds were to be combined the sequence would have an orderly continuous arrangement, without any gaps or internal repeats present. There was no Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being <10	Domains: DEATH, NACHT, LRRs.\n
SPU_024975	SPU_024975	none	Domains: DEATH, NACHT, LRRs.\n
SPU_018168	SPU_018168	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nThis and all other Sp-SARM-related genes cluster "tightly" with Sp-Sarm (SPU_011042) and other Sarm genes in MSA trees built based on the sequence of their TIR domains. In addition, most of Sp-Sarm-related genes have similar domain compositions to that of Sarm genes (TIR, Armadillo/b-catenin and SAM domains), and those that do not are often located at the end of their respective scaffold and might be incomplete models. It should be noted, however, that the subfamily of Sarm-related genes shows a different domain organization than that of Sarms (including Sp-Sarm).\n
SPU_013299	SPU_013299	none	This seems a duplication of SPU_007952. Please see GLEAN_07952 for details. \n \nNote that the adjacent SPU_013298 is also most likely a duplication of SPU_007951.\n
SPU_014752	SPU_014752	none	This model was annotated based on reciprocal blasting and similarity of domain structure/organization with vertebrate Nck. \n \nIts embryonic expression is partly supported by signal from the tiling array hybridization data.\n
SPU_028520	SPU_028520	none	 fragment\n
SPU_028870	SPU_028870	none	 fragment\n
SPU_017129	SPU_017129	From the BLAST results as well as the excel data, it is evident that this is the best fit for this particular GLEAN model. When reviewing the excel data in comparison with the BLAST results it was clear that this scaffold contained an orderly arrangement without any large gaps present. However, the scaffold did not begin until the 36th base pair but this was the only missing sequence information. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with all of the values being less than 5. 	#\nDomains: NACHT, LRRs. This gene model is at the end of a scaffold. Could be incomplete.\n
SPU_009631	SPU_009631	none	 fragment\n
SPU_009694	SPU_009694	none	 tiny fragment\n
SPU_028868	SPU_028868	none	Partial sequence.\n
SPU_013786	SPU_013786	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 3 different scaffolds. However, there are numerous gaps and repeats within the scaffolds and if the three scaffolds were combined the overall sequence coverage would be poor. There was some Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be 	There are extra exon(s) at the end of the GLEAN.\n
SPU_008600	SPU_008600	After reviewing the data and performing a BLAST search, it appears that this is the best fit for this particular GLEAN model. The BLAST results indicated that this was the second best results based on bit score and e-value. However, this scaffold has more coverage furthermore; the differences between the bit scores and e-values were very close compared to that of the first results. There are several gaps present within this sequence, specifically from 913-987, 1083-1123 and 1416-1496. This is an un-annotated gene so no additional gene information was available from Baylor annotations (comments). There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being <10	SPU_008600 is a duplicate partial prediction for SPU_013786.\n
SPU_006848	SPU_006848	For this particular GLEAN model there was no Cds information available from either SpBase or Baylor annotations. However, there was in fact mRNA information available from SpBase. When reviewing the excel data, it appears that the sequence could be distributed onto 2 different scaffolds. Specifically onto subject gb|DS015099| and AAGJ02106196. Subject gb|DS015099 has two different base pair duplicates that occur within the sequence. Besides the internal repeats present, both scaffolds appear to have an orderly arrangement. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with all the values being greater than 10.	view GLEAN_12983 for complete CDS\n
SPU_016552	SPU_016552	none	SPU_016552 has one part and SPU_028397 appears to have the last part of DDX42.\n
SPU_021726	SPU_021726	none	contains TGFbeta_propeptide domain\n
SPU_022653	SPU_022653	none	partial sequence, contains TGFbeta_propeptide domain\n
SPU_026612	SPU_026612	none	Similar to other catalitic subunits of telomerase\n
SPU_003296	SPU_003296	For this particular GLEAN model there was no Cds information available from either Baylor annotations or SpBase. However, there was mRNA information available from SpBase. When reviewing the excel data, it appears that the sequence is distributed on to 2 different scaffolds (subjects gb|DS006852| and gb|DS007174|). It is unclear if sequence has good coverage without the BLAST results, but if these 2 scaffolds were combined the sequence would have an orderly and continuous arrangement without any gaps or repeats present. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values ranging from 5-10.\nAdditional gene information from Baylor annotations (comments):\nThis is highly correlated with two highly conserved regions:\n1) cytidine_deaminase-like region (2e-14);\n2) SNF7 family (9e-22).\nBut an alternative splicing gives different regions with less highly correlated scores; thus, this one is consideredmore likely.	This is highly correlated with two highly conserved regions:  \n1) cytidine_deaminase-like region (2e-14) \n2) SNF7 family (9e-22) \nBut an alternative splicing gives different regions with less highly correlated scores; thus, this one is considered more likely\n
SPU_028359	SPU_028359	For this particular GLEAN model there was no Cds information available from Baylor annotations or from SpBase. There was however, mRNA information available from SpBase. When examining the excel data, it was apparent that there was numerous repeats present resulting in an un-orderly sequence arrangement. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being <5	Highly correlated with highly conserved region identified as nucleoside_deaminase. The Accession number is from a zebra fish protein that had not been identified.\n
SPU_026280	SPU_026280	After reviewing the data and performing a BLAST search it appears that this is the best fit for this particular GLEAN model. When comparing the excel data with the BLAST results, the sequence data does not coincide. The excel data displayed a large sequence of repeats and the coverage was only until about 658, while the BLAST results indicated that the sequence ended at about 2120.  There was Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores were very low (below 5). 	This is KRP95 of kinesin-2 cloned from Sp by Cole et al, 1993. \ne val = 3e-37 for NP_004789, KIF3B [Homo sapiens] \nThis is amino terminal portion of protein in 11 exons.  Two other scaffolds required to complete the gene.  KRP95 continues on scaffold91496 (one exon) then scaffold58510 (eight exons). \nAnnotated by RA Obar, RL Morris, B Rossetti, AM Musante, and EJ Jin.\n
SPU_021526	SPU_021526	From the results of the BLAST search in comparison with the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. Both scaffolds contain several gaps present, however from the excel data, it appears that where ever one scaffold is missing information the other scaffold will fill it in. There was some Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5. 	Exon 2 is missing from Scaffold1158 due to a string of N's but is present on Scaffold181669.\n
SPU_007007	SPU_007007	none	An incomplete MCM8 sequence is also found in SPU_007296\n
SPU_007296	SPU_007296	none	Partial sequences. See anotation of SPU_007007 for the MCM8 full-length coding sequence\n
SPU_023032	SPU_023032	none	The scaffold assembly should be revised. \nAnother GLEAN encodes the CDC45 sequence: SPU_024816, in that case also exons are missing or artefactually assembled. \n
SPU_008547	SPU_008547	From the BLAST results and the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. The first portion of the sequence (v2.1_scaffold37344) has an orderly and continuous arrangement until base pair 2838 is reached. The second scaffold (v2.1_scaffold49974) is unique in that it continues that rest of the sequence until the end, covering the last 1390 base pairs continuously (from 2837-4227). There was no Est. support from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with all of the values being less than 6.5.	#\nDomains: DEATH, NACHT, LRRs.\n
SPU_002372	SPU_002372	After reviewing the data and performing a BLAST search, it appears that there is no sufficient fit for this particular GLEAN model. There are numerous repeats and gaps within the sequence. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with half of the values being less than 10 and the other half greater than 10.	This gene model is likely a fusion of two models. The Fgenesh model has DEATH domain and NACHT.\n
SPU_013504	SPU_013504	none	Domains: DEATH, NACHT, LRRs.\n
SPU_017341	SPU_017341	none	Domains: DEATH, NACHT, LRRs.\n
SPU_028457	SPU_028457	none	This is the end of ITSN2 gene.  The rest of the gene (5'end) is in GLEAN 03961.\n
SPU_026119	SPU_026119	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix D.3 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees.\n
SPU_027405	SPU_027405	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix D.4 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees. \n \nNote that this model is likely incomplete. Based on the position of this model close to an end of a scaffold, and the structure of other closely related models and Sp-MACPF genes in general, it is likely that there is missing N-ter sequence (including a signal peptide). \n \nThe embryonic expression and structure of this model are partly supported by the tiling array data.\n
SPU_002923	SPU_002923	none	Very similar to SPU_002921; looks like local duplication.\n
SPU_021313	SPU_021313	none	The lim domains are in SPU_004021, located in scaffold5\n
SPU_022817	SPU_022817	none	duplicate of SPU_025302.  See that model for data.\n
SPU_005135	SPU_005135	From the BLAST results and the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. The sequence doesn't begin until the 76th base pair on scaffold v2.1_scaffold75030. When reviewing the BLAST results, it is apparent that there are numerous internal repeats within both scaffolds and a gap within the second scaffold that ranges from 625-696. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with most of the values being less than 5 (excluding one outlier at 41.)	single SEA domain - characteristic of mucins\n
SPU_020148	SPU_020148	none	sequence identical to SPU_008768\n
SPU_001134	SPU_001134	none	Exact same sequence that SPU_005105\n
SPU_027384	SPU_027384	none	small part of SPU_000964\n
SPU_013465	SPU_013465	none	Domains: DEATH, NACHT, LRRs.\n
SPU_005026	SPU_005026	For this particular GLEAN model an error message was received when doing a BLAST search. When reviewing the excel data, it was evident that the sequence was very long with numerous repeats present.	Domains: DEATH, DED, NACHT,LRRs.\n
SPU_009161	SPU_009161	For this particular GLEAN model there were no CDS found in both the SpBase search engine as well as the Baylor annotations search engine. There were also no gene features available on either site as well. However, the mRNA sequence was available from SpBase.\nAdditional information found: This gene model is combined with SPU_009160. Please refer to this model for details.	This gene model is combined with SPU_009160. Please refer to this model for details.\n
SPU_028595	SPU_028595	none	Domains: Signal peptide, DEATH, NACHT, LRRs. \n
SPU_028433	SPU_028433	none	Domains: Signal peptide, DEATH, NACHT, LRRs.\n
SPU_004456	SPU_004456	none	single armadillo repeat - clusters with importins \n
SPU_022441	SPU_022441	none	Domains: DEATH, NACHT, LRRs.\n
SPU_012713	SPU_012713	none	Domains: DEATH, NACHT, LRRs.\n
SPU_026304	SPU_026304	none	Domains: NACHT, DEATH, LRRs. \nThe presence of a DEATH domain C-terminal to the NACHT domain is unexpected since this domain structure is not observed in the vertebrate NLRs, where PYD and CARD domains are N-terminal to the NACHT domain. \n \nThe Fgenesh model (S.P_Scaffold1348.seq.N000001:  2,207-2,237) located 5' of this Glean model codes for a DEATH domain that likely belongs to this gene.\n
SPU_010039	SPU_010039	For this particular GLEAN model it appears that the sequence is distributed onto 2 different scaffolds. The first portion of the sequence appears to have several small gaps throughout the sequence that only span about 10-15 base pairs apart. The second scaffold has an orderly and continuous distribution without any gaps or repeats present. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with most of the values being less than 5.	Domains: NACHT, LRRs, CLECT, CLECT. \nThis gene model is likely a fusion of two models, as the Fgenesh model only has the first 4 exons and doesn't code for the CLECT domain, which are not normally found in this type of protein. Also, the intron separating the NLR domains from the CLECT domains is very large (~150kb) which is very unusual. \nThis gene model is at the beginning of a scaffold and could be incomplete, missing the DEATH domain(s).\n
SPU_005419	SPU_005419	From the BLAST results it is evident that the sequence is distributed onto two different scaffolds. When examining the excel data, it is apparent that there are no internal repeats or gaps present within either of the two scaffolds and if they were to be combined the sequence would have an orderly continuous arrangement. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with all of the values being greater than 5 and several values that were greater than 10. 	Kinase domain + some 3' UTR partially cloned out of egg cDNA \nThe model is probably missing the 5' most sequence based on comparisons to A.miniata SFK3 \n \n Gleen for CDS and Peptide sequences Accepted, however further verification should be done for sequence that is not within the cloned mRNA sequence.\n
SPU_010116	SPU_010116	none	Could contain a repetitive non-coding sequence in the second predicted exon, without this sequence the predicted ORF is very similar in length and homology to human Dystroglycan\n
SPU_007092	SPU_007092	none	This is the S.purpuratus version of gi|1817526|dbj|BAA09934.1| intermediate chain 1 [Anthocidaris crassispina].\n
SPU_019893	SPU_019893	none	Structure: CARD-DEATH. \nThe CARD domain is a poor hit in a SMART analysis, but does appear in the 2nd table.\n
SPU_006410	SPU_006410	none	Structure: CARD-DEATH-DEATH. \nThe CARD domain is a poor hit in a SMART analysis but does appear in the 2nd table.\n
SPU_019506	SPU_019506	none	This GLEAN represents the sea urchin Dynein Intermediate Chain 2, as defined by the Anthocidaris crassispina cDNA (gi|2494214|sp|Q16959|DYI2_ANTCR Dynein intermediate chain 2, ciliary).  SPU_019506 originally represented only the first 477 amino acid residues, while SPU_005973 represented the last 437 residues.  These were merged into SPU_019506.\n
SPU_006354	SPU_006354	none	The prototype of this gene product is from the sea urchin Anthocidaris crassispina (gi|2760163|dbj|BAA24185.1| outer arm dynein light chain 1 [Anthocidaris crassispina]).\n
SPU_024517	SPU_024517	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. On the first portion of the sequence (v2.1_scaffold85334) the sequence is continuous until 480 and doesn't continue until 717. This region is filled in by v2.1_scaffold6747 . If the two scaffolds were combined, the sequence would have a continuous orderly arrangement. There was Est support available and from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5.	putative homolog of xrcc4, 3'end / C-terminus is missing in gene model\n
SPU_028717	SPU_028717	none	SPU_028716 is the front part of SPU_028717.  Reported as separate genes but should be the same gene.  I've attached the SPU_028716 sequences in front of the SPU_028717 sequences.  \n
SPU_018291	SPU_018291	From the BLAST results and the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. Both scaffolds contain an orderly arrangement without any gaps or repeats present. There was Est support available from GBrowse assembly and the transcriptome intensity scores appeared to be weak with most of the values being less than 5. 	This glean contains a RhoGEF domain but also an F-Box domain????  Either a novel GEF or the F-box is not part of this gene.\n
SPU_019025	SPU_019025	none	Appears to be an additional exon of SPU_019024\n
SPU_010897	SPU_010897	none	This is a duplicaiton of the SH3 domain of GLEAN 19856\n
SPU_028171	SPU_028171	none	previously cloned gene\n
SPU_019134	SPU_019134	From the BLAST results as well as the excel data, it appears that the sequence is distributed onto 2 different scaffolds. When reviewing the excel data it is evident that both scaffolds do not have any internal repeats or gaps present and if both scaffolds were to be combined, the entire sequence would have an orderly and continuous arrangement. There was no Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5. 	 extra 400 amino acids on N-terminus\n
SPU_010536	SPU_010536	none	This gene model clusters with the RIG-I family of helicases in phylogenetic tree based on a multiple sequence alignment of the DEXDc domains.  \n
SPU_015823	SPU_015823	none	 partial, tiny fragment\n
SPU_001360	SPU_001360	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. When reviewing the excel data, it is apparent that there are several internal repeats present specifically in scaffold v2.1_scaffold80884. Besides the internal repeats, both scaffolds contain an orderly arrangement without any gaps present. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being greater than 10.	May represent an duplication or paralog since SPU_014392 also matches a significant part of the published katanin B sequence.\n
SPU_026236	SPU_026236	none	This prediction does not include the N-terminal sequences of the protein which are present in SPU_026235.  26235 and 26236 should be combined.\n
SPU_007435	SPU_007435	none	The automated GLEAN prediction contained an exon near the 5'-end of this gene that threw the alignment of the sequence out of line with all other alpha tubulins, so it was deleted.\n
SPU_021670	SPU_021670	none	This GLEAN has disappeared from the curated GLEAN set in GenBoree (043006).\n
SPU_020749	SPU_020749	none	Annotated by BLAST only. \nBest human hit is NP_699160\n
SPU_028158	SPU_028158	none	Annotation by BLAST only \npartial gene missing N terminus and exon in the middle\n
SPU_013551	SPU_013551	none	Glean model had extra C terminal exon \nMay be missing small real C terminal\n
SPU_012976	SPU_012976	none	possibly missing 1-2 exons in middle and 1 exon at N terminus \ntrees with Ciona intestinalis predicted NATs\n
SPU_014392	SPU_014392	none	Partial sequence duplication or homolog to SPU_001360.\n
SPU_004070	SPU_004070	none	fragment\n
SPU_009093	SPU_009093	For this particular GLEAN model, there was no Cds information available from either Baylor annotations or SpBase. There was however mRNA information available from SpBase. When reviewing the excel data, it is apparent that there are several repeats within the scaffolds as well as gaps. There was no Est support from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with all of the values bring greater than 10. \nAdditional gene information from Baylor annotations (comments): very similar to 0014185.	very similar to 0014185\n
SPU_006011	SPU_006011	none	potential novel kinase family member\n
SPU_015483	SPU_015483	none	polymeric globin gene similar to shrimp polymeric globin\n
SPU_004962	SPU_004962	For this particular GLEAN model there was no Cds information available from SpBase or Baylor annotations. There was however mRNA information available from SpBase. When examining the excel data, it appears that the sequence may be distributed onto three different scaffolds. The first part of the sequence appears to be on subject gb|DS001520|, the second part on AAGJ02167917, and the remaining part on gb|DS012062|. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong (greater than 5) and distributed into two distinct clusters of values. 	protein sequence modified as per HMM model; this is consistent with tiling data\n
SPU_014812	SPU_014812	none	Gene sequence modified as per HMM model\n
SPU_025188	SPU_025188	none	sequence modified as per HMM model\n
HIPKa	SPU_030163	none	HMM prediction \n
SPU_015281	SPU_015281	none	protein sequence modified as per HMM model\n
SPU_003043	SPU_003043	none	 fragment\n
SPU_010028	SPU_010028	none	 fragment\n
SPU_010065	SPU_010065	none	 fragment\n
SPU_010066	SPU_010066	none	 fragment\n
SPU_016785	SPU_016785	none	Homology is no very strong (7e-11) \nBest genbank hit is form Danio rerio  \nname is temporary waiting for a better one\n
SPU_009993	SPU_009993	none	SPU_009993 is a duplicate prediction for SPU_000127.\n
SPU_009955	SPU_009955	none	SPU_021779 is a partial duplicate prediction for SPU_009955.\n
Sp-Il17-p1	SPU_030189	none	#\nThis gene model was annotated based on FgeneshAB and ++. It is partial and may link to Sp-Il17-12.\n
SPU_005886	SPU_005886	none	Missing first exon?\n
SPU_005571	SPU_005571	none	Incomplete gene model.\n
SPU_020648	SPU_020648	none	Note that SPU_020648 probably has the C-terminal exon of this gene.\n
SPU_006605	SPU_006605	none	SPU_006605 has first half and SPU_008394 has the rest.\n
SPU_010936	SPU_010936	none	SPU_005825 is a partial duplicate prediction.\n
SPU_018977	SPU_018977	From the BLAST results and the excel data it appears that the sequence is distributed onto 2 different scaffolds. The beginning portion of the sequence is distributed onto v2.1_scaffold64444 and part of the remaining sequence is distributed onto v2.1_scaffold23041. The sequence ends at 579 but the second scaffold is truncated at 455. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with most of the scores ranging less than 5. 	SPU_018977 has part I and SPU_021649 appears to have the rest of the gene.\n
SPU_006672	SPU_006672	none	Missing ~150 AA at end.\n
SPU_018163	SPU_018163	none	SPU_018163 and SPU_016841 are partially duplicate predictions.\n
SPU_003593	SPU_003593	After examining the data and performing a BLAST search, it appears that the sequence is distributed onto 2 scaffolds. The sequence doesn't begin until the 6th base pair, but from there, if the two scaffolds were combined the sequence would have an orderly continuous arrangement, without any repeats or gaps present. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be	SPU_003593 has the first half of FBXL10 gene. SPU_0025153 MAY encode the other half. SPU_002864 is a partial duplicate prediction for SPU_003593.\n
SPU_004667	SPU_004667	none	SPU_004667 is a partial duplicate prediction for SPU_024920.\n
SPU_000487	SPU_000487	none	SPU_000487 has the first half of the gene. SPU_027232 MAY have the rest.\n
SPU_009085	SPU_009085	none	Likely isoform of DHPS.\n
SPU_017547	SPU_017547	none	Incomplete gene model.\n
SPU_013813	SPU_013813	none	SPU_013812 encodes Part I of XRN1 and SPU_013813 probably encodes the rest (though may be partially incorrect).\n
SPU_021007	SPU_021007	none	SPU_021007 may be missing a few AA at the end. SPU_024819 is a partial duplicate prediction for SPU_021007. SPU_000416 may be as well.\n
SPU_024819	SPU_024819	none	SPU_021007 may be missing a few AA at the end. SPU_024819 is a partial duplicate prediction for SPU_021007. SPU_000416 may be as well.\n
SPU_004023	SPU_004023	none	SPU_004023 is a partial duplicate prediction for SPU_011694.\n
SPU_010748	SPU_010748	After reviewing the data and performing a BLAST search it appears that the sequence is dispersed onto 2 different scaffolds. The sequence appears to have an orderly arrangement if the two scaffolds were combined. The BLAST results displayed an additional stretch of sequence from 1-29 that was not available in the Excel spreadsheet. The additional 1-29 bases were continuous with the data provided in the excel spreadsheet. Some EST information was available and the transcriptome intensity score were strong ranging from 30-60. This is an un-annotated gene so no additional comments were available.	SPU_006645 and SPU_010748 are partial duplicate predictions for SPU_004785.\n
SPU_013160	SPU_013160	After reviewing the data and performing a BLAST search it appears that the sequence is distributed onto 2 different scaffolds. There was no Est information available on GBrowse assembly V0.5 and the transcriptome intensity scores appeared to widely dispersed with several values ranging from about 5-50. Additional information found from Baylor gene information: This GLEAN contains N-terminal region of the full-length Mdn1 gene, corresponding to amino acids 1 to ~1532 of human midasin. The mid rgion of the Sp-MDN1 gene is in SPU_009702 and the C-terminal region is SPU_022614. The MDN1 gene contains a conserved MIDAS domain COG5271.2 and a hexomeric AAA ATPase domain distantly related to that of dynein	This GLEAN contains N-terminal region of the full-length Mdn1 gene, corresponding to amino acids 1 to ~1532 of human midasin. The mid rgion of the Sp-MDN1 gene is in SPU_009702 and the C-terminal region is SPU_022614. The MDN1 gene contains a conserved MIDAS domain COG5271.2 and a hexomeric AAA ATPase domain distantly related to that of dynein\n
SPU_008188	SPU_008188	For this particular GLEAN model, there was no Cds information from both Baylor annotations and SpBase. There was however mRNA information available from SpBase. When examining the excel data, it appears that the sequence is distributed onto 2 different scaffolds. Subject AAGJ02141483 and AAGJ02094195 appear to have the most orderly arrangement however, without the BLAST results, it is difficult to determine whether or not these are the best fits for this GLEAN model. On subject AAGJ02094195, the sequence doesn't begin until about 400, resulting in a large gap within the sequence. There was no Est. information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with all of the values being less than 10.	This Glean forms part of the annotated full length Sp-DNAH14 gene (SPU_030233)\n
SPU_004548	SPU_004548	none	Missing an exon in middle.\n
SPU_015720	SPU_015720	none	Likely has an extra exon. \nSPU_002618 and SPU_022235 are partial duplicate predictions for SPU_015720.\n
SPU_015087	SPU_015087	none	SPU_015087 has the second half of TSR1 gene. SPU_019664 likely codes for the first half.\n
SPU_027482	SPU_027482	none	Longer than required prediction.\n
SPU_011380	SPU_011380	none	Inspection of the tiling array suggests that glean may have missed the following exons: FLDEGQVLLRLLLFVLDNPLIKVLIKQTFELLIYVCLGLYSVRMNCMLPGLILSCSIFYPAERAVRLFLTSTVSWPLEYYFLLTDQHAYLHLRSVAGCLEGMAISRGLLLLVCHLLTLVFALEHILQEGLCCSVHENQ\n
SPU_014683	SPU_014683	none	Inspection of the tiling array suggests that glean may have missed the following exons: SPVISDAQKFQCSHCEGLFSSAKLILRHIRCEHSDGEPCEMMPALAWKRKGKKKGREKSVAIKFKFNHPINHPIVRKRKNSEEEEECDFRCGTCVKSFPSLGRLKEHELFHEMMHGDKPYECSECNQRYTAQSSLNRHEREVHGFLDDYKPRSRPKRLKAHVPKKPLHCRYCGQGYKSRGALANHERRIHGSRHPIREPDLPNDEPKDMLGRPSDYYQRPFKCRFCPKRYVSWTTVEQHEKEVHTREGTFKCSHCPKVCASESRLKEHLVVHKYMHMHRCTLCPRSFASESALNNHQGEHTGLKPFKCEICSRGFRTRKLTLKHKQRMHQERPKRYICSICNKGFAEKCNLKVHERRHKGIRQFVCLECGKGFTARFSLTAHMQAMHIKERPFACEICGKSFALNHHYNHHMAKHRLDGDDSIPQ,RRMYRKSHFTVVTVAKGTNHAVHSRTTRGESMALGTRFGNRTYRTMSPRICLVDPLITTSDPSSADFVQRDTFPGQRLNNTRRRSTREKALSSAVIVPRFAPVRAV,SVKEIQTIKQREQCSSSSHQASASSSSSDTSNPTPNTSKDESQLLAALNLKKTKSIQDLPQNLLFRATPEGKVDGVVAKERIEKGVEFGPYAGTLLDEEQGWTRDTTWEVRRAVFHKTVF,FPLDSAHGVSNAGIIHQARQQLPVHLRGHGVKSSTLSANYAPPITTHEPIRERNDLPITTHESVSSIIQPLTTPESGAKSNVPRPQGTVCNFCLVGFC,TRCPGVRDRTLVVTQRSVNKLFVARNCHSFTLKKPRQFILWQFTAPYLNKPLLSLSLSLSPSLYATSLNLENELSSASTDSNLTLYH\n
SPU_014687	SPU_014687	none	Inspection of the tiling array suggests that glean may have missed the following exons: QCRFPILKPLYHEIQIAVWMPQGAVQGMVSEADPLRTQADAVRLCGMVGPWGRVQHVAGVVAVVVAKVVDVVGFPVDLFQCSRNQVDNYLPLLIYYEGIKEN,VLNVAVHTLVKFEILESCMNSNILPVYLAVEHFQFTSWFNDLSASEVDIAVILICFHVLFRFTPCASQMRHPVFTSVNVEENPTWITDL\n
SPU_017850	SPU_017850	none	Inspection of the tiling array suggests that glean may have missed the following exons: ESPVNPIELWKKGPIVARSVVKTDEDVLPVRVFNPTREQRNIKAGTAIASLSAVEEVGTEMCNQTMPTETSASRNMPPDKQGNKDVDARRTTTHFFQCDSCTKSYEKKDSLQRHKREVHILKYRCDQCDYRTGRKTEMERHQGAHEVAVVPVRKHSLTPVSSPRCKPTSTPITGTSEKADASWRHVRMDWRSPPRDDPRLDRPRGRSPRRYRSSRRKSKSPQRDLTKSSCQKTPESPQHDGSEAS,WCQKWDPVTMAQGRNRQNRPTPGEMMKHLERSMTEVDQMMEQLRASMSQVPVADPEPGRQEAFPPTLSGQTASGPTADSQMPAAQRVRFSSTPREQSRRLPSAGTGVHITPPLFSPP\n
SPU_022024	SPU_022024	none	Inspection of the tiling array suggests that glean may have missed the following exons: YFSTMTIRHPLIHFAIQTYKYSSFKRQYWKTNGGEVCREDQRQRRQDNEGWPKRKRFISTSGSGVKRKLEGACQKEVYIHICQ,LCSQLSWRVKHLKHISVWTLYEGASIKDSDRRLRGAVDTHYIITGPEQTIRQSASKLLVSSAFKNELSNFILKEWGKEHYWNIYSGRTRFASYGGG\n
SPU_015843	SPU_015843	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto two different scaffolds. It is also apparent that there is an overlap between the two scaffolds that occur from 1353-2346. When examining the excel data, it is clear that there are no internal repeats or gaps present within the two scaffolds. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being greater than 10. 	SPU_002037 is a partial duplicate prediction for SPU_015843.\n
SPU_007083	SPU_007083	none	Motor domain\n
SPU_003746	SPU_003746	none	motor domain \n
SPU_025767	SPU_025767	After reviewing the data and performing a BLAST search it appears that the sequence is distributed onto 2 different scaffolds. When examining the excel data it appears that the both scaffolds have no gaps or internal repeats present. However if the data from both scaffolds were to be combined, there would be a sequence overlap between the 2 scaffolds from 1210-1951. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5.	e val for Q02224=e-3e-38; CENPE_HUMAN [Homo sapiens].   \ne val for AAI10990=2e-30; KIF19 protein [Homo sapiens]. \nKinesin-7 family member.   \nSee also SPU_017809 and SPU_023126 which also hit Q02224. \nCENPE_HUMAN data obtained from UniProtKB/Swiss-Prot entry Q02224.    \nAnnotation by RA Obar, RL Morris, SA Tower, and AP Rawson.  \n
SPU_016553	SPU_016553	none	SPU_016553 and SPU_018172 both overlapping but non-complete models for UTP14A.\n
SPU_018172	SPU_018172	none	SPU_016553 and SPU_018172 both overlapping but non-complete models for UTP14A.\n
SPU_017897	SPU_017897	none	SPU_017897 is a partial duplicate prediction for SPU_001512.\n
SPU_027769	SPU_027769	From the BLAST results as well as the excel data, it appears that this is the best fit for this particular GLEAN model. When examining the excel data, it is clear that the scaffold contains an orderly arrangement without any gaps or repeats present. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5.	SPU_027769 is a partial duplicate prediction for SPU_002606.\n
SPU_013647	SPU_013647	From the BLAST results as well as the excel data it was evident that the sequence is distributed onto 2 different scaffolds. The sequence didn't begin until the 17th base pair and had an orderly arrangement until the end of the scaffold. However, there were 2 internal repeats present within the scaffold on 569-666 and 665-825. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5. 	SPU_013648 is a duplicate prediction for SPU_013647.\n
SPU_003197	SPU_003197	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. The first portion of the sequence (v2.1_scaffold29830) has a continuous orderly arrangement without any repeats or gaps present. It also has the lowest e-value and highest bit score results from the BLAST search. The rest of the sequence is distributed onto v2.1_scaffold50215, which has an orderly continuous arrangement as well but had a lower bit score and higher e-value when compared to the rest of the results. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appear to be somewhat strong with most of the scores being greater than 10.	Incorrect gene model. Likely a hybrid prediction.\n
SPU_013648	SPU_013648	none	SPU_013648 is a duplicate prediction for SPU_013647.\n
SPU_027976	SPU_027976	none	Likely missing first exon.\n
SPU_013577	SPU_013577	none	Extra ~40AA (exon).\n
SPU_011201	SPU_011201	After reviewing the data it appears that there is no sufficient GLEAN model that fits model_SPU_011201. The excel data had numerous repeats within the 1-228 region and this may be the reason there is no good fit onto V2.1.  The sequence is distributed onto 2 different scaffolds. If the 2 scaffolds were combined the sequence would be continuous with no gaps or repeats. There was Est. information available on GBrowse assembly V0.5 and the transcriptome intensity scores appear to be unusually distributed with the majority of scores on the weak side and one large outlier (225).This is an un-annotation gene so no additional comments were available. 	PREDICTED: hypothetical protein [Strongylocentrotus purpuratus],PREDICTED: similar to golgi-specific brefeldin A-resistance guanine nucleotide exchange factor 1 [Strongylocentrotus purpuratus] \n
SPU_000994	SPU_000994	Due to the bits scores, the E-value, and the low sequence coverage it appears that there is no good model fit for model. There was also no Est information available from GBrowse assembly V0.5 and it was somewhat unclear whether or not the transcriptome intensity scores were strong or moderate due to several values being <10 and one large value of about 40. 	PREDICTED: similar to alpha macroglobulin [Strongylocentrotus purpuratus].\n
SPU_008950	SPU_008950	none	PREDICTED: similar to opsin [Strongylocentrotus purpuratus]\n
SPU_000031	SPU_000031	none	SP, lec, PAN? kringle, lec\n
SPU_002718	SPU_002718	From the BLAST results as well as the excel data, it appears that the sequence is distributed onto two different scaffolds. When examining the excel data, it is apparent that there are several base pair duplicates and sequence overlaps within both scaffolds. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak (>5) excluding one outlier value at about 37. 	lec, CCP(Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)), TM\n
SPU_006649	SPU_006649	none	SP,  CCP, CCP, lec \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)\n
SPU_015726	SPU_015726	After reviewing the data in the excel file and performing a BLAST search on model_SPU_015726, it appears that the best matches found were on two different scaffolds. Scaffold >v2.1_scaffold42451 has no visible repeats and covers about 500 base pairs of 753. Scaffold v2.1_scaffold73364 only covered 209 base pairs. model_SPU_015726 is an un-annotated model and it was unclear what the complete scaffold coverage was. There was no EST information available from the GBrowse V0.5 assembly and the transcriptome intensity scores were weak (less than 10) as well. 	lec\n
SPU_012837	SPU_012837	none	TM, lec\n
SPU_018940	SPU_018940	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. v2.1_scaffold65146 has a low bit score as well as a high e-value when comparing it to the other scaffold results. When examining the excel data as well as the BLAST results, it is apparent that there are numerous repeats as well as gaps within the scaffolds resulting in poor coverage. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appear to be weak overall, with most of the scores being less than 5 (excluding 2 outliers at about 35 and 60). 	Clec X4, TM\n
SPU_016771	SPU_016771	none	#\nlec \n \n
SPU_028846	SPU_028846	none	lec, CCP (Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)), TM, cyt\n
SPU_014450	SPU_014450	none	CCP, lec, CCP \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)\n
SPU_014951	SPU_014951	After reviewing the data and performing a BLAST search it appears that the sequence is distributed onto 2 different scaffolds. However, the 2nd scaffold had a high e-value and a low bit score when compared the other BLAST results (>v2.1_scaffold76004 was about the 31st alignment down the list). There was no Est information available from GBRowse assembly V0.5 and the transcriptome intensity scores appeared to be weak and widely dispersed with all of values less than 5 except for an outlier at about 20. This is an un-annotated gene so no additional gene information was available from Baylor annotations (comments). 	an SH2 domain pulled up from the Anger group mRNA expression database associated with SFK1...unclear if it is really expressed.  First ~400aa of sequence has homology to "solute carrier family 15", followed by a full SH2 domain and then a partial TyK domain.  Probably not a SFK. \n
SPU_014285	SPU_014285	none	This Glean was picked up from one dimensional gel            electrophoresis and mass spectrometry.  It appears to be missing ~300nt at the 5' end and ~200 nt at the 3' end compared to the best human hit.\n
SPU_028871	SPU_028871	none	This gene is most likely an allel of SPU_007497\n
SPU_010275	SPU_010275	After reviewing the data and performing a BLAST search it appears that the sequence is distributed onto 2 different scaffolds. However, the sequence from the BLAST results does not coincide with the sequence from the excel data. The BLAST results indicate that the sequence ends at about 4500, but the excel data is truncated at 1147. 	This is the N termainal region of PLCg.  The full annotation can be found on scaffold 53431 and SPU_027462.\nFragment, extra stretch in middle.
GLENA3_03774	SPU_030161	none	small C-type lectin.
NGFFFamide-precursor	SPU_030074	(by Maurice Elphick) This gene encodes a putative neuropeptide precursor. It was identified by BLAST analysis of the sea urchin genomic sequence data using the sea cucumber neuropeptide NGIWYamide as the query. The precursor appears to encode two copies of a NGIWYamide-like peptide, which has the sequence NGFFFamide.\nThe gene was originally predicted by gnomon to comprise two exons, with the first exon encoding a puative N-terminal signal peptide (supported by SignalP3.0 analysis of the protein sequence) and the second exon encoding two copies of NGFFFamide in tandem, separated and bounded by dibasic cleavage sites.\nSubsequent analysis of EST data from a radial nerve library (EC439145; GI: 109403168 and EC438106; GI: 109402129) has revealed that this gene comprises 4 exons and enocodes a 266 residue protein comprising:\n1. 26 amino acid residue signal peptide\n2.  a 114 amino acid residue sequence\n3. a 18 residue sequence (KRNGFFFGKRNGFFFGKR), comprising two copies of the putative NGFFFamide neuropeptide separated and flanked by potential dibasic cleavage sites (KR). \n4. a 108 amino acid residue sequence that shares a high level of sequence identity with neurophysins, proteins that form the C-terminal region of precursors for the neuropeptide hormones vasopressin, oxytocin and vasotocin.	This gene encodes a putative neuropeptide precursor. It was identified by BLAST analysis of the sea urchin genomic sequence data using the sea cucumber neuropeptide NGIWYamide as the query. The precursor appears to encode two copies of a NGIWYamide-like peptide, which has the sequence NGFFFamide. \nThe gene comprises two exons, as predicted by gnomon; this is supported by the tiling data, which shows signals corresponding exactly with the two predicted protein coding exons. \nThe first exon encodes a puative N-terminal signal peptide, supported by SignalP3.0 analysis of the protein sequence. \nThe second exon encodes two copies of NGFFFamide in tandem, separated and bounded by dibasic cleavage sites. \n
SPU_004779	SPU_004779	Strongylocentrotus purpuratus-specific protein	none
SPU_004915	SPU_004915	Strongylocentrotus purpuratus-specific protein	none
SPU_004933	SPU_004933	Strongylocentrotus purpuratus-specific protein	none
SPU_013938	SPU_013938	many similar protease genes in Strongylocentrotus purpuratus	none
SPU_000625	SPU_000625	hypothetical protein. similar to SPU_000560.	none
SPU_001273	SPU_001273	also homologous to SC5AC, sodium-coupled monocarboxylate transporter 1, or sodium solute transporter Vito-a, electrogenic sodium monocarboxylate cotransporter	none
SPU_003973	SPU_003973	contains 2 Ion_trans_2 superfamily motifs and a FRQ1 domain	none
SPU_004607	SPU_004607	strongly homologous to bacterial as well as mammalian agmatinases	none
SPU_004740	SPU_004740	homologous to numerous transport proteins in Drosophila, and in bacteria	none
SPU_005125	SPU_005125	homologous to numerous unnamed, hypothetical proteins in bacteria	none
SPU_005554	SPU_005554	contains S-methyl_trans domain	none
SPU_005630	SPU_005630	also homologous to numerous known or putative proteins	none
SPU_006382	SPU_006382	contains 2 SDF superfamily motifs	none
SPU_007000	SPU_007000	contains 2 SH2 superfamily motifs	none
SPU_012361	SPU_012361	contains BTB superfamily domain near N-terminus	none
SPU_012980	SPU_012980	contains 2 ANK superfamily motifs	none
SPU_013999	SPU_013999	contains 4 ANK superfamily motifs	none
SPU_015716	SPU_015716	contains MFS_1 domain	none
SPU_018782	SPU_018782	contains 2 PBPb superfamily motifs	none
SPU_019256	SPU_019256	contains 2 Bromodomain superfamily motifs	none
SPU_019590	SPU_019590	contains ABC_transp_aux domain	none
SPU_020875	SPU_020875	contains 4 C2 superfamily motifs	none
SPU_020900	SPU_020900	contains RhaT domain	none
SPU_022457	SPU_022457	contains Prominin domain	none
SPU_022860	SPU_022860	contains 5 ANK superfamily motifs	none
SPU_028684	SPU_028684	contains LPD_N domain	none
SPU_028629	SPU_028629	contains 3 AdoHcyase superfamily motifs and D1tE domain	none
SPU_028353	SPU_028353	contains Smc domain and Qor domain	none
SPU_027791	SPU_027791	also homologous to bacterial glutamate formiminotransferase	none
SPU_025787	SPU_025787	contains 3 ARM superfamily motifs	none
SPU_025713	SPU_025713	also homologous to miscellaneous proteins	none
SPU_025654	SPU_025654	contains ARM superfamily motif near N-terminus	none
SPU_025623	SPU_025623	contains 2-Hacid domain	none
SPU_025571	SPU_025571	contains 6 Kelch_1 superfamily motifs	none
SPU_025453	SPU_025453	also homologous to mammalian G protein-coupled receptors 85 and 173	none
SPU_025435	SPU_025435	contains RPN2 domain	none
SPU_025393	SPU_025393	contains SSL2 domain	none
SPU_025306	SPU_025306	contains 2 CXC superfamily motifs	none
SPU_025234	SPU_025234	contains 2 RRM superfamily motifs and HEC1 domain	none
SPU_025141	SPU_025141	contains 2 PA superfamily motifs	none
SPU_025110	SPU_025110	contains DUF1740 domain	none
SPU_025107	SPU_025107	contains Smc domain	none
SPU_025090	SPU_025090	contains Rph domain	none
SPU_025089	SPU_025089	contains 2 Kelch_1 superfamily motifs and Ehrlichia_rpt domain and COG3055 domain	none
SPU_024997	SPU_024997	contains Herpes_LMP2 domain	none
SPU_024893	SPU_024893	contains Adaptin_N domain	none
SPU_024875	SPU_024875	contains 2 IBR superfamily motifs	none
SPU_024814	SPU_024814	contains Sec63 superfamily motif at N-terminus	none
SPU_024802	SPU_024802	contains 2 MTH_psq superfamily motifs	none
SPU_024741	SPU_024741	contains 4 CCP superfamily motifs	none
SPU_024667	SPU_024667	contains DnaJ superfamily motif at N-terminus	none
SPU_024620	SPU_024620	contains 6 HS1_rep superfamily motifs, and SH3 superfaily motif at C-terminus	none
SPU_024524	SPU_024524	contains Smc domain	none
SPU_024490	SPU_024490	contains 4 MAM superfamily motifs and 2 LDLa superfamily motifs	none
SPU_024401	SPU_024401	contains 2 ANK superfamily motifs near C-terminus	none
SPU_024399	SPU_024399	contains hATC superfamily motif at C-terminus	none
SPU_024376	SPU_024376	also homologous to mammalian nucleolar protein 4 (NOL4) (Mus musculus)	none
SPU_024172	SPU_024172	contains 5 ANK superfamily motifs and Arp domain	none
SPU_024142	SPU_024142	contains 3 ANK superfamily motifs	none
SPU_024130	SPU_024130	contains 2 UPF0016 superfamily motifs	none
SPU_024059	SPU_024059	contains Smc domain	none
SPU_024049	SPU_024049	contains 2 Ion_trans_2 superfamily motifs	none
SPU_023992	SPU_023992	contains 5 RCC1 superfamily motifs and ATS1 domain	none
SPU_023958	SPU_023958	contains COG5594 domain	none
SPU_023947	SPU_023947	contains 2 TPR superfamily motifs	none
SPU_023881	SPU_023881	contains SRP40_C superfamily motif at C-terminus	none
SPU_023766	SPU_023766	contains HCR domain	none
SPU_023765	SPU_023765	contains COG5273 domain	none
SPU_023740	SPU_023740	contains THAP superfamily motif at N-terminus. homologous to Drosophila transposases.	none
SPU_023733	SPU_023733	contains SbcC domain	none
SPU_023659	SPU_023659	contains COG5252 domain	none
SPU_023640	SPU_023640	contains 2 JmjC superfamily motifs	none
SPU_023632	SPU_023632	contains MFS_1 domain	none
SPU_023598	SPU_023598	novel, conserved protein	none
SPU_023584	SPU_023584	contains Vitellogenin_N domain	none
SPU_023571	SPU_023571	contains 3 Thioredoxin-like superfamily motifs	none
SPU_023466	SPU_023466	contains FRQ1 domain	none
SPU_023460	SPU_023460	contains AMP-binding domain	none
SPU_000581	SPU_000581	contains AST1 domain and HECTc domain	none
SPU_001263	SPU_001263	contains COG5543 domain	none
SPU_001695	SPU_001695	contains SMC_N domain	none
SPU_002007	SPU_002007	contains Crp domain and CAP_ED superfamily motif near the N-terminus	none
SPU_003417	SPU_003417	contains COG3391 domain	none
SPU_005526	SPU_005526	contains Pecanex_C superfamily motif at C-terminus	none
SPU_005792	SPU_005792	contains 2 EGF_CA superfamily motifs	none
SPU_006665	SPU_006665	contains LacZ domain	none
SPU_006902	SPU_006902	contains 5 ANK superfamily motifs and Arp domain motifs	none
SPU_007115	SPU_007115	contains 2 ANK superfamily motifs	none
SPU_007747	SPU_007747	contains 4 Periplasmic_Binding_Protein_T1 superfamily motifs and 3 ANF_receptor domain motifs	none
SPU_008107	SPU_008107	contains UCH domain	none
SPU_008442	SPU_008442	contains COG5635 domain. member of very large S. purpuratus-specific protein family	none
SPU_010729	SPU_010729	contains 2 P_loop_NTPase superfamily motifs and 2 DYN1 domain motifs	none
SPU_013250	SPU_013250	contains 2 zf-RING superfamily motifs	none
SPU_013313	SPU_013313	contains PRK8566 domain	none
SPU_013984	SPU_013984	contains Metallo-dependent_Hydrolase superfamily motif at C-terminus. S. purpuratus-specific protein.	none
SPU_014416	SPU_014416	contains 3 BRCT superfamily motifs	none
SPU_014617	SPU_014617	contains 5 LDL_r superfamily motifs and COG3391 domain	none
SPU_015386	SPU_015386	contains Tex domain	none
SPU_016624	SPU_016624	contains 6 ANK superfamily motifs	none
SPU_016630	SPU_016630	contains 5 ANK superfamily motifs	none
SPU_016642	SPU_016642	contains DNA_pol_B domain	none
SPU_016664	SPU_016664	contains SMC_N domain	none
SPU_016735	SPU_016735	contains Tra5 domain	none
SPU_016910	SPU_016910	contains 3 ANK superfamily motifs and Arp domains	none
SPU_016927	SPU_016927	contains 14 ANK superfamily motifs and Arp domains	none
SPU_016928	SPU_016928	contains 11 ANK superfamily motifs and Arp domains	none
SPU_016982	SPU_016982	contains AIR1 domain	none
SPU_016986	SPU_016986	contains 10 ANK superfamily motifs and Arp domains	none
SPU_017043	SPU_017043	contains HOOK domain	none
SPU_017113	SPU_017113	contains 4 ANK superfamily motifs and Arp domains	none
SPU_017114	SPU_017114	contains 6 ANK superfamily motifs and Arp domains and Ion_trans domain	none
SPU_017126	SPU_017126	contains 2 zf- superfamily motifs	none
SPU_017173	SPU_017173	contains Dynein_heavy domain	none
SPU_017265	SPU_017265	contains 7 ANK superfamily motifs and 2 ZU5 superfamily motifs and Arp domain motifs	none
SPU_017283	SPU_017283	contains 9 TRP superfamily motifs	none
SPU_017303	SPU_017303	contains 8 SPEC superfamily motifs	none
SPU_017312	SPU_017312	contains 9 ANK superfamily motifs and Arp domain motifs	none
SPU_017369	SPU_017369	contains Smc domain	none
SPU_017390	SPU_017390	contains AIR1 domain	none
SPU_017413	SPU_017413	contains 2 tandem repeat of DM15 superfamily motifs and LHP1 domain	none
SPU_017532	SPU_017532	contains 2 SPEC superfamily motifs and Smc domain and UCH domain and UBP5 domain and ULP1 domain	none
SPU_017580	SPU_017580	contains 9 ANK superfamily motifs and Arp domain motifs	none
SPU_017588	SPU_017588	Sp-specific protein	none
SPU_017626	SPU_017626	contains RING superfamily motif at C-terminus	none
SPU_017648	SPU_017648	contains 2 DNA_Pol_B_2 domain motifs	none
SPU_017678	SPU_017678	contains 4 EGF_CA superfamily motifs and 2 vWA superfamily motifs	none
SPU_017713	SPU_017713	contains 12 ANK superfamily motifs and Arp domain motifs	none
SPU_017822	SPU_017822	contains 4 ANK superfamily motifs and Arp domain motifs	none
SPU_017823	SPU_017823	contains 10 ANK superfamily motifs and Arp domain motifs	none
SPU_017825	SPU_017825	contains 5 ANK superfamily motifs and Arp domain motifs	none
SPU_017841	SPU_017841	contains 2 Sfi1 superfamily motifs	none
SPU_017875	SPU_017875	contains 2 WD40 superfamily motifs	none
SPU_017879	SPU_017879	contains 11 ANK superfamily motifs and Arp domain motifs	none
SPU_017928	SPU_017928	contains 22 ANK superfamily motifs and Arp domain motifs	none
SPU_017944	SPU_017944	contains 2 Vps53_N superfamily motifs	none
SPU_017946	SPU_017946	contains MRS6 domain	none
SPU_017974	SPU_017974	contains 2 ANK superfamily motifs	none
SPU_018066	SPU_018066	contains AIR1 domain	none
SPU_018133	SPU_018133	contains SMC_N domain	none
SPU_018136	SPU_018136	contains 11 EGF_CA superfamily motifs	none
SPU_018138	SPU_018138	contains SMC_N domain. Sp-specific protein family.	none
SPU_018142	SPU_018142	contains Ion_trans domain and COG3883 domain	none
SPU_018221	SPU_018221	contains 2 KH-I superfamily motifs	none
SPU_018265	SPU_018265	contains SMC_N domain	none
SPU_018390	SPU_018390	contains VPS9 superfamily motif at C-terminus. contains MDN1 domain.	none
SPU_018428	SPU_018428	Sp-specific protein	none
SPU_018491	SPU_018491	contains DYN1 domain and Dynein_heavy domain	none
SPU_018537	SPU_018537	contains Ndh domain	none
SPU_018587	SPU_018587	contains 2 UBQ domain motifs	none
SPU_018602	SPU_018602	contains AIR1 domain	none
SPU_018650	SPU_018650	contains 10 ANK superfamily motifs and Arp domain motifs	none
SPU_018651	SPU_018651	contains 8 ANK superfamily motifs and Arp domain motifs and 2 ZU5 superfamily motifs	none
SPU_018666	SPU_018666	contains TRF4 domain and PAT1 domain	none
SPU_018668	SPU_018668	contains COG1112 domain and Keratin_B2 domain	none
SPU_018671	SPU_018671	contains 7 EGF_CA superfamily motifs and 2 Tryp_SP superfamily motifs and 4 CUB superfamily motifs and 4 LDLa superfamily motifs	none
SPU_018709	SPU_018709	contains 11 FA58C superfamily motifs and 9 EGF_CA superfamily motifs	none
SPU_018713	SPU_018713	contains 10 EGF_CA superfamily motifs and 3 KR superfamily motifs	none
SPU_018721	SPU_018721	contains 14 ANK superfamily motifs and Arp domain motifs	none
SPU_018745	SPU_018745	contains MDN1 domain	none
SPU_018754	SPU_018754	contains 9 ANK superfamily motifs and Arp domain motifs	none
SPU_018796	SPU_018796	contains Ric8 domain	none
SPU_018809	SPU_018809	contains 2 WD40 superfamily motifs and COG2319 domain	none
SPU_018821	SPU_018821	contains 2 EGF_CA superfamily motifs	none
SPU_018837	SPU_018837	contains 9 ANK superfamily motifs and Arp domain motifs	none
SPU_018842	SPU_018842	contains 2 Smc domain motifs and SMC_N domain. Sp-specific family.	none
SPU_018950	SPU_018950	contains 7 ANK superfamily motifs and Arp domain motifs	none
SPU_018958	SPU_018958	contains CH superfamily motifs at C-terminus	none
SPU_019083	SPU_019083	Sp-specific protein	none
SPU_019119	SPU_019119	contains COG4485 domain	none
SPU_019128	SPU_019128	contains 2 LamG superfamily motifs at N-terminus	none
SPU_019130	SPU_019130	contains VSP domain	none
SPU_019149	SPU_019149	contains 12 ANK superfamily motifs and Arp domain motifs	none
SPU_019156	SPU_019156	Sp-specific unique protein	none
SPU_019191	SPU_019191	Sp-specific protein	none
SPU_019222	SPU_019222	Sp-specific protein	none
SPU_019226	SPU_019226	contains PRK08581 domain. Sp-specific protein.	none
SPU_019310	SPU_019310	contains 2 Smc domain motifs and 2 SMC_N domain motifs. Sp-specific protein.	none
SPU_019336	SPU_019336	contains 3 NDPk superfamily motifs. Sp-specific protein.	none
SPU_019347	SPU_019347	contains COG5635 domain. large protein family specific to S. purpuratus and Branchiostoma floridae.	none
SPU_019371	SPU_019371	contains Hamartin domain and Smc domain	none
SPU_019380	SPU_019380	contains Smc domain	none
SPU_019461	SPU_019461	contains HECTc superfamily motif and HECTc domain at C-terminus	none
SPU_019556	SPU_019556	contains 3 P_loop_NTPase superfamily motifs	none
SPU_019564	SPU_019564	contains RING superfamily motif near N-terminus	none
SPU_019724	SPU_019724	contains Smc domain and SMC_N domain	none
SPU_019729	SPU_019729	contains 2 TPR superfamily motifs	none
SPU_019745	SPU_019745	contains C2 superfamily motif near C-terminus	none
SPU_019765	SPU_019765	contains COG4913 domain. Sp-specific protein family.	none
SPU_019820	SPU_019820	contains 10 Gal-3-0_sulfotr superfamily motifs	none
SPU_019854	SPU_019854	contains 4 HYR superfamily motifs and 7 EGF_CA superfamily motifs	none
SPU_019865	SPU_019865	contains 8 FA58C superfamily motifs and 18 EGF_CA superfamily motifs	none
SPU_019884	SPU_019884	contains DYN1 domain and Dynein_heavy domain	none
SPU_019936	SPU_019936	contains 8 ANK superfamily motifs and Arp domain motifs	none
SPU_019953	SPU_019953	contains 2 SMC_N domain motifs	none
SPU_019991	SPU_019991	contains 16 ANK superfamily motifs and Arp domain motifs	none
SPU_020044	SPU_020044	contains 4 ANK superfamily motifs and Arp domain motifs	none
SPU_020052	SPU_020052	contains 3 UBQ superfamily motifs and SMC_N domain and PRK12704 domain	none
SPU_020076	SPU_020076	contains 6 EGF_CA superfamily motifs	none
SPU_020083	SPU_020083	contains 3 PH-like superfamily motifs and 2 zf-RING superfamily motifs	none
SPU_020097	SPU_020097	contains MDN1 domain	none
SPU_020132	SPU_020132	contains COG1112 domain and Keratin_B2 domain	none
SPU_020167	SPU_020167	contains MDN1 domain	none
SPU_020183	SPU_020183	contains 2 ANK superfamily motifs and Arp domain motifs	none
SPU_020238	SPU_020238	contains Smc domain and SMC_N domain	none
SPU_020288	SPU_020288	contains KAP95 domain	none
SPU_020307	SPU_020307	contains 4 ANK superfamily motifs and Arp domain motifs	none
SPU_020315	SPU_020315	contains 3 CUB superfamily motifs and 5 LDLa superfamily motifs	none
SPU_020320	SPU_020320	contains 7 ANK superfamily motifs and Arp domain motifs	none
SPU_020394	SPU_020394	contains 4 ANK superfamily motifs and Arp domain motifs	none
SPU_020410	SPU_020410	contains PAT1 domain	none
SPU_020436	SPU_020436	contains COG5635 domain	none
SPU_020442	SPU_020442	contains Bromodomain superfamily motif at C-terminue	none
SPU_020455	SPU_020455	contains 22 ANK superfamily motifs and Arp domain motifs	none
SPU_020459	SPU_020459	contains GOG10033 domain	none
SPU_020526	SPU_020526	contains SMC_N domain and Smc domain	none
SPU_020561	SPU_020561	contains COG5635 domain	none
SPU_020605	SPU_020605	contains 2 Herpes_teg_N superfamily motifs and 4 PRK13596 domain motifs	none
SPU_020616	SPU_020616	contains 2 SAM superfamily motifs	none
SPU_020625	SPU_020625	contains 2 P_loop_NTPase superfamily motifs	none
SPU_020770	SPU_020770	contains XPG superfamily motif at N-terminus	none
SPU_020809	SPU_020809	contains 10 ANK superfamily motifs and Arp domain motifs	none
SPU_020820	SPU_020820	contains 14 ANK superfamily motifs and Arp domain motifs	none
SPU_020824	SPU_020824	contains 6 CUB superfamily motifs	none
SPU_020828	SPU_020828	contains COG1112 domain and Keratin_B2 domain	none
SPU_020909	SPU_020909	contains 3 IPT superfamily motifs	none
SPU_020944	SPU_020944	contains 7 CCP superfamily motifs and 2 HYR superfamily motifs	none
SPU_020971	SPU_020971	contains 24 HYR superfamily motifs	none
SPU_020976	SPU_020976	contains 10 ANK superfamily motifs and Arp domain motifs	none
SPU_020980	SPU_020980	contains 2 IBR superfamily motifs	none
SPU_021076	SPU_021076	contains 9 ANK superfamily motifs and Arp domain motifs	none
SPU_021079	SPU_021079	contains 8 ANK superfamily motifs and Arp domain motifs	none
SPU_021089	SPU_021089	contains 4 HYR superfamily motifs and RecD domain	none
SPU_021152	SPU_021152	contains 17 ANK superfamily motifs and Arp domain motifs	none
SPU_021222	SPU_021222	contains Smc domain	none
SPU_021266	SPU_021266	contains MFS_1 domain	none
SPU_021284	SPU_021284	contains 11 ANK superfamily motifs and Arp domain motifs	none
SPU_021318	SPU_021318	contains 11 ANK superfamily motifs and Arp domain motifs	none
SPU_021411	SPU_021411	contains HEC1 domain	none
SPU_021419	SPU_021419	contains 2 PHD superfamily motifs and MDN1 domain	none
SPU_021448	SPU_021448	contains MRS6 domain	none
SPU_021547	SPU_021547	contains 4 MBT superfamily motifs	none
SPU_021600	SPU_021600	contains 4 EGF_CA superfamily motifs	none
SPU_021622	SPU_021622	contains 10 ANK superfamily motifs and Arp domain motifs	none
SPU_021665	SPU_021665	Sp-specific protein	none
SPU_021681	SPU_021681	contains 9 ANK superfamily motifs and Arp domain motifs	none
SPU_021728	SPU_021728	contains 3 Peptidase_C19 superfamily motifs	none
SPU_021785	SPU_021785	contains 2 Esterase_lipase superfamily motifs	none
SPU_021808	SPU_021808	contains 2 ANK superfamily motifs	none
SPU_021822	SPU_021822	contains COG1112 domain	none
SPU_021954	SPU_021954	contains 6 ANK superfamily motifs and Arp domain motifs	none
SPU_021959	SPU_021959	contains 7 PDZ superfamily motifs	none
SPU_022013	SPU_022013	contains 2 Smc domain motifs	none
SPU_022017	SPU_022017	Sp-specific protein	none
SPU_022058	SPU_022058	contains Fib_alpha domain and SMC_N domain	none
SPU_022087	SPU_022087	contains 13 ANK superfamily motifs and Arp domain motifs	none
SPU_022315	SPU_022315	contains 8 EGF_CA superfamily motifs	none
SPU_022341	SPU_022341	contains 9 ANK superfamily motifs and Arp domain motifs	none
SPU_022368	SPU_022368	contains MDN1 domain	none
SPU_022388	SPU_022388	contains 17 ANK superfamily motifs and Arp domain motifs and 2 ZU5 superfamily motifs	none
SPU_022430	SPU_022430	contains DNA_pol_B2 domain	none
SPU_022443	SPU_022443	contains 9 MAM superfamily motifs and 3 LDLa superfamily motifs	none
SPU_022500	SPU_022500	contains 4 PDZ superfamily motifs	none
SPU_022514	SPU_022514	contains Pneumo_att_G domain	none
SPU_022544	SPU_022544	contains 6 SRCR superfamily motifs	none
SPU_022553	SPU_022553	contains 6 ANK superfamily motifs and Arp domain motifs and 2 ZU5 superfamily motifs	none
SPU_022559	SPU_022559	contains 2 BBOX superfamily motifs and GRASP55_65 domain	none
SPU_022582	SPU_022582	contains 6 EGF_CA superfamily motifs and 3 HYR superfamily motifs	none
SPU_022618	SPU_022618	contains 17 ANK superfamily motifs and Arp domain motifs	none
SPU_022642	SPU_022642	contains 3 ANK superfamily motifs	none
SPU_022690	SPU_022690	contains f1hF domain	none
SPU_022808	SPU_022808	contains SbcC domain	none
SPU_022825	SPU_022825	contains 5 CUB superfamily motifs	none
SPU_022905	SPU_022905	Strongylocentrotus purpuratus-specific protein	none
SPU_023060	SPU_023060	contains 22 MAM superfamily motifs and 10 LDLa superfamily motifs	none
SPU_023072	SPU_023072	contains 10 ANK superfamily motifs and Arp domain motifs	none
SPU_023087	SPU_023087	contains 11 ANK superfamily motifs and Arp domain motifs	none
SPU_023093	SPU_023093	contains MDN1 domain	none
SPU_023142	SPU_023142	contains 4 SPEC superfamily motifs and 8 IG superfamily motifs	none
SPU_023267	SPU_023267	contains 2 PHD superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_023282	SPU_023282	contains 3 CUB superfamily motifs	none
SPU_023404	SPU_023404	contains 10 ANK superfamily motifs and Arp domain motifs	none
SPU_023454	SPU_023454	contains 2 DHC_N1 domain motifs	none
SPU_023477	SPU_023477	contains 7 Ldl_re superfamily motifs	none
SPU_023545	SPU_023545	contains 5 MAM superfamily motifs	none
SPU_023570	SPU_023570	contains HRD1 domain	none
SPU_023745	SPU_023745	contains 2 EGF_CA superfamily motifs	none
SPU_023746	SPU_023746	contains COG1112 domain and SSL2 domain	none
SPU_023749	SPU_023749	contains 12 ANK superfamily motifs and Arp domain motifs	none
SPU_023796	SPU_023796	contains 9 MAM superfamily motifs	none
SPU_023921	SPU_023921	contains 10 ANK superfamily motifs and Arp domain motifs	none
SPU_023929	SPU_023929	Strongylocentrotus purpuratus-specific protein	none
SPU_023956	SPU_023956	Strongylocentrotus purpuratus-specific protein	none
SPU_023996	SPU_023996	contains 22 ANK superfamily motifs and Arp domain motifs	none
SPU_024125	SPU_024125	contains 2 rve superfamily motifs	none
SPU_024175	SPU_024175	Strongylocentrotus purpuratus-specific protein	none
SPU_024176	SPU_024176	contains Actin superfamily motif at C-terminus	none
SPU_024223	SPU_024223	contains 3 RRM superfamily motifs	none
SPU_024242	SPU_024242	contains 34 ANK superfamily motifs and Arp domain motifs	none
SPU_024281	SPU_024281	contains PRK02362 domain	none
SPU_024466	SPU_024466	Strongylocentrotus purpuratus-specific protein	none
SPU_024521	SPU_024521	contains SMC_N domain	none
SPU_024558	SPU_024558	contains recD domain	none
SPU_024609	SPU_024609	contains 9 HYR superfamily motifs and recD domain	none
SPU_024647	SPU_024647	contains Smc domain	none
SPU_024663	SPU_024663	contains 16 ANK superfamily motifs and Arp domain motifs	none
SPU_024771	SPU_024771	contains COG5222 domain	none
SPU_024834	SPU_024834	contains 7 MAM superfamily motifs and 3 LDLa superfamily motifs	none
SPU_024856	SPU_024856	contains 5 SPEC superfamily motifs	none
SPU_024950	SPU_024950	Strongylocentrotus purpuratus-specific protein	none
SPU_024958	SPU_024958	Strongly homologous to mammalian hypothetical proteins.	none
SPU_025027	SPU_025027	contains COG4783 domain	none
SPU_025030	SPU_025030	contains 3 PDZ superfamily motifs	none
SPU_025064	SPU_025064	contains 12 ANK superfamily motifs and Arp domain motifs	none
SPU_025117	SPU_025117	Strongylocentrotus purpuratus-specific protein	none
SPU_025197	SPU_025197	Strongylocentrotus purpuratus-specific protein	none
SPU_025220	SPU_025220	contains SbcC domain and SMC_N domain	none
SPU_025255	SPU_025255	Strongylocentrotus purpuratus-specific protein	none
SPU_025274	SPU_025274	contains 12 ANK superfamily motifs and Arp domain motifs	none
SPU_025388	SPU_025388	contains 14 ANK superfamily motifs and Arp domain motifs	none
SPU_025456	SPU_025456	contains 2 7tm_2 superfamily motifs	none
SPU_025573	SPU_025573	contains PAT1 domain	none
SPU_025603	SPU_025603	contains 19 ANK superfamily motifs and Arp domain motifs	none
SPU_025624	SPU_025624	contains 2 Bromo_TFIID superfamily motifs	none
SPU_025630	SPU_025630	contains COG3899 domain	none
SPU_025667	SPU_025667	contains 22 ANK superfamily motifs and Arp domain motifs	none
SPU_025769	SPU_025769	contains 24 ANK superfamily motifs and Arp domain motifs	none
SPU_025876	SPU_025876	contains PRK10416 domain	none
SPU_025987	SPU_025987	contains COG1413 domain	none
SPU_026006	SPU_026006	contains 13 HYR superfamily motifs	none
SPU_026018	SPU_026018	contains infB domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_026069	SPU_026069	contains 10 ANK superfamily motifs and Arp domain motifs	none
SPU_026111	SPU_026111	contains MDN1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_026121	SPU_026121	contains 14 ANK superfamily motifs and Arp domain motifs	none
SPU_026131	SPU_026131	contains 3 UDPGT domain motifs	none
SPU_026151	SPU_026151	contains COG1112 domain	none
SPU_026152	SPU_026152	contains COG1112 domain and Furin-1 domain	none
SPU_026191	SPU_026191	contains 4 LDLa superfamily motifs and 2 CUB superfamily motifs and 2 Tryp_SPc superfamily motifs	none
SPU_026293	SPU_026293	contains 18 ANK superfamily motifs and Arp domain motifs	none
SPU_026342	SPU_026342	contains DYN1 domain	none
SPU_026412	SPU_026412	contains PRK08691 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_026417	SPU_026417	contains 5 EGF_CA superfamily motifs and 3 CUB superfamily motifs	none
SPU_026454	SPU_026454	contains 6 MAM superfamily motifs	none
SPU_026517	SPU_026517	Strongylocentrotus purpuratus-specific protein	none
SPU_026526	SPU_026526	Strongylocentrotus purpuratus-specific protein	none
SPU_026543	SPU_026543	contains 2 CCP superfamily motifs and 5 HYR superfamily motifs and 11 EGF_CA superfamily motifs	none
SPU_026567	SPU_026567	Strongylocentrotus purpuratus-specific protein	none
SPU_026569	SPU_026569	contains 2 CUB superfamily motifs and 8 EGF_CA superfamily motifs	none
SPU_026607	SPU_026607	contains 18 MAM superfamily motifs	none
SPU_026623	SPU_026623	contains 2 KAZAL_FS superfamily motifs at N-terminus and 1 KAZAL_FS superfamily motif at C-terminus. 5383 amino acids. Strongylocentrotus purpuratus-specific protein.	none
SPU_026733	SPU_026733	contains CDC6 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_026773	SPU_026773	contains 2 PBP1_GPCR_family_C-like superfamily motifs and 5 Periplasmic_Binding_Protein superfamily motifs and ANF_receptor domain	none
SPU_026785	SPU_026785	Strongylocentrotus purpuratus-specific protein	none
SPU_026836	SPU_026836	Strongylocentrotus purpuratus-specific protein	none
SPU_026939	SPU_026939	contains RecD domain	none
SPU_026976	SPU_026976	contains 12 ANK superfamily motifs and Arp domain motifs	none
SPU_027025	SPU_027025	contains COG1112 domain	none
SPU_027095	SPU_027095	contains 6 SRCR superfamily motifs and COG4886 domain	none
SPU_027148	SPU_027148	Strongylocentrotus purpuratus-specific protein	none
SPU_027302	SPU_027302	Strongylocentrotus purpuratus-specific protein	none
SPU_027310	SPU_027310	contains 3 MRS6 domain motifs	none
SPU_027329	SPU_027329	contains 2 TST_Repeat superfamily motifs and 2 RING superfamily motifs and SseA domain and UBP5 domain	none
SPU_027333	SPU_027333	contains 13 ANK superfamily motifs and Arp domain motifs	none
SPU_027338	SPU_027338	Strongylocentrotus purpuratus-specific protein	none
SPU_027403	SPU_027403	Strongylocentrotus purpuratus-specific protein	none
SPU_027465	SPU_027465	contains 8 ANK superfamily motifs and Arp domain motifs	none
SPU_027509	SPU_027509	Strongylocentrotus purpuratus-specific protein	none
SPU_027615	SPU_027615	Strongylocentrotus purpuratus-specific protein	none
SPU_027653	SPU_027653	Strongylocentrotus purpuratus-specific protein	none
SPU_027668	SPU_027668	contains 21 ANK superfamily motifs and Arp domain motifs	none
SPU_027696	SPU_027696	contains COG4942 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_027762	SPU_027762	Strongylocentrotus purpuratus-specific protein	none
SPU_027772	SPU_027772	contains 13 ANK superfamily motifs and Arp domain motifs	none
SPU_027790	SPU_027790	contains MDN1 domain	none
SPU_027829	SPU_027829	Strongylocentrotus purpuratus-specific protein	none
SPU_027861	SPU_027861	contains 6 MAM superfamily motifs	none
SPU_027862	SPU_027862	contains 6 MAM superfamily motifs	none
SPU_027898	SPU_027898	Strongylocentrotus purpuratus-specific protein	none
SPU_027939	SPU_027939	contains 2 Peptidase_C54 superfamily motifs	none
SPU_027959	SPU_027959	Strongylocentrotus purpuratus-specific protein	none
SPU_028009	SPU_028009	contains PH-like superfamily motif at C-terminus	none
SPU_028014	SPU_028014	contains CYK3 domain	none
SPU_028029	SPU_028029	contains MDN1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_028064	SPU_028064	Strongylocentrotus purpuratus-specific protein	none
SPU_028076	SPU_028076	contains RecD domain	none
SPU_028196	SPU_028196	contains SbcC domain and SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_028251	SPU_028251	contains 3 C2 superfamily motifs	none
SPU_028256	SPU_028256	Strongylocentrotus purpuratus-specific protein	none
SPU_028291	SPU_028291	contains 11 ANK superfamily motifs and Arp domain motifs	none
SPU_028292	SPU_028292	contains 10 ANK superfamily motifs and Arp domain motifs	none
SPU_028293	SPU_028293	contains 8 ANK superfamily motifs and Arp domain motifs	none
SPU_028303	SPU_028303	contains RecD domain	none
SPU_028333	SPU_028333	contains SNF2_N domain	none
SPU_028340	SPU_028340	Strongylocentrotus purpuratus-specific protein	none
SPU_028361	SPU_028361	contains 16 ANK superfamily motifs and Arp domain motifs	none
SPU_028393	SPU_028393	contains 14 EGF_CA superfamily motifs	none
SPU_028415	SPU_028415	contains 15 ANK superfamily motifs and Arp domain motifs	none
SPU_028439	SPU_028439	Strongylocentrotus purpuratus-specific protein	none
SPU_028481	SPU_028481	Strongylocentrotus purpuratus-specific protein	none
SPU_028512	SPU_028512	contains 4 ANK superfamily motifs and Arp domain motifs	none
SPU_028533	SPU_028533	contains 6 ANK superfamily motifs and Arp domain motifs	none
SPU_028535	SPU_028535	contains 7 LDL_recept_b superfamily motifs and 2 HYR superfamily motifs and 3 CCP superfamily motifs	none
SPU_028571	SPU_028571	Strongylocentrotus purpuratus-specific protein	none
SPU_028600	SPU_028600	Strongylocentrotus purpuratus-specific protein	none
SPU_028713	SPU_028713	contains Mitofilin domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_028745	SPU_028745	contains 20 ANK superfamily motifs and Arp domain motifs	none
SPU_028779	SPU_028779	contains COG1413 domain	none
SPU_028786	SPU_028786	contains DYN1 domain and 3 Dynein_heavy domains	none
SPU_028788	SPU_028788	contains COG4886 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_028890	SPU_028890	contains 9 ANK superfamily motifs and Arp domain motifs	none
SPU_028916	SPU_028916	Strongylocentrotus purpuratus-specific protein	none
SPU_000003	SPU_000003	Strongylocentrotus purpuratus-specific protein	none
SPU_000004	SPU_000004	Strongylocentrotus purpuratus-specific protein	none
SPU_000079	SPU_000079	Strongylocentrotus purpuratus-specific protein	none
SPU_000081	SPU_000081	Strongylocentrotus purpuratus-specific protein	none
SPU_000087	SPU_000087	Strongylocentrotus purpuratus-specific protein	none
SPU_000100	SPU_000100	Strongylocentrotus purpuratus-specific protein	none
SPU_000133	SPU_000133	Strongylocentrotus purpuratus-specific protein	none
SPU_000185	SPU_000185	Strongylocentrotus purpuratus-specific protein	none
SPU_000320	SPU_000320	Strongylocentrotus purpuratus-specific protein	none
SPU_000325	SPU_000325	contains COG4886 domain	none
SPU_000335	SPU_000335	Strongylocentrotus purpuratus-specific protein	none
SPU_000339	SPU_000339	Strongylocentrotus purpuratus-specific protein	none
SPU_000358	SPU_000358	Strongylocentrotus purpuratus-specific protein	none
SPU_000361	SPU_000361	Strongylocentrotus purpuratus-specific protein	none
SPU_000367	SPU_000367	Strongylocentrotus purpuratus-specific protein	none
SPU_000368	SPU_000368	Strongylocentrotus purpuratus-specific protein	none
SPU_000369	SPU_000369	Strongylocentrotus purpuratus-specific protein	none
SPU_000370	SPU_000370	Strongylocentrotus purpuratus-specific protein	none
SPU_000381	SPU_000381	Strongylocentrotus purpuratus-specific protein	none
SPU_000382	SPU_000382	contains Glyco_hydro_47 superfamily motif at C-terminus	none
SPU_000401	SPU_000401	contains COG5222 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_000410	SPU_000410	contains 6 EGF_CA superfamily motifs	none
SPU_000413	SPU_000413	Strongylocentrotus purpuratus-specific protein	none
SPU_000419	SPU_000419	contains 2 EGF_CA superfamily motifs	none
SPU_000527	SPU_000527	contains 2 CUB superfamily motifs	none
SPU_000555	SPU_000555	contains NADB_Rossmann superfamily motif in the C-terminal half. probable assembly chimera.	none
SPU_000564	SPU_000564	contains 2 ANK superfamily motifs	none
SPU_000585	SPU_000585	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_000586	SPU_000586	contains LdhA domain	none
SPU_000620	SPU_000620	contains 2 LDLa superfamily motifs	none
SPU_000624	SPU_000624	contains PAT1 domain	none
SPU_000626	SPU_000626	Strongylocentrotus purpuratus-specific protein	none
SPU_000628	SPU_000628	contains Arp domain	none
SPU_000640	SPU_000640	contains Smc domain and Reo_sigma1 domain	none
SPU_000653	SPU_000653	contains 2 FA58C superfamily motifs	none
SPU_000658	SPU_000658	Strongylocentrotus purpuratus-specific protein	none
SPU_000660	SPU_000660	contains GatA domain	none
SPU_000662	SPU_000662	Strongylocentrotus purpuratus-specific protein	none
SPU_000671	SPU_000671	contains RING superfamily motif and COG5540 domain at C-terminus. Strongylocentrotus purpuratus-specific protein.	none
SPU_000682	SPU_000682	contains Sulfotransfer_1 domain	none
SPU_000683	SPU_000683	Strongylocentrotus purpuratus-specific protein	none
SPU_000684	SPU_000684	contains COG0429 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_000695	SPU_000695	contains 2 WW superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_000698	SPU_000698	Strongylocentrotus purpuratus-specific protein	none
SPU_000699	SPU_000699	Strongylocentrotus purpuratus-specific protein	none
SPU_000700	SPU_000700	Strongylocentrotus purpuratus-specific protein	none
SPU_000701	SPU_000701	Strongylocentrotus purpuratus-specific protein	none
SPU_000703	SPU_000703	Strongylocentrotus purpuratus-specific protein	none
SPU_000704	SPU_000704	contains COG5222 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_000705	SPU_000705	contains RAD18 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_000711	SPU_000711	Strongylocentrotus purpuratus-specific protein	none
SPU_000712	SPU_000712	Strongylocentrotus purpuratus-specific protein	none
SPU_000713	SPU_000713	Strongylocentrotus purpuratus-specific protein	none
SPU_000718	SPU_000718	contains Sultotransfer_1 domain	none
SPU_000728	SPU_000728	Strongylocentrotus purpuratus-specific protein	none
SPU_000804	SPU_000804	contains CALCOCO1 domain	none
SPU_000808	SPU_000808	Strongylocentrotus purpuratus-specific protein	none
SPU_000880	SPU_000880	Strongylocentrotus purpuratus-specific protein	none
SPU_000895	SPU_000895	Strongylocentrotus purpuratus-specific protein	none
SPU_000902	SPU_000902	Strongylocentrotus purpuratus-specific protein	none
SPU_000903	SPU_000903	Strongylocentrotus purpuratus-specific protein	none
SPU_000916	SPU_000916	Strongylocentrotus purpuratus-specific protein	none
SPU_000917	SPU_000917	contains 3 LDL_recept_b superfamily motifs	none
SPU_000928	SPU_000928	contains 2 TM2 superfamily motifs	none
SPU_000929	SPU_000929	contains 3 TM2 superfamily motifs	none
SPU_000930	SPU_000930	contains 2 TM2 superfamily motifs	none
SPU_000942	SPU_000942	contains 2 LuxE superfamily motifs	none
SPU_000956	SPU_000956	Strongylocentrotus purpuratus-specific protein	none
SPU_000979	SPU_000979	contains Sulfotransfer_1 domain	none
SPU_000983	SPU_000983	contains 2 TSP_1 superfamily motifs	none
SPU_000989	SPU_000989	contains 2 MAM superfamily motifs	none
SPU_001007	SPU_001007	Strongylocentrotus purpuratus-specific protein	none
SPU_001008	SPU_001008	Strongylocentrotus purpuratus-specific protein	none
SPU_001032	SPU_001032	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_001046	SPU_001046	Strongylocentrotus purpuratus-specific protein	none
SPU_001078	SPU_001078	contains MRP-S27 domain	none
SPU_001080	SPU_001080	contains mutL domain	none
SPU_001082	SPU_001082	Strongylocentrotus purpuratus-specific protein	none
SPU_001102	SPU_001102	contains SMC_N domain	none
SPU_001108	SPU_001108	contains COG4886 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_001115	SPU_001115	Strongylocentrotus purpuratus-specific protein	none
SPU_001122	SPU_001122	Strongylocentrotus purpuratus-specific protein	none
SPU_001133	SPU_001133	contains 2 ANK superfamily motifs and Arp domain	none
SPU_001145	SPU_001145	contains metH domain	none
SPU_001173	SPU_001173	Strongylocentrotus purpuratus-specific protein	none
SPU_001196	SPU_001196	Strongylocentrotus purpuratus-specific protein	none
SPU_001197	SPU_001197	Strongylocentrotus purpuratus-specific protein	none
SPU_001212	SPU_001212	contains Na_H_Exchanger domain	none
SPU_001219	SPU_001219	Strongylocentrotus purpuratus-specific protein	none
SPU_001220	SPU_001220	Strongylocentrotus purpuratus-specific protein	none
SPU_001224	SPU_001224	Strongylocentrotus purpuratus-specific protein	none
SPU_001246	SPU_001246	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_001248	SPU_001248	Strongylocentrotus purpuratus-specific protein	none
SPU_001250	SPU_001250	contains 2 Ras-like_GTPase superfamily motifs	none
SPU_001265	SPU_001265	contains 2 PHD superfamily motifs and SFP1 domain	none
SPU_001295	SPU_001295	contains FRQ1 domain	none
SPU_001297	SPU_001297	Strongylocentrotus purpuratus-specific protein	none
SPU_001316	SPU_001316	contains Sulfotransfer_1 domain	none
SPU_001331	SPU_001331	Strongylocentrotus purpuratus-specific protein	none
SPU_001337	SPU_001337	contains 2 RF-1 superfamily motifs and prfA domain	none
SPU_001345	SPU_001345	contains 10 EGF_CA superfamily motifs	none
SPU_001381	SPU_001381	contains 2 MFS superfamily motifs	none
SPU_001386	SPU_001386	Strongylocentrotus purpuratus-specific protein	none
SPU_001391	SPU_001391	contains COG3391 domain	none
SPU_001412	SPU_001412	contains 2 DUF605 domain motifs	none
SPU_001413	SPU_001413	Strongylocentrotus purpuratus-specific protein	none
SPU_001418	SPU_001418	Strongylocentrotus purpuratus-specific protein	none
SPU_001426	SPU_001426	Strongylocentrotus purpuratus-specific protein	none
SPU_001438	SPU_001438	Strongylocentrotus purpuratus-specific protein	none
SPU_001448	SPU_001448	contains DNA_pol_B_2 domain	none
SPU_001456	SPU_001456	only N-terminal 80 amino acids have homologies to other proteins. probable assembly chimera.	none
SPU_001462	SPU_001462	contains B41 domain	none
SPU_001464	SPU_001464	Strongylocentrotus purpuratus-specific protein	none
SPU_001468	SPU_001468	Strongylocentrotus purpuratus-specific protein	none
SPU_001476	SPU_001476	Strongylocentrotus purpuratus-specific protein	none
SPU_001479	SPU_001479	contains 2 NPR2 superfamily motifs	none
SPU_001485	SPU_001485	Strongylocentrotus purpuratus-specific protein	none
SPU_001490	SPU_001490	Strongylocentrotus purpuratus-specific protein	none
SPU_001499	SPU_001499	contains Sulfotransfer_1 domain	none
SPU_001535	SPU_001535	contains 2 ANK superfamily motifs and Arp domain	none
SPU_001567	SPU_001567	Strongylocentrotus purpuratus-specific protein	none
SPU_001581	SPU_001581	Strongylocentrotus purpuratus-specific protein	none
SPU_001602	SPU_001602	contains 2 V-set domain motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_001627	SPU_001627	contains PRK07800 domain	none
SPU_001641	SPU_001641	contains COG4886 domain	none
SPU_001645	SPU_001645	contains Smc domain	none
SPU_001654	SPU_001654	Strongylocentrotus purpuratus-specific protein	none
SPU_001655	SPU_001655	contains Sulfotransfer_1 domain	none
SPU_001666	SPU_001666	contains 2 PBPb superfamily motifs	none
SPU_001672	SPU_001672	contains COG4886 domain	none
SPU_001673	SPU_001673	homolgous to bacterial and plant proteins	none
SPU_001691	SPU_001691	Strongylocentrotus purpuratus-specific protein	none
SPU_001699	SPU_001699	contains DYN1 domain	none
SPU_001719	SPU_001719	contains ComEA domain and Tex domain	none
SPU_001720	SPU_001720	Strongylocentrotus purpuratus-specific protein	none
SPU_001737	SPU_001737	contains 3 HYR superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_001747	SPU_001747	Strongylocentrotus purpuratus-specific protein	none
SPU_001751	SPU_001751	contains COG4870 domain	none
SPU_001754	SPU_001754	contains 4 FA58C superfamily motifs	none
SPU_001755	SPU_001755	contains PAT1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_001783	SPU_001783	Strongylocentrotus purpuratus-specific protein	none
SPU_001819	SPU_001819	Strongylocentrotus purpuratus-specific protein	none
SPU_001839	SPU_001839	Strongylocentrotus purpuratus-specific protein	none
SPU_001841	SPU_001841	Strongylocentrotus purpuratus-specific protein	none
SPU_001842	SPU_001842	Strongylocentrotus purpuratus-specific protein	none
SPU_001843	SPU_001843	Strongylocentrotus purpuratus-specific protein	none
SPU_001844	SPU_001844	Strongylocentrotus purpuratus-specific protein	none
SPU_001845	SPU_001845	Strongylocentrotus purpuratus-specific protein	none
SPU_001858	SPU_001858	contains 3 ANK superfamily motifs and Arp domain	none
SPU_001876	SPU_001876	Strongylocentrotus purpuratus-specific protein	none
SPU_001913	SPU_001913	Strongylocentrotus purpuratus-specific protein	none
SPU_001917	SPU_001917	Strongylocentrotus purpuratus-specific protein	none
SPU_001920	SPU_001920	Strongylocentrotus purpuratus-specific protein	none
SPU_001940	SPU_001940	contains Fanconi_C domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_001941	SPU_001941	Strongylocentrotus purpuratus-specific protein	none
SPU_001943	SPU_001943	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_001944	SPU_001944	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_001953	SPU_001953	Strongylocentrotus purpuratus-specific protein. probable assembly chimera.	none
SPU_001963	SPU_001963	contains Pkinase domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_001981	SPU_001981	contains Smc domain	none
SPU_001985	SPU_001985	Strongylocentrotus purpuratus-specific protein	none
SPU_001989	SPU_001989	Strongylocentrotus purpuratus-specific protein	none
SPU_001997	SPU_001997	contains 6 HYR superfamily motifs	none
SPU_001999	SPU_001999	Strongylocentrotus purpuratus-specific protein	none
SPU_002000	SPU_002000	Strongylocentrotus purpuratus-specific protein	none
SPU_002034	SPU_002034	contains Borrelia_P83 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_002064	SPU_002064	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_002077	SPU_002077	contains 2 MIB_HERC2 superfamily motifs	none
SPU_002098	SPU_002098	Strongylocentrotus purpuratus-specific protein	none
SPU_002099	SPU_002099	Strongylocentrotus purpuratus-specific protein	none
SPU_002104	SPU_002104	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_002143	SPU_002143	Strongylocentrotus purpuratus-specific protein	none
SPU_002166	SPU_002166	Strongylocentrotus purpuratus-specific protein	none
SPU_002216	SPU_002216	contains Smc domain	none
SPU_002237	SPU_002237	Strongylocentrotus purpuratus-specific protein. probable assembly chimera.	none
SPU_002239	SPU_002239	Strongylocentrotus purpuratus-specific protein	none
SPU_002240	SPU_002240	Strongylocentrotus purpuratus-specific protein	none
SPU_002243	SPU_002243	contains 2 EGF_CA superfamily motifs	none
SPU_002245	SPU_002245	Strongylocentrotus purpuratus-specific protein	none
SPU_002253	SPU_002253	contains ACTIN domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_002255	SPU_002255	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_002261	SPU_002261	Strongylocentrotus purpuratus-specific protein	none
SPU_002268	SPU_002268	contains 3 ANK superfamily motifs and Arp domain	none
SPU_002275	SPU_002275	contains 4 ANK superfamily motifs and Arp domain	none
SPU_002276	SPU_002276	contains 3 ANK superfamily motifs and Arp domain	none
SPU_002277	SPU_002277	contains 4 ANK superfamily motifs and Arp domain	none
SPU_002283	SPU_002283	Strongylocentrotus purpuratus-specific protein	none
SPU_002289	SPU_002289	contains Mrp domain	none
SPU_002296	SPU_002296	homologous to numerous putative Branchiostoma floridae proteins. Strongylocentrotus purpuratus-specific protein.	none
SPU_002332	SPU_002332	contains PRK00142 domain	none
SPU_002344	SPU_002344	contains COG2319 domain	none
SPU_002348	SPU_002348	contains Sultotransfer_1 domain	none
SPU_002358	SPU_002358	Strongylocentrotus purpuratus-specific protein	none
SPU_002400	SPU_002400	Strongylocentrotus purpuratus-specific protein	none
SPU_002395	SPU_002395	Strongylocentrotus purpuratus-specific protein	none
SPU_002403	SPU_002403	Strongylocentrotus purpuratus-specific protein	none
SPU_002410	SPU_002410	contains 2 FA58C superfamily motifs	none
SPU_002424	SPU_002424	contains Tektin domain	none
SPU_002427	SPU_002427	probable assembly chimera	none
SPU_002440	SPU_002440	contains COG3391 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_002452	SPU_002452	contains 3 CCP superfamily motifs	none
SPU_002454	SPU_002454	contains 2 CCP superfamily motifs	none
SPU_002495	SPU_002495	Strongylocentrotus purpuratus-specific protein	none
SPU_002540	SPU_002540	contains 4 IG superfamily motifs	none
SPU_002551	SPU_002551	contains 4 EGF_CA superfamily motifs	none
SPU_002573	SPU_002573	Strongylocentrotus purpuratus-specific protein	none
SPU_002597	SPU_002597	Strongylocentrotus purpuratus-specific protein	none
SPU_002620	SPU_002620	contains 2 TPR superfamily motifs	none
SPU_002629	SPU_002629	contains MesJ domain	none
SPU_002640	SPU_002640	contains COG2433 domain	none
SPU_002642	SPU_002642	contains ArgE domain	none
SPU_002652	SPU_002652	Strongylocentrotus purpuratus-specific protein	none
SPU_002675	SPU_002675	contains 3 MAM superfamily motifs	none
SPU_002684	SPU_002684	contains 2 MFS superfamily motifs	none
SPU_002689	SPU_002689	contains 3 zf-C2HC superfamily motifs and Myosin_tail_1 domain	none
SPU_002716	SPU_002716	probable assembly chimera. Strongylocentrotus purpuratus-specific protein.	none
SPU_002726	SPU_002726	Strongylocentrotus purpuratus-specific protein	none
SPU_002729	SPU_002729	contains RecD domain	none
SPU_002746	SPU_002746	contains 2 EGF_CA superfamily motifs	none
SPU_002751	SPU_002751	contains 2 CCP superfamily motifs	none
SPU_002762	SPU_002762	contains 2 ANK superfamily motifs	none
SPU_002799	SPU_002799	Strongylocentrotus purpuratus-specific protein	none
SPU_002814	SPU_002814	Strongylocentrotus purpuratus-specific protein	none
SPU_002826	SPU_002826	contains 2 IG superfamily motifs	none
SPU_002834	SPU_002834	contains 2 FA58C superfamily motifs	none
SPU_002906	SPU_002906	Strongylocentrotus purpuratus-specific protein	none
SPU_002913	SPU_002913	Strongylocentrotus purpuratus-specific protein	none
SPU_002916	SPU_002916	Strongylocentrotus purpuratus-specific protein	none
SPU_002920	SPU_002920	contains Smc domain	none
SPU_002934	SPU_002934	Strongylocentrotus purpuratus-specific protein	none
SPU_002935	SPU_002935	contains COG4886 domain	none
SPU_002956	SPU_002956	Strongylocentrotus purpuratus-specific protein	none
SPU_002976	SPU_002976	Strongylocentrotus purpuratus-specific protein	none
SPU_002999	SPU_002999	contains PAT1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_003024	SPU_003024	Strongylocentrotus purpuratus-specific protein	none
SPU_003053	SPU_003053	contains RimI domain	none
SPU_003066	SPU_003066	Strongylocentrotus purpuratus-specific protein	none
SPU_003074	SPU_003074	contains COG4886 domain	none
SPU_003079	SPU_003079	Strongylocentrotus purpuratus-specific protein	none
SPU_003096	SPU_003096	contains Herpes_BLLF1 domain	none
SPU_003118	SPU_003118	contains 2 HYR superfamily motifs	none
SPU_003155	SPU_003155	contains PAT1 domain	none
SPU_003178	SPU_003178	contains 3 ANK superfamily motifs and Arp domain	none
SPU_003180	SPU_003180	contains 2 class_II_aaRS-like_core superfamily motifs	none
SPU_003187	SPU_003187	probable assembly chimera	none
SPU_003188	SPU_003188	contains 3 ANK superfamily motifs and Arp domain	none
SPU_003205	SPU_003205	contains PilF domain	none
SPU_003208	SPU_003208	probable assembly chimera. Strongylocentrotus purpuratus-specific protein.	none
SPU_003238	SPU_003238	Strongylocentrotus purpuratus-specific protein	none
SPU_003288	SPU_003288	Strongylocentrotus purpuratus-specific protein	none
SPU_003301	SPU_003301	Strongylocentrotus purpuratus-specific protein	none
SPU_003337	SPU_003337	contains 2 DEXDc superfamily motifs	none
SPU_003342	SPU_003342	Strongylocentrotus purpuratus-specific protein	none
SPU_003352	SPU_003352	probable assembly chimera. Strongylocentrotus purpuratus-specific protein.	none
SPU_003354	SPU_003354	Strongylocentrotus purpuratus-specific protein	none
SPU_003359	SPU_003359	contains 2 IG superfamily motifs	none
SPU_003385	SPU_003385	contains 2 ANK superfamily motifs and Arp domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_003386	SPU_003386	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_003427	SPU_003427	contains 3 IG superfamily motifs	none
SPU_003484	SPU_003484	Strongylocentrotus purpuratus-specific protein	none
SPU_003503	SPU_003503	contains PRK05771	none
SPU_003519	SPU_003519	contains SSM4 domain	none
SPU_003523	SPU_003523	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_003562	SPU_003562	contains 2 FNR-like superfamily motifs	none
SPU_003572	SPU_003572	contains Arp domain	none
SPU_003577	SPU_003577	contains 2 FA58C superfamily motifs	none
SPU_003629	SPU_003629	probable assembly chimera	none
SPU_003648	SPU_003648	contains 3 ANK superfamily motifs and Arp domain	none
SPU_003650	SPU_003650	contains 2 DnaQ-like_exo superfamily and PolB domain	none
SPU_003652	SPU_003652	contains DUF1394 domain	none
SPU_003662	SPU_003662	probable assembly chimera	none
SPU_003682	SPU_003682	Strongylocentrotus purpuratus-specific protein	none
SPU_003695	SPU_003695	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_003734	SPU_003734	contains PRK05771 domain and UhpB domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_003737	SPU_003737	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_003738	SPU_003738	contains SNC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_003739	SPU_003739	contains MDN1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_003745	SPU_003745	contains Sulfotransfer_1 domain	none
SPU_003751	SPU_003751	contains 7 Ldl_recept_b superfamily motifs	none
SPU_003756	SPU_003756	Strongylocentrotus purpuratus-specific protein	none
SPU_003779	SPU_003779	contains COG4886 domain	none
SPU_003801	SPU_003801	contains SbcC domain	none
SPU_003813	SPU_003813	contains 3 ANK superfamily motifs and Arp domain	none
SPU_003838	SPU_003838	contains ATS1 domain	none
SPU_003858	SPU_003858	contains Sulfotransfer_1 domain	none
SPU_003871	SPU_003871	contains 2 PHD superfamily motifs	none
SPU_003879	SPU_003879	contains AcuC domain	none
SPU_003885	SPU_003885	Strongylocentrotus purpuratus-specific protein	none
SPU_003888	SPU_003888	contains Smc domain	none
SPU_003889	SPU_003889	contains Sulfotransfer_1 domain	none
SPU_003948	SPU_003948	Strongylocentrotus purpuratus-specific protein	none
SPU_003950	SPU_003950	contains 2 ANK superfamily motifs	none
SPU_003952	SPU_003952	contains SMC_N domain and SGL domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_003960	SPU_003960	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_004012	SPU_004012	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_004052	SPU_004052	probable assembly chimera. Strongylocentrotus purpuratus-specific protein.	none
SPU_004054	SPU_004054	contains DnaJ domain. Strongylocentrotus purpuratus-specific protein. probable assembly chimera.	none
SPU_004063	SPU_004063	contains ATS1 domain	none
SPU_004081	SPU_004081	contains RecQ domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_004082	SPU_004082	Strongylocentrotus purpuratus-specific protein	none
SPU_004090	SPU_004090	SPP	none
SPU_004099	SPU_004099	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_004108	SPU_004108	contains Uvr domain	none
SPU_004126	SPU_004126	contains 3 RRM superfamily motifs	none
SPU_004175	SPU_004175	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_004186	SPU_004186	contains PnbA domain	none
SPU_004187	SPU_004187	contains AMP-binding domain	none
SPU_004201	SPU_004201	Strongylocentrotus purpuratus-specific protein	none
SPU_004231	SPU_004231	contains 2 LamG superfamily motifs	none
SPU_004302	SPU_004302	Strongylocentrotus purpuratus-specific protein	none
SPU_004307	SPU_004307	Strongylocentrotus purpuratus-specific protein	none
SPU_004331	SPU_004331	Strongylocentrotus purpuratus-specific protein	none
SPU_004351	SPU_004351	contains AF-4 domain	none
SPU_004355	SPU_004355	Strongylocentrotus purpuratus-specific protein	none
SPU_004369	SPU_004369	Strongylocentrotus purpuratus-specific protein	none
SPU_004370	SPU_004370	Strongylocentrotus purpuratus-specific protein	none
SPU_004374	SPU_004374	Strongylocentrotus purpuratus-specific protein	none
SPU_004387	SPU_004387	Strongylocentrotus purpuratus-specific protein	none
SPU_004400	SPU_004400	contains 2 KR superfamily motifs	none
SPU_004410	SPU_004410	Strongylocentrotus purpuratus-specific protein	none
SPU_004422	SPU_004422	contains 3 PDZ superfamily motifs	none
SPU_004439	SPU_004439	Strongylocentrotus purpuratus-specific protein	none
SPU_004465	SPU_004465	Strongylocentrotus purpuratus-specific protein	none
SPU_004472	SPU_004472	contains PAT1 domain. probable assembly chimera.	none
SPU_004480	SPU_004480	contains UDPGT domain	none
SPU_004481	SPU_004481	contains UDPGT domain	none
SPU_004488	SPU_004488	contains 2 LDLa superfamily motifs	none
SPU_004491	SPU_004491	contains 3 MAM superfamily motifs	none
SPU_004497	SPU_004497	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_004509	SPU_004509	contains Filament domain	none
SPU_004514	SPU_004514	Strongylocentrotus purpuratus-specific protein	none
SPU_004516	SPU_004516	contains COG4886 domain	none
SPU_004530	SPU_004530	contains 3 EGF_CA superfamily motifs	none
SPU_004542	SPU_004542	contains 2 EGF_CA superfamily motifs	none
SPU_004560	SPU_004560	probable assembly chimera	none
SPU_004565	SPU_004565	contains 2 Cupin superfamily motifs	none
SPU_004583	SPU_004583	Strongylocentrotus purpuratus-specific protein	none
SPU_004618	SPU_004618	contains 3 KR superfamily motifs	none
SPU_004635	SPU_004635	probable assembly chimera	none
SPU_004636	SPU_004636	contains 2 MAM superfamily motifs	none
SPU_004682	SPU_004682	contains Adaptin_N domain	none
SPU_004686	SPU_004686	contains 2 FA58C superfamily motifs	none
SPU_004703	SPU_004703	contains DLH domain	none
SPU_004706	SPU_004706	contains 2 WSC superfamily motifs	none
SPU_004707	SPU_004707	contains 2 WSC superfamily motifs	none
SPU_004712	SPU_004712	Strongylocentrotus purpuratus-specific protein	none
SPU_004757	SPU_004757	contains PRK05431 domain	none
SPU_004761	SPU_004761	contains PnbA domain	none
SPU_004775	SPU_004775	Strongylocentrotus purpuratus-specific protein	none
SPU_004780	SPU_004780	contains PRK06116 domain. probable assembly chimera.	none
SPU_004829	SPU_004829	Strongylocentrotus purpuratus-specific protein	none
SPU_004843	SPU_004843	probable assembly chimera	none
SPU_004851	SPU_004851	Strongylocentrotus purpuratus-specific protein	none
SPU_004904	SPU_004904	homologous to only 1 hypothetical Branchiostoma floridae protein. Strongylocentrotus purpuratus-specific protein.	none
SPU_004906	SPU_004906	homologous to only 1 hypothetical Branchiostoma floridae protein. Strongylocentrotus purpuratus-specific protein.	none
SPU_004931	SPU_004931	contains 2 IG superfamily motifs	none
SPU_004942	SPU_004942	probable assembly chimera	none
SPU_004979	SPU_004979	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_005012	SPU_005012	contains 2 CCP superfamily motifs	none
SPU_005014	SPU_005014	Strongylocentrotus purpuratus-specific protein	none
SPU_005016	SPU_005016	Strongylocentrotus purpuratus-specific protein	none
SPU_005025	SPU_005025	contains MFS_1 domain	none
SPU_005043	SPU_005043	contains V-set domain	none
SPU_005046	SPU_005046	contains SMC_N domain	none
SPU_005064	SPU_005064	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_005069	SPU_005069	contains 2 IG superfamily motifs	none
SPU_005093	SPU_005093	contains Sulfotransfer_1 domain	none
SPU_005098	SPU_005098	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_005101	SPU_005101	contains MFS domain	none
SPU_005109	SPU_005109	Strongylocentrotus purpuratus-specific protein	none
SPU_005121	SPU_005121	Strongylocentrotus purpuratus-specific protein	none
SPU_005130	SPU_005130	contains 2 HYR superfamily motifs. probable assembly chimera.	none
SPU_005159	SPU_005159	contains Sultotransfer_1 domain	none
SPU_005163	SPU_005163	contains Ion_trans domain	none
SPU_005165	SPU_005165	Strongylocentrotus purpuratus-specific protein	none
SPU_005168	SPU_005168	contains PnbA domain	none
SPU_005179	SPU_005179	contains COG3415 domain	none
SPU_005191	SPU_005191	contains 2 ARM superfamily motifs	none
SPU_005199	SPU_005199	homologous only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_005206	SPU_005206	contains 2 EGF_CA superfamily	none
SPU_005218	SPU_005218	contains YidC domain	none
SPU_005263	SPU_005263	contains ACTIN domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_005283	SPU_005283	contains S_TKc domain	none
SPU_005289	SPU_005289	poor sequence data: ~25% of amino acids are X	none
SPU_005294	SPU_005294	Strongylocentrotus purpuratus-specific protein	none
SPU_005297	SPU_005297	contains PRK00409 domain	none
SPU_005299	SPU_005299	contains 3 ANK superfamily motifs and Arp domain	none
SPU_005310	SPU_005310	homologous to bacterial putative proteins	none
SPU_005330	SPU_005330	contains OATP domain	none
SPU_005373	SPU_005373	contains Sulfotransfer_1 domain	none
SPU_005421	SPU_005421	contains 3 ANK superfamily motifs and Arp domain	none
SPU_005448	SPU_005448	contains ECM4 domain	none
SPU_005474	SPU_005474	Strongylocentrotus purpuratus-specific protein	none
SPU_005477	SPU_005477	contains Integrin_alpha2 domain	none
SPU_005496	SPU_005496	Strongylocentrotus purpuratus-specific protein. probable assembly chimera.	none
SPU_005546	SPU_005546	Strongylocentrotus purpuratus-specific protein	none
SPU_005548	SPU_005548	contains COG4886 domain	none
SPU_005617	SPU_005617	contains 2 MFS superfamily motifs	none
SPU_005631	SPU_005631	contains RhaT domain	none
SPU_005632	SPU_005632	contains 2 DUF6 superfamily motifs	none
SPU_005633	SPU_005633	contains RhaT domain	none
SPU_005634	SPU_005634	contains RhaT domain	none
SPU_005635	SPU_005635	contains RhaT domain	none
SPU_005647	SPU_005647	Strongylocentrotus purpuratus-specific protein	none
SPU_005648	SPU_005648	contains S-methyl_trans domain	none
SPU_005660	SPU_005660	probable assembly chimera	none
SPU_005664	SPU_005664	contains 2 DM13 superfamily motifs	none
SPU_005682	SPU_005682	contains 3 HYR superfamily motifs	none
SPU_005685	SPU_005685	contains 6 rve superfamily motifs	none
SPU_005711	SPU_005711	contains PRK05431 domain	none
SPU_005724	SPU_005724	contains 5 CUB superfamily motifs	none
SPU_005764	SPU_005764	contains RAD18 domain	none
SPU_005770	SPU_005770	contains PRK07003 domain	none
SPU_005809	SPU_005809	contains 2 MAM superfamily motifs	none
SPU_005812	SPU_005812	Strongylocentrotus purpuratus-specific protein	none
SPU_005824	SPU_005824	contains SMC_N domain	none
SPU_005827	SPU_005827	contains PRK04778 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_005879	SPU_005879	Strongylocentrotus purpuratus-specific protein	none
SPU_005898	SPU_005898	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_005902	SPU_005902	contains V-set domain	none
SPU_005918	SPU_005918	contains 2 MAM superfamily motifs	none
SPU_005921	SPU_005921	Strongylocentrotus purpuratus-specific protein	none
SPU_005727	SPU_005727	contains 2 MAM superfamily motifs	none
SPU_005939	SPU_005939	Strongylocentrotus purpuratus-specific protein	none
SPU_005940	SPU_005940	Strongylocentrotus purpuratus-specific protein	none
SPU_005953	SPU_005953	contains 2 Periplasmic_Binding_Protein_type_1 superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_005959	SPU_005959	Strongylocentrotus purpuratus-specific protein	none
SPU_006007	SPU_006007	contains COG0790 domain	none
SPU_006009	SPU_006009	contains MDN1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_006023	SPU_006023	contains 3 IG superfamily motifs	none
SPU_006032	SPU_006032	probable assembly chimera	none
SPU_006036	SPU_006036	probable assembly chimera	none
SPU_006039	SPU_006039	contains COG4486 domain	none
SPU_006050	SPU_006050	contains COG3415 domain	none
SPU_006080	SPU_006080	Strongylocentrotus purpuratus-specific protein	none
SPU_006094	SPU_006094	contains 4 ANK superfamily motifs and Arp domain	none
SPU_006095	SPU_006095	contains 3 ANK superfamily motifs and Arp domain	none
SPU_006104	SPU_006104	Strongylocentrotus purpuratus-specific protein	none
SPU_006108	SPU_006108	Strongylocentrotus purpuratus-specific protein	none
SPU_006113	SPU_006113	contains MDN1 domain	none
SPU_006127	SPU_006127	contains GCC2_GCC3 superfamily motifs	none
SPU_006133	SPU_006133	contains SNC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_006187	SPU_006187	contains Sulfotransfer_1 domain	none
SPU_006193	SPU_006193	Strongylocentrotus purpuratus-specific protein	none
SPU_006194	SPU_006194	contains Transposase domain	none
SPU_006227	SPU_006227	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_006231	SPU_006231	contains 3 ANK superfamily motifs and Arp domain motifs	none
SPU_006233	SPU_006233	contains 2 Gelsolin superfamily motifs	none
SPU_006240	SPU_006240	contains PEX10 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_006246	SPU_006246	Strongylocentrotus purpuratus-specific protein	none
SPU_006247	SPU_006247	Strongylocentrotus purpuratus-specific protein	none
SPU_006255	SPU_006255	probable assembly chimera	none
SPU_006282	SPU_006282	Strongylocentrotus purpuratus-specific protein	none
SPU_006288	SPU_006288	Strongylocentrotus purpuratus-specific protein	none
SPU_006289	SPU_006289	Strongylocentrotus purpuratus-specific protein	none
SPU_006292	SPU_006292	contains SMC_N domain	none
SPU_006304	SPU_006304	probable assembly chimera. Strongylocentrotus purpuratus-specific protein.	none
SPU_006307	SPU_006307	contains AslA domain	none
SPU_006314	SPU_006314	Strongylocentrotus purpuratus-specific protein	none
SPU_006336	SPU_006336	Strongylocentrotus purpuratus-specific protein	none
SPU_006376	SPU_006376	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_006407	SPU_006407	contains Smc domain	none
SPU_006408	SPU_006408	Strongylocentrotus purpuratus-specific protein	none
SPU_006415	SPU_006415	probable assembly chimera	none
SPU_006416	SPU_006416	contains 5 EGF_CA superfamily	none
SPU_006422	SPU_006422	contains 2 MFS superfamily motifs	none
SPU_006436	SPU_006436	contains SEC21 domain	none
SPU_006500	SPU_006500	Strongylocentrotus purpuratus-specific protein	none
SPU_006501	SPU_006501	Strongylocentrotus purpuratus-specific protein	none
SPU_006512	SPU_006512	Strongylocentrotus purpuratus-specific protein	none
SPU_006515	SPU_006515	poor sequence data: >80% of the amino acids are X	none
SPU_006537	SPU_006537	contains 2 MFS superfamily motifs	none
SPU_006613	SPU_006613	contains 2 HYR superfamily motifs	none
SPU_006616	SPU_006616	contains 3 IG superfamily motifs	none
SPU_006618	SPU_006618	contains PAT1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_006667	SPU_006667	contains 2 Neutralized superfamily motifs	none
SPU_006681	SPU_006681	Strongylocentrotus purpuratus-specific protein	none
SPU_006694	SPU_006694	Strongylocentrotus purpuratus-specific protein	none
SPU_006697	SPU_006697	Strongylocentrotus purpuratus-specific protein	none
SPU_006712	SPU_006712	probable assembly chimera	none
SPU_006717	SPU_006717	Strongylocentrotus purpuratus-specific protein	none
SPU_006760	SPU_006760	Strongylocentrotus purpuratus-specific protein	none
SPU_006774	SPU_006774	contains COG3391 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_006780	SPU_006780	poor sequence data: ~40% of amino acids are X	none
SPU_006856	SPU_006856	contains 3 EGF_CA superfamily motifs	none
SPU_006867	SPU_006867	Strongylocentrotus purpuratus-specific protein	none
SPU_006885	SPU_006885	probable assembly chimera	none
SPU_006898	SPU_006898	probable assembly chimera	none
SPU_006905	SPU_006905	contains SMC_N domain	none
SPU_006934	SPU_006934	contains 3 CCP superfamily motifs	none
SPU_006957	SPU_006957	contains 3 ANK superfamily motifs and Arp domain	none
SPU_006978	SPU_006978	contains 2 S1-like superfamily motifs	none
SPU_006999	SPU_006999	Strongylocentrotus purpuratus-specific protein	none
SPU_007010	SPU_007010	Strongylocentrotus purpuratus-specific protein	none
SPU_007016	SPU_007016	contains 2 IG superfamily motifs	none
SPU_007025	SPU_007025	Strongylocentrotus purpuratus-specific protein	none
SPU_007039	SPU_007039	contains 2 FA58C superfamily motifs	none
SPU_007054	SPU_007054	contains ArgK domain	none
SPU_007076	SPU_007076	contains 3 CCP superfamily motifs	none
SPU_007089	SPU_007089	Strongylocentrotus purpuratus-specific protein	none
SPU_007124	SPU_007124	contains 4 ANK superfamily motifs and Arp domain	none
SPU_007173	SPU_007173	contains DUF2146 domain	none
SPU_007177	SPU_007177	Strongylocentrotus purpuratus-specific protein	none
SPU_007179	SPU_007179	contains Ins134_P3_kin domain	none
SPU_007198	SPU_007198	Strongylocentrotus purpuratus-specific protein	none
SPU_007212	SPU_007212	Strongylocentrotus purpuratus-specific protein	none
SPU_007250	SPU_007250	contains FrhG domain	none
SPU_007256	SPU_007256	contains MDN1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_007266	SPU_007266	Strongylocentrotus purpuratus-specific protein	none
SPU_007269	SPU_007269	Strongylocentrotus purpuratus-specific protein	none
SPU_007303	SPU_007303	contains SpoVK domain	none
SPU_007348	SPU_007348	contains COG5635 domain	none
SPU_007367	SPU_007367	Strongylocentrotus purpuratus-specific protein	none
SPU_007378	SPU_007378	Strongylocentrotus purpuratus-specific protein	none
SPU_007388	SPU_007388	Strongylocentrotus purpuratus-specific protein	none
SPU_007417	SPU_007417	contains PRK11664 domain	none
SPU_007451	SPU_007451	Strongylocentrotus purpuratus-specific protein	none
SPU_007464	SPU_007464	contains 3 TPR superfamily motifs	none
SPU_007496	SPU_007496	probable assembly chimera	none
SPU_007504	SPU_007504	contains 2 ANK superfamily motifs	none
SPU_007534	SPU_007534	contains 2 EGF_CA superfamily motifs	none
SPU_007550	SPU_007550	contains 2 EFh superfamily motifs	none
SPU_007561	SPU_007561	contains COG5635 domain	none
SPU_007570	SPU_007570	contains 2 MFS superfamily motifs	none
SPU_007588	SPU_007588	contains Tweety domain	none
SPU_007608	SPU_007608	Strongylocentrotus purpuratus-specific protein	none
SPU_007627	SPU_007627	Strongylocentrotus purpuratus-specific protein	none
SPU_007635	SPU_007635	contains GalT domain	none
SPU_007643	SPU_007643	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_007648	SPU_007648	contains 2 CUB superfamily motifs	none
SPU_007680	SPU_007680	contains Smc domain	none
SPU_007681	SPU_007681	contains Smc domain	none
SPU_007689	SPU_007689	contains UDPGT domain	none
SPU_007752	SPU_007752	contains 2 ANK superfamily motifs and Arp domain	none
SPU_007754	SPU_007754	contains Smc domain	none
SPU_007786	SPU_007786	Strongylocentrotus purpuratus-specific protein	none
SPU_007817	SPU_007817	contains COG5222 domain	none
SPU_007824	SPU_007824	contains 3 IG superfamily motifs	none
SPU_007826	SPU_007826	Strongylocentrotus purpuratus-specific protein	none
SPU_007871	SPU_007871	contains 3 ANK superfamily motifs and Arp domain	none
SPU_007872	SPU_007872	contains Sulfotransfer_1 domain	none
SPU_007874	SPU_007874	Strongylocentrotus purpuratus-specific protein	none
SPU_007879	SPU_007879	Strongylocentrotus purpuratus-specific protein	none
SPU_007886	SPU_007886	contains MDN1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_007912	SPU_007912	contains COG3391 domain	none
SPU_007926	SPU_007926	contains Sulfotransfer_1 domain	none
SPU_007947	SPU_007947	contains V-set domain	none
SPU_007956	SPU_007956	contains TruA domain	none
SPU_007959	SPU_007959	Strongylocentrotus purpuratus-specific protein	none
SPU_007963	SPU_007963	contains Ion_trans domain	none
SPU_007998	SPU_007998	Strongylocentrotus purpuratus-specific protein	none
SPU_007999	SPU_007999	probable assembly chimera. Strongylocentrotus purpuratus-specific protein.	none
SPU_008002	SPU_008002	ATS1 domain	none
SPU_008010	SPU_008010	contains VacB domain	none
SPU_008023	SPU_008023	Strongylocentrotus purpuratus-specific protein	none
SPU_008026	SPU_008026	contains 2 LIM superfamily motifs	none
SPU_008028	SPU_008028	contains 3 EGF_CA superfamily motifs. probable assembly chimera.	none
SPU_008059	SPU_008059	contains 3 BTB superfamily motifs	none
SPU_008060	SPU_008060	contains 2 BTB superfamily motifs	none
SPU_008061	SPU_008061	contains 2 BTB superfamily motifs	none
SPU_008100	SPU_008100	contains 3 CLECT superfamily motifs	none
SPU_008129	SPU_008129	contains 2 TUDOR superfamily motifs	none
SPU_008145	SPU_008145	contains 2 EGF_CA superfamily motifs	none
SPU_008156	SPU_008156	contains 2 FA58C superfamily motifs	none
SPU_008165	SPU_008165	contains 2 EGF_CA superfamily motifs	none
SPU_008234	SPU_008234	contains 6 HMA superfamily motifs	none
SPU_008257	SPU_008257	contains 3 ANK superfamily motifs and Arp domain	none
SPU_008269	SPU_008269	Strongylocentrotus purpuratus-specific protein	none
SPU_008321	SPU_008321	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_008323	SPU_008323	Strongylocentrotus purpuratus-specific protein	none
SPU_008330	SPU_008330	Strongylocentrotus purpuratus-specific protein	none
SPU_008335	SPU_008335	contains Torsin domain	none
SPU_008343	SPU_008343	contains MFS_1 domain	none
SPU_008345	SPU_008345	contains 2 SAM superfamily motifs	none
SPU_008383	SPU_008383	contains 2 EGF_CA superfamily motifs	none
SPU_008392	SPU_008392	contains Myosin_tail_1 domain	none
SPU_008414	SPU_008414	contains 2 MFS superfamily motifs	none
SPU_008430	SPU_008430	contains 2 FA58C superfamily motifs	none
SPU_008473	SPU_008473	contains PnbA domain	none
SPU_008474	SPU_008474	contains COG4886 domain	none
SPU_008484	SPU_008484	probable assembly chimera	none
SPU_008491	SPU_008491	probable assembly chimera	none
SPU_008508	SPU_008508	contains AIR1 domain	none
SPU_008518	SPU_008518	contains SMC_N domain	none
SPU_008523	SPU_008523	Strongylocentrotus purpuratus-specific protein	none
SPU_008524	SPU_008524	Strongylocentrotus purpuratus-specific protein	none
SPU_008546	SPU_008546	contains Nop14 domain	none
SPU_008569	SPU_008569	contains DYN1 domain	none
SPU_008579	SPU_008579	Strongylocentrotus purpuratus-specific protein	none
SPU_008607	SPU_008607	contains 4 ANK superfamily motifs and Arp domain	none
SPU_008613	SPU_008613	contains 2 EGF_CA superfamily motifs	none
SPU_008630	SPU_008630	contains COG1204 domain	none
SPU_008636	SPU_008636	contains COG4700 domain	none
SPU_008637	SPU_008637	contains 3 LRR_RI superfamily motifs	none
SPU_008639	SPU_008639	contains PRK08315 domain	none
SPU_008671	SPU_008671	probable assembly chimera	none
SPU_008726	SPU_008726	Strongylocentrotus purpuratus-specific protein	none
SPU_008784	SPU_008784	contains SMC_N domain	none
SPU_008788	SPU_008788	contains Sulfotransfer_1 domain	none
SPU_008796	SPU_008796	contains 3 ANK superfamily motifs and Arp domain	none
SPU_008805	SPU_008805	contains MFS_1 domain	none
SPU_008809	SPU_008809	Strongylocentrotus purpuratus-specific protein	none
SPU_008810	SPU_008810	Strongylocentrotus purpuratus-specific protein	none
SPU_008831	SPU_008831	contains 2 BTB superfamily motifs	none
SPU_008837	SPU_008837	probable assembly chimera. Strongylocentrotus purpuratus-specific protein.	none
SPU_008873	SPU_008873	contains PRK06116 domain	none
SPU_008929	SPU_008929	Strongylocentrotus purpuratus-specific protein	none
SPU_008961	SPU_008961	contains SMC_N domain	none
SPU_008972	SPU_008972	contains 2 SPEC superfamily motifs	none
SPU_009001	SPU_009001	contains DadA domain	none
SPU_009033	SPU_009033	contains 2 FA58C superfamily motifs	none
SPU_009058	SPU_009058	Strongylocentrotus purpuratus-specific protein	none
SPU_009077	SPU_009077	contains 2 FA58C superfamily motifs	none
SPU_009094	SPU_009094	contains MFS_1 domain	none
SPU_009110	SPU_009110	contains 2 VWC superfamily motifs	none
SPU_009120	SPU_009120	Strongylocentrotus purpuratus-specific protein	none
SPU_009137	SPU_009137	Strongylocentrotus purpuratus-specific protein	none
SPU_009166	SPU_009166	contains COG3889 domain	none
SPU_009187	SPU_009187	contains SMC_N domain	none
SPU_009207	SPU_009207	Strongylocentrotus purpuratus-specific protein	none
SPU_009211	SPU_009211	Strongylocentrotus purpuratus-specific protein	none
SPU_009244	SPU_009244	contains 8 EGF_CA superfamily motifs	none
SPU_009247	SPU_009247	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_009275	SPU_009275	Strongylocentrotus purpuratus-specific protein	none
SPU_009285	SPU_009285	Strongylocentrotus purpuratus-specific protein	none
SPU_009287	SPU_009287	contains PAT1 domain and Smc domain	none
SPU_009291	SPU_009291	Strongylocentrotus purpuratus-specific protein	none
SPU_009304	SPU_009304	probable assembly chimera	none
SPU_009306	SPU_009306	Strongylocentrotus purpuratus-specific protein	none
SPU_009344	SPU_009344	contains 2 DEXDc superfamily motifs	none
SPU_009363	SPU_009363	Strongylocentrotus purpuratus-specific protein	none
SPU_009364	SPU_009364	contains AmpC domain	none
SPU_009373	SPU_009373	contains COG4886 domain	none
SPU_009379	SPU_009379	contains COG5222 domain	none
SPU_009391	SPU_009391	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_009402	SPU_009402	Strongylocentrotus purpuratus-specific protein	none
SPU_009403	SPU_009403	Strongylocentrotus purpuratus-specific protein	none
SPU_009445	SPU_009445	contains DadA domain	none
SPU_009449	SPU_009449	contains DadA domain	none
SPU_009466	SPU_009466	contains 2 HYR superfamily motifs	none
SPU_009467	SPU_009467	homologous only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_009551	SPU_009551	probable assembly chimera	none
SPU_009563	SPU_009563	contains 4 SRCR superfamily motifs	none
SPU_009570	SPU_009570	contains 3 IG superfamily motifs	none
SPU_009585	SPU_009585	Strongylocentrotus purpuratus-specific protein	none
SPU_009603	SPU_009603	Strongylocentrotus purpuratus-specific protein	none
SPU_009641	SPU_009641	Strongylocentrotus purpuratus-specific protein	none
SPU_009671	SPU_009671	probable assembly chimera	none
SPU_009672	SPU_009672	contains GltB domain	none
SPU_009744	SPU_009744	contains 2 C2 superfamily motifs	none
SPU_009784	SPU_009784	contains COG5222 domain	none
SPU_009787	SPU_009787	contains SMC_N domain	none
SPU_009811	SPU_009811	contains Dynein_heavy domain	none
SPU_009827	SPU_009827	contains 5307 domain	none
SPU_009875	SPU_009875	contains 4 KAZAL_FS superfamily motifs	none
SPU_009920	SPU_009920	Strongylocentrotus purpuratus-specific protein	none
SPU_009926	SPU_009926	Strongylocentrotus purpuratus-specific protein	none
SPU_009935	SPU_009935	contains AST1 domain	none
SPU_009938	SPU_009938	contains 2 FA58C superfamily motifs	none
SPU_009971	SPU_009971	contains COG4942 domain	none
SPU_009972	SPU_009972	contains COG1579 domain	none
SPU_009973	SPU_009973	contains COG1579 domain	none
SPU_009974	SPU_009974	contains SMC_N domain	none
SPU_010015	SPU_010015	Strongylocentrotus purpuratus-specific protein	none
SPU_010027	SPU_010027	Strongylocentrotus purpuratus-specific protein	none
SPU_010034	SPU_010034	contains 2 FA58C superfamily motifs	none
SPU_010047	SPU_010047	Strongylocentrotus purpuratus-specific protein	none
SPU_010061	SPU_010061	Strongylocentrotus purpuratus-specific protein	none
SPU_010088	SPU_010088	contains 2 MFS superfamily motifs	none
SPU_010092	SPU_010092	contains SMC_N domain	none
SPU_010142	SPU_010142	contains 2 MFS superfamily motifs	none
SPU_010169	SPU_010169	probable assembly chimera. Strongylocentrotus purpuratus-specific protein.	none
SPU_010194	SPU_010194	contains COG2433 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_010195	SPU_010195	Strongylocentrotus purpuratus-specific protein	none
SPU_010218	SPU_010218	Strongylocentrotus purpuratus-specific protein	none
SPU_010257	SPU_010257	contains 4 LDLa superfamily motifs and 2 CUB superfamily motifs	none
SPU_010272	SPU_010272	contains PRK09039 domain	none
SPU_010292	SPU_010292	contains COG5540 domain	none
SPU_010355	SPU_010355	Strongylocentrotus purpuratus-specific protein	none
SPU_010356	SPU_010356	Strongylocentrotus purpuratus-specific protein	none
SPU_010357	SPU_010357	Strongylocentrotus purpuratus-specific protein	none
SPU_010388	SPU_010388	Strongylocentrotus purpuratus-specific protein	none
SPU_010400	SPU_010400	contains UvrA domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_010401	SPU_010401	Strongylocentrotus purpuratus-specific protein	none
SPU_010414	SPU_010414	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_010434	SPU_010434	poor sequence data: ~70% of amino acids are X	none
SPU_010465	SPU_010465	contains 3 ANK superfamily motifs and Arp domain	none
SPU_010492	SPU_010492	contains SMC_N domain	none
SPU_010532	SPU_010532	contains 6 EGF_CA fft motifs	none
SPU_010566	SPU_010566	contains 2 PBPb superfamily motifs	none
SPU_010568	SPU_010568	contains 2 PBPb superfamily motifs	none
SPU_010586	SPU_010586	contains PRK00409 domain	none
SPU_010588	SPU_010588	contains SMC_N domain	none
SPU_010589	SPU_010589	matches only to itself	none
SPU_010594	SPU_010594	matches only to itself	none
SPU_010599	SPU_010599	Strongylocentrotus purpuratus-specific protein	none
SPU_010616	SPU_010616	contains FAA1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_010654	SPU_010654	contains RAD18 domain	none
SPU_010665	SPU_010665	contains PAT1 domain	none
SPU_010682	SPU_010682	probable assembly chimera	none
SPU_010709	SPU_010709	contains SMC_N domain	none
SPU_010781	SPU_010781	contains RfaG domain	none
SPU_010791	SPU_010791	PnbA domain	none
SPU_010817	SPU_010817	contains 5 ANK superfamily motifs. homologous to numerous ankyrin repeat proteins in Trichomonas vaginalis.	none
SPU_010818	SPU_010818	contains COG4886 domain	none
SPU_010837	SPU_010837	Strongylocentrotus purpuratus-specific protein	none
SPU_010895	SPU_010895	contains 3 EGF_CA superfamily motifs	none
SPU_010903	SPU_010903	Strongylocentrotus purpuratus-specific protein	none
SPU_010907	SPU_010907	contains 2 MFS superfamily motifs	none
SPU_010932	SPU_010932	Strongylocentrotus purpuratus-specific protein	none
SPU_010983	SPU_010983	contains MPH1 domain	none
SPU_010985	SPU_010985	contains 2 Kelch_1 superfamily motifs	none
SPU_010999	SPU_010999	Strongylocentrotus purpuratus-specific protein	none
SPU_011040	SPU_011040	Strongylocentrotus purpuratus-specific protein	none
SPU_011056	SPU_011056	Strongylocentrotus purpuratus-specific protein	none
SPU_011122	SPU_011122	contains DAO domain	none
SPU_011123	SPU_011123	Strongylocentrotus purpuratus-specific protein	none
SPU_011137	SPU_011137	contains 2 MFS superfamily motifs	none
SPU_011138	SPU_011138	probable assembly chimera	none
SPU_011165	SPU_011165	contains 2 BTB superfamily motifs	none
SPU_011169	SPU_011169	homologous to numerous putative Branchiostoma floridae proteins. Strongylocentrotus purpuratus-specific protein.	none
SPU_011170	SPU_011170	homologous to numerous putative Branchiostoma floridae proteins. Strongylocentrotus purpuratus-specific protein.	none
SPU_011178	SPU_011178	contains STT3 domain	none
SPU_011185	SPU_011185	contains Sulfotransfer_1 domain	none
SPU_011205	SPU_011205	contains 4 PDZ superfamily motifs	none
SPU_011207	SPU_011207	contains 3 ANK superfamily motifs and Arp domain	none
SPU_011236	SPU_011236	contains PnbA domain	none
SPU_011247	SPU_011247	Strongylocentrotus purpuratus-specific protein	none
SPU_011255	SPU_011255	contains Sulfotransfer_1 domain	none
SPU_011263	SPU_011263	contains 3 ANK superfamily motifs and Arp domain	none
SPU_011274	SPU_011274	contains 4 ANK superfamily motifs and Arp domain	none
SPU_011276	SPU_011276	Strongylocentrotus purpuratus-specific protein	none
SPU_011278	SPU_011278	contains Mnd1 domain	none
SPU_011322	SPU_011322	contains MDN1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_011324	SPU_011324	contains Smc domain. homologous only to a dozen putative S. purpuratus proteins. Strongylocentrotus purpuratus-specific protein.	none
SPU_011371	SPU_011371	contains infB domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_011377	SPU_011377	Strongylocentrotus purpuratus-specific protein	none
SPU_011387	SPU_011387	Strongylocentrotus purpuratus-specific protein	none
SPU_011392	SPU_011392	contains 2 PBPb superfamily motifs	none
SPU_011401	SPU_011401	Strongylocentrotus purpuratus-specific protein	none
SPU_011450	SPU_011450	contains 2 ANK superfamily motifs and Arp domain	none
SPU_011465	SPU_011465	contains Smc domain	none
SPU_011482	SPU_011482	Strongylocentrotus purpuratus-specific protein	none
SPU_011480	SPU_011480	Strongylocentrotus purpuratus-specific protein	none
SPU_011553	SPU_011553	Strongylocentrotus purpuratus-specific protein	none
SPU_011568	SPU_011568	contains Borrelia_P83 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_011577	SPU_011577	Adaptin_N domain	none
SPU_011578	SPU_011578	homologous only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_011586	SPU_011586	contains csdA domain	none
SPU_011630	SPU_011630	contains 2 RING superfamily motifs	none
SPU_011643	SPU_011643	contains RecQ domain	none
SPU_011728	SPU_011728	Strongylocentrotus purpuratus-specific protein	none
SPU_011763	SPU_011763	contains Sulfotransfer_1 domain	none
SPU_011795	SPU_011795	contains 2 ANK superfamily motifs and Arp domain	none
SPU_011802	SPU_011802	contains 4 FA58C superfamily motifs	none
SPU_011809	SPU_011809	probable assembly chimera	none
SPU_011810	SPU_011810	probable assembly chimera	none
SPU_011817	SPU_011817	contains 3 ANK superfamily motifs and Arp domain	none
SPU_011818	SPU_011818	contains 2 UBQ superfamily motifs	none
SPU_011820	SPU_011820	contains 2 CCP superfamily motifs	none
SPU_011835	SPU_011835	contains 6 Kelch_1 superfamily motifs	none
SPU_011850	SPU_011850	Strongylocentrotus purpuratus-specific protein	none
SPU_011852	SPU_011852	Strongylocentrotus purpuratus-specific protein	none
SPU_011854	SPU_011854	Strongylocentrotus purpuratus-specific protein	none
SPU_011858	SPU_011858	contains 2 EGF_CA superfamily motifs	none
SPU_011883	SPU_011883	contains 4 CUB superfamily motifs	none
SPU_011888	SPU_011888	contains 2 MAM superfamily motifs	none
SPU_011892	SPU_011892	contains 3 SCP superfamily motifs	none
SPU_011929	SPU_011929	Strongylocentrotus purpuratus-specific protein	none
SPU_011963	SPU_011963	contains SMC_N domain	none
SPU_011974	SPU_011974	Strongylocentrotus purpuratus-specific protein	none
SPU_011992	SPU_011992	contains 2 ANK superfamily motifs and Arp domain	none
SPU_012011	SPU_012011	contains 2 FA58C superfamily motifs	none
SPU_012019	SPU_012019	Strongylocentrotus purpuratus-specific protein	none
SPU_012038	SPU_012038	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_012062	SPU_012062	contains 3 ANK superfamily motifs and Arp domain	none
SPU_012064	SPU_012064	contains 3 EGF_CA superfamily motifs	none
SPU_012117	SPU_012117	Strongylocentrotus purpuratus-specific protein	none
SPU_012201	SPU_012201	contains COG3217 domain	none
SPU_012233	SPU_012233	contains 4 EGF_CA superfamily motifs	none
SPU_012249	SPU_012249	Strongylocentrotus purpuratus-specific protein	none
SPU_012263	SPU_012263	Strongylocentrotus purpuratus-specific protein	none
SPU_012266	SPU_012266	Strongylocentrotus purpuratus-specific protein	none
SPU_012280	SPU_012280	contains TRF4 domain	none
SPU_012305	SPU_012305	contains 3 IG superfamily motifs	none
SPU_012311	SPU_012311	Strongylocentrotus purpuratus-specific protein	none
SPU_012327	SPU_012327	contains Ion_trans domain	none
SPU_012336	SPU_012336	contains PRK11824 domain	none
SPU_012342	SPU_012342	contains Sulfotransfer_1 domain	none
SPU_012343	SPU_012343	contains Sulfotransfer_1 domain	none
SPU_012356	SPU_012356	contains BRO1 domain	none
SPU_012357	SPU_012357	contains COG5273 domain	none
SPU_012379	SPU_012379	contains 4 PLAT superfamily motifs	none
SPU_012391	SPU_012391	Strongylocentrotus purpuratus-specific protein	none
SPU_012418	SPU_012418	contains DYN1 domain and Dynein_heavy domain	none
SPU_012419	SPU_012419	contains DHC_N1 domain	none
SPU_012432	SPU_012432	contains 4 Ldl_recept_b superfamily motifs	none
SPU_012444	SPU_012444	contains RecQ domain	none
SPU_012468	SPU_012468	contains Sulfotransfer_1 domain	none
SPU_012481	SPU_012481	contains Spc7 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_012484	SPU_012484	contains Laminin_EGF domain	none
SPU_012486	SPU_012486	Strongylocentrotus purpuratus-specific protein	none
SPU_012502	SPU_012502	Strongylocentrotus purpuratus-specific protein	none
SPU_012601	SPU_012601	Strongylocentrotus purpuratus-specific protein	none
SPU_012603	SPU_012603	Strongylocentrotus purpuratus-specific protein	none
SPU_012613	SPU_012613	Strongylocentrotus purpuratus-specific protein	none
SPU_012641	SPU_012641	Strongylocentrotus purpuratus-specific protein	none
SPU_012660	SPU_012660	Strongylocentrotus purpuratus-specific protein	none
SPU_012666	SPU_012666	contains COG4886 domain	none
SPU_012691	SPU_012691	contains Smc domain	none
SPU_012697	SPU_012697	contains Smc domain	none
SPU_012726	SPU_012726	contains 2 RHOD superfamily motifs and SseA domain	none
SPU_012727	SPU_012727	contains 2 RHOD superfamily motifs and SseA domain	none
SPU_012752	SPU_012752	contains 2 LDLa superfamily motifs	none
SPU_012781	SPU_012781	Strongylocentrotus purpuratus-specific protein	none
SPU_012795	SPU_012795	contains NptA domain	none
SPU_012812	SPU_012812	contains DHC_N1 domain	none
SPU_012854	SPU_012854	contains PRK12704 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_012857	SPU_012857	contains Mon1 domain	none
SPU_012880	SPU_012880	contains Smc domain and COG3391 domain	none
SPU_012994	SPU_012994	contains Elp3 domain	none
SPU_012996	SPU_012996	contains 2 PH-like superfamily motifs	none
SPU_013043	SPU_013043	probable assembly chimera	none
SPU_013094	SPU_013094	probable assembly chimera	none
SPU_013110	SPU_013110	Strongylocentrotus purpuratus-specific protein	none
SPU_013114	SPU_013114	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_013120	SPU_013120	Strongylocentrotus purpuratus-specific protein	none
SPU_013137	SPU_013137	contains COG4886 domain	none
SPU_013143	SPU_013143	Strongylocentrotus purpuratus-specific protein	none
SPU_013150	SPU_013150	contains Sultotransfer_1 domain	none
SPU_013175	SPU_013175	Strongylocentrotus purpuratus-specific protein	none
SPU_013196	SPU_013196	contains COG5222 domain	none
SPU_013211	SPU_013211	contains 3 ANK superfamily motifs and Arp domain	none
SPU_013229	SPU_013229	contains SMC_N domain	none
SPU_013293	SPU_013293	contains 2 CCP superfamily motifs and 2 HYR superfamily motifs	none
SPU_013300	SPU_013300	contains Smc domain	none
SPU_013327	SPU_013327	contains 4 HYR superfamily motifs	none
SPU_013367	SPU_013367	contains COG5222 domain	none
SPU_013411	SPU_013411	contains SMC_N domain	none
SPU_013417	SPU_013417	contains 2 FA58C superfamily motifs	none
SPU_013437	SPU_013437	contains flhF domain	none
SPU_013483	SPU_013483	homologous only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_013488	SPU_013488	contains Myosin_tail_1 domain	none
SPU_013500	SPU_013500	contains Vps16_N domain	none
SPU_013516	SPU_013516	contains RAD18 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_013545	SPU_013545	contains 3 LDL_recept_b superfamily motifs	none
SPU_013554	SPU_013554	Strongylocentrotus purpuratus-specific protein	none
SPU_013555	SPU_013555	Strongylocentrotus purpuratus-specific protein	none
SPU_013556	SPU_013556	Strongylocentrotus purpuratus-specific protein	none
SPU_013558	SPU_013558	poor sequence data: ~40% of amino acids are X. Strongylocentrotus purpuratus-specific protein.	none
SPU_013559	SPU_013559	contains FAT domain	none
SPU_013588	SPU_013588	Strongylocentrotus purpuratus-specific protein	none
SPU_013594	SPU_013594	Strongylocentrotus purpuratus-specific protein	none
SPU_013600	SPU_013600	Strongylocentrotus purpuratus-specific protein	none
SPU_013652	SPU_013652	Strongylocentrotus purpuratus-specific protein	none
SPU_013665	SPU_013665	contains COG5635 domain	none
SPU_013684	SPU_013684	contains OpuAC domain	none
SPU_013695	SPU_013695	Strongylocentrotus purpuratus-specific protein	none
SPU_013723	SPU_013723	contains 2 CUB superfamily motifs	none
SPU_013753	SPU_013753	probable assembly chimera	none
SPU_013758	SPU_013758	contains 3 CCP superfamily motifs	none
SPU_013777	SPU_013777	Strongylocentrotus purpuratus-specific protein	none
SPU_013778	SPU_013778	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_013780	SPU_013780	Strongylocentrotus purpuratus-specific protein	none
SPU_013783	SPU_013783	Strongylocentrotus purpuratus-specific protein	none
SPU_013790	SPU_013790	Strongylocentrotus purpuratus-specific protein	none
SPU_013791	SPU_013791	Strongylocentrotus purpuratus-specific protein	none
SPU_013801	SPU_013801	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_013865	SPU_013865	Strongylocentrotus purpuratus-specific protein	none
SPU_013870	SPU_013870	probable assembly chimera	none
SPU_013877	SPU_013877	probable assembly chimera	none
SPU_013881	SPU_013881	Strongylocentrotus purpuratus-specific protein	none
SPU_013899	SPU_013899	contains PRK13042 domain	none
SPU_013956	SPU_013956	contains 5 Filamin superfamily motifs	none
SPU_013968	SPU_013968	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_013970	SPU_013970	Strongylocentrotus purpuratus-specific protein	none
SPU_013977	SPU_013977	Strongylocentrotus purpuratus-specific protein	none
SPU_013981	SPU_013981	contains COG5635 domain	none
SPU_013983	SPU_013983	contains 3 ANK superfamily motifs	none
SPU_013986	SPU_013986	contains Sultotransfer_1 domain	none
SPU_014015	SPU_014015	contains OmpH domain	none
SPU_014022	SPU_014022	contains 3 ANK superfamily motifs	none
SPU_014030	SPU_014030	contains 2 WSC superfamily motifs	none
SPU_014050	SPU_014050	Strongylocentrotus purpuratus-specific protein	none
SPU_014070	SPU_014070	Strongylocentrotus purpuratus-specific protein	none
SPU_014124	SPU_014124	homologous to numerous putative proteins in Branchiostoma floridae and Nematostella vectensis	none
SPU_014147	SPU_014147	Strongylocentrotus purpuratus-specific protein	none
SPU_014149	SPU_014149	probable assembly chimera	none
SPU_014150	SPU_014150	contains 2 WSC superfamily motifs	none
SPU_014155	SPU_014155	Strongylocentrotus purpuratus-specific protein	none
SPU_014183	SPU_014183	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_014192	SPU_014192	contains 2 FA58C superfamily motifs	none
SPU_014225	SPU_014225	Strongylocentrotus purpuratus-specific protein	none
SPU_014302	SPU_014302	contains Sulfotransfer_1 domain	none
SPU_014304	SPU_014304	contains Sulfotransfer_1 domain	none
SPU_014327	SPU_014327	Strongylocentrotus purpuratus-specific protein	none
SPU_014340	SPU_014340	contains 2 CCP superfamily motifs	none
SPU_014349	SPU_014349	Strongylocentrotus purpuratus-specific protein	none
SPU_014378	SPU_014378	homolgous to bacterial and plant proteins	none
SPU_014379	SPU_014379	Strongylocentrotus purpuratus-specific protein	none
SPU_014388	SPU_014388	contains 2 BTB superfamily motifs	none
SPU_014422	SPU_014422	contains 9 EGF_CA superfamily motifs	none
SPU_014443	SPU_014443	poor amino acid sequence (~40 % of amino acids are X)	none
SPU_014449	SPU_014449	contains Aes domain	none
SPU_014453	SPU_014453	probable assembly chimera	none
SPU_014459	SPU_014459	contains ATS1 domain	none
SPU_014464	SPU_014464	contains 2 TSP_1 superfamily motifs	none
SPU_014465	SPU_014465	contains 2 TSP_1 superfamily motifs	none
SPU_014499	SPU_014499	Strongylocentrotus purpuratus-specific protein	none
SPU_014500	SPU_014500	Strongylocentrotus purpuratus-specific protein	none
SPU_014524	SPU_014524	Strongylocentrotus purpuratus-specific protein	none
SPU_014556	SPU_014556	Strongylocentrotus purpuratus-specific protein	none
SPU_014581	SPU_014581	Strongylocentrotus purpuratus-specific protein	none
SPU_014592	SPU_014592	contains O-FucT domain	none
SPU_014616	SPU_014616	Strongylocentrotus purpuratus-specific protein	none
SPU_014635	SPU_014635	contains EURL domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_014640	SPU_014640	contains 2 NHL superfamily motifs	none
SPU_014651	SPU_014651	contains MDN1 domain	none
SPU_014682	SPU_014682	contains flgG domain	none
SPU_014699	SPU_014699	contains 4 ANK superfamily motifs	none
SPU_014794	SPU_014794	contains PRK12496 domain and COG1439 domain	none
SPU_014820	SPU_014820	Strongylocentrotus purpuratus-specific protein	none
SPU_014838	SPU_014838	Strongylocentrotus purpuratus-specific protein	none
SPU_014847	SPU_014847	contains 2 Ldl_recept_b superfamily motifs	none
SPU_014904	SPU_014904	contains 2 CCP superfamily motifs	none
SPU_014907	SPU_014907	homolgous to bacterial and plant proteins	none
SPU_014912	SPU_014912	contains PnbA domain	none
SPU_014920	SPU_014920	contains 2 Mito_carr superfamily motifs and PTZ00169 domain	none
SPU_014964	SPU_014964	contains PAT1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_014973	SPU_014973	probable assembly chimera	none
SPU_014983	SPU_014983	contains 6 LDLa superfamily motifs	none
SPU_015000	SPU_015000	contains S-methyl_trans domain	none
SPU_015001	SPU_015001	contains 3 IG superfamily motifs and V-set domain	none
SPU_015045	SPU_015045	contains RAD18 domain	none
SPU_015057	SPU_015057	probable assembly chimera	none
SPU_015070	SPU_015070	contains EGF_Lam domain	none
SPU_015130	SPU_015130	Strongylocentrotus purpuratus-specific protein	none
SPU_015131	SPU_015131	Strongylocentrotus purpuratus-specific protein	none
SPU_015138	SPU_015138	contains SMC_prok_B domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_015141	SPU_015141	contains MPH1 domain	none
SPU_015148	SPU_015148	Strongylocentrotus purpuratus-specific protein	none
SPU_015151	SPU_015151	contains 2 homeodomain superfamily motifs	none
SPU_015189	SPU_015189	contains 3 IG superfamily motifs	none
SPU_015204	SPU_015204	contains PRK05648 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_015220	SPU_015220	contains 2 BTB superfamily motifs	none
SPU_015225	SPU_015225	contains COG3264 domain	none
SPU_015239	SPU_015239	contains COG3391 domain	none
SPU_015268	SPU_015268	contains 2 MFS superfamily motifs	none
SPU_015275	SPU_015275	contains FRQ1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_015306	SPU_015306	contains DUF803 domain	none
SPU_015307	SPU_015307	contains CytochromB561_N domain	none
SPU_015366	SPU_015366	Strongylocentrotus purpuratus-specific protein	none
SPU_015367	SPU_015367	Strongylocentrotus purpuratus-specific protein	none
SPU_015393	SPU_015393	Strongylocentrotus purpuratus-specific protein	none
SPU_015400	SPU_015400	contains 3 ANK superfamily motifs and Arp domain	none
SPU_015410	SPU_015410	contains 2 CUB superfamily motifs	none
SPU_015413	SPU_015413	contains 2 EGF_CA superfamily motifs	none
SPU_015427	SPU_015427	contains 4 ANK superfamily motifs and Arp domain	none
SPU_015433	SPU_015433	contains TraB_pillus domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_015457	SPU_015457	Strongylocentrotus purpuratus-specific protein	none
SPU_015458	SPU_015458	contains 2A0113 domain	none
SPU_015493	SPU_015493	contains 2 EGF_A superfamily motifs	none
SPU_015496	SPU_015496	homologous to numerous putative proteins in Branchiostoma floridae	none
SPU_015501	SPU_015501	contains 5 ANK superfamily motifs and Arp domain	none
SPU_015523	SPU_015523	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_015537	SPU_015537	probable assembly chimera	none
SPU_015542	SPU_015542	contains RAB domain	none
SPU_015552	SPU_015552	probable assembly chimera	none
SPU_015556	SPU_015556	contains Adaptin_N domain	none
SPU_015585	SPU_015585	contains 4 IG superfamily motifs and V-set domain	none
SPU_015596	SPU_015596	contains 5 KAZAL_FS superfamily motifs	none
SPU_015597	SPU_015597	Strongylocentrotus purpuratus-specific protein	none
SPU_015636	SPU_015636	Strongylocentrotus purpuratus-specific protein	none
SPU_015649	SPU_015649	Strongylocentrotus purpuratus-specific protein	none
SPU_015656	SPU_015656	contains RhaT domain	none
SPU_015684	SPU_015684	probable assembly chimera	none
SPU_015693	SPU_015693	probable assembly chimera	none
SPU_015711	SPU_015711	contains 2A0113 domain	none
SPU_015764	SPU_015764	contains 2 ANK superfamily motifs and Arp domain	none
SPU_015773	SPU_015773	ArgS domain	none
SPU_015784	SPU_015784	probable assembly chimera	none
SPU_015785	SPU_015785	contains SrmB domain	none
SPU_015787	SPU_015787	Strongylocentrotus purpuratus-specific protein	none
SPU_015796	SPU_015796	contains RAD18 domain. homologous to numerous putative Branchiostoma floridae proteins.	none
SPU_015864	SPU_015864	probable assembly chimera	none
SPU_015874	SPU_015874	contains RVT-1 domain	none
SPU_015897	SPU_015897	contains 2 IG superfamily motifs and V-set domain	none
SPU_015903	SPU_015903	contains 3 IG superfamily motifs	none
SPU_015910	SPU_015910	probable assembly chimera	none
SPU_024827	SPU_024827	contains 2 Epimerase domains	none
SPU_015912	SPU_015912	contains Epimerase domain	none
SPU_015933	SPU_015933	probable assembly chimera	none
SPU_015944	SPU_015944	probable assembly chimera	none
SPU_015985	SPU_015985	contains LcrH_SycD domain	none
SPU_016000	SPU_016000	contains AIR1 domain	none
SPU_016002	SPU_016002	contains ABC_tran domain	none
SPU_016022	SPU_016022	Strongylocentrotus purpuratus-specific protein. probable assembly chimera.	none
SPU_016040	SPU_016040	contains 4 PDZ superfamily motifs	none
SPU_016051	SPU_016051	contains COG4886 domain	none
SPU_016064	SPU_016064	contains SMC_prok_B domain	none
SPU_016071	SPU_016071	contains COG5635 domain. homologous to numerous putative Branchiostoma floridae proteins.	none
SPU_016106	SPU_016106	contains 3 ANK superfamily motifs	none
SPU_016112	SPU_016112	contains UDPGT domain	none
SPU_016127	SPU_016127	contains LIC domain	none
SPU_016155	SPU_016155	contains sulP domain	none
SPU_016160	SPU_016160	contains 4 IG superfamily motifs	none
SPU_016180	SPU_016180	contains 2 Ldl_recept_b superfamily motifs	none
SPU_016184	SPU_016184	contains MFS_1 domain	none
SPU_016185	SPU_016185	contains 2 MFS superfamily motifs and 2A0119 domain	none
SPU_016192	SPU_016192	contains 2A0113 domain	none
SPU_016197	SPU_016197	contains 2A0113 domain	none
SPU_016202	SPU_016202	Strongylocentrotus purpuratus-specific protein. probable assembly chimera.	none
SPU_016235	SPU_016235	contains EzrA domain	none
SPU_016242	SPU_016242	contains 2 PSI superfamily motifs	none
SPU_016245	SPU_016245	contains Dcp domain	none
SPU_016246	SPU_016246	Strongylocentrotus purpuratus-specific protein	none
SPU_016248	SPU_016248	contains SMC_prok_B domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_016290	SPU_016290	Strongylocentrotus purpuratus-specific protein	none
SPU_016336	SPU_016336	contains PtrB domain	none
SPU_016348	SPU_016348	contains 2 BTB superfamily motifs	none
SPU_016353	SPU_016353	Strongylocentrotus purpuratus-specific protein	none
SPU_016355	SPU_016355	contains COG5141 domain	none
SPU_016405	SPU_016405	contains 2 CUB superfamily motifs	none
SPU_016416	SPU_016416	Strongylocentrotus purpuratus-specific protein	none
SPU_016440	SPU_016440	contains 2 EGF_CA superfamily motifs	none
SPU_016454	SPU_016454	contains 2 ANK superfamily motifs	none
SPU_016474	SPU_016474	contains SMC_prok_A domain	none
SPU_016516	SPU_016516	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_016535	SPU_016535	contains 2 FA58C superfamily motifs	none
SPU_016541	SPU_016541	Strongylocentrotus purpuratus-specific protein	none
SPU_016543	SPU_016543	contains 2A0119 domain	none
SPU_016557	SPU_016557	probable assembly chimera	none
SPU_016566	SPU_016566	contains 2A0113 domain	none
SPU_016690	SPU_016690	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_016692	SPU_016692	Strongylocentrotus purpuratus-specific protein	none
SPU_016704	SPU_016704	Strongylocentrotus purpuratus-specific protein	none
SPU_016710	SPU_016710	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_016719	SPU_016719	Strongylocentrotus purpuratus-specific protein	none
SPU_016721	SPU_016721	contains 2 MFS superfamily motifs	none
SPU_016724	SPU_016724	contains 2A0113 domain	none
SPU_016737	SPU_016737	Strongylocentrotus purpuratus-specific protein	none
SPU_016750	SPU_016750	contains 3 ANK superfamily motifs and Arp domain	none
SPU_016751	SPU_016751	Strongylocentrotus purpuratus-specific protein	none
SPU_016756	SPU_016756	contains PTZ00322 domain	none
SPU_016765	SPU_016765	Strongylocentrotus purpuratus-specific protein	none
SPU_016776	SPU_016776	Strongylocentrotus purpuratus-specific protein	none
SPU_016778	SPU_016778	contains 3 ANK superfamily motifs	none
SPU_016781	SPU_016781	contains S_TKc domain	none
SPU_016782	SPU_016782	contains PcnB domain	none
SPU_016797	SPU_016797	contains TIGR00376 domain	none
SPU_016855	SPU_016855	Strongylocentrotus purpuratus-specific protein	none
SPU_016871	SPU_016871	contains SMC_prok_A domain	none
SPU_016877	SPU_016877	contains 2 WW superfamily motifs	none
SPU_016889	SPU_016889	contains 3 IG superfamily motifs	none
SPU_016891	SPU_016891	Strongylocentrotus purpuratus-specific protein	none
SPU_016893	SPU_016893	contains DUF607 domain	none
SPU_016911	SPU_016911	contains 2A0113 domain	none
SPU_016915	SPU_016915	Strongylocentrotus purpuratus-specific protein	none
SPU_016923	SPU_016923	Strongylocentrotus purpuratus-specific protein. matches only to itself. probable assembly chimera.	none
SPU_016931	SPU_016931	contains COG5540 domain	none
SPU_016960	SPU_016960	contains SMC_prok_B domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_016966	SPU_016966	Strongylocentrotus purpuratus-specific protein	none
SPU_016979	SPU_016979	Strongylocentrotus purpuratus-specific protein	none
SPU_017001	SPU_017001	contains PRK00409 domain	none
SPU_017012	SPU_017012	Strongylocentrotus purpuratus-specific protein	none
SPU_017023	SPU_017023	contains 2 IG superfamily motifs	none
SPU_017027	SPU_017027	contains Gag_spuma domain	none
SPU_017057	SPU_017057	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_017060	SPU_017060	Strongylocentrotus purpuratus-specific protein	none
SPU_017080	SPU_017080	contains SMC_prok_A domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_017100	SPU_017100	contains PEP_TPR_lipo domain	none
SPU_017101	SPU_017101	contains 4 ANK superfamily motifs and Arp domain	none
SPU_017105	SPU_017105	contains 2 Na_Pi_cotrans superfamily motifs and 2a58 domain	none
SPU_017151	SPU_017151	contains 2A1904 domain	none
SPU_017182	SPU_017182	Strongylocentrotus purpuratus-specific protein. probable assembly chimera.	none
SPU_017190	SPU_017190	contains SMC_prok_B domain	none
SPU_017191	SPU_017191	probable assembly chimera	none
SPU_017199	SPU_017199	homologous to numerous putative zebrafish proteins	none
SPU_017203	SPU_017203	Strongylocentrotus purpuratus-specific protein	none
SPU_017248	SPU_017248	contains PRK05771 domain	none
SPU_017270	SPU_017270	contains SMC_prok_B domain	none
SPU_017290	SPU_017290	contains Sulfotransfer_1 domain	none
SPU_017301	SPU_017301	contains COG4886 domain	none
SPU_017317	SPU_017317	contains 2 MFS superfamily motifs	none
SPU_017324	SPU_017324	contains 2 CCP superfamily motifs	none
SPU_017337	SPU_017337	contains 2A0113 domain	none
SPU_017362	SPU_017362	contains 2 CUB superfamily motifs	none
SPU_017397	SPU_017397	contains 2 RF-1 superfamily motifs	none
SPU_017409	SPU_017409	contains 2 MFS superfamily motifs	none
SPU_017454	SPU_017454	contains PRK10263 domain	none
SPU_017458	SPU_017458	contains NmrA domain	none
SPU_017459	SPU_017459	contains ERG8 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_017464	SPU_017464	contains NmrA domain	none
SPU_017486	SPU_017486	probable assembly chimera	none
SPU_017489	SPU_017489	probable assembly chimera	none
SPU_017494	SPU_017494	contains SMC_prok_B domain	none
SPU_017516	SPU_017516	contains ABC_tran domain	none
SPU_017552	SPU_017552	contains SMC_prok_B domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_017570	SPU_017570	contains PRK05035 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_017587	SPU_017587	Strongylocentrotus purpuratus-specific protein	none
SPU_017598	SPU_017598	contains UDPGT domain	none
SPU_017625	SPU_017625	contains 2A1904 domain	none
SPU_017631	SPU_017631	Strongylocentrotus purpuratus-specific protein	none
SPU_017691	SPU_017691	contains Fucokinase domain	none
SPU_017697	SPU_017697	probable assembly chimera	none
SPU_017699	SPU_017699	contains SMC_N domain	none
SPU_017743	SPU_017743	probable assembly chimera	none
SPU_017744	SPU_017744	contains 2A0113 domain	none
SPU_017770	SPU_017770	contains AcAcCoA_reduct domain. poor sequence data: ~25% of amino acids are X.	none
SPU_017824	SPU_017824	contains 3 ANK superfamily motifs and Arp domain	none
SPU_017835	SPU_017835	poor sequence data: ~45% of amino acids are X.	none
SPU_017909	SPU_017909	contains NagC domain	none
SPU_017941	SPU_017941	Strongylocentrotus purpuratus-specific protein	none
SPU_017967	SPU_017967	contains MFS_1 domain	none
SPU_017999	SPU_017999	contains PRK02287 domain	none
SPU_018008	SPU_018008	contains COG5152 domain	none
SPU_018032	SPU_018032	Strongylocentrotus purpuratus-specific protein	none
SPU_018033	SPU_018033	Strongylocentrotus purpuratus-specific protein	none
SPU_018037	SPU_018037	Strongylocentrotus purpuratus-specific protein	none
SPU_018054	SPU_018054	contains 2 Calx-beta superfamily motifs	none
SPU_018086	SPU_018086	contains IF-2B domain	none
SPU_018104	SPU_018104	contains 3 ANK superfamily motifs and Arp domain	none
SPU_018105	SPU_018105	contains 4 ANK superfamily motifs and Arp domain	none
SPU_018108	SPU_018108	probable assembly chimera	none
SPU_018122	SPU_018122	probable assembly chimera	none
SPU_018124	SPU_018124	probable assembly chimera	none
SPU_018145	SPU_018145	contains PRK03992 domain	none
SPU_018160	SPU_018160	contains 6 CCP superfamily motifs	none
SPU_018165	SPU_018165	Strongylocentrotus purpuratus-specific protein	none
SPU_018169	SPU_018169	contains PRK05431 domain. probable assembly chimera.	none
SPU_018176	SPU_018176	contains SMC_prok_B domain	none
SPU_018178	SPU_018178	Strongylocentrotus purpuratus-specific protein	none
SPU_018186	SPU_018186	contains COG3217 domain	none
SPU_018224	SPU_018224	contains DadA domain	none
SPU_018233	SPU_018233	contains 2 TPR superfamily motifs	none
SPU_018235	SPU_018235	contains 2 CUB superfamily motifs	none
SPU_018241	SPU_018241	contains 2 MFS superfamily motifs	none
SPU_018260	SPU_018260	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_018263	SPU_018263	Strongylocentrotus purpuratus-specific protein	none
SPU_018284	SPU_018284	Strongylocentrotus purpuratus-specific protein	none
SPU_018308	SPU_018308	contains mTERF domain	none
SPU_018327	SPU_018327	contains RhaT domain	none
SPU_018337	SPU_018337	contains SMC_prok_B domain	none
SPU_018350	SPU_018350	contains COG5028 domain	none
SPU_018365	SPU_018365	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_018383	SPU_018383	contains LdhA domain	none
SPU_018386	SPU_018386	contains mutS2 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_018389	SPU_018389	Strongylocentrotus purpuratus-specific protein	none
SPU_018402	SPU_018402	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_018405	SPU_018405	Strongylocentrotus purpuratus-specific protein	none
SPU_018422	SPU_018422	probable assembly chimera	none
SPU_018459	SPU_018459	contains rne domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_018463	SPU_018463	contains COG4886 domain	none
SPU_018486	SPU_018486	contains Sulfotransfer_1 domain	none
SPU_018477	SPU_018477	contains 4 EGF_CA superfamily motifs	none
SPU_018487	SPU_018487	contains Sulfotransfer_1 domain	none
SPU_018494	SPU_018494	contains SMC_prok_B domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_018561	SPU_018561	Strongylocentrotus purpuratus-specific protein	none
SPU_018562	SPU_018562	Strongylocentrotus purpuratus-specific protein	none
SPU_018593	SPU_018593	contains sbcB domain	none
SPU_018622	SPU_018622	probable assembly chimera	none
SPU_018628	SPU_018628	Strongylocentrotus purpuratus-specific protein	none
SPU_018646	SPU_018646	contains PABP-1234 domain	none
SPU_018698	SPU_018698	contains 3 EGF_CA superfamily motifs	none
SPU_018706	SPU_018706	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_018707	SPU_018707	Strongylocentrotus purpuratus-specific protein. probable assembly chimera.	none
SPU_018714	SPU_018714	Strongylocentrotus purpuratus-specific protein	none
SPU_018734	SPU_018734	Strongylocentrotus purpuratus-specific protein. probable assembly chimera.	none
SPU_018756	SPU_018756	Strongylocentrotus purpuratus-specific protein	none
SPU_018762	SPU_018762	contains 2 MFS superfamily motifs and 2A0115 domain	none
SPU_018785	SPU_018785	contains RhaT domain	none
SPU_018788	SPU_018788	contains COG1565 domain	none
SPU_018798	SPU_018798	contains 2 EGF_CA superfamily motifs	none
SPU_018808	SPU_018808	contains SpoIIIAA domain	none
SPU_018827	SPU_018827	contains 2 LamG superfamily motifs	none
SPU_006534	SPU_006534	contains 7 HYR superfamily motifs. no good homologs are found in other species. member of highly S. purpuratus-specific protein group.	none
SPU_018863	SPU_018863	contains 4 HYR superfamily motifs	none
SPU_018874	SPU_018874	contains Sulfotransfer_1 domain	none
SPU_018876	SPU_018876	contains Sulfotransfer_1 domain	none
SPU_018919	SPU_018919	contains 2A1904 domain	none
SPU_018937	SPU_018937	Strongylocentrotus purpuratus-specific protein	none
SPU_018970	SPU_018970	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_018986	SPU_018986	contains PRK11281 domain	none
SPU_018989	SPU_018989	contains SMC_prok_B domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_019000	SPU_019000	contains DYN1 domain	none
SPU_019013	SPU_019013	Strongylocentrotus purpuratus-specific protein	none
SPU_019047	SPU_019047	contains soxA_mon domain	none
SPU_019048	SPU_019048	contains SMC_prok_B domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_019057	SPU_019057	contains NmrA domain	none
SPU_019069	SPU_019069	poor sequence data: ~65% of amino acids are X.	none
SPU_019080	SPU_019080	contains Ins145_P3_rec domain and MIR domain	none
SPU_019110	SPU_019110	probable assembly chimera	none
SPU_019121	SPU_019121	contains 4 EGF_CA superfamily motifs	none
SPU_019167	SPU_019167	Glyco_hydro_59 domain	none
SPU_019168	SPU_019168	contains CLH domain	none
SPU_019183	SPU_019183	contains 2A1904 domain	none
SPU_019193	SPU_019193	contains 2 MAM superfamily motifs	none
SPU_019217	SPU_019217	Strongylocentrotus purpuratus-specific protein	none
SPU_019236	SPU_019236	matches only to S. purpuratus proteins. Strongylocentrotus purpuratus-specific protein.	none
SPU_019254	SPU_019254	contains 5 S1-like superfamily motifs	none
SPU_019270	SPU_019270	contains trp domain	none
SPU_019279	SPU_019279	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_019289	SPU_019289	contains SMC_prok_B domain	none
SPU_019319	SPU_019319	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_019320	SPU_019320	contains SrmB domain	none
SPU_019335	SPU_019335	contains 2 MAM superfamily motifs	none
SPU_019354	SPU_019354	contains 5 LDLa superfamily motifs and 2 CUB superfamily motifs	none
SPU_019377	SPU_019377	contains PRK05771 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_019395	SPU_019395	Strongylocentrotus purpuratus-specific protein	none
SPU_019410	SPU_019410	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_019411	SPU_019411	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_019448	SPU_019448	contains Pyr_redox_2 domain. poor protein sequence data: ~40% of amino acids at C-terminus are X.	none
SPU_019452	SPU_019452	contains 2 IG superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_019455	SPU_019455	probable assembly chimera	none
SPU_019463	SPU_019463	contains MDN1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_019513	SPU_019513	contains 2A1904 domain	none
SPU_019531	SPU_019531	contains SMC_N domain	none
SPU_019573	SPU_019573	Strongylocentrotus purpuratus-specific protein	none
SPU_019600	SPU_019600	contains 2 A2M_N domain motifs	none
SPU_019608	SPU_019608	contains Man-6-P_recept domain	none
SPU_019635	SPU_019635	Strongylocentrotus purpuratus-specific protein	none
SPU_019662	SPU_019662	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_019670	SPU_019670	contains 6 EGF_CA superfamily motifs	none
SPU_019687	SPU_019687	contains COG4886 domain and PCC domain	none
SPU_019691	SPU_019691	contains COG2268 domain	none
SPU_019697	SPU_019697	contains 2 HYR superfamily motifs	none
SPU_019741	SPU_019741	contains 2 IG superfamily motifs	none
SPU_019749	SPU_019749	contains 3 Kelch_1 superfamily motifs	none
SPU_019750	SPU_019750	contains c_cpa1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_019753	SPU_019753	Strongylocentrotus purpuratus-specific protein	none
SPU_019760	SPU_019760	contains COG4886 domain	none
SPU_019776	SPU_019776	contains CUS1 domain	none
SPU_019787	SPU_019787	contains Ion_trans domain	none
SPU_019796	SPU_019796	Strongylocentrotus purpuratus-specific protein	none
SPU_019797	SPU_019797	Strongylocentrotus purpuratus-specific protein	none
SPU_019800	SPU_019800	Strongylocentrotus purpuratus-specific protein	none
SPU_019806	SPU_019806	Strongylocentrotus purpuratus-specific protein	none
SPU_019812	SPU_019812	contains Dynein_heavy domain	none
SPU_019841	SPU_019841	Strongylocentrotus purpuratus-specific protein	none
SPU_019858	SPU_019858	contains PAT1 domain	none
SPU_019877	SPU_019877	probable assembly chimera	none
SPU_019901	SPU_019901	contains SMC_prok_A domain	none
SPU_019924	SPU_019924	contains 3 ANK domain motifs. homologous to numerous putative Trichomonas vaginalis proteins.	none
SPU_019927	SPU_019927	contains MIP-T3 domain	none
SPU_019929	SPU_019929	contains 3 Thioredoxin-like superfamily motifs	none
SPU_019955	SPU_019955	contains 2 EFh superfamily motifs	none
SPU_019964	SPU_019964	contains 2A0113 domain	none
SPU_019965	SPU_019965	contains 2A0113 domain	none
SPU_019966	SPU_019966	contains 2A0113 domain	none
SPU_020043	SPU_020043	contains 2 IG superfamily motifs	none
SPU_020093	SPU_020093	Strongylocentrotus purpuratus-specific protein	none
SPU_020094	SPU_020094	contains Sulfotransfer_1 domain	none
SPU_020101	SPU_020101	Strongylocentrotus purpuratus-specific protein	none
SPU_020106	SPU_020106	Strongylocentrotus purpuratus-specific protein	none
SPU_020109	SPU_020109	poor amino acid sequence (~40 % of amino acids are X)	none
SPU_020114	SPU_020114	Strongylocentrotus purpuratus-specific protein	none
SPU_020137	SPU_020137	Strongylocentrotus purpuratus-specific protein	none
SPU_020178	SPU_020178	contains 4 ANK superfamily motifs and Arp domains and PTZ00322 domains and trp domains	none
SPU_020187	SPU_020187	Strongylocentrotus purpuratus-specific protein	none
SPU_020193	SPU_020193	Strongylocentrotus purpuratus-specific protein	none
SPU_020203	SPU_020203	contains 6 EGF_CA superfamily motifs	none
SPU_020208	SPU_020208	Strongylocentrotus purpuratus-specific protein	none
SPU_020222	SPU_020222	contains BET4 domain	none
SPU_020230	SPU_020230	contains TIGR00376 domain. probable assembly chimera.	none
SPU_020241	SPU_020241	Strongylocentrotus purpuratus-specific protein	none
SPU_020298	SPU_020298	Strongylocentrotus purpuratus-specific protein	none
SPU_020301	SPU_020301	contains PRK07768 domain	none
SPU_020303	SPU_020303	contains Sugar_transport domain	none
SPU_020319	SPU_020319	contains 2 MFS superfamily motifs	none
SPU_020323	SPU_020323	contains 2 MFS superfamily motifs and 2A0114 domain	none
SPU_020353	SPU_020353	contains PRK03918 domain	none
SPU_020354	SPU_020354	contains 2 TSP1 superfamily motifs	none
SPU_020355	SPU_020355	Strongylocentrotus purpuratus-specific protein	none
SPU_020367	SPU_020367	probable assembly chimera	none
SPU_020389	SPU_020389	contains Dynein_heavy domain	none
SPU_020420	SPU_020420	contains 3 CUB superfamily motifs	none
SPU_020500	SPU_020500	contains HSP70 domain. probable assembly chimera.	none
SPU_020515	SPU_020515	contains Sulfotransfer_1 domain	none
SPU_020516	SPU_020516	contains LdhA domain	none
SPU_020520	SPU_020520	contains 2A0113 domain	none
SPU_020538	SPU_020538	contains 4 CCP superfamily motifs	none
SPU_020548	SPU_020548	contains COG5236 domain. probable assembly chimera.	none
SPU_020559	SPU_020559	contains RVT_1 domain	none
SPU_020562	SPU_020562	contains NarG domain	none
SPU_020579	SPU_020579	contains CynX domain	none
SPU_020582	SPU_020582	Strongylocentrotus purpuratus-specific protein	none
SPU_020613	SPU_020613	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_020630	SPU_020630	contains 3 ANK superfamily motifs	none
SPU_020655	SPU_020655	Strongylocentrotus purpuratus-specific protein	none
SPU_020681	SPU_020681	contains 2 IG superfamily motifs	none
SPU_020705	SPU_020705	contains 2 ANK superfamily motifs and Arp domains	none
SPU_020711	SPU_020711	contains hCaCC domain. probable assembly chimera.	none
SPU_020713	SPU_020713	contains SMC_prok_B domain	none
SPU_020715	SPU_020715	contains SMC_prok_B domain	none
SPU_020725	SPU_020725	contains Sulfotransfer_1 domain	none
SPU_020791	SPU_020791	contains COG4886 domain	none
SPU_020796	SPU_020796	probable assembly chimera	none
SPU_020811	SPU_020811	contains 3 ANK superfamily motifs and Arp domains	none
SPU_020823	SPU_020823	contains 4 ANK superfamily motifs and Arp domains	none
SPU_020826	SPU_020826	contains Dynein_heavy domain	none
SPU_020830	SPU_020830	probable assembly chimera	none
SPU_020857	SPU_020857	contains 3 ANK superfamily motifs and Arp domains	none
SPU_020861	SPU_020861	contains Sulfotransfer_1 domain	none
SPU_020890	SPU_020890	contains RAD18 domain	none
SPU_020891	SPU_020891	contains Smc domain	none
SPU_020977	SPU_020977	Strongylocentrotus purpuratus-specific protein	none
SPU_021017	SPU_021017	probable assembly chimera	none
SPU_021031	SPU_021031	contains 2 IG superfamily motifs	none
SPU_021033	SPU_021033	contains PnbA domain	none
SPU_021048	SPU_021048	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_021049	SPU_021049	Strongylocentrotus purpuratus-specific protein	none
SPU_021064	SPU_021064	contains RhaT domain	none
SPU_021091	SPU_021091	contains 2 HYR superfamily motifs	none
SPU_021100	SPU_021100	contains 2 ANK superfamily motifs and PTZ00322 domain	none
SPU_021108	SPU_021108	Strongylocentrotus purpuratus-specific protein	none
SPU_021150	SPU_021150	probable assembly chimera	none
SPU_021189	SPU_021189	contains WecE domain	none
SPU_021323	SPU_021323	contains PRK03427 domain and PRK08853 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_021334	SPU_021334	poor protein sequence data: ~70% of the amino acids are X	none
SPU_021339	SPU_021339	Strongylocentrotus purpuratus-specific protein	none
SPU_021346	SPU_021346	contains 8 EGF_CA superfamily motifs	none
SPU_021361	SPU_021361	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_021365	SPU_021365	contains infB domain	none
SPU_021376	SPU_021376	contains RimI domain	none
SPU_021386	SPU_021386	Strongylocentrotus purpuratus-specific protein	none
SPU_021396	SPU_021396	Strongylocentrotus purpuratus-specific protein	none
SPU_021400	SPU_021400	contains RVT_1 domain	none
SPU_021401	SPU_021401	contains 6 Kelch_1 superfamily motifs	none
SPU_021434	SPU_021434	contains COG5222 domain	none
SPU_021451	SPU_021451	Strongylocentrotus purpuratus-specific protein	none
SPU_021482	SPU_021482	contains YidC domain	none
SPU_021485	SPU_021485	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_021489	SPU_021489	Strongylocentrotus purpuratus-specific protein. probable assembly chimera.	none
SPU_021511	SPU_021511	contains 3 EFh superfamily motifs and 2 PTZ00184 domain motifs	none
SPU_021513	SPU_021513	contains 3 EGF_CA superfamily motifs	none
SPU_021518	SPU_021518	Strongylocentrotus purpuratus-specific protein. probable assembly chimera.	none
SPU_021542	SPU_021542	probable assembly chimera	none
SPU_021551	SPU_021551	Strongylocentrotus purpuratus-specific protein	none
SPU_021570	SPU_021570	contains 2 Kelch_1 superfamily motifs and COG3055 domain	none
SPU_021574	SPU_021574	Strongylocentrotus purpuratus-specific protein	none
SPU_021575	SPU_021575	Strongylocentrotus purpuratus-specific protein	none
SPU_021576	SPU_021576	Strongylocentrotus purpuratus-specific protein	none
SPU_021578	SPU_021578	Strongylocentrotus purpuratus-specific protein	none
SPU_021584	SPU_021584	probable assembly chimera	none
SPU_021607	SPU_021607	contains UbiG domain	none
SPU_021611	SPU_021611	poor sequence data: ~60% of amino acids are X	none
SPU_021660	SPU_021660	contains PepP domain	none
SPU_021684	SPU_021684	contains PRK12678 domain	none
SPU_021687	SPU_021687	contains 2 P_loop_NTPase superfamily motifs and cas3_core domain	none
SPU_021712	SPU_021712	probable assembly chimera	none
SPU_021722	SPU_021722	Strongylocentrotus purpuratus-specific protein	none
SPU_021746	SPU_021746	contains 2 IG superfamily motifs	none
SPU_021748	SPU_021748	contains 4 IG superfamily motifs	none
SPU_021750	SPU_021750	contains 2 FNR-like superfamily motifs	none
SPU_021763	SPU_021763	Strongylocentrotus purpuratus-specific protein	none
SPU_021768	SPU_021768	Strongylocentrotus purpuratus-specific protein	none
SPU_021855	SPU_021855	Strongylocentrotus purpuratus-specific protein. probable assembly chimera.	none
SPU_021872	SPU_021872	Strongylocentrotus purpuratus-specific protein	none
SPU_021880	SPU_021880	Strongylocentrotus purpuratus-specific protein	none
SPU_021905	SPU_021905	contains 2 IG superfamily motifs	none
SPU_021912	SPU_021912	contains matE domain	none
SPU_021929	SPU_021929	Strongylocentrotus purpuratus-specific protein	none
SPU_021938	SPU_021938	contains SMC_prok_B domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_021939	SPU_021939	contains Sulfotransfer_1 domain	none
SPU_021960	SPU_021960	contains DHC_N1 domain	none
SPU_022025	SPU_022025	Strongylocentrotus purpuratus-specific protein	none
SPU_022027	SPU_022027	Strongylocentrotus purpuratus-specific protein	none
SPU_022044	SPU_022044	contains 2 FA58C superfamily motifs	none
SPU_022077	SPU_022077	Strongylocentrotus purpuratus-specific protein	none
SPU_022139	SPU_022139	Strongylocentrotus purpuratus-specific protein	none
SPU_022196	SPU_022196	Strongylocentrotus purpuratus-specific protein	none
SPU_022198	SPU_022198	Strongylocentrotus purpuratus-specific protein	none
SPU_022205	SPU_022205	Strongylocentrotus purpuratus-specific protein. probable assembly chimera.	none
SPU_022229	SPU_022229	poor sequence data: ~65% of amino acids are X	none
SPU_022241	SPU_022241	contains 2 PLDc superfamily motifs	none
SPU_022303	SPU_022303	contains Sulfotransfer_1 domain	none
SPU_022312	SPU_022312	contains Sulfotransfer_1 domain	none
SPU_022313	SPU_022313	contains Sulfotransfer_1 domain	none
SPU_022329	SPU_022329	contains Smc domain	none
SPU_022331	SPU_022331	contains rad18 domain and PRK09603 domain and SGL domain	none
SPU_022358	SPU_022358	contains COG5026 domain	none
SPU_017090	SPU_017090	also homologous to a wide variety of FAD-dependent oxidoreductases in bacteria and fungi	none
SPU_022379	SPU_022379	contains 2 FA58C superfamily motifs	none
SPU_022392	SPU_022392	contains DUF1898 domain	none
SPU_022418	SPU_022418	contains 2 EGF_CA superfamily motifs	none
SPU_022454	SPU_022454	contains 3 CUB superfamily motifs	none
SPU_022466	SPU_022466	Strongylocentrotus purpuratus-specific protein	none
SPU_022508	SPU_022508	contains Sulfotransfer_1 domain	none
SPU_022517	SPU_022517	contains 3 ANK superfamily motifs	none
SPU_022520	SPU_022520	contains 2 BTB superfamily motifs	none
SPU_022521	SPU_022521	contains 2 BTB superfamily motifs	none
SPU_022540	SPU_022540	probable assembly chimera	none
SPU_022545	SPU_022545	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_022550	SPU_022550	Strongylocentrotus purpuratus-specific protein	none
SPU_022591	SPU_022591	probable assembly chimera	none
SPU_022600	SPU_022600	contains MetG superfamily motif. Strongylocentrotus purpuratus-specific protein.	none
SPU_022610	SPU_022610	contains Pkinase domain	none
SPU_022631	SPU_022631	Strongylocentrotus purpuratus-specific protein	none
SPU_022645	SPU_022645	Strongylocentrotus purpuratus-specific protein	none
SPU_022656	SPU_022656	contains DYN1 domain	none
SPU_022738	SPU_022738	Strongylocentrotus purpuratus-specific protein	none
SPU_022739	SPU_022739	contains 3 CCP superfamily motifs	none
SPU_022747	SPU_022747	Strongylocentrotus purpuratus-specific protein	none
SPU_022758	SPU_022758	contains SGL domain	none
SPU_022782	SPU_022782	contains recD_rel domain	none
SPU_022789	SPU_022789	contains COG4886 domain	none
SPU_022803	SPU_022803	Strongylocentrotus purpuratus-specific protein	none
SPU_022812	SPU_022812	contains SmtA domain	none
SPU_022856	SPU_022856	contains 2 SANT superfamily motifs	none
SPU_022862	SPU_022862	Strongylocentrotus purpuratus-specific protein	none
SPU_022881	SPU_022881	contains 2 LDLa superfamily motifs	none
SPU_022886	SPU_022886	contains 2 HYR superfamily motifs	none
SPU_022927	SPU_022927	contains SMC_N domain and SbcC domain	none
SPU_022930	SPU_022930	contains 2 DUF1900 superfamily motifs	none
SPU_022931	SPU_022931	contains 2 AMK superfamily motifs	none
SPU_022945	SPU_022945	Strongylocentrotus purpuratus-specific protein	none
SPU_022987	SPU_022987	contains tpt domain	none
SPU_022990	SPU_022990	contains tpt domain	none
SPU_022999	SPU_022999	contains 2 BTB superfamily motifs	none
SPU_023003	SPU_023003	contains HcaE domain	none
SPU_023006	SPU_023006	Strongylocentrotus purpuratus-specific protein	none
SPU_023010	SPU_023010	contains pol2 domain	none
SPU_023051	SPU_023051	contains SMC_prok_A domain	none
SPU_023089	SPU_023089	contains 2 MFS superfamily motifs	none
SPU_023110	SPU_023110	contains COG7 domain	none
SPU_023159	SPU_023159	contains TadD domain	none
SPU_023167	SPU_023167	contains SMC_prok_A domain	none
SPU_023171	SPU_023171	contains SMC_prok_B domain	none
SPU_023199	SPU_023199	contains 2 Fascin superfamily motifs	none
SPU_023243	SPU_023243	contains 2 IG superfamily motifs	none
SPU_023244	SPU_023244	Strongylocentrotus purpuratus-specific protein	none
SPU_023256	SPU_023256	contains SMC_N domain	none
SPU_023272	SPU_023272	contains 2 MFS superfamily motifs	none
SPU_023273	SPU_023273	contains 2 MFS superfamily motifs	none
SPU_023284	SPU_023284	poor sequence data: ~40% of amino acids are X. Strongylocentrotus purpuratus-specific protein.	none
SPU_023292	SPU_023292	contains RAD18 domain	none
SPU_023306	SPU_023306	contains PRK10263 domain	none
SPU_023326	SPU_023326	probable assembly chimera	none
SPU_023346	SPU_023346	contains 2 E2F_TDP superfamily motifs	none
SPU_023405	SPU_023405	Strongylocentrotus purpuratus-specific protein	none
SPU_023449	SPU_023449	contains 3 ANK superfamily motifs and Arp domains	none
SPU_023465	SPU_023465	probable assembly chimera	none
SPU_023467	SPU_023467	contains NmrA domain	none
SPU_023470	SPU_023470	contains SMC_prok_B domain	none
SPU_023489	SPU_023489	Strongylocentrotus purpuratus-specific protein	none
SPU_023493	SPU_023493	contains Ion_trans domain	none
SPU_023513	SPU_023513	contains 2A0119 domain	none
SPU_023524	SPU_023524	Strongylocentrotus purpuratus-specific protein	none
SPU_023535	SPU_023535	probable assembly chimera	none
SPU_023554	SPU_023554	contains 2 SPEC superfamily motifs	none
SPU_023562	SPU_023562	Strongylocentrotus purpuratus-specific protein	none
SPU_023566	SPU_023566	contains Nckap1 domain	none
SPU_023588	SPU_023588	contains UDPGT domain	none
SPU_023650	SPU_023650	probable assembly chimera	none
SPU_023657	SPU_023657	Strongylocentrotus purpuratus-specific protein	none
SPU_023681	SPU_023681	contains FMO-like domain	none
SPU_023683	SPU_023683	contains 2 Ion_trans_2 superfamily motifs	none
SPU_023687	SPU_023687	contains HUL4 domain	none
SPU_023713	SPU_023713	contains 4 ANK superfamily motifs and Arp domains	none
SPU_023719	SPU_023719	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_023721	SPU_023721	contains COG3415 domain	none
SPU_023784	SPU_023784	contains SMC_prok_B domain	none
SPU_023817	SPU_023817	contains SMC_N domain	none
SPU_023821	SPU_023821	contains COG3391 domain	none
SPU_023825	SPU_023825	contains 2A0113 domain	none
SPU_023848	SPU_023848	Strongylocentrotus purpuratus-specific protein	none
SPU_023852	SPU_023852	poor sequence data: ~55% of amino acids are X	none
SPU_023860	SPU_023860	contains RAD18 domain	none
SPU_023879	SPU_023879	contains 4 EGF_CA superfamily motifs	none
SPU_023902	SPU_023902	contains 2 CUB superfamily motifs	none
SPU_023946	SPU_023946	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_023953	SPU_023953	contains 3 Ldl_recept_b superfamily motifs and COG3391 domain	none
SPU_023959	SPU_023959	contains SMC_prok_B domain	none
SPU_023960	SPU_023960	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_023971	SPU_023971	Strongylocentrotus purpuratus-specific protein. probable assembly chimera.	none
SPU_023987	SPU_023987	Strongylocentrotus purpuratus-specific protein	none
SPU_024005	SPU_024005	contains SMC_prok_B domain	none
SPU_024015	SPU_024015	Strongylocentrotus purpuratus-specific protein	none
SPU_024021	SPU_024021	contains 2 FReD superfamily motifs	none
SPU_024034	SPU_024034	contains FRQ1 domain	none
SPU_024037	SPU_024037	contains PRK05771 domain	none
SPU_024043	SPU_024043	contains Sulfotransfer_1 domain	none
SPU_024051	SPU_024051	contains 2 BTB superfamily	none
SPU_024057	SPU_024057	Strongylocentrotus purpuratus-specific protein	none
SPU_024085	SPU_024085	contains 2 Ion_trans_2 superfamily motifs	none
SPU_024118	SPU_024118	contains Laminin_EGF domain	none
SPU_024145	SPU_024145	contains type_I_hly domain	none
SPU_024155	SPU_024155	Strongylocentrotus purpuratus-specific protein	none
SPU_024188	SPU_024188	contains 4 Annexin superfamily motifs	none
SPU_024198	SPU_024198	contains SMC_prok_B superfamily	none
SPU_024202	SPU_024202	Strongylocentrotus purpuratus-specific protein	none
SPU_024213	SPU_024213	Strongylocentrotus purpuratus-specific protein	none
SPU_024265	SPU_024265	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_024266	SPU_024266	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_024271	SPU_024271	contains SMC_prok_B domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_024272	SPU_024272	contains 3 ANK superfamily motifs and Arp domains	none
SPU_024314	SPU_024314	contains 2 PNP_UDP_1 superfamily motifs	none
SPU_024318	SPU_024318	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_024341	SPU_024341	contains PTZ00322 domain	none
SPU_024365	SPU_024365	contains 3 ANK superfamily motifs and Arp domains	none
SPU_024370	SPU_024370	Strongylocentrotus purpuratus-specific protein	none
SPU_024380	SPU_024380	Strongylocentrotus purpuratus-specific protein	none
SPU_024388	SPU_024388	contains SMC_prok_A domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_024389	SPU_024389	contains SMC_prok_A domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_024391	SPU_024391	contains tpt domain	none
SPU_024415	SPU_024415	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_024426	SPU_024426	Strongylocentrotus purpuratus-specific protein	none
SPU_024433	SPU_024433	Strongylocentrotus purpuratus-specific protein	none
SPU_024444	SPU_024444	contains 2 MAM superfamily motifs	none
SPU_024465	SPU_024465	contains SMC_prok_B domain and COG2423 domain	none
SPU_024475	SPU_024475	contains 2 Arylesterase superfamily motifs and COG3386 domain	none
SPU_024488	SPU_024488	contains Lon domain	none
SPU_024505	SPU_024505	contains Sulfotransfer_1 domain	none
SPU_024511	SPU_024511	Strongylocentrotus purpuratus-specific protein	none
SPU_024512	SPU_024512	Strongylocentrotus purpuratus-specific protein	none
SPU_024542	SPU_024542	contains 2 IG superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_024554	SPU_024554	contains pro_imino_pep_2 domain	none
SPU_024596	SPU_024596	contains Dynein_heavy domain	none
SPU_024600	SPU_024600	contains 4 Filamin superfamily motifs	none
SPU_024613	SPU_024613	contains 2 IG superfamily motifs	none
SPU_024660	SPU_024660	contains 2 MAM superfamily motifs	none
SPU_024661	SPU_024661	probable assembly chimera	none
SPU_024692	SPU_024692	contains SMC_prok_A domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_024713	SPU_024713	Strongylocentrotus purpuratus-specific protein	none
SPU_024724	SPU_024724	contains Pkinase domain	none
SPU_024738	SPU_024738	contains SMC_prok_A domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_024742	SPU_024742	contains COG5412 domain	none
SPU_024758	SPU_024758	contains 2 C8 superfamily motifs and 2 TIL superfamily motifs	none
SPU_024767	SPU_024767	Strongylocentrotus purpuratus-specific protein	none
SPU_024791	SPU_024791	contains 2 P_loop_NTPase superfamily motifs	none
SPU_024804	SPU_024804	contains 4 S1-like superfamily motifs	none
SPU_024822	SPU_024822	Strongylocentrotus purpuratus-specific protein	none
SPU_024825	SPU_024825	contains hCaCC domain	none
SPU_024876	SPU_024876	contains mutS2 domain	none
SPU_024904	SPU_024904	Strongylocentrotus purpuratus-specific protein	none
SPU_024911	SPU_024911	contains RPN2 domain and AIR1 domain	none
SPU_024935	SPU_024935	Strongylocentrotus purpuratus-specific protein	none
SPU_024954	SPU_024954	contains SMC_prok_B domain	none
SPU_024959	SPU_024959	contains rad50 domain	none
SPU_024961	SPU_024961	contains 2 ANK superfamily motifs	none
SPU_024990	SPU_024990	contains DnaJ_bact domain	none
SPU_024995	SPU_024995	contains 3 IG superfamily	none
SPU_025007	SPU_025007	Strongylocentrotus purpuratus-specific protein	none
SPU_025008	SPU_025008	Strongylocentrotus purpuratus-specific protein	none
SPU_025022	SPU_025022	contains COG4886 domain	none
SPU_025026	SPU_025026	Strongylocentrotus purpuratus-specific protein	none
SPU_025069	SPU_025069	Strongylocentrotus purpuratus-specific protein	none
SPU_025083	SPU_025083	Strongylocentrotus purpuratus-specific protein	none
SPU_025085	SPU_025085	contains 2A0113 domain	none
SPU_025108	SPU_025108	contains PAT1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_025116	SPU_025116	contains 3 CLECT superfamily	none
SPU_025134	SPU_025134	Strongylocentrotus purpuratus-specific protein	none
SPU_025177	SPU_025177	Strongylocentrotus purpuratus-specific protein	none
SPU_025180	SPU_025180	contains 2 CLECT superfamily motifs	none
SPU_025207	SPU_025207	contains PRK02224 domain	none
SPU_025261	SPU_025261	Strongylocentrotus purpuratus-specific protein	none
SPU_025277	SPU_025277	contains 3 HYR superfamily motifs	none
SPU_025282	SPU_025282	contains 3 EGF_CA superfamily motifs	none
SPU_025287	SPU_025287	probable assembly chimera	none
SPU_025295	SPU_025295	contains 2 HYR superfamily motifs	none
SPU_025339	SPU_025339	Strongylocentrotus purpuratus-specific protein	none
SPU_025349	SPU_025349	contains O-FucT domain	none
SPU_025361	SPU_025361	contains 2 MAM superfamily motifs	none
SPU_025379	SPU_025379	Strongylocentrotus purpuratus-specific protein	none
SPU_025407	SPU_025407	contains DPPIV_N domain	none
SPU_025448	SPU_025448	contains 2 TSP_1 superfamily motifs	none
SPU_025454	SPU_025454	contains 2 TSP_1 superfamily motifs	none
SPU_025473	SPU_025473	Strongylocentrotus purpuratus-specific protein	none
SPU_025478	SPU_025478	contains SSM4 domain. probable assembly chimera.	none
SPU_025479	SPU_025479	contains COG4886 domain	none
SPU_025491	SPU_025491	Strongylocentrotus purpuratus-specific protein	none
SPU_025519	SPU_025519	contains Sulfotransfer_1 domain	none
SPU_025542	SPU_025542	contains 3 EGF_CA superfamily motifs	none
SPU_025556	SPU_025556	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_025563	SPU_025563	contains 3 MAM superfamily motifs	none
SPU_025582	SPU_025582	contains DUF2450 domain	none
SPU_025583	SPU_025583	contains PRK12678 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_025605	SPU_025605	contains 2 CCP superfamily motifs	none
SPU_025631	SPU_025631	contains Smc domain	none
SPU_025647	SPU_025647	contains SH3BP5 domain	none
SPU_025659	SPU_025659	contains Fanconi_C domain	none
SPU_025662	SPU_025662	contains 2A0119 domain	none
SPU_025663	SPU_025663	contains 2 FA58C superfamily motifs	none
SPU_025677	SPU_025677	contains 2 IG superfamily motifs	none
SPU_025706	SPU_025706	contains APH domain	none
SPU_025721	SPU_025721	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_025748	SPU_025748	contains SpoVK domain	none
SPU_025780	SPU_025780	Strongylocentrotus purpuratus-specific protein	none
SPU_025795	SPU_025795	Strongylocentrotus purpuratus-specific protein	none
SPU_025799	SPU_025799	contains Deme6 domain	none
SPU_025802	SPU_025802	Strongylocentrotus purpuratus-specific protein	none
SPU_025835	SPU_025835	poor sequence data: ~75% of amino acids are X	none
SPU_025867	SPU_025867	contains 2A0113 domain	none
SPU_025906	SPU_025906	contains COG4886 domain	none
SPU_025910	SPU_025910	contains GTPBP1 domain	none
SPU_025928	SPU_025928	contains COG4886 domain	none
SPU_025936	SPU_025936	contains SMC_prok_A domain	none
SPU_025962	SPU_025962	contains 3 Annexin superfamily motifs	none
SPU_025971	SPU_025971	contains rad18 domain	none
SPU_025977	SPU_025977	probable assembly chimera	none
SPU_025981	SPU_025981	contains PRK05761 domain	none
SPU_025984	SPU_025984	contains 3 CCP superfamily motifs	none
SPU_025992	SPU_025992	contains 2a57 domain	none
SPU_026089	SPU_026089	contains 2 IPT superfamily motifs	none
SPU_026093	SPU_026093	Strongylocentrotus purpuratus-specific protein	none
SPU_026112	SPU_026112	Strongylocentrotus purpuratus-specific protein	none
SPU_026115	SPU_026115	Strongylocentrotus purpuratus-specific protein	none
SPU_026117	SPU_026117	contains 4 LDLa superfamily motifs	none
SPU_026134	SPU_026134	contains SMC_prok_A domain and SMC_prok_B domain	none
SPU_026149	SPU_026149	contains SMC_prok_B domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_026163	SPU_026163	contains 2 IG superfamily motifs	none
SPU_026190	SPU_026190	contains 2 NIPSNAP superfamily motifs	none
SPU_026204	SPU_026204	Strongylocentrotus purpuratus-specific protein	none
SPU_026232	SPU_026232	contains SMC_N domain	none
SPU_026233	SPU_026233	contains 2 MFS superfamily motifs and MFS_1 domain	none
SPU_026242	SPU_026242	Strongylocentrotus purpuratus-specific protein	none
SPU_026282	SPU_026282	contains SMC_N domain	none
SPU_026283	SPU_026283	contains SMC_N domain	none
SPU_026290	SPU_026290	contains 2 WSC superfamily motifs	none
SPU_026302	SPU_026302	contains 4 EGF_Lam superfamily motifs and 4 Laminin_EGF domain motifs	none
SPU_026303	SPU_026303	contains 4 EGF_Lam superfamily motifs and Laminin_EGF domain	none
SPU_026305	SPU_026305	contains SMC_prok_A domain	none
SPU_026343	SPU_026343	contains Dynein_heavy domain	none
SPU_026365	SPU_026365	contains FRQ1 domain	none
SPU_026402	SPU_026402	contains PRK08566 domain	none
SPU_026405	SPU_026405	contains SMC_prok_B domain	none
SPU_026409	SPU_026409	contains COG4886 domain	none
SPU_026410	SPU_026410	contains SMC_prok_B domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_026421	SPU_026421	contains SMC_prok_A domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_026435	SPU_026435	contains SMC_prok_B domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_026443	SPU_026443	contains amidohydrolase domain and AbgB domain	none
SPU_026451	SPU_026451	Strongylocentrotus purpuratus-specific protein	none
SPU_026478	SPU_026478	contains PAT1 domain	none
SPU_026494	SPU_026494	contains carnitine_bodg domain	none
SPU_026512	SPU_026512	contains SMC_prok_B domain	none
SPU_026544	SPU_026544	contains Ftcd domain	none
SPU_026546	SPU_026546	contains 2 ANK superfamily motifs and Arp domains	none
SPU_026548	SPU_026548	contains Myosin_tail_1 domain and PRK13729 domain	none
SPU_026551	SPU_026551	contains 2 Ldl_recept_b superfamily motifs and COG3391 domain	none
SPU_026553	SPU_026553	contains 2 LRR_RI superfamily motifs	none
SPU_026565	SPU_026565	Strongylocentrotus purpuratus-specific protein	none
SPU_026571	SPU_026571	contains 2 TPR superfamily motifs and PRK11788 domain	none
SPU_026575	SPU_026575	contains 2 EGF_CA superfamily motifs	none
SPU_026584	SPU_026584	Strongylocentrotus purpuratus-specific protein	none
SPU_026585	SPU_026585	Strongylocentrotus purpuratus-specific protein	none
SPU_026589	SPU_026589	Strongylocentrotus purpuratus-specific protein	none
SPU_026602	SPU_026602	contains 2 PHD superfamily motifs	none
SPU_026603	SPU_026603	Strongylocentrotus purpuratus-specific protein	none
SPU_026650	SPU_026650	contains 3 Gelsolin superfamily motifs	none
SPU_026653	SPU_026653	contains 2 IG superfamily motifs and Marek_A domain	none
SPU_026665	SPU_026665	contains SMC_prok_B domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_026686	SPU_026686	contains 2 FA58C superfamily motifs	none
SPU_026697	SPU_026697	contains RAD18 domain	none
SPU_026703	SPU_026703	contains 2 IG superfamily motifs and V-set domain	none
SPU_026713	SPU_026713	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_026750	SPU_026750	contains 4 IG superfamily motifs	none
SPU_026753	SPU_026753	contains Sulfotransfer_1 domain	none
SPU_026765	SPU_026765	contains MviM domain	none
SPU_026843	SPU_026843	contains 2 FA58C superfamily motifs	none
SPU_026850	SPU_026850	contains 2 MFS superfamily motifs	none
SPU_026851	SPU_026851	contains Smc domain	none
SPU_026869	SPU_026869	contains MRS6 domain	none
SPU_026870	SPU_026870	contains COG5222 domain	none
SPU_026876	SPU_026876	contains LIC domain	none
SPU_026878	SPU_026878	contains 2 Kelch_1 superfamily motifs and COG3055 domain	none
SPU_026891	SPU_026891	Strongylocentrotus purpuratus-specific protein	none
SPU_026908	SPU_026908	Strongylocentrotus purpuratus-specific protein	none
SPU_026915	SPU_026915	contains 2 TSP_1 superfamily motifs	none
SPU_026951	SPU_026951	Strongylocentrotus purpuratus-specific protein	none
SPU_026969	SPU_026969	contains 2 ANK superfamily motifs	none
SPU_026978	SPU_026978	contains CLH domain	none
SPU_026980	SPU_026980	contains 2 Kelch_1 superfamily motifs and COG3055 domain	none
SPU_027003	SPU_027003	contains COG1413 domain	none
SPU_027011	SPU_027011	Strongylocentrotus purpuratus-specific protein	none
SPU_027021	SPU_027021	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_027024	SPU_027024	contains DNA_pol_B_2 domain and rne domain	none
SPU_027042	SPU_027042	contains 2A0601 domain	none
SPU_027064	SPU_027064	contains 2 PDZ superfamily motifs	none
SPU_027099	SPU_027099	Strongylocentrotus purpuratus-specific protein	none
SPU_027134	SPU_027134	contains SMC_prok_B domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_027135	SPU_027135	contains SMC_prok_B domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_027139	SPU_027139	Strongylocentrotus purpuratus-specific protein	none
SPU_027142	SPU_027142	contains 3 HYR superfamily motifs	none
SPU_027154	SPU_027154	contains 2A0113 domain	none
SPU_027161	SPU_027161	contains 2 Hyd_WA superfamily motifs	none
SPU_027169	SPU_027169	contains 2 IG superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_027170	SPU_027170	Strongylocentrotus purpuratus-specific protein	none
SPU_027174	SPU_027174	contains SMC_prok_B domain	none
SPU_027187	SPU_027187	contains infB domain	none
SPU_027200	SPU_027200	contains CRM1 domain	none
SPU_027211	SPU_027211	contains EH domain	none
SPU_027228	SPU_027228	Strongylocentrotus purpuratus-specific protein	none
SPU_027271	SPU_027271	probable assembly chimera	none
SPU_027283	SPU_027283	contains Qor domain	none
SPU_027296	SPU_027296	contains PRK02106 domain	none
SPU_027298	SPU_027298	contains PRK02106 domain	none
SPU_027303	SPU_027303	contains COG3603 domain	none
SPU_027347	SPU_027347	probable assembly chimera	none
SPU_027366	SPU_027366	contains 2 MFS superfamily motifs	none
SPU_027373	SPU_027373	contains COG5222 domain and SMC_prok_A domain	none
SPU_027385	SPU_027385	contains 3 IPT superfamily motifs	none
SPU_027392	SPU_027392	contains DUF803 domain	none
SPU_027396	SPU_027396	contains 2 FA58C superfamily motifs	none
SPU_027397	SPU_027397	contains 2A0113 domain	none
SPU_027416	SPU_027416	Strongylocentrotus purpuratus-specific protein	none
SPU_027417	SPU_027417	contains 2 IG superfamily motifs	none
SPU_027429	SPU_027429	contains SMC_prok_B domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_027434	SPU_027434	contains LIC domain	none
SPU_027444	SPU_027444	contains HAD-SF-IA-v3 domain	none
SPU_027453	SPU_027453	contains COG4955 domain	none
SPU_027459	SPU_027459	contains TelA domain	none
SPU_027460	SPU_027460	probable assembly chimera	none
SPU_027470	SPU_027470	contains MPH1 domain	none
SPU_027471	SPU_027471	Strongylocentrotus purpuratus-specific protein	none
SPU_027500	SPU_027500	contains 2A0119 domain	none
SPU_027504	SPU_027504	Strongylocentrotus purpuratus-specific protein	none
SPU_027521	SPU_027521	Strongylocentrotus purpuratus-specific protein	none
SPU_027554	SPU_027554	contains PABP-1234 domain	none
SPU_027567	SPU_027567	contains COG4581 domain	none
SPU_027577	SPU_027577	Strongylocentrotus purpuratus-specific protein	none
SPU_027585	SPU_027585	contains SMC_prok_B domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_027588	SPU_027588	contains 2 EGF_CA superfamily motifs	none
SPU_027595	SPU_027595	probable assembly chimera	none
SPU_027656	SPU_027656	contains Nup88 domain	none
SPU_027692	SPU_027692	contains COG2129 domain	none
SPU_027723	SPU_027723	contains CSE1 domain	none
SPU_027809	SPU_027809	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_027835	SPU_027835	Strongylocentrotus purpuratus-specific protein. probable assembly chimera.	none
SPU_027863	SPU_027863	contains 2 MAM superfamily motifs	none
SPU_027881	SPU_027881	probable assembly chimera	none
SPU_027917	SPU_027917	contains 4 IG superfamily motifs	none
SPU_027945	SPU_027945	contains CaiA domain	none
SPU_027948	SPU_027948	contains 3 IPT superfamily motifs	none
SPU_027958	SPU_027958	Strongylocentrotus purpuratus-specific protein	none
SPU_027996	SPU_027996	Strongylocentrotus purpuratus-specific protein	none
SPU_028008	SPU_028008	contains SMC_prok_A domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_028012	SPU_028012	contains 3 ANK superfamily motifs	none
SPU_028015	SPU_028015	Strongylocentrotus purpuratus-specific protein	none
SPU_028044	SPU_028044	contains FIMAC domain and PRK08853 domain	none
SPU_028052	SPU_028052	contains 3 SPEC superfamily motifs	none
SPU_028074	SPU_028074	probable assembly chimera	none
SPU_028085	SPU_028085	Strongylocentrotus purpuratus-specific protein	none
SPU_028116	SPU_028116	contains RVT_1 domain	none
SPU_028126	SPU_028126	contains 3 ANK superfamily motifs	none
SPU_028127	SPU_028127	Strongylocentrotus purpuratus-specific protein	none
SPU_028136	SPU_028136	Strongylocentrotus purpuratus-specific protein	none
SPU_028138	SPU_028138	Strongylocentrotus purpuratus-specific protein	none
SPU_028143	SPU_028143	contains 4 HYR superfamily motifs	none
SPU_028172	SPU_028172	Strongylocentrotus purpuratus-specific protein	none
SPU_028186	SPU_028186	Strongylocentrotus purpuratus-specific protein	none
SPU_028193	SPU_028193	probable assembly chimera	none
SPU_028212	SPU_028212	contains 2 MAM superfamily motifs	none
SPU_028219	SPU_028219	contains DUF1253 domain	none
SPU_028243	SPU_028243	contains SGL domain	none
SPU_028248	SPU_028248	Strongylocentrotus purpuratus-specific protein	none
SPU_028264	SPU_028264	Strongylocentrotus purpuratus-specific protein	none
SPU_028269	SPU_028269	Strongylocentrotus purpuratus-specific protein	none
SPU_028281	SPU_028281	Strongylocentrotus purpuratus-specific protein	none
SPU_028284	SPU_028284	contains 2 P_loop_NTPase superfamily motifs	none
SPU_028289	SPU_028289	contains 2 MFS superfamily motifs and 2A0119 domain	none
SPU_028314	SPU_028314	contains UvrA domain	none
SPU_028317	SPU_028317	Strongylocentrotus purpuratus-specific protein	none
SPU_028321	SPU_028321	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_028322	SPU_028322	contains 2 EFh superfamily motifs and FRQ1 domain	none
SPU_028360	SPU_028360	contains CAL1 domain	none
SPU_028364	SPU_028364	contains Dynein_heavy domain	none
SPU_028365	SPU_028365	contains Dynein_heavy domain	none
SPU_028369	SPU_028369	contains 2A0113 domain and CynX domain	none
SPU_028409	SPU_028409	contains COG2940 domain	none
SPU_028414	SPU_028414	Strongylocentrotus purpuratus-specific protein	none
SPU_028418	SPU_028418	contains 2 MAM superfamily motifs	none
SPU_028419	SPU_028419	contains Sulfotransfer_1 domain	none
SPU_028426	SPU_028426	contains SMC_prok_B domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_028427	SPU_028427	contains 2 ANK superfamily motifs and UbiH domain	none
SPU_028441	SPU_028441	contains soxA_mon domain	none
SPU_028442	SPU_028442	contains DadA domain	none
SPU_028443	SPU_028443	contains DadA domain	none
SPU_028450	SPU_028450	contains 2 IG superfamily motifs	none
SPU_028472	SPU_028472	contains Sulfotransfer_1 domain	none
SPU_028484	SPU_028484	Strongylocentrotus purpuratus-specific protein	none
SPU_028500	SPU_028500	contains Filament domain	none
SPU_028502	SPU_028502	contains 2 Filamin superfamily motifs	none
SPU_028519	SPU_028519	contains 5 HYR superfamily motifs	none
SPU_028659	SPU_028659	contains RPN6 domain	none
SPU_028663	SPU_028663	contains COG5273 domain	none
SPU_028672	SPU_028672	contains degP_htrA domain	none
SPU_028705	SPU_028705	contains 2 HYR superfamily motifs	none
SPU_028715	SPU_028715	contains Macoilin domain	none
SPU_028743	SPU_028743	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_028747	SPU_028747	contains COG3696 domain	none
SPU_028761	SPU_028761	contains Aes domain	none
SPU_028851	SPU_028851	contains PlsC domain	none
SPU_028863	SPU_028863	probable assembly chimera	none
SPU_028901	SPU_028901	contains TRF4 domain	none
SPU_028911	SPU_028911	contains MRS6 domain	none
SPU_028928	SPU_028928	contains hCaCC domain	none
SPU_000094	SPU_000094	Strongylocentrotus purpuratus-specific protein	none
SPU_000124	SPU_000124	Strongylocentrotus purpuratus-specific protein	none
SPU_000143	SPU_000143	Strongylocentrotus purpuratus-specific protein	none
SPU_000245	SPU_000245	Strongylocentrotus purpuratus-specific protein	none
SPU_000333	SPU_000333	contains 2 zf-BED superfamily motifs near N-terminus and hATC superfamily motif at C-terminus. Ac-like transposable element homolog.	none
SPU_000345	SPU_000345	contains SAM superfamily motif at C-terminus. contains PRK12323 domain.	none
SPU_000347	SPU_000347	Strongylocentrotus purpuratus-specific protein	none
SPU_000353	SPU_000353	Strongylocentrotus purpuratus-specific protein	none
SPU_000396	SPU_000396	Strongylocentrotus purpuratus-specific protein	none
SPU_000423	SPU_000423	Strongylocentrotus purpuratus-specific protein	none
SPU_000477	SPU_000477	contains Smc domain	none
SPU_000529	SPU_000529	contains 2 CUB superfamily motifs	none
SPU_000551	SPU_000551	Strongylocentrotus purpuratus-specific protein	none
SPU_000559	SPU_000559	contains 2 COG5028 domains	none
SPU_000570	SPU_000570	Strongylocentrotus purpuratus-specific protein	none
SPU_000573	SPU_000573	Strongylocentrotus purpuratus-specific protein	none
SPU_000599	SPU_000599	contains KA1 superfamily motif at C-terminus. Strongylocentrotus purpuratus-specific protein.	none
SPU_000610	SPU_000610	Strongylocentrotus purpuratus-specific protein	none
SPU_000621	SPU_000621	contains 5 LDLa superfamily motifs and 2 CUB superfamily motifs	none
SPU_000752	SPU_000752	Strongylocentrotus purpuratus-specific protein	none
SPU_000815	SPU_000815	contains 2 ANK superfamily motifs and Arp domain	none
SPU_000868	SPU_000868	Strongylocentrotus purpuratus-specific protein	none
SPU_000920	SPU_000920	Strongylocentrotus purpuratus-specific protein	none
SPU_000993	SPU_000993	Strongylocentrotus purpuratus-specific protein	none
SPU_001001	SPU_001001	Strongylocentrotus purpuratus-specific protein	none
SPU_001019	SPU_001019	contains 3 ANK superfamily motifs and Arp domain motifs	none
SPU_001025	SPU_001025	Strongylocentrotus purpuratus-specific protein	none
SPU_001057	SPU_001057	Strongylocentrotus purpuratus-specific protein	none
SPU_001121	SPU_001121	contains SMC_N domain	none
SPU_001157	SPU_001157	contains PAT1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_001189	SPU_001189	Strongylocentrotus purpuratus-specific protein	none
SPU_001206	SPU_001206	contains 2 Death superfamily motifs	none
SPU_001257	SPU_001257	Strongylocentrotus purpuratus-specific protein	none
SPU_001332	SPU_001332	contains Adaptin_N domain	none
SPU_001335	SPU_001335	contains Smc domain	none
SPU_001346	SPU_001346	contains 2 FA58C superfamily motifs and 11 EGF_CA superfamily motifs	none
SPU_001347	SPU_001347	Strongylocentrotus purpuratus-specific protein	none
SPU_001348	SPU_001348	Strongylocentrotus purpuratus-specific protein	none
SPU_001358	SPU_001358	Strongylocentrotus purpuratus-specific protein	none
SPU_001409	SPU_001409	contains 4 SRCR superfamily motifs	none
SPU_001427	SPU_001427	contains 4 SPEC superfamily motifs	none
SPU_001434	SPU_001434	contains 2 COG4886 domains	none
SPU_001445	SPU_001445	SPU_001497	none
SPU_001516	SPU_001516	contains 5 MAM superfamily motifs	none
SPU_001528	SPU_001528	contains COG2319 domain	none
SPU_001562	SPU_001562	contains 2 MFS superfamily motifs	none
SPU_001591	SPU_001591	contains COG5635 domain.Strongylocentrotus purpuratus-specific protein.	none
SPU_001606	SPU_001606	Strongylocentrotus purpuratus-specific protein	none
SPU_001706	SPU_001706	contains 4 ANK superfamily motifs and Arp domains	none
SPU_001762	SPU_001762	contains COG4886 domain	none
SPU_001791	SPU_001791	contains 4 ANK superfamily motifs and Arp domains	none
SPU_001816	SPU_001816	contains COG5141 domain	none
SPU_001857	SPU_001857	contains 6 ANK superfamily motifs and Arp domains	none
SPU_001906	SPU_001906	contains 2 MFS superfamily motifs	none
SPU_001922	SPU_001922	contains 5 ANK superfamily motifs and Arp domains	none
SPU_001946	SPU_001946	contains 6 ANK superfamily motifs and Arp domains	none
SPU_001947	SPU_001947	contains 4 ANK superfamily motifs and Arp domains	none
SPU_001952	SPU_001952	contains 2 7tm_1 superfamily motifs	none
SPU_002010	SPU_002010	contains 2 HYR superfamily motifs	none
SPU_002011	SPU_002011	contains 2 DCX superfamily motifs	none
SPU_002071	SPU_002071	contains MCM2 domain and MCM domain	none
SPU_002080	SPU_002080	contains 4 LDLa_re superfamily motifs and 5 EGF_CA superfamily motifs	none
SPU_002086	SPU_002086	contains 2 LRR_RI superfamily motifs	none
SPU_002096	SPU_002096	Strongylocentrotus purpuratus-specific protein	none
SPU_002111	SPU_002111	contains 6 ANK superfamily motifs and Arp domains	none
SPU_002121	SPU_002121	Strongylocentrotus purpuratus-specific protein	none
SPU_002125	SPU_002125	contains PRK12678 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_002145	SPU_002145	Strongylocentrotus purpuratus-specific protein	none
SPU_002159	SPU_002159	Strongylocentrotus purpuratus-specific protein	none
SPU_002180	SPU_002180	Strongylocentrotus purpuratus-specific protein	none
SPU_002187	SPU_002187	Strongylocentrotus purpuratus-specific protein	none
SPU_002198	SPU_002198	Strongylocentrotus purpuratus-specific protein	none
SPU_002212	SPU_002212	Strongylocentrotus purpuratus-specific protein	none
SPU_002217	SPU_002217	Strongylocentrotus purpuratus-specific protein	none
SPU_002315	SPU_002315	Strongylocentrotus purpuratus-specific protein	none
SPU_002326	SPU_002326	contains SAM superfamily motif at C-terminus and PRK12323 domain near N-terminus	none
SPU_002330	SPU_002330	Strongylocentrotus purpuratus-specific protein	none
SPU_002336	SPU_002336	Strongylocentrotus purpuratus-specific protein	none
SPU_002340	SPU_002340	contains AST1 domain	none
SPU_002390	SPU_002390	contains 2 BBOX superfamily motifs	none
SPU_002397	SPU_002397	Strongylocentrotus purpuratus-specific protein	none
SPU_002426	SPU_002426	contains CYK3 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_002444	SPU_002444	contains 2 C2 superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_002475	SPU_002475	contains 2 MFS superfamily motifs	none
SPU_002494	SPU_002494	Strongylocentrotus purpuratus-specific protein	none
SPU_002497	SPU_002497	contains FixC domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_002513	SPU_002513	contains 2 Exo_endo_phos superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_002539	SPU_002539	Strongylocentrotus purpuratus-specific protein	none
SPU_002545	SPU_002545	contains 2 SH3 superfamily motifs	none
SPU_002561	SPU_002561	contains 2 Sulfatase superfamily motifs	none
SPU_002574	SPU_002574	Strongylocentrotus purpuratus-specific protein	none
SPU_002579	SPU_002579	contains 2 APG9 superfamily motifs	none
SPU_002635	SPU_002635	COG3055 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_002651	SPU_002651	contains MDN1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_002654	SPU_002654	Strongylocentrotus purpuratus-specific protein	none
SPU_002665	SPU_002665	Strongylocentrotus purpuratus-specific protein	none
SPU_002666	SPU_002666	Strongylocentrotus purpuratus-specific protein	none
SPU_002671	SPU_002671	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_002679	SPU_002679	Strongylocentrotus purpuratus-specific protein	none
SPU_002700	SPU_002700	Strongylocentrotus purpuratus-specific protein	none
SPU_002701	SPU_002701	Strongylocentrotus purpuratus-specific protein	none
SPU_002728	SPU_002728	contains RecD domain	none
SPU_002745	SPU_002745	Strongylocentrotus purpuratus-specific protein	none
SPU_002748	SPU_002748	Strongylocentrotus purpuratus-specific protein	none
SPU_002754	SPU_002754	Strongylocentrotus purpuratus-specific protein	none
SPU_002755	SPU_002755	Strongylocentrotus purpuratus-specific protein	none
SPU_002842	SPU_002842	contains TRF4 domain	none
SPU_002852	SPU_002852	low quality sequence info	none
SPU_002865	SPU_002865	contains 2 WD40 superfamily motifs	none
SPU_002878	SPU_002878	contains Smc domain	none
SPU_002917	SPU_002917	Strongylocentrotus purpuratus-specific protein	none
SPU_002941	SPU_002941	contains IMD superfamily motif at N-terminus	none
SPU_002946	SPU_002946	Strongylocentrotus purpuratus-specific protein	none
SPU_002957	SPU_002957	Strongylocentrotus purpuratus-specific protein	none
SPU_002966	SPU_002966	contains 3 Mito_carr superfamily motifs in tandem at C-terminus	none
SPU_003023	SPU_003023	contains 8 ANK superfamily motifs and Arp domains	none
SPU_003040	SPU_003040	contains 2 CUB superfamily motifs and 6 EGF_CA superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_003089	SPU_003089	contains 2 Neur_chan_LBD superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_003100	SPU_003100	contains MRS6 domain	none
SPU_003120	SPU_003120	contains CDC55 domain	none
SPU_003144	SPU_003144	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_003146	SPU_003146	Strongylocentrotus purpuratus-specific protein	none
SPU_003163	SPU_003163	leucine-rich repeat kinase 2-like	none
SPU_003185	SPU_003185	Strongylocentrotus purpuratus-specific protein	none
SPU_003218	SPU_003218	endonuclease reverse transcriptase-like	none
SPU_003233	SPU_003233	contains SH3 superfamily motif at N-terminus and COG5531 domain	none
SPU_003243	SPU_003243	contains MDN1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_003245	SPU_003245	contains PRK07768 domain and PRK12476 domain and AMP-binding domain	none
SPU_003278	SPU_003278	Strongylocentrotus purpuratus-specific protein	none
SPU_003300	SPU_003300	contains 2 FA58C superfamily motifs	none
SPU_003328	SPU_003328	contains 2 NADB_Rossmann superfamily motifs	none
SPU_003350	SPU_003350	contains 2 CUB superfamily motifs and 2 LDLa superfamily motifs	none
SPU_003364	SPU_003364	Strongylocentrotus purpuratus-specific protein	none
SPU_003365	SPU_003365	contains 2 7tm_1 superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_003434	SPU_003434	Strongylocentrotus purpuratus-specific protein	none
SPU_003440	SPU_003440	Strongylocentrotus purpuratus-specific protein	none
SPU_003448	SPU_003448	Strongylocentrotus purpuratus-specific protein	none
SPU_003452	SPU_003452	contains flgC domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_003461	SPU_003461	KIAA1753 protein-like	none
SPU_003494	SPU_003494	Strongylocentrotus purpuratus-specific protein	none
SPU_003497	SPU_003497	contains zf-MYND superfamily motif near C-terminus	none
SPU_003499	SPU_003499	Strongylocentrotus purpuratus-specific protein	none
SPU_003522	SPU_003522	Strongylocentrotus purpuratus-specific protein	none
SPU_003533	SPU_003533	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_003538	SPU_003538	contains 8 ANK superfamily motifs and Arp domains	none
SPU_003568	SPU_003568	Strongylocentrotus purpuratus-specific protein	none
SPU_003602	SPU_003602	contains COG1480 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_003605	SPU_003605	contains 3 CCP superfamily motifs	none
SPU_003630	SPU_003630	contains PRK13342 domain	none
SPU_003637	SPU_003637	contains 3 CUB superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_003644	SPU_003644	contains SMC_N domain and TPR_MLP1_2 domain	none
SPU_003657	SPU_003657	contains Smc domain	none
SPU_003674	SPU_003674	Strongylocentrotus purpuratus-specific protein	none
SPU_003675	SPU_003675	Strongylocentrotus purpuratus-specific protein	none
SPU_003755	SPU_003755	Strongylocentrotus purpuratus-specific protein	none
SPU_003690	SPU_003690	Strongylocentrotus purpuratus-specific protein	none
SPU_003697	SPU_003697	Strongylocentrotus purpuratus-specific protein	none
SPU_003707	SPU_003707	Strongylocentrotus purpuratus-specific protein	none
SPU_003735	SPU_003735	Strongylocentrotus purpuratus-specific protein	none
SPU_003752	SPU_003752	contains 2 LDL_recept_b superfamily motifs and 2 EGF_CA superfamily motifs	none
SPU_003767	SPU_003767	contains Smc domain	none
SPU_003769	SPU_003769	contains 4 LDLa superfamily motifs	none
SPU_003776	SPU_003776	Strongylocentrotus purpuratus-specific protein	none
SPU_003788	SPU_003788	Strongylocentrotus purpuratus-specific protein	none
SPU_003807	SPU_003807	Strongylocentrotus purpuratus-specific protein	none
SPU_003816	SPU_003816	contains 4 MAM superfamily motifs	none
SPU_003836	SPU_003836	Strongylocentrotus purpuratus-specific protein	none
SPU_003872	SPU_003872	Strongylocentrotus purpuratus-specific protein	none
SPU_003890	SPU_003890	Strongylocentrotus purpuratus-specific protein	none
SPU_003897	SPU_003897	contains MAP65_ASE1 domain	none
SPU_003946	SPU_003946	Strongylocentrotus purpuratus-specific protein	none
SPU_003949	SPU_003949	Strongylocentrotus purpuratus-specific protein	none
SPU_004027	SPU_004027	contains 2 EGF_CA superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_004073	SPU_004073	Strongylocentrotus purpuratus-specific protein	none
SPU_004130	SPU_004130	contains Smc domain and SMC_N domain	none
SPU_004164	SPU_004164	contains HSP70 domain	none
SPU_004215	SPU_004215	Strongylocentrotus purpuratus-specific protein	none
SPU_004219	SPU_004219	contains rne domain	none
SPU_004229	SPU_004229	contains DPPIV_N domain	none
SPU_004240	SPU_004240	Strongylocentrotus purpuratus-specific protein	none
SPU_004250	SPU_004250	Strongylocentrotus purpuratus-specific protein	none
SPU_004277	SPU_004277	contains 2 RNA_Pol_B_RPB2 superfamily motifs	none
SPU_004299	SPU_004299	Strongylocentrotus purpuratus-specific protein	none
SPU_004306	SPU_004306	contains 2 EGF_CA superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_004348	SPU_004348	Strongylocentrotus purpuratus-specific protein	none
SPU_004372	SPU_004372	contains 2 EGF_CA superfamily motifs and 2 GCC2_GCC3 superfamily motifs	none
SPU_004378	SPU_004378	Strongylocentrotus purpuratus-specific protein	none
SPU_004385	SPU_004385	Strongylocentrotus purpuratus-specific protein	none
SPU_004386	SPU_004386	Strongylocentrotus purpuratus-specific protein	none
SPU_004440	SPU_004440	Strongylocentrotus purpuratus-specific protein	none
SPU_004447	SPU_004447	contains 3 LDL_recept_b superfamily motifs and 2 EGF_CA superfamily motifs and COG3391 domain	none
SPU_004482	SPU_004482	contains UDPGT domain	none
SPU_004507	SPU_004507	Strongylocentrotus purpuratus-specific protein	none
SPU_004521	SPU_004521	contains SMC_N domain	none
SPU_004589	SPU_004589	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_004624	SPU_004624	Strongylocentrotus purpuratus-specific protein	none
SPU_004645	SPU_004645	contains 2 Es2 superfamily motifs	none
SPU_004646	SPU_004646	Strongylocentrotus purpuratus-specific protein	none
SPU_004647	SPU_004647	contains DHC_N1 domain	none
SPU_004672	SPU_004672	contains 2 WD40 superfamily motifs and COG2319 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_004674	SPU_004674	Strongylocentrotus purpuratus-specific protein	none
SPU_004678	SPU_004678	Strongylocentrotus purpuratus-specific protein	none
SPU_004716	SPU_004716	contains 4 S1-like (Cold Shock Protein motif) superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_004744	SPU_004744	Strongylocentrotus purpuratus-specific protein	none
SPU_004748	SPU_004748	Strongylocentrotus purpuratus-specific protein	none
SPU_004750	SPU_004750	contains AMP-binding domain	none
SPU_004799	SPU_004799	contains PKinase domain	none
SPU_004822	SPU_004822	Strongylocentrotus purpuratus-specific protein	none
SPU_004853	SPU_004853	Strongylocentrotus purpuratus-specific protein	none
SPU_004874	SPU_004874	Strongylocentrotus purpuratus-specific protein	none
SPU_004881	SPU_004881	contains Na_H_Exchanger domain	none
SPU_004959	SPU_004959	Strongylocentrotus purpuratus-specific protein	none
SPU_004987	SPU_004987	contains PRK00409 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_004994	SPU_004994	Strongylocentrotus purpuratus-specific protein	none
SPU_005041	SPU_005041	Strongylocentrotus purpuratus-specific protein	none
SPU_005081	SPU_005081	Strongylocentrotus purpuratus-specific protein	none
SPU_005086	SPU_005086	Strongylocentrotus purpuratus-specific protein	none
SPU_005114	SPU_005114	contains THAP superfamily motif at N-terminus	none
SPU_005120	SPU_005120	Strongylocentrotus purpuratus-specific protein	none
SPU_005134	SPU_005134	contains Prominin domain	none
SPU_005208	SPU_005208	Strongylocentrotus purpuratus-specific protein	none
SPU_005217	SPU_005217	Strongylocentrotus purpuratus-specific protein	none
SPU_005247	SPU_005247	contains 2 IG superfamily motifs	none
SPU_005251	SPU_005251	contains COG1233 domain	none
SPU_005287	SPU_005287	contains Caldesmon domain	none
SPU_005308	SPU_005308	contains SMC_N domain	none
SPU_005348	SPU_005348	contains SPO22 domain	none
SPU_005363	SPU_005363	contains 3 ANK superfamily motifs	none
SPU_005365	SPU_005365	Strongylocentrotus purpuratus-specific protein	none
SPU_005372	SPU_005372	Strongylocentrotus purpuratus-specific protein	none
SPU_005374	SPU_005374	contains 6 ANK superfamily motifs and Arp domains	none
SPU_005405	SPU_005405	Strongylocentrotus purpuratus-specific protein	none
SPU_005427	SPU_005427	contains 2 CUB superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_005432	SPU_005432	Strongylocentrotus purpuratus-specific protein	none
SPU_005445	SPU_005445	Strongylocentrotus purpuratus-specific protein	none
SPU_005454	SPU_005454	Strongylocentrotus purpuratus-specific protein	none
SPU_005470	SPU_005470	Strongylocentrotus purpuratus-specific protein	none
SPU_005488	SPU_005488	Strongylocentrotus purpuratus-specific protein	none
SPU_005502	SPU_005502	Strongylocentrotus purpuratus-specific protein	none
SPU_005530	SPU_005530	Strongylocentrotus purpuratus-specific protein	none
SPU_005562	SPU_005562	contains PRK00409 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_005563	SPU_005563	Strongylocentrotus purpuratus-specific protein	none
SPU_005564	SPU_005564	contains MukB domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_005568	SPU_005568	Strongylocentrotus purpuratus-specific protein	none
SPU_005601	SPU_005601	Strongylocentrotus purpuratus-specific protein	none
SPU_005603	SPU_005603	Strongylocentrotus purpuratus-specific protein	none
SPU_005604	SPU_005604	Strongylocentrotus purpuratus-specific protein	none
SPU_005612	SPU_005612	Strongylocentrotus purpuratus-specific protein	none
SPU_005629	SPU_005629	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_005649	SPU_005649	Strongylocentrotus purpuratus-specific protein	none
SPU_005680	SPU_005680	contains 2 TIR superfamily motifs	none
SPU_005716	SPU_005716	Strongylocentrotus purpuratus-specific protein	none
SPU_005719	SPU_005719	Strongylocentrotus purpuratus-specific protein	none
SPU_005755	SPU_005755	contains SSL2 domain	none
SPU_005769	SPU_005769	contains 2 PRK05648 domain motifs	none
SPU_005775	SPU_005775	contains 2 PBPb superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_005783	SPU_005783	contains 2 Sulfotransfer_1 domain motifs	none
SPU_005840	SPU_005840	contains CDC6 domain	none
SPU_005847	SPU_005847	contains 5 LDLa superfamily motifs	none
SPU_005869	SPU_005869	contains COG4372 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_005872	SPU_005872	Strongylocentrotus purpuratus-specific protein	none
SPU_005911	SPU_005911	Strongylocentrotus purpuratus-specific protein	none
SPU_005917	SPU_005917	contains 4 MAM superfamily motifs	none
SPU_005941	SPU_005941	contains 4 CCP superfamily motifs	none
SPU_005972	SPU_005972	contains 2 ANK superfamily motifs and Arp domains	none
SPU_005988	SPU_005988	Strongylocentrotus purpuratus-specific protein	none
SPU_005994	SPU_005994	Strongylocentrotus purpuratus-specific protein	none
SPU_006071	SPU_006071	contains 3 NHL superfamily motifs and PRK07764 domain and DOG3391 domain	none
SPU_006166	SPU_006166	contains HAP1_N domain	none
SPU_006195	SPU_006195	contains 3 SPEC superfamily motifs	none
SPU_006243	SPU_006243	Strongylocentrotus purpuratus-specific protein	none
SPU_006259	SPU_006259	Strongylocentrotus purpuratus-specific protein	none
SPU_006280	SPU_006280	Strongylocentrotus purpuratus-specific protein	none
SPU_006298	SPU_006298	contains CysJ domain	none
SPU_006331	SPU_006331	contains 4 HYR superfamily motifs	none
SPU_006335	SPU_006335	contains 3 MAM superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_006357	SPU_006357	Strongylocentrotus purpuratus-specific protein	none
SPU_006373	SPU_006373	Strongylocentrotus purpuratus-specific protein	none
SPU_006397	SPU_006397	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_006418	SPU_006418	contains PRK03918 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_006419	SPU_006419	contains DUF2450 domain	none
SPU_006425	SPU_006425	Strongylocentrotus purpuratus-specific protein	none
SPU_006460	SPU_006460	contains 2 Vps52 superfamily motifs	none
SPU_006464	SPU_006464	contains 5 SRCR superfamily motifs	none
SPU_006465	SPU_006465	contains Prominin domain	none
SPU_006491	SPU_006491	Strongylocentrotus purpuratus-specific protein	none
SPU_006498	SPU_006498	contains 3 TSP_1 superfamily motifs	none
SPU_006505	SPU_006505	Strongylocentrotus purpuratus-specific protein	none
SPU_006586	SPU_006586	contains RAD18 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_006590	SPU_006590	Strongylocentrotus purpuratus-specific protein	none
SPU_006592	SPU_006592	Strongylocentrotus purpuratus-specific protein	none
SPU_006657	SPU_006657	contains 3 PDZ superfamily motifs	none
SPU_006673	SPU_006673	contains PRK03918 domain	none
SPU_006674	SPU_006674	Strongylocentrotus purpuratus-specific protein	none
SPU_006696	SPU_006696	Strongylocentrotus purpuratus-specific protein	none
SPU_006716	SPU_006716	contains COG2319 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_006726	SPU_006726	contains Vinculin domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_006728	SPU_006728	Strongylocentrotus purpuratus-specific protein	none
SPU_006736	SPU_006736	Strongylocentrotus purpuratus-specific protein	none
SPU_006805	SPU_006805	Strongylocentrotus purpuratus-specific protein	none
SPU_006806	SPU_006806	Strongylocentrotus purpuratus-specific protein	none
SPU_006812	SPU_006812	contains 3 IG superfamily motifs	none
SPU_006841	SPU_006841	Strongylocentrotus purpuratus-specific protein	none
SPU_006862	SPU_006862	Strongylocentrotus purpuratus-specific protein	none
SPU_006863	SPU_006863	Strongylocentrotus purpuratus-specific protein	none
SPU_006919	SPU_006919	contains 2 7tm_1 superfamily motifs	none
SPU_006925	SPU_006925	Strongylocentrotus purpuratus-specific protein	none
SPU_006942	SPU_006942	Strongylocentrotus purpuratus-specific protein	none
SPU_006949	SPU_006949	contains C2 superfamily motif at N-terminus	none
SPU_006963	SPU_006963	Strongylocentrotus purpuratus-specific protein	none
SPU_007003	SPU_007003	Strongylocentrotus purpuratus-specific protein	none
SPU_007018	SPU_007018	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_007028	SPU_007028	Strongylocentrotus purpuratus-specific protein	none
SPU_007130	SPU_007130	Strongylocentrotus purpuratus-specific protein	none
SPU_007158	SPU_007158	Strongylocentrotus purpuratus-specific protein	none
SPU_007178	SPU_007178	Strongylocentrotus purpuratus-specific protein	none
SPU_007182	SPU_007182	contains 4 ANK superfamily motifs and Arp domains	none
SPU_007238	SPU_007238	contains Myosin_tail_1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_007286	SPU_007286	Strongylocentrotus purpuratus-specific protein	none
SPU_007289	SPU_007289	contains 2 HYR superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_007299	SPU_007299	contains 3 ANK superfamily motifs and Arp domains	none
SPU_007354	SPU_007354	Strongylocentrotus purpuratus-specific protein	none
SPU_007385	SPU_007385	Strongylocentrotus purpuratus-specific protein	none
SPU_007394	SPU_007394	Strongylocentrotus purpuratus-specific protein	none
SPU_007408	SPU_007408	Strongylocentrotus purpuratus-specific protein	none
SPU_007428	SPU_007428	contains MFS_1 domain	none
SPU_007434	SPU_007434	contains 2 SecD_SecF superfamily motifs	none
SPU_007458	SPU_007458	Strongylocentrotus purpuratus-specific protein	none
SPU_007461	SPU_007461	Strongylocentrotus purpuratus-specific protein	none
SPU_007488	SPU_007488	contains 3 ANK superfamily motifs and Arp domains	none
SPU_007521	SPU_007521	contains 5 ARM superfamily motifs	none
SPU_007531	SPU_007531	Strongylocentrotus purpuratus-specific protein	none
SPU_007541	SPU_007541	Strongylocentrotus purpuratus-specific protein	none
SPU_007562	SPU_007562	Strongylocentrotus purpuratus-specific protein	none
SPU_007597	SPU_007597	contains LisH superfamily motif at N-terminus. Strongylocentrotus purpuratus-specific protein.	none
SPU_007686	SPU_007686	contains COG1033 domain	none
SPU_007688	SPU_007688	contains 2 LRR_RI superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_007703	SPU_007703	Strongylocentrotus purpuratus-specific protein	none
SPU_007708	SPU_007708	contains PRK10263 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_007717	SPU_007717	Strongylocentrotus purpuratus-specific protein	none
SPU_007800	SPU_007800	contains 5 TSP1 superfamily motifs	none
SPU_007813	SPU_007813	contains SMC_N domain	none
SPU_007838	SPU_007838	contains CbiK domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_007851	SPU_007851	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_007875	SPU_007875	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_007888	SPU_007888	Strongylocentrotus purpuratus-specific protein	none
SPU_007889	SPU_007889	Strongylocentrotus purpuratus-specific protein	none
SPU_007901	SPU_007901	contains 2 RING superfamily motifs and PEX10 domain and PRK00409 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_007923	SPU_007923	Strongylocentrotus purpuratus-specific protein	none
SPU_007958	SPU_007958	Strongylocentrotus purpuratus-specific protein	none
SPU_007969	SPU_007969	Strongylocentrotus purpuratus-specific protein	none
SPU_008098	SPU_008098	contains COG4886 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_008118	SPU_008118	contains 2 SMC_N domain motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_008141	SPU_008141	Strongylocentrotus purpuratus-specific protein	none
SPU_008143	SPU_008143	contains Adaptin_N domain at N-terminus	none
SPU_008153	SPU_008153	contains RAD18 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_008161	SPU_008161	Strongylocentrotus purpuratus-specific protein	none
SPU_008171	SPU_008171	contains 4 ANK superfamily motifs and Arp domains	none
SPU_008200	SPU_008200	Strongylocentrotus purpuratus-specific protein	none
SPU_008225	SPU_008225	contains Pkinase domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_008236	SPU_008236	Strongylocentrotus purpuratus-specific protein	none
SPU_008242	SPU_008242	contains 2 Smc domain motifs	none
SPU_008260	SPU_008260	Strongylocentrotus purpuratus-specific protein	none
SPU_008324	SPU_008324	contains 2 DNA_Pol_B2 domain motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_008334	SPU_008334	Strongylocentrotus purpuratus-specific protein	none
SPU_008340	SPU_008340	Strongylocentrotus purpuratus-specific protein	none
SPU_008350	SPU_008350	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_008354	SPU_008354	contains 2 IG superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_008381	SPU_008381	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_008402	SPU_008402	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_008405	SPU_008405	contains 2 Kelch_1 superfamily motifs	none
SPU_008422	SPU_008422	contains 2 RCC1 superfamily motifs and ATS1 domain	none
SPU_008462	SPU_008462	Strongylocentrotus purpuratus-specific protein	none
SPU_008475	SPU_008475	Strongylocentrotus purpuratus-specific protein	none
SPU_008495	SPU_008495	Strongylocentrotus purpuratus-specific protein	none
SPU_008510	SPU_008510	Strongylocentrotus purpuratus-specific protein	none
SPU_008529	SPU_008529	Strongylocentrotus purpuratus-specific protein	none
SPU_008531	SPU_008531	contains COG1204 domain	none
SPU_008537	SPU_008537	contains DPPIV_N domain and COG1073 domain	none
SPU_008592	SPU_008592	contains 4 S1-lie superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_008594	SPU_008594	Strongylocentrotus purpuratus-specific protein	none
SPU_008596	SPU_008596	Strongylocentrotus purpuratus-specific protein	none
SPU_008627	SPU_008627	contains 4 ANK superfamily motifs and Arp domains	none
SPU_008628	SPU_008628	contains 8 ANK superfamily motifs and Arp domains	none
SPU_008657	SPU_008657	contains 2 WD40 superfamily motifs	none
SPU_008675	SPU_008675	Strongylocentrotus purpuratus-specific protein	none
SPU_008702	SPU_008702	contains DUF2152 domain	none
SPU_008720	SPU_008720	Strongylocentrotus purpuratus-specific protein	none
SPU_008760	SPU_008760	Strongylocentrotus purpuratus-specific protein	none
SPU_008772	SPU_008772	contains 5 IG superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_008811	SPU_008811	Strongylocentrotus purpuratus-specific protein	none
SPU_008924	SPU_008924	Strongylocentrotus purpuratus-specific protein	none
SPU_008935	SPU_008935	contains 7 ANK superfamily motifs and Arp domains	none
SPU_008991	SPU_008991	contains 5 ANK superfamily motifs and Arp domains	none
SPU_008995	SPU_008995	Strongylocentrotus purpuratus-specific protein	none
SPU_009024	SPU_009024	contains 5 IG superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_009035	SPU_009035	Strongylocentrotus purpuratus-specific protein	none
SPU_009060	SPU_009060	contains 3 ANK superfamily motifs and Arp domains	none
SPU_009070	SPU_009070	contains RecD domain	none
SPU_009072	SPU_009072	contains 2 WD40 superfamily motifs	none
SPU_009092	SPU_009092	Strongylocentrotus purpuratus-specific protein	none
SPU_009121	SPU_009121	contains AIR1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_009122	SPU_009122	contains 3 CCP superfamily motifs and 2 EGF_CA superfamily motifs	none
SPU_009133	SPU_009133	contains WcaA domain	none
SPU_009138	SPU_009138	Strongylocentrotus purpuratus-specific protein	none
SPU_009168	SPU_009168	contains ChaA domain	none
SPU_009243	SPU_009243	contains 4 EGF_CA superfamily motifs	none
SPU_009256	SPU_009256	contains COG4943 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_009265	SPU_009265	contains 3 Neuralized superfamily motifs	none
SPU_009271	SPU_009271	contains 2 COG1444 domain motifs	none
SPU_009280	SPU_009280	contains 2 RSI superfamily motifs	none
SPU_009284	SPU_009284	Strongylocentrotus purpuratus-specific protein	none
SPU_009314	SPU_009314	Strongylocentrotus purpuratus-specific protein	none
SPU_009321	SPU_009321	contains 4 FA58C superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_009323	SPU_009323	Strongylocentrotus purpuratus-specific protein	none
SPU_009324	SPU_009324	Strongylocentrotus purpuratus-specific protein	none
SPU_009337	SPU_009337	Strongylocentrotus purpuratus-specific protein	none
SPU_009410	SPU_009410	contains COG5089 domain	none
SPU_009411	SPU_009411	Strongylocentrotus purpuratus-specific protein	none
SPU_009439	SPU_009439	contains 3 DUF1126 superfamily motifs	none
SPU_009461	SPU_009461	contains 2 Ribophorin_II domain motifs	none
SPU_009519	SPU_009519	Strongylocentrotus purpuratus-specific protein	none
SPU_009525	SPU_009525	contains LIM superfamily motif at C-terminus. Strongylocentrotus purpuratus-specific protein.	none
SPU_009530	SPU_009530	contains 2 Galactosyl_T superfamily motifs	none
SPU_009531	SPU_009531	contains Vset domain and UDPGT domain	none
SPU_009543	SPU_009543	Strongylocentrotus purpuratus-specific protein	none
SPU_009550	SPU_009550	Strongylocentrotus purpuratus-specific protein	none
SPU_009554	SPU_009554	contains ATS1 domain	none
SPU_009577	SPU_009577	Strongylocentrotus purpuratus-specific protein	none
SPU_009582	SPU_009582	Strongylocentrotus purpuratus-specific protein	none
SPU_009599	SPU_009599	contains 2 ANK superfamily motifs and Ion_trans domain	none
SPU_009630	SPU_009630	Strongylocentrotus purpuratus-specific protein	none
SPU_009649	SPU_009649	Strongylocentrotus purpuratus-specific protein	none
SPU_009699	SPU_009699	Strongylocentrotus purpuratus-specific protein	none
SPU_009705	SPU_009705	Strongylocentrotus purpuratus-specific protein	none
SPU_009721	SPU_009721	contains IMD superfamily motif at N-terminus	none
SPU_009737	SPU_009737	contains COG5099 domain	none
SPU_009774	SPU_009774	contains Smc domain	none
SPU_009786	SPU_009786	Strongylocentrotus purpuratus-specific protein	none
SPU_009791	SPU_009791	contains 5 ANK superfamily motifs and Arp domains	none
SPU_009803	SPU_009803	Strongylocentrotus purpuratus-specific protein	none
SPU_009813	SPU_009813	Strongylocentrotus purpuratus-specific protein	none
SPU_009814	SPU_009814	Strongylocentrotus purpuratus-specific protein	none
SPU_009873	SPU_009873	contains 2 WSC superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_009891	SPU_009891	contains 2 NOP5NT superfamily motifs and SIK1 domain	none
SPU_009896	SPU_009896	contains 3 DUF1126 superfamily motifs and FRQ1 domain	none
SPU_009897	SPU_009897	Strongylocentrotus purpuratus-specific protein	none
SPU_009934	SPU_009934	Strongylocentrotus purpuratus-specific protein	none
SPU_009942	SPU_009942	contains 2 DEXDc superfamily motifs and MPH1 domain	none
SPU_009947	SPU_009947	Strongylocentrotus purpuratus-specific protein	none
SPU_009948	SPU_009948	Strongylocentrotus purpuratus-specific protein	none
SPU_009959	SPU_009959	contains 2 Esterase_lipase superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_009960	SPU_009960	Strongylocentrotus purpuratus-specific protein.	none
SPU_009967	SPU_009967	contains RING superfamily motif at C-terminus. contains 2 Smc domain motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_009978	SPU_009978	Strongylocentrotus purpuratus-specific protein	none
SPU_009986	SPU_009986	Strongylocentrotus purpuratus-specific protein	none
SPU_009999	SPU_009999	contains SNC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_010002	SPU_010002	contains 5 IG superfamily motifs	none
SPU_010010	SPU_010010	contains 2 Ion_trans domain motifs	none
SPU_010016	SPU_010016	contains 3 ANK superfamily motifs and Arp domains	none
SPU_010018	SPU_010018	Strongylocentrotus purpuratus-specific protein	none
SPU_010031	SPU_010031	contains 2 Ion_trans domain motifs	none
SPU_010044	SPU_010044	contains PAT1 domain at C-terminus. Strongylocentrotus purpuratus-specific protein.	none
SPU_010049	SPU_010049	contains 3 RT-like superfamily motifs	none
SPU_010058	SPU_010058	contains COG5038 domain	none
SPU_010059	SPU_010059	Strongylocentrotus purpuratus-specific protein	none
SPU_010077	SPU_010077	Strongylocentrotus purpuratus-specific protein	none
SPU_010079	SPU_010079	Strongylocentrotus purpuratus-specific protein	none
SPU_010086	SPU_010086	Strongylocentrotus purpuratus-specific protein	none
SPU_010111	SPU_010111	Strongylocentrotus purpuratus-specific protein	none
SPU_010112	SPU_010112	Strongylocentrotus purpuratus-specific protein	none
SPU_010162	SPU_010162	Strongylocentrotus purpuratus-specific protein	none
SPU_010170	SPU_010170	Strongylocentrotus purpuratus-specific protein	none
SPU_010171	SPU_010171	1	none
SPU_010176	SPU_010176	contains 5 MAM superfamily motifs	none
SPU_010177	SPU_010177	Strongylocentrotus purpuratus-specific protein	none
SPU_010186	SPU_010186	contains 4 TPR superfamily motifs	none
SPU_010196	SPU_010196	Strongylocentrotus purpuratus-specific protein	none
SPU_010233	SPU_010233	contains MIP-T3 domain	none
SPU_010236	SPU_010236	Strongylocentrotus purpuratus-specific protein	none
SPU_010269	SPU_010269	Strongylocentrotus purpuratus-specific protein	none
SPU_010294	SPU_010294	contains 2 P_loop_NTPase superfamily motifs	none
SPU_010315	SPU_010315	contains COG5219 domain	none
SPU_010322	SPU_010322	contains 8 ANK superfamily motifs and Arp domains	none
SPU_010326	SPU_010326	Strongylocentrotus purpuratus-specific protein	none
SPU_010439	SPU_010439	contains DUF2211 domain	none
SPU_010442	SPU_010442	contains 2 HYR superfamily motifs and 2 GCC2_GCC3 superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_010478	SPU_010478	Strongylocentrotus purpuratus-specific protein	none
SPU_010481	SPU_010481	contains 3 ANK superfamily motifs and Arp domains	none
SPU_010485	SPU_010485	contains SMC_N domain and Borrelia_P83 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_010486	SPU_010486	contains COG5222 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_010515	SPU_010515	contains PRK10920 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_010516	SPU_010516	Strongylocentrotus purpuratus-specific protein	none
SPU_010518	SPU_010518	contains 2 Glyco_transf_10 superfamily motifs	none
SPU_010614	SPU_010614	contains 3 ANK superfamily motifs and Arp domains	none
SPU_010661	SPU_010661	contains 2 ANK superfamily motifs	none
SPU_010681	SPU_010681	Strongylocentrotus purpuratus-specific protein	none
SPU_010718	SPU_010718	contains 6 EGF_CA superfamily motifs	none
SPU_010753	SPU_010753	contains 5 ANK superfamily motifs and Arp domains	none
SPU_010766	SPU_010766	contains COG5222 domain	none
SPU_010792	SPU_010792	Strongylocentrotus purpuratus-specific protein	none
SPU_010807	SPU_010807	Strongylocentrotus purpuratus-specific protein	none
SPU_010812	SPU_010812	contains PRK07003 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_010822	SPU_010822	contains 3 ANK superfamily motifs and Ion_trans domain	none
SPU_010828	SPU_010828	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_010841	SPU_010841	Strongylocentrotus purpuratus-specific protein	none
SPU_010859	SPU_010859	Strongylocentrotus purpuratus-specific protein	none
SPU_010877	SPU_010877	contains SMC_N domain	none
SPU_010885	SPU_010885	contains DYN1 domain	none
SPU_010905	SPU_010905	Strongylocentrotus purpuratus-specific protein	none
SPU_010938	SPU_010938	Strongylocentrotus purpuratus-specific protein	none
SPU_010947	SPU_010947	contains 5 MAM superfamily motifs	none
SPU_010975	SPU_010975	contains SMC_N domain	none
SPU_010986	SPU_010986	contains 2 Kelch_1 superfamily motifs	none
SPU_011008	SPU_011008	contains 2 ANK superfamily motifs and Arp domain	none
SPU_011013	SPU_011013	contains 4 TSP_1 superfamily motifs and 3 FA58C superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_011025	SPU_011025	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_011026	SPU_011026	contains 2 LDLa superfamily motifs and 3 MAM superfamily motifs	none
SPU_011047	SPU_011047	Strongylocentrotus purpuratus-specific protein	none
SPU_011068	SPU_011068	contains Smc domain	none
SPU_011131	SPU_011131	Strongylocentrotus purpuratus-specific protein	none
SPU_011150	SPU_011150	Strongylocentrotus purpuratus-specific protein	none
SPU_011153	SPU_011153	Strongylocentrotus purpuratus-specific protein	none
SPU_011164	SPU_011164	contains 6 BTB superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_011181	SPU_011181	Strongylocentrotus purpuratus-specific protein	none
SPU_011203	SPU_011203	Strongylocentrotus purpuratus-specific protein	none
SPU_011208	SPU_011208	contains 7 ANK superfamily motifs and Arp domains	none
SPU_011219	SPU_011219	contains COG3055 domain	none
SPU_011237	SPU_011237	contains PnbA domain	none
SPU_011267	SPU_011267	Strongylocentrotus purpuratus-specific protein	none
SPU_011310	SPU_011310	contains COG5098 domain	none
SPU_011386	SPU_011386	Strongylocentrotus purpuratus-specific protein	none
SPU_011422	SPU_011422	Strongylocentrotus purpuratus-specific protein	none
SPU_011423	SPU_011423	Strongylocentrotus purpuratus-specific protein	none
SPU_011433	SPU_011433	Strongylocentrotus purpuratus-specific protein	none
SPU_011463	SPU_011463	Strongylocentrotus purpuratus-specific protein	none
SPU_011468	SPU_011468	contains 3 Sell superfamily motifs and COG0790 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_011492	SPU_011492	contains 7 CUB superfamily motifs	none
SPU_011549	SPU_011549	contains 2 CUB superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_011559	SPU_011559	Strongylocentrotus purpuratus-specific protein	none
SPU_011565	SPU_011565	contains 5 HYR superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_011585	SPU_011585	contains COG0520 domain	none
SPU_011606	SPU_011606	contains 2 AdoMet_MTase superfamily motifs	none
SPU_011610	SPU_011610	Strongylocentrotus purpuratus-specific protein	none
SPU_011647	SPU_011647	contains 2 DEXDc superfamily motifs and MPH1 domain	none
SPU_011661	SPU_011661	Strongylocentrotus purpuratus-specific protein	none
SPU_011678	SPU_011678	Strongylocentrotus purpuratus-specific protein	none
SPU_011689	SPU_011689	Strongylocentrotus purpuratus-specific protein	none
SPU_011710	SPU_011710	contains TRF4 domain	none
SPU_011717	SPU_011717	no match except self-matching. Strongylocentrotus purpuratus-specific protein.	none
SPU_011726	SPU_011726	Strongylocentrotus purpuratus-specific protein	none
SPU_011738	SPU_011738	Strongylocentrotus purpuratus-specific protein	none
SPU_011742	SPU_011742	contains Pkinase domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_011767	SPU_011767	Strongylocentrotus purpuratus-specific protein	none
SPU_011770	SPU_011770	contains 3 Lipocalin superfamily motifs	none
SPU_011769	SPU_011769	contains 2 UBQ superfamily motifs and 2 ARM superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_011777	SPU_011777	Strongylocentrotus purpuratus-specific protein	none
SPU_011780	SPU_011780	contains 3 S1-like superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_011813	SPU_011813	Strongylocentrotus purpuratus-specific protein	none
SPU_011848	SPU_011848	contains ZU01 domain	none
SPU_011903	SPU_011903	contains Smc domain	none
SPU_011905	SPU_011905	Strongylocentrotus purpuratus-specific protein	none
SPU_011938	SPU_011938	Strongylocentrotus purpuratus-specific protein	none
SPU_011969	SPU_011969	contains 2 Peptidase_C19 superfamily motifs	none
SPU_011979	SPU_011979	Strongylocentrotus purpuratus-specific protein	none
SPU_012074	SPU_012074	contains COG1483 domain	none
SPU_012075	SPU_012075	contains DAP2 domain	none
SPU_012101	SPU_012101	contains Marek_A domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_012114	SPU_012114	contains 2 PP2Cc superfamily motifs	none
SPU_012134	SPU_012134	Strongylocentrotus purpuratus-specific protein	none
SPU_012147	SPU_012147	contains 2 ANK superfamily motifs	none
SPU_012164	SPU_012164	contains PAT1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_012167	SPU_012167	Strongylocentrotus purpuratus-specific protein	none
SPU_012181	SPU_012181	Strongylocentrotus purpuratus-specific protein	none
SPU_012188	SPU_012188	contains SMC_N domain and Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_012197	SPU_012197	contains XerD domain	none
SPU_012214	SPU_012214	matches only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_012224	SPU_012224	contains 5 ANK superfamily motifs and Arp domains	none
SPU_012225	SPU_012225	Strongylocentrotus purpuratus-specific protein	none
SPU_012254	SPU_012254	contains 3 SH3 superfamily motifs	none
SPU_012268	SPU_012268	Strongylocentrotus purpuratus-specific protein	none
SPU_012269	SPU_012269	Strongylocentrotus purpuratus-specific protein	none
SPU_012285	SPU_012285	Strongylocentrotus purpuratus-specific protein	none
SPU_012326	SPU_012326	Strongylocentrotus purpuratus-specific protein	none
SPU_012334	SPU_012334	Strongylocentrotus purpuratus-specific protein	none
SPU_012351	SPU_012351	contains 4 IG superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_012401	SPU_012401	contains 3 IG superfamily motifs and Transposase_22 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_012402	SPU_012402	Strongylocentrotus purpuratus-specific protein	none
SPU_012413	SPU_012413	contains A2M_N domain and COG2373 domain	none
SPU_012458	SPU_012458	Strongylocentrotus purpuratus-specific protein	none
SPU_012499	SPU_012499	contains 2 7tm_1 superfamily motifs	none
SPU_012569	SPU_012569	contains 6 ANK superfamily motifs and Arp domains	none
SPU_012602	SPU_012602	Strongylocentrotus purpuratus-specific protein	none
SPU_012635	SPU_012635	Strongylocentrotus purpuratus-specific protein	none
SPU_012654	SPU_012654	contains 7 EGF_CA superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_012656	SPU_012656	contains 2 CUB superfamily motifs and 5 EGF_CA superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_012670	SPU_012670	Strongylocentrotus purpuratus-specific protein	none
SPU_012728	SPU_012728	contains 2 WD40 superfamily motifs	none
SPU_012730	SPU_012730	Strongylocentrotus purpuratus-specific protein	none
SPU_012736	SPU_012736	contains 2 CCGC superfamily motifs	none
SPU_012783	SPU_012783	contains 6 CUB superfamily motifs	none
SPU_012788	SPU_012788	contains Smc domain	none
SPU_012808	SPU_012808	contains FabG domain	none
SPU_012815	SPU_012815	Strongylocentrotus purpuratus-specific protein	none
SPU_012822	SPU_012822	Strongylocentrotus purpuratus-specific protein	none
SPU_012833	SPU_012833	Strongylocentrotus purpuratus-specific protein	none
SPU_012870	SPU_012870	contains SMC_N domain	none
SPU_012903	SPU_012903	contains 6 HYR superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_012938	SPU_012938	Strongylocentrotus purpuratus-specific protein	none
SPU_012954	SPU_012954	contains 2 WSC superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_012965	SPU_012965	contains 2 Peptidase_C48 superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_012973	SPU_012973	Strongylocentrotus purpuratus-specific protein	none
SPU_013009	SPU_013009	contains 2 BTB superfamily motifs and 3 Kelch_1 superfamily motifs and COG3055 domain	none
SPU_013040	SPU_013040	Strongylocentrotus purpuratus-specific protein	none
SPU_013078	SPU_013078	Strongylocentrotus purpuratus-specific protein	none
SPU_013085	SPU_013085	Strongylocentrotus purpuratus-specific protein	none
SPU_013115	SPU_013115	Strongylocentrotus purpuratus-specific protein	none
SPU_013124	SPU_013124	contains COG0790 domain	none
SPU_013146	SPU_013146	Strongylocentrotus purpuratus-specific protein	none
SPU_013152	SPU_013152	Strongylocentrotus purpuratus-specific protein	none
SPU_013190	SPU_013190	Strongylocentrotus purpuratus-specific protein	none
SPU_013193	SPU_013193	Strongylocentrotus purpuratus-specific protein	none
SPU_013208	SPU_013208	contains Herpes_BLLF1 domain	none
SPU_013233	SPU_013233	contains PRK00409 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_013235	SPU_013235	contains MDN1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_013274	SPU_013274	Strongylocentrotus purpuratus-specific protein	none
SPU_013304	SPU_013304	contains 5 ANK superfamily motifs and Arp domains	none
SPU_013341	SPU_013341	Strongylocentrotus purpuratus-specific protein	none
SPU_013370	SPU_013370	Strongylocentrotus purpuratus-specific protein	none
SPU_013379	SPU_013379	Strongylocentrotus purpuratus-specific protein	none
SPU_013382	SPU_013382	Strongylocentrotus purpuratus-specific protein	none
SPU_013402	SPU_013402	contains 2 DNA_pol_B_2 domain motifs	none
SPU_013419	SPU_013419	contains 4 ANK superfamily motifs and Arp domains	none
SPU_013420	SPU_013420	contains 4 ANK superfamily motifs and Arp domains	none
SPU_013453	SPU_013453	Strongylocentrotus purpuratus-specific protein	none
SPU_013461	SPU_013461	Strongylocentrotus purpuratus-specific protein	none
SPU_013472	SPU_013472	Strongylocentrotus purpuratus-specific protein	none
SPU_013487	SPU_013487	Strongylocentrotus purpuratus-specific protein	none
SPU_013537	SPU_013537	contains COG4799 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_013541	SPU_013541	contains 4 MAM superfamily motifs	none
SPU_013553	SPU_013553	Strongylocentrotus purpuratus-specific protein	none
SPU_013602	SPU_013602	contains DUF812 domain	none
SPU_013605	SPU_013605	contains PRK1280 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_013606	SPU_013606	Strongylocentrotus purpuratus-specific protein.	none
SPU_013616	SPU_013616	Strongylocentrotus purpuratus-specific protein	none
SPU_013631	SPU_013631	contains 3 EGF_CA superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_013640	SPU_013640	Strongylocentrotus purpuratus-specific protein	none
SPU_013666	SPU_013666	Strongylocentrotus purpuratus-specific protein	none
SPU_013672	SPU_013672	contains 2 ANK superfamily motifs	none
SPU_013700	SPU_013700	contains OCD_Mu_Crystal domain	none
SPU_013784	SPU_013784	Strongylocentrotus purpuratus-specific protein	none
SPU_013817	SPU_013817	contains Herpes_BLLF1 domain	none
SPU_013826	SPU_013826	Strongylocentrotus purpuratus-specific protein	none
SPU_013827	SPU_013827	Strongylocentrotus purpuratus-specific protein	none
SPU_013830	SPU_013830	Strongylocentrotus purpuratus-specific protein	none
SPU_013851	SPU_013851	contains NPY1 domain	none
SPU_013911	SPU_013911	contains 4 CUB superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_014010	SPU_014010	contains Sulfotransfer_1 domain	none
SPU_014049	SPU_014049	contains 2 WSC superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_014052	SPU_014052	contains 2 VWC superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_014054	SPU_014054	Strongylocentrotus purpuratus-specific protein	none
SPU_014074	SPU_014074	contains 7 LDLa superfamily motifs	none
SPU_014101	SPU_014101	Strongylocentrotus purpuratus-specific protein	none
SPU_014116	SPU_014116	contains PRK02224 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_014117	SPU_014117	Strongylocentrotus purpuratus-specific protein	none
SPU_014123	SPU_014123	Strongylocentrotus purpuratus-specific protein	none
SPU_014171	SPU_014171	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_014194	SPU_014194	contains MDN1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_014199	SPU_014199	contains UvrD domain	none
SPU_014200	SPU_014200	contains B-block_TFIIIC domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_014224	SPU_014224	Strongylocentrotus purpuratus-specific protein	none
SPU_014235	SPU_014235	Strongylocentrotus purpuratus-specific protein	none
SPU_014273	SPU_014273	Strongylocentrotus purpuratus-specific protein	none
SPU_014284	SPU_014284	Strongylocentrotus purpuratus-specific protein	none
SPU_014292	SPU_014292	contains Arp domain	none
SPU_014300	SPU_014300	Strongylocentrotus purpuratus-specific protein	none
SPU_014321	SPU_014321	Strongylocentrotus purpuratus-specific protein	none
SPU_014353	SPU_014353	Strongylocentrotus purpuratus-specific protein	none
SPU_014354	SPU_014354	contains Smc domain	none
SPU_014374	SPU_014374	Strongylocentrotus purpuratus-specific protein	none
SPU_014389	SPU_014389	Strongylocentrotus purpuratus-specific protein	none
SPU_014427	SPU_014427	contains 2 FNR-like superfamily motifs	none
SPU_014428	SPU_014428	contains PRK05912 domain	none
SPU_014440	SPU_014440	contains Methyltransf_10 domain	none
SPU_014495	SPU_014495	contains COG5635 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_014508	SPU_014508	Strongylocentrotus purpuratus-specific protein	none
SPU_014540	SPU_014540	contains 4 NHL superfamily motifs and HRD1 domani and COG3386 domain	none
SPU_014545	SPU_014545	Strongylocentrotus purpuratus-specific protein	none
SPU_014547	SPU_014547	contains MRS 6 domain	none
SPU_014589	SPU_014589	contains 3 IPT superfamily motifs	none
SPU_014600	SPU_014600	contains 5 ANK superfamily motifs and Arp domain	none
SPU_014612	SPU_014612	Strongylocentrotus purpuratus-specific protein	none
SPU_014666	SPU_014666	Strongylocentrotus purpuratus-specific protein	none
SPU_014681	SPU_014681	Strongylocentrotus purpuratus-specific protein	none
SPU_014688	SPU_014688	contains 4 LRR_RI superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_014703	SPU_014703	Strongylocentrotus purpuratus-specific protein	none
SPU_014728	SPU_014728	contains Vinculin domain	none
SPU_014739	SPU_014739	Strongylocentrotus purpuratus-specific protein	none
SPU_014781	SPU_014781	contains 5 EGF_CA superfamily motifs	none
SPU_014811	SPU_014811	contains Coatomer_WDAD domain	none
SPU_014823	SPU_014823	contains 2 CCP superfamily motifs	none
SPU_014827	SPU_014827	contains 2 CUB superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_014837	SPU_014837	contains PAT1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_014868	SPU_014868	Strongylocentrotus purpuratus-specific protein	none
SPU_014925	SPU_014925	contains 2 Kelch_1 superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_014941	SPU_014941	Strongylocentrotus purpuratus-specific protein	none
SPU_014960	SPU_014960	Strongylocentrotus purpuratus-specific protein	none
SPU_014972	SPU_014972	contains 5 PLAT superfamily motifs	none
SPU_014987	SPU_014987	contains SGL domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_015014	SPU_015014	Strongylocentrotus purpuratus-specific protein	none
SPU_015081	SPU_015081	Strongylocentrotus purpuratus-specific protein	none
SPU_015104	SPU_015104	contains RAB domain	none
SPU_015150	SPU_015150	Strongylocentrotus purpuratus-specific protein	none
SPU_015170	SPU_015170	Strongylocentrotus purpuratus-specific protein	none
SPU_015195	SPU_015195	contains SMC_N domain	none
SPU_015196	SPU_015196	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_015202	SPU_015202	contains PRK00409 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_015208	SPU_015208	Strongylocentrotus purpuratus-specific protein	none
SPU_015244	SPU_015244	Strongylocentrotus purpuratus-specific protein	none
SPU_015272	SPU_015272	contains 4 EFh superfamily motifs and FRQ1 domain	none
SPU_015273	SPU_015273	contains 4 IG superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_015280	SPU_015280	Strongylocentrotus purpuratus-specific protein	none
SPU_015295	SPU_015295	Strongylocentrotus purpuratus-specific protein	none
SPU_015298	SPU_015298	Strongylocentrotus purpuratus-specific protein	none
SPU_015324	SPU_015324	Strongylocentrotus purpuratus-specific protein	none
SPU_015326	SPU_015326	Strongylocentrotus purpuratus-specific protein	none
SPU_015331	SPU_015331	contains COG5191 domain	none
SPU_015392	SPU_015392	Strongylocentrotus purpuratus-specific protein	none
SPU_015438	SPU_015438	Strongylocentrotus purpuratus-specific protein	none
SPU_015439	SPU_015439	Strongylocentrotus purpuratus-specific protein	none
SPU_015444	SPU_015444	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_015455	SPU_015455	Strongylocentrotus purpuratus-specific protein	none
SPU_015505	SPU_015505	contains 2 Smc domain motifs	none
SPU_015569	SPU_015569	Strongylocentrotus purpuratus-specific protein	none
SPU_015570	SPU_015570	Strongylocentrotus purpuratus-specific protein	none
SPU_015574	SPU_015574	contains PRK12704 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_015583	SPU_015583	contains 2 IG superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_015584	SPU_015584	contains 5 IG superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_015589	SPU_015589	Strongylocentrotus purpuratus-specific protein	none
SPU_015628	SPU_015628	contains COG3386 domain. homologous to numerous Branchiostoma floridae proteins.	none
SPU_015635	SPU_015635	Strongylocentrotus purpuratus-specific protein	none
SPU_015664	SPU_015664	Strongylocentrotus purpuratus-specific protein	none
SPU_015669	SPU_015669	Strongylocentrotus purpuratus-specific protein	none
SPU_015707	SPU_015707	contains PRK02106 domain	none
SPU_015713	SPU_015713	Strongylocentrotus purpuratus-specific protein	none
SPU_015724	SPU_015724	contains Adaptin_N domain	none
SPU_015755	SPU_015755	contains 2 EGF_CA superfamily motifs	none
SPU_015759	SPU_015759	contains 3 VWD superfamily motifs	none
SPU_015788	SPU_015788	contains COG1511 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_015794	SPU_015794	Strongylocentrotus purpuratus-specific protein	none
SPU_015812	SPU_015812	contains 5 HYR superfamily motifs and 2 EGF_CA superfamily motifs	none
SPU_015885	SPU_015885	contains HRD1 domain	none
SPU_015904	SPU_015904	Strongylocentrotus purpuratus-specific protein	none
SPU_015934	SPU_015934	Strongylocentrotus purpuratus-specific protein	none
SPU_015954	SPU_015954	Strongylocentrotus purpuratus-specific protein	none
SPU_015967	SPU_015967	contains 2 MFS superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_015971	SPU_015971	homologous to numerous Hydra proteins. Strongylocentrotus purpuratus-specific protein.	none
SPU_015992	SPU_015992	contains 2 PKc-like superfamily motifs and 2 IG superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_016037	SPU_016037	Strongylocentrotus purpuratus-specific protein	none
SPU_016041	SPU_016041	contains 7 EGF_CA superfamily motifs and 2 CCP superfamily motifs	none
SPU_016044	SPU_016044	contains PRK03992 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_016054	SPU_016054	Strongylocentrotus purpuratus-specific protein	none
SPU_016098	SPU_016098	Strongylocentrotus purpuratus-specific protein	none
SPU_016099	SPU_016099	Strongylocentrotus purpuratus-specific protein	none
SPU_016126	SPU_016126	contains 2 MFS superfamily motifs	none
SPU_016159	SPU_016159	Strongylocentrotus purpuratus-specific protein	none
SPU_016165	SPU_016165	contains ATS1 domain	none
SPU_016170	SPU_016170	contains 5 MAM superfamily motifs	none
SPU_016186	SPU_016186	contains COG1231 domain	none
SPU_016199	SPU_016199	Strongylocentrotus purpuratus-specific protein	none
SPU_016210	SPU_016210	contains 5 MAM superfamily motifs	none
SPU_016221	SPU_016221	contains Smc domain	none
SPU_016262	SPU_016262	Strongylocentrotus purpuratus-specific protein	none
SPU_016293	SPU_016293	contains MDN1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_016332	SPU_016332	Strongylocentrotus purpuratus-specific protein	none
SPU_016340	SPU_016340	Strongylocentrotus purpuratus-specific protein	none
SPU_016357	SPU_016357	Strongylocentrotus purpuratus-specific protein	none
SPU_016358	SPU_016358	contains Smc domain	none
SPU_016412	SPU_016412	Strongylocentrotus purpuratus-specific protein	none
SPU_016425	SPU_016425	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_016460	SPU_016460	contains 7 ANK superfamily motifs and Arp domain	none
SPU_016461	SPU_016461	contains 6 ANK superfamily motifs and Arp domain	none
SPU_016466	SPU_016466	Strongylocentrotus purpuratus-specific protein	none
SPU_016484	SPU_016484	Strongylocentrotus purpuratus-specific protein	none
SPU_016491	SPU_016491	contains PRK02362 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_016497	SPU_016497	contains Herpes_BLLF1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_016509	SPU_016509	Strongylocentrotus purpuratus-specific protein	none
SPU_016544	SPU_016544	Strongylocentrotus purpuratus-specific protein	none
SPU_016567	SPU_016567	contains Myosin_tail_1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_016568	SPU_016568	contains Myosin_tail_1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_016608	SPU_016608	contains PRK10416 domain	none
SPU_016628	SPU_016628	contains ATS1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_016662	SPU_016662	Strongylocentrotus purpuratus-specific protein	none
SPU_016668	SPU_016668	contains ND5 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_016676	SPU_016676	Strongylocentrotus purpuratus-specific protein	none
SPU_016683	SPU_016683	Strongylocentrotus purpuratus-specific protein	none
SPU_016714	SPU_016714	Strongylocentrotus purpuratus-specific protein	none
SPU_016757	SPU_016757	contains 7 Kelch_1 superfamily motifs	none
SPU_016759	SPU_016759	Strongylocentrotus purpuratus-specific protein	none
SPU_016775	SPU_016775	Strongylocentrotus purpuratus-specific protein	none
SPU_016783	SPU_016783	contains COG5635 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_016801	SPU_016801	contains SMC_N domain and Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_017820	SPU_017820	Strongylocentrotus purpuratus-specific protein	none
SPU_016842	SPU_016842	contains 2 PDZ superfamily motifs	none
SPU_016859	SPU_016859	Strongylocentrotus purpuratus-specific protein. homologous to numerous putative Branchiostoma floridae proteins.	none
SPU_016874	SPU_016874	Strongylocentrotus purpuratus-specific protein	none
SPU_016888	SPU_016888	contains 3 IG superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_016904	SPU_016904	Strongylocentrotus purpuratus-specific protein	none
SPU_016908	SPU_016908	Strongylocentrotus purpuratus-specific protein	none
SPU_016919	SPU_016919	contains 4 ANK superfamily motifs and Arp domain	none
SPU_016943	SPU_016943	Strongylocentrotus purpuratus-specific protein	none
SPU_016963	SPU_016963	contains PRK00055 domain	none
SPU_016995	SPU_016995	Strongylocentrotus purpuratus-specific protein	none
SPU_016998	SPU_016998	Strongylocentrotus purpuratus-specific protein	none
SPU_017003	SPU_017003	Strongylocentrotus purpuratus-specific protein. homologous only to the protein itself.	none
SPU_017011	SPU_017011	Strongylocentrotus purpuratus-specific protein	none
SPU_017018	SPU_017018	Strongylocentrotus purpuratus-specific protein	none
SPU_017044	SPU_017044	Strongylocentrotus purpuratus-specific protein	none
SPU_017066	SPU_017066	Strongylocentrotus purpuratus-specific protein	none
SPU_017074	SPU_017074	contains 4 EGF_CA superfamily motifs and 3 GCC2_GCC3 superfamily motifs	none
SPU_017108	SPU_017108	Strongylocentrotus purpuratus-specific protein	none
SPU_017140	SPU_017140	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_017145	SPU_017145	Strongylocentrotus purpuratus-specific protein	none
SPU_017152	SPU_017152	Strongylocentrotus purpuratus-specific protein	none
SPU_017183	SPU_017183	contains SNF2_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_017192	SPU_017192	Strongylocentrotus purpuratus-specific protein	none
SPU_017214	SPU_017214	contains 2 TPR superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_017223	SPU_017223	Strongylocentrotus purpuratus-specific protein	none
SPU_017260	SPU_017260	Strongylocentrotus purpuratus-specific protein. no match except to itself.	none
SPU_017262	SPU_017262	Strongylocentrotus purpuratus-specific protein	none
SPU_017272	SPU_017272	Strongylocentrotus purpuratus-specific protein	none
SPU_017282	SPU_017282	Strongylocentrotus purpuratus-specific protein	none
SPU_017294	SPU_017294	Strongylocentrotus purpuratus-specific protein	none
SPU_017309	SPU_017309	Strongylocentrotus purpuratus-specific protein	none
SPU_017331	SPU_017331	Strongylocentrotus purpuratus-specific protein	none
SPU_017332	SPU_017332	Strongylocentrotus purpuratus-specific protein	none
SPU_017339	SPU_017339	Strongylocentrotus purpuratus-specific protein	none
SPU_017353	SPU_017353	Strongylocentrotus purpuratus-specific protein	none
SPU_017357	SPU_017357	Strongylocentrotus purpuratus-specific protein	none
SPU_017359	SPU_017359	Strongylocentrotus purpuratus-specific protein	none
SPU_017368	SPU_017368	contains ATS1 domain	none
SPU_017391	SPU_017391	contains AIR1 domain	none
SPU_017392	SPU_017392	contains AIR1 domain	none
SPU_017393	SPU_017393	contains AIR1 domain	none
SPU_017394	SPU_017394	contains AIR1 domain	none
SPU_017400	SPU_017400	contains 2 ANK superfamily motifs and Arp domain	none
SPU_017406	SPU_017406	contains 2 Cu-oxidase superfamily motifs. also homologous to laccase proteins in many insects.	none
SPU_017430	SPU_017430	Strongylocentrotus purpuratus-specific protein	none
SPU_017480	SPU_017480	contains 3 IG superfamily motifs and Vset domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_017499	SPU_017499	contains 3 SNF superfamily motifs	none
SPU_017514	SPU_017514	contains 3 PDZ superfamily motifs and PAT1 domain	none
SPU_017539	SPU_017539	contains 4 HTH_psq superfamily motifs	none
SPU_017550	SPU_017550	Strongylocentrotus purpuratus-specific protein	none
SPU_017566	SPU_017566	contains 2 ANK superfamily motifs and Arp domain	none
SPU_017586	SPU_017586	contains 6 KAZAL_FS superfamily motifs	none
SPU_017591	SPU_017591	Strongylocentrotus purpuratus-specific protein	none
SPU_017653	SPU_017653	contains 2 Gal-3-0_sulfotr superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_017654	SPU_017654	Strongylocentrotus purpuratus-specific protein	none
SPU_017670	SPU_017670	Strongylocentrotus purpuratus-specific protein	none
SPU_017671	SPU_017671	contains COG2103 domain	none
SPU_017706	SPU_017706	Strongylocentrotus purpuratus-specific protein	none
SPU_017709	SPU_017709	contains Transposase_22 domain	none
SPU_017728	SPU_017728	contains 4 WD40 superfamily motifs	none
SPU_017746	SPU_017746	Strongylocentrotus purpuratus-specific protein	none
SPU_017754	SPU_017754	Strongylocentrotus purpuratus-specific protein	none
SPU_017785	SPU_017785	Strongylocentrotus purpuratus-specific protein	none
SPU_017786	SPU_017786	contains 2 MBT superfamily motifs	none
SPU_017792	SPU_017792	contains Adaptin_N domain	none
SPU_017855	SPU_017855	Strongylocentrotus purpuratus-specific protein	none
SPU_017858	SPU_017858	Strongylocentrotus purpuratus-specific protein	none
SPU_017873	SPU_017873	contains 2 VWC superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_017883	SPU_017883	Strongylocentrotus purpuratus-specific protein	none
SPU_017894	SPU_017894	contains Spc7 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_017913	SPU_017913	Strongylocentrotus purpuratus-specific protein	none
SPU_017924	SPU_017924	contains 2 BBOX superfamily motifs and SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_017970	SPU_017970	Strongylocentrotus purpuratus-specific protein	none
SPU_018024	SPU_018024	contains PRK09297 domain	none
SPU_018042	SPU_018042	Strongylocentrotus purpuratus-specific protein	none
SPU_018049	SPU_018049	contains 3 ANK superfamily motifs	none
SPU_018077	SPU_018077	contains Smc domain	none
SPU_018150	SPU_018150	Strongylocentrotus purpuratus-specific protein	none
SPU_018208	SPU_018208	contains 6 NF-X1_zinc_finger superfamily motifs	none
SPU_018223	SPU_018223	contains 5 CUB superfamily motifs	none
SPU_018255	SPU_018255	Strongylocentrotus purpuratus-specific protein	none
SPU_018287	SPU_018287	contains PRK06241 domain. homologous to bacterial proteins.	none
SPU_018296	SPU_018296	contains 4 SRCR superfamily motifs	none
SPU_018300	SPU_018300	contains Smc domain	none
SPU_018395	SPU_018395	possible chimera due to erroneous sequence assembly ?	none
SPU_018517	SPU_018517	Strongylocentrotus purpuratus-specific protein	none
SPU_018539	SPU_018539	Strongylocentrotus purpuratus-specific protein	none
SPU_018594	SPU_018594	Strongylocentrotus purpuratus-specific protein	none
SPU_018616	SPU_018616	contains 2 Thioredoxin-like superfamily motifs	none
SPU_018696	SPU_018696	contains 2 MFS superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_018712	SPU_018712	contains SXM1 domain	none
SPU_018789	SPU_018789	contains COG4372 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_018790	SPU_018790	contains 4 COG5638 domain motifs	none
SPU_018839	SPU_018839	contains PRK00409 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_018942	SPU_018942	contains 4 ANK superfamily motifs and Arp domain	none
SPU_019034	SPU_019034	Strongylocentrotus purpuratus-specific protein	none
SPU_019063	SPU_019063	contains AST1 domain	none
SPU_019098	SPU_019098	Strongylocentrotus purpuratus-specific protein	none
SPU_019125	SPU_019125	contains 2 TPR superfamily motifs	none
SPU_019148	SPU_019148	contains FRQ1 domain	none
SPU_019173	SPU_019173	Strongylocentrotus purpuratus-specific protein	none
SPU_019194	SPU_019194	Strongylocentrotus purpuratus-specific protein	none
SPU_019211	SPU_019211	poor amino acid sequence (~50% of amino acids are X)	none
SPU_019225	SPU_019225	contains COG1331 domain	none
SPU_019237	SPU_019237	contains 4 SRCR superfamily motifs	none
SPU_019244	SPU_019244	Strongylocentrotus purpuratus-specific protein	none
SPU_019250	SPU_019250	Strongylocentrotus purpuratus-specific protein	none
SPU_019284	SPU_019284	contains RAD18 domain	none
SPU_019311	SPU_019311	contains 3 CCP superfamily motifs and 2 GCC2_GCC3 superfamily motifs	none
SPU_019296	SPU_019296	contains Golgin_A5 domain and SMC_N domain	none
SPU_019313	SPU_019313	probable assembly chimera	none
SPU_019315	SPU_019315	Strongylocentrotus purpuratus-specific protein	none
SPU_019343	SPU_019343	contains infB domain	none
SPU_019439	SPU_019439	contains Macoilin domain	none
SPU_019511	SPU_019511	contains 4 C2 superfamily motifs	none
SPU_019538	SPU_019538	Strongylocentrotus purpuratus-specific protein	none
SPU_019565	SPU_019565	Strongylocentrotus purpuratus-specific protein	none
SPU_019566	SPU_019566	contains COG3386 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_019578	SPU_019578	poor amino acid sequence (~40 % of amino acids are X)	none
SPU_019611	SPU_019611	probable assembly chimera	none
SPU_019621	SPU_019621	contains 7 ANK superfamily motifs and Arp domain	none
SPU_019665	SPU_019665	contains 4 IG superfamily motifs	none
SPU_019716	SPU_019716	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_019731	SPU_019731	contains 5 Kelch_1 superfamily motifs	none
SPU_019752	SPU_019752	Strongylocentrotus purpuratus-specific protein	none
SPU_019789	SPU_019789	Strongylocentrotus purpuratus-specific protein	none
SPU_019801	SPU_019801	Strongylocentrotus purpuratus-specific protein	none
SPU_019807	SPU_019807	contains Smc domain and SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_019831	SPU_019831	Strongylocentrotus purpuratus-specific protein	none
SPU_019851	SPU_019851	contains DNA_Pol_B_2 domain	none
SPU_019860	SPU_019860	contains 3 FA58C superfamily motifs	none
SPU_019891	SPU_019891	contains 4 FA58C superfamily motifs	none
SPU_019892	SPU_019892	contains 6 ANK superfamily motifs and Arp domain	none
SPU_019937	SPU_019937	Strongylocentrotus purpuratus-specific protein	none
SPU_019938	SPU_019938	Strongylocentrotus purpuratus-specific protein	none
SPU_019941	SPU_019941	Strongylocentrotus purpuratus-specific protein	none
SPU_019963	SPU_019963	contains 4 SH3 superfamily motifs	none
SPU_019978	SPU_019978	Strongylocentrotus purpuratus-specific protein	none
SPU_019981	SPU_019981	contains 2 Ldl_recept_b superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_019993	SPU_019993	contains PRK03918 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_020047	SPU_020047	contains 2 HYR superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_020068	SPU_020068	contains COG1340 domain	none
SPU_020111	SPU_020111	contains MgtA domain	none
SPU_020122	SPU_020122	contains 8 HYR superfamily motifs	none
SPU_020133	SPU_020133	Strongylocentrotus purpuratus-specific protein	none
SPU_020134	SPU_020134	Strongylocentrotus purpuratus-specific protein	none
SPU_020145	SPU_020145	contains 3 Kelch_1 superfamily motifs	none
SPU_020152	SPU_020152	homologous to numerous Branchiostoma floridae proteins	none
SPU_020198	SPU_020198	contains 2 RGS superfamily motifs	none
SPU_020245	SPU_020245	contains 2 HYR superfamily motifs	none
SPU_020328	SPU_020328	Strongylocentrotus purpuratus-specific protein	none
SPU_020421	SPU_020421	contains 2 LRR_RI superfamily motifs	none
SPU_020422	SPU_020422	contains 3 Neuralized superfamily motifs	none
SPU_020470	SPU_020470	contains 3 IG superfamily motifs	none
SPU_020539	SPU_020539	contains V-set domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_020546	SPU_020546	contains COG3391 domain	none
SPU_020571	SPU_020571	Strongylocentrotus purpuratus-specific protein	none
SPU_020572	SPU_020572	contains PRK00409 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_020589	SPU_020589	contains COG5222 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_020602	SPU_020602	Strongylocentrotus purpuratus-specific protein	none
SPU_020608	SPU_020608	Strongylocentrotus purpuratus-specific protein	none
SPU_020618	SPU_020618	contains V-set domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_020660	SPU_020660	contains 10 ANK superfamily motifs and Arp domain	none
SPU_020661	SPU_020661	contains COG5635 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_020684	SPU_020684	Strongylocentrotus purpuratus-specific protein	none
SPU_020702	SPU_020702	contains 2 SAM superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_020716	SPU_020716	contains 2 HYR superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_020723	SPU_020723	contains 5 ANK superfamily motifs and Arp domain	none
SPU_020727	SPU_020727	homologous to numerous Hydra magnipapillata proteins	none
SPU_020765	SPU_020765	contains COG1112 domain	none
SPU_020768	SPU_020768	Strongylocentrotus purpuratus-specific protein	none
SPU_020779	SPU_020779	Strongylocentrotus purpuratus-specific protein	none
SPU_020786	SPU_020786	Strongylocentrotus purpuratus-specific protein	none
SPU_020787	SPU_020787	Strongylocentrotus purpuratus-specific protein	none
SPU_020794	SPU_020794	Strongylocentrotus purpuratus-specific protein	none
SPU_020805	SPU_020805	contains SGL domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_020810	SPU_020810	contains 4 ANK superfamily motifs and Arp domain	none
SPU_020818	SPU_020818	contains 5 HYR superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_020843	SPU_020843	contains 2 ANK superfamily motifs	none
SPU_020866	SPU_020866	Strongylocentrotus purpuratus-specific protein	none
SPU_020894	SPU_020894	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_020896	SPU_020896	contains MDN1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_020921	SPU_020921	Strongylocentrotus purpuratus-specific protein	none
SPU_020958	SPU_020958	Strongylocentrotus purpuratus-specific protein	none
SPU_020964	SPU_020964	contains MviM domain	none
SPU_020967	SPU_020967	Strongylocentrotus purpuratus-specific protein	none
SPU_021053	SPU_021053	contains 2 Galactosyl_T superfamily motifs	none
SPU_021107	SPU_021107	contains Smc domain and SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_021122	SPU_021122	Strongylocentrotus purpuratus-specific protein	none
SPU_021138	SPU_021138	Strongylocentrotus purpuratus-specific protein	none
SPU_021139	SPU_021139	Strongylocentrotus purpuratus-specific protein	none
SPU_021163	SPU_021163	contains 2 SPEC superfamily motifs	none
SPU_021227	SPU_021227	contains DHC_N1 domain	none
SPU_021247	SPU_021247	Strongylocentrotus purpuratus-specific protein	none
SPU_021258	SPU_021258	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_021310	SPU_021310	Strongylocentrotus purpuratus-specific protein	none
SPU_021314	SPU_021314	contains 2 MFS superfamily motifs	none
SPU_021338	SPU_021338	Strongylocentrotus purpuratus-specific protein	none
SPU_021389	SPU_021389	contains 7 ANK superfamily motifs and Arp domain	none
SPU_021413	SPU_021413	homologous only to 1 putative Strongylocentrotus purpuratus protein. Strongylocentrotus purpuratus-specific protein.	none
SPU_021429	SPU_021429	contains 3 DUF1126 superfamily motifs	none
SPU_021446	SPU_021446	Strongylocentrotus purpuratus-specific protein	none
SPU_021541	SPU_021541	contains 2 CCP superfamily motifs and COG3889 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_021626	SPU_021626	Strongylocentrotus purpuratus-specific protein	none
SPU_021644	SPU_021644	contains 5 ANK superfamily motifs and Arp domain	none
SPU_021659	SPU_021659	contains V-set domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_021710	SPU_021710	contains PAT1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_021791	SPU_021791	contains 2 7tm_1 superfamily motifs	none
SPU_021831	SPU_021831	contains 2 MFS superfamily motifs	none
SPU_021842	SPU_021842	Strongylocentrotus purpuratus-specific protein	none
SPU_021845	SPU_021845	contains CH superfamily motif at C-terminus	none
SPU_021850	SPU_021850	contains CH superfamily motif at C-terminus and Smc domain	none
SPU_021856	SPU_021856	contains 2 Serinc superfamily motifs	none
SPU_021935	SPU_021935	contains PRK07003 domain and AcuC domain	none
SPU_021953	SPU_021953	Strongylocentrotus purpuratus-specific protein	none
SPU_021971	SPU_021971	contains SMC_N domain and PRK11281 domain	none
SPU_021980	SPU_021980	Strongylocentrotus purpuratus-specific protein	none
SPU_021999	SPU_021999	Strongylocentrotus purpuratus-specific protein	none
SPU_022008	SPU_022008	contains DHC_N1 domain	none
SPU_022018	SPU_022018	Strongylocentrotus purpuratus-specific protein	none
SPU_022069	SPU_022069	contains COG5141 domain	none
SPU_022096	SPU_022096	Strongylocentrotus purpuratus-specific protein	none
SPU_022106	SPU_022106	contains COG2319 domain	none
SPU_022107	SPU_022107	contains Smc domain	none
SPU_022119	SPU_022119	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_022137	SPU_022137	contains 2 CUB superfamily motifs	none
SPU_022138	SPU_022138	contains 4 CUB superfamily motifs	none
SPU_022143	SPU_022143	contains 3 IG superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_022165	SPU_022165	Strongylocentrotus purpuratus-specific protein	none
SPU_022176	SPU_022176	contains 7 ANK superfamily motifs and Arp domain	none
SPU_022182	SPU_022182	contains 2 EFh superfamily motifs	none
SPU_022219	SPU_022219	contains 2 7tm_1 superfamily motifs	none
SPU_022249	SPU_022249	contains 5 Kelch_1 superfamily motifs	none
SPU_022261	SPU_022261	Strongylocentrotus purpuratus-specific protein	none
SPU_022293	SPU_022293	contains 9 ANK superfamily motifs and Arp domain	none
SPU_022359	SPU_022359	contains 2 RNA_bind superfamily motifs	none
SPU_022366	SPU_022366	contains 5 CUB superfamily motifs	none
SPU_022408	SPU_022408	contains PnbA domain	none
SPU_022416	SPU_022416	Strongylocentrotus purpuratus-specific protein	none
SPU_022417	SPU_022417	homologous to numerous putative Hydra magnipapillata proteins	none
SPU_022420	SPU_022420	contains COG1112 domain	none
SPU_022442	SPU_022442	Strongylocentrotus purpuratus-specific protein	none
SPU_022446	SPU_022446	contains COG1293 domain	none
SPU_022471	SPU_022471	homologous to numerous putative Branchiostoma floridae proteins	none
SPU_022494	SPU_022494	contains 2 IG superfamily motifs	none
SPU_022539	SPU_022539	contains 2 Noc2 superfamily motifs	none
SPU_022551	SPU_022551	Strongylocentrotus purpuratus-specific protein	none
SPU_022599	SPU_022599	Strongylocentrotus purpuratus-specific protein	none
SPU_022671	SPU_022671	Strongylocentrotus purpuratus-specific protein	none
SPU_022741	SPU_022741	contains 4 CCP superfamily motifs	none
SPU_022745	SPU_022745	contains 2 SMC_N domain motifs	none
SPU_022804	SPU_022804	contains PRK00409 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_022823	SPU_022823	Strongylocentrotus purpuratus-specific protein	none
SPU_022828	SPU_022828	contains PRK06851 domain	none
SPU_022835	SPU_022835	contains COG5236 domain	none
SPU_022852	SPU_022852	contains 2 EFh superfamily motifs	none
SPU_022863	SPU_022863	contains COG5222 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_022885	SPU_022885	Strongylocentrotus purpuratus-specific protein	none
SPU_022913	SPU_022913	homologous only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_022933	SPU_022933	contains V-set domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_022948	SPU_022948	contains Deme6 domain	none
SPU_022964	SPU_022964	contains PRK07003 domain	none
SPU_022981	SPU_022981	contains 3 CCP superfamily motifs and 3 EGF_CA superfamily motifs	none
SPU_023024	SPU_023024	contains 2 PHD superfamily motifs	none
SPU_023043	SPU_023043	homologous only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_023045	SPU_023045	homologous only to 2 putative Strongylocentrotus purpuratus proteins, probably including itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_023063	SPU_023063	Strongylocentrotus purpuratus-specific protein	none
SPU_023077	SPU_023077	Strongylocentrotus purpuratus-specific protein	none
SPU_023103	SPU_023103	contains COG5222 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_023105	SPU_023105	Strongylocentrotus purpuratus-specific protein	none
SPU_023116	SPU_023116	low quality protein sequence: more than 50% of amino acids are uncertain.	none
SPU_023135	SPU_023135	contains SGL domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_023143	SPU_023143	contains 2 WD40 superfamily motifs	none
SPU_023157	SPU_023157	Strongylocentrotus purpuratus-specific protein	none
SPU_023184	SPU_023184	contains 3 FA58C superfamily motifs	none
SPU_023190	SPU_023190	contains Pyr_redox_2 domain at C-terminus	none
SPU_023208	SPU_023208	contains 3 Lipocalin superfamily motifs	none
SPU_023212	SPU_023212	Strongylocentrotus purpuratus-specific protein	none
SPU_023230	SPU_023230	Strongylocentrotus purpuratus-specific protein	none
SPU_023301	SPU_023301	Strongylocentrotus purpuratus-specific protein	none
SPU_023356	SPU_023356	Strongylocentrotus purpuratus-specific protein	none
SPU_023365	SPU_023365	contains 3 TPR superfamily motifs and NrfG domain	none
SPU_023379	SPU_023379	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_023382	SPU_023382	homologous to only itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_023392	SPU_023392	Strongylocentrotus purpuratus-specific protein	none
SPU_023410	SPU_023410	contains 3 CCP superfamily motifs	none
SPU_023474	SPU_023474	Strongylocentrotus purpuratus-specific protein	none
SPU_023478	SPU_023478	Strongylocentrotus purpuratus-specific protein	none
SPU_023536	SPU_023536	contains 2 IPT superfamily motifs	none
SPU_023543	SPU_023543	contains COG4946 domain	none
SPU_023553	SPU_023553	contains 4 CUB superfamily motifs	none
SPU_023558	SPU_023558	low quality protein sequence: 70% of amino acids are X. Strongylocentrotus purpuratus-specific protein.	none
SPU_023572	SPU_023572	contains 2 Hist_deacetyl superfamily motifs	none
SPU_023586	SPU_023586	contains 7 ANK superfamily motifs and Arp domain	none
SPU_023594	SPU_023594	Strongylocentrotus purpuratus-specific protein	none
SPU_023606	SPU_023606	Strongylocentrotus purpuratus-specific protein	none
SPU_023679	SPU_023679	contains Dynein_heavy domain	none
SPU_023690	SPU_023690	Strongylocentrotus purpuratus-specific protein	none
SPU_023729	SPU_023729	contains Smc domain	none
SPU_023786	SPU_023786	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_023910	SPU_023910	Strongylocentrotus purpuratus-specific protein	none
SPU_023918	SPU_023918	contains 2 Sulfotransfer_1 domain motifs	none
SPU_023939	SPU_023939	contains ROM1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_024022	SPU_024022	contains 2 IG superfamily motifs and Ion_trans domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_024024	SPU_024024	contains 2 CUB superfamily motifs	none
SPU_024050	SPU_024050	contains COG4886 domain	none
SPU_024000	SPU_024000	contains OmpH domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_024054	SPU_024054	Strongylocentrotus purpuratus-specific protein	none
SPU_024056	SPU_024056	contains SMC_N domain	none
SPU_024077	SPU_024077	low quality protein sequence: >70% of amino aicds are X	none
SPU_024086	SPU_024086	contains Smc domain	none
SPU_024089	SPU_024089	contains COG5222 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_024108	SPU_024108	Strongylocentrotus purpuratus-specific protein	none
SPU_024109	SPU_024109	Strongylocentrotus purpuratus-specific protein	none
SPU_024116	SPU_024116	Strongylocentrotus purpuratus-specific protein	none
SPU_024124	SPU_024124	Strongylocentrotus purpuratus-specific protein	none
SPU_024147	SPU_024147	Strongylocentrotus purpuratus-specific protein	none
SPU_024151	SPU_024151	contains 4 LDLa superfamily motifs	none
SPU_024156	SPU_024156	contains COG4886 domain	none
SPU_024171	SPU_024171	contains 5 ANK superfamily motifs	none
SPU_024211	SPU_024211	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_024275	SPU_024275	Strongylocentrotus purpuratus-specific protein	none
SPU_024276	SPU_024276	Strongylocentrotus purpuratus-specific protein	none
SPU_024287	SPU_024287	contains CH superfamily motif at C-terminus and Smc domain	none
SPU_024292	SPU_024292	contains Ion_trans domain	none
SPU_024293	SPU_024293	Strongylocentrotus purpuratus-specific protein	none
SPU_024303	SPU_024303	contains TEL1 domain	none
SPU_024359	SPU_024359	contains AST1 domain	none
SPU_024363	SPU_024363	Strongylocentrotus purpuratus-specific protein	none
SPU_024369	SPU_024369	Strongylocentrotus purpuratus-specific protein	none
SPU_024402	SPU_024402	contains HSP70 domain	none
SPU_024436	SPU_024436	contains 3 ANK superfamily motifs and Arp domain and Ion_trans domain	none
SPU_024437	SPU_024437	contains 2 ANK superfamily motifs and Arp domain and Ion_trans domain	none
SPU_024533	SPU_024533	Strongylocentrotus purpuratus-specific protein	none
SPU_024544	SPU_024544	contains 3 ANK superfamily motifs	none
SPU_024564	SPU_024564	contains A2M_N domain	none
SPU_024591	SPU_024591	Strongylocentrotus purpuratus-specific protein	none
SPU_024599	SPU_024599	contains 3 CH superfamily motifs and 4 Filamin superfamily motifs	none
SPU_024624	SPU_024624	Strongylocentrotus purpuratus-specific protein	none
SPU_024630	SPU_024630	contains 2 BBOX superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_024633	SPU_024633	contains PAT1 domain	none
SPU_024682	SPU_024682	contains Rad18 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_024683	SPU_024683	Strongylocentrotus purpuratus-specific protein	none
SPU_024684	SPU_024684	Strongylocentrotus purpuratus-specific protein	none
SPU_024689	SPU_024689	Strongylocentrotus purpuratus-specific protein	none
SPU_024722	SPU_024722	contains COG5114 domain	none
SPU_024723	SPU_024723	contains 2 EGF_CA superfamily motifs	none
SPU_024759	SPU_024759	Strongylocentrotus purpuratus-specific protein	none
SPU_024760	SPU_024760	contains AIR1 domain	none
SPU_024776	SPU_024776	contains PDZ superfamily motif at C-terminus. Strongylocentrotus purpuratus-specific protein.	none
SPU_024833	SPU_024833	Strongylocentrotus purpuratus-specific protein	none
SPU_024870	SPU_024870	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_024874	SPU_024874	contains SMC_N domain	none
SPU_024886	SPU_024886	Strongylocentrotus purpuratus-specific protein	none
SPU_024897	SPU_024897	Strongylocentrotus purpuratus-specific protein	none
SPU_025020	SPU_025020	homologous to numerous putative proteins in Branchiostoma floridae and Nematostella vectensis	none
SPU_025096	SPU_025096	contains 3 EGF_CA superfamily motifs	none
SPU_025124	SPU_025124	contains PRK12323 domain and COG2931 domain	none
SPU_025130	SPU_025130	Strongylocentrotus purpuratus-specific protein	none
SPU_025133	SPU_025133	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_025191	SPU_025191	contains PldB domain	none
SPU_025208	SPU_025208	contains Smc domain	none
SPU_025216	SPU_025216	contains 3 CUB superfamily motifs	none
SPU_025235	SPU_025235	contains 2 vWFA superfamily motifs	none
SPU_025247	SPU_025247	contains DUF2353 domain	none
SPU_025310	SPU_025310	contains 4 FA58C superfamily motifs	none
SPU_025330	SPU_025330	Strongylocentrotus purpuratus-specific protein	none
SPU_025331	SPU_025331	contains 2 KR superfamily motifs and 3 FA58C superfamily motifs	none
SPU_025343	SPU_025343	contains B41 domain	none
SPU_025356	SPU_025356	contains SMC_N domain	none
SPU_025392	SPU_025392	contains PRK08279 domain	none
SPU_025403	SPU_025403	contains COG3386 domain	none
SPU_025415	SPU_025415	Strongylocentrotus purpuratus-specific protein	none
SPU_025416	SPU_025416	contains 2 DEXDc superfamily motifs and SrmB domain	none
SPU_025418	SPU_025418	contains Smc domain and SMC_N domain	none
SPU_025426	SPU_025426	contains MDN1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_025430	SPU_025430	contains 4 IG superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_025431	SPU_025431	contains 4 IG superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_025532	SPU_025532	contains 3 ANK superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_025534	SPU_025534	Strongylocentrotus purpuratus-specific protein	none
SPU_025538	SPU_025538	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_025539	SPU_025539	Strongylocentrotus purpuratus-specific protein	none
SPU_025541	SPU_025541	contains SbcC domain	none
SPU_025587	SPU_025587	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_025628	SPU_025628	contains 6 NF-X1-zinc-finger superfamily motifs and COG5219 domain	none
SPU_025757	SPU_025757	Strongylocentrotus purpuratus-specific protein	none
SPU_025805	SPU_025805	Strongylocentrotus purpuratus-specific protein	none
SPU_025836	SPU_025836	contains 2 CIMR superfamily motifs	none
SPU_025839	SPU_025839	contains COG1340 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_025842	SPU_025842	contains 3 HYR superfamily motifs	none
SPU_025883	SPU_025883	contains 2 RPA2_OBF_family superfamily motifs	none
SPU_025901	SPU_025901	Strongylocentrotus purpuratus-specific protein	none
SPU_025933	SPU_025933	Strongylocentrotus purpuratus-specific protein	none
SPU_025941	SPU_025941	contains Ribosomal_L7Ae superfamily motif at C-terminus	none
SPU_025950	SPU_025950	contains 4 ARM superfamily motifs	none
SPU_025965	SPU_025965	contains COG4783 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_025970	SPU_025970	Strongylocentrotus purpuratus-specific protein	none
SPU_025978	SPU_025978	Strongylocentrotus purpuratus-specific protein	none
SPU_026000	SPU_026000	contains V-set domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_026024	SPU_026024	Strongylocentrotus purpuratus-specific protein	none
SPU_026033	SPU_026033	homologous only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_026044	SPU_026044	contains 6 ANK superfamily motifs and Arp domain	none
SPU_026085	SPU_026085	homologous to numerous putative Hydra magnipapillata proteins. Strongylocentrotus purpuratus-specific protein.	none
SPU_026120	SPU_026120	Strongylocentrotus purpuratus-specific protein	none
SPU_026150	SPU_026150	Strongylocentrotus purpuratus-specific protein	none
SPU_026192	SPU_026192	contains csdA domain	none
SPU_026194	SPU_026194	Strongylocentrotus purpuratus-specific protein	none
SPU_026213	SPU_026213	Strongylocentrotus purpuratus-specific protein	none
SPU_026323	SPU_026323	contains 2 Mem_trans superfamily motifs	none
SPU_026341	SPU_026341	Strongylocentrotus purpuratus-specific protein	none
SPU_026344	SPU_026344	contains 2 Filamin superfamily motifs	none
SPU_026383	SPU_026383	contains COG1293 domain	none
SPU_026354	SPU_026354	Strongylocentrotus purpuratus-specific protein	none
SPU_026384	SPU_026384	Strongylocentrotus purpuratus-specific protein	none
SPU_026426	SPU_026426	Strongylocentrotus purpuratus-specific protein	none
SPU_026429	SPU_026429	Strongylocentrotus purpuratus-specific protein	none
SPU_026449	SPU_026449	contains 2 RING superfamily and PRK03918 domain. poor sequence data: ~30% of amino acids are X. Strongylocentrotus purpuratus-specific protein.	none
SPU_026464	SPU_026464	contains 2 LRR_RI superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_026489	SPU_026489	contains Ion_trans domain	none
SPU_026522	SPU_026522	contains Smc domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_026557	SPU_026557	contains PRK13578 domain	none
SPU_026562	SPU_026562	contains 2 Drf_FH3 superfamily motifs	none
SPU_026564	SPU_026564	Strongylocentrotus purpuratus-specific protein	none
SPU_026588	SPU_026588	contains 5 HYR superfamily motifs	none
SPU_026609	SPU_026609	Strongylocentrotus purpuratus-specific protein	none
SPU_026655	SPU_026655	contains 8 ANK superfamily motifs and Arp domain	none
SPU_026662	SPU_026662	contains 3 CUB superfamily motifs and 3 EGF_CA superfamily motifs	none
SPU_026674	SPU_026674	contains 9 ANK superfamily motifs and Arp domain	none
SPU_026732	SPU_026732	Strongylocentrotus purpuratus-specific protein	none
SPU_026738	SPU_026738	Strongylocentrotus purpuratus-specific protein	none
SPU_026739	SPU_026739	contains 2 FA58C superfamily motifs	none
SPU_026741	SPU_026741	contains 4 MAM superfamily motifs	none
SPU_026752	SPU_026752	contains 3 ANK superfamily motifs and Arp domain	none
SPU_026794	SPU_026794	Strongylocentrotus purpuratus-specific protein	none
SPU_026826	SPU_026826	contains Smc domain	none
SPU_026865	SPU_026865	Strongylocentrotus purpuratus-specific protein	none
SPU_026875	SPU_026875	contains 3 NHL superfamily motifs and PRK07764 domain and COG3391 domain	none
SPU_026887	SPU_026887	contains 2 Death superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_026906	SPU_026906	contains 2 RRM superfamily motifs	none
SPU_026917	SPU_026917	contains 3 HYR superfamily motifs and 2 CUB superfamily motifs	none
SPU_026934	SPU_026934	Strongylocentrotus purpuratus-specific protein	none
SPU_026942	SPU_026942	contains 2 HYR superfamily motifs and 2 GCC2_GCC3 superfamily motifs	none
SPU_026950	SPU_026950	contains 4 Kelch_1 superfamily motifs	none
SPU_026965	SPU_026965	Strongylocentrotus purpuratus-specific protein	none
SPU_026970	SPU_026970	contains Nop14 domain	none
SPU_026998	SPU_026998	contains 4 CCP superfamily motifs	none
SPU_027002	SPU_027002	contains HSP70 domain. poor sequence data: more than 60% of amino acids including C-terminal half are X.	none
SPU_027005	SPU_027005	contains Smc domain	none
SPU_027008	SPU_027008	contains SMC_N domain	none
SPU_027029	SPU_027029	contains Torsin domain	none
SPU_027046	SPU_027046	contains 3 IG superfamily motifs	none
SPU_027058	SPU_027058	contains 2 ZU5 superfamily motifs	none
SPU_027102	SPU_027102	contains DNA_pol_B_2 domain	none
SPU_027122	SPU_027122	homologous only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_027279	SPU_027279	contains 3 DM14 superfamily motifs	none
SPU_027295	SPU_027295	contains PRK02106 domain	none
SPU_027297	SPU_027297	contains PRK02106 domain	none
SPU_027307	SPU_027307	homologous only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_027352	SPU_027352	contains SMC_N domain	none
SPU_027354	SPU_027354	Strongylocentrotus purpuratus-specific protein	none
SPU_027391	SPU_027391	Strongylocentrotus purpuratus-specific protein	none
SPU_027395	SPU_027395	Strongylocentrotus purpuratus-specific protein	none
SPU_027424	SPU_027424	contains 2 Galactosyl_T superfamily motifs	none
SPU_027437	SPU_027437	contains 2 Galactosyl_T superfamily motifs	none
SPU_027463	SPU_027463	contains MFS_1 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_027473	SPU_027473	Strongylocentrotus purpuratus-specific protein	none
SPU_027515	SPU_027515	Strongylocentrotus purpuratus-specific protein	none
SPU_027558	SPU_027558	Strongylocentrotus purpuratus-specific protein	none
SPU_027575	SPU_027575	contains 3 COG1357 domain motifs	none
SPU_027627	SPU_027627	contains 3 Thioredoxin-like superfamily motifs	none
SPU_027638	SPU_027638	contains 2 HYR superfamily motifs	none
SPU_027646	SPU_027646	contains MAM superfamily motif at N-terminus. Strongylocentrotus purpuratus-specific protein.	none
SPU_027651	SPU_027651	contains COG1219 domain	none
SPU_027661	SPU_027661	contains Macoilin domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_027667	SPU_027667	contains 2 IG superfamily motifs	none
SPU_027742	SPU_027742	contains 2 WSC superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_027746	SPU_027746	contains 5 ANK superfamily motifs and Arp domain	none
SPU_027773	SPU_027773	contains 2 EGF_CA superfamily motifs	none
SPU_027787	SPU_027787	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_027869	SPU_027869	homologous only to itself. Strongylocentrotus purpuratus-specific protein.	none
SPU_027871	SPU_027871	contains 3 IPT superfamily motifs	none
SPU_027877	SPU_027877	contains 2 SSF superfamily motifs	none
SPU_027907	SPU_027907	contains PRK12704 domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_027955	SPU_027955	contains 3 IG superfamily motifs. Strongylocentrotus purpuratus-specific protein.	none
SPU_027961	SPU_027961	poor sequence data: ~50% of amino acids are X. Strongylocentrotus purpuratus-specific protein.	none
SPU_027991	SPU_027991	Strongylocentrotus purpuratus-specific protein	none
SPU_028048	SPU_028048	contains 3 CUB superfamily motifs	none
SPU_028057	SPU_028057	contains Dynein_heavy domain. probable assembly chimera (only N-terminal half matches).	none
SPU_028080	SPU_028080	contains 2 CCP superfamily motifs	none
SPU_028082	SPU_028082	contains Smc domain	none
SPU_028091	SPU_028091	contains 5 FA58C superfamily motifs	none
SPU_028099	SPU_028099	Strongylocentrotus purpuratus-specific protein	none
SPU_028107	SPU_028107	contains 9 ANK superfamily motifs and Arp domain	none
SPU_028176	SPU_028176	contains 2 Nramp superfamily motifs	none
SPU_028227	SPU_028227	contains COG1322 domain	none
SPU_028259	SPU_028259	Strongylocentrotus purpuratus-specific protein	none
SPU_028287	SPU_028287	contains Gelsolin superfamily motif at C-terminus	none
SPU_028295	SPU_028295	Strongylocentrotus purpuratus-specific protein	none
SPU_028306	SPU_028306	contains Chelatase_Class_II superfamily at N-terminus	none
SPU_028341	SPU_028341	Strongylocentrotus purpuratus-specific protein	none
SPU_028400	SPU_028400	contains 4 HYR superfamily motifs	none
SPU_028402	SPU_028402	contains 4 CCP superfamily motifs	none
SPU_028438	SPU_028438	contains 3 zf-CXXC superfamily motifs	none
SPU_028453	SPU_028453	contains AIR1 domain	none
SPU_028474	SPU_028474	Strongylocentrotus purpuratus-specific protein	none
SPU_028482	SPU_028482	contains FAT domain	none
SPU_028577	SPU_028577	Strongylocentrotus purpuratus-specific protein	none
SPU_028581	SPU_028581	contains 2 LDLa superfamily motifs	none
SPU_028689	SPU_028689	contains 2 IG superfamily motifs	none
SPU_028706	SPU_028706	contains eIF2A domain	none
SPU_028727	SPU_028727	Strongylocentrotus purpuratus-specific protein	none
SPU_028734	SPU_028734	Strongylocentrotus purpuratus-specific protein	none
SPU_028738	SPU_028738	Strongylocentrotus purpuratus-specific protein	none
SPU_028790	SPU_028790	contains 2 MFS superfamily motifs	none
SPU_028895	SPU_028895	contains 4 PLAT superfamily motifs	none
SPU_000998	SPU_000998	Strongylocentrotus purpuratus-specific protein	none
SPU_002441	SPU_002441	Strongylocentrotus purpuratus-specific protein	none
SPU_003396	SPU_003396	Strongylocentrotus purpuratus-specific protein	none
SPU_004698	SPU_004698	Strongylocentrotus purpuratus-specific protein	none
SPU_005574	SPU_005574	Strongylocentrotus purpuratus-specific protein	none
SPU_005935	SPU_005935	Strongylocentrotus purpuratus-specific protein	none
SPU_006183	SPU_006183	Strongylocentrotus purpuratus-specific protein	none
SPU_006474	SPU_006474	Strongylocentrotus purpuratus-specific protein	none
SPU_007024	SPU_007024	Strongylocentrotus purpuratus-specific protein	none
SPU_007654	SPU_007654	Strongylocentrotus purpuratus-specific protein	none
SPU_007691	SPU_007691	Strongylocentrotus purpuratus-specific protein	none
SPU_009475	SPU_009475	Strongylocentrotus purpuratus-specific protein	none
SPU_012056	SPU_012056	Strongylocentrotus purpuratus-specific protein	none
SPU_012297	SPU_012297	Strongylocentrotus purpuratus-specific protein	none
SPU_013339	SPU_013339	Strongylocentrotus purpuratus-specific protein	none
SPU_014410	SPU_014410	Strongylocentrotus purpuratus-specific protein	none
SPU_016026	SPU_016026	Strongylocentrotus purpuratus-specific protein	none
SPU_016036	SPU_016036	Strongylocentrotus purpuratus-specific protein	none
SPU_016997	SPU_016997	Strongylocentrotus purpuratus-specific protein	none
SPU_017658	SPU_017658	Strongylocentrotus purpuratus-specific protein	none
SPU_018406	SPU_018406	Strongylocentrotus purpuratus-specific protein	none
SPU_018407	SPU_018407	Strongylocentrotus purpuratus-specific protein	none
SPU_019271	SPU_019271	Strongylocentrotus purpuratus-specific protein	none
SPU_020393	SPU_020393	Strongylocentrotus purpuratus-specific protein	none
SPU_020873	SPU_020873	Strongylocentrotus purpuratus-specific protein	none
SPU_022032	SPU_022032	Strongylocentrotus purpuratus-specific protein	none
SPU_022613	SPU_022613	Strongylocentrotus purpuratus-specific protein	none
SPU_022617	SPU_022617	Strongylocentrotus purpuratus-specific protein	none
SPU_022979	SPU_022979	Strongylocentrotus purpuratus-specific protein	none
SPU_024106	SPU_024106	Strongylocentrotus purpuratus-specific protein	none
SPU_025395	SPU_025395	Strongylocentrotus purpuratus-specific protein	none
SPU_026173	SPU_026173	Strongylocentrotus purpuratus-specific protein	none
SPU_027358	SPU_027358	Strongylocentrotus purpuratus-specific protein	none
SPU_028179	SPU_028179	Strongylocentrotus purpuratus-specific protein	none
SPU_000643	SPU_000643	Strongylocentrotus purpuratus-specific protein	none
SPU_006344	SPU_006344	Strongylocentrotus purpuratus-specific protein	none
SPU_007042	SPU_007042	Strongylocentrotus purpuratus-specific protein	none
SPU_011696	SPU_011696	Strongylocentrotus purpuratus-specific protein	none
SPU_012191	SPU_012191	Strongylocentrotus purpuratus-specific protein	none
SPU_012929	SPU_012929	Strongylocentrotus purpuratus-specific protein	none
SPU_014090	SPU_014090	Strongylocentrotus purpuratus-specific protein	none
SPU_014561	SPU_014561	Strongylocentrotus purpuratus-specific protein	none
SPU_015616	SPU_015616	Strongylocentrotus purpuratus-specific protein	none
SPU_015925	SPU_015925	Strongylocentrotus purpuratus-specific protein	none
SPU_017284	SPU_017284	Strongylocentrotus purpuratus-specific protein	none
SPU_020150	SPU_020150	Strongylocentrotus purpuratus-specific protein	none
SPU_028878	SPU_028878	Strongylocentrotus purpuratus-specific protein	none
SPU_028879	SPU_028879	Strongylocentrotus purpuratus-specific protein	none
SPU_028047	SPU_028047	Strongylocentrotus purpuratus-specific protein	none
SPU_025293	SPU_025293	Strongylocentrotus purpuratus-specific protein	none
SPU_024291	SPU_024291	Strongylocentrotus purpuratus-specific protein	none
SPU_021717	SPU_021717	Strongylocentrotus purpuratus-specific protein	none
SPU_021405	SPU_021405	Strongylocentrotus purpuratus-specific protein	none
SPU_021556	SPU_021556	Strongylocentrotus purpuratus-specific protein	none
SPU_020173	SPU_020173	Strongylocentrotus purpuratus-specific protein	none
SPU_020586	SPU_020586	Strongylocentrotus purpuratus-specific protein	none
SPU_018957	SPU_018957	Strongylocentrotus purpuratus-specific protein	none
SPU_019725	SPU_019725	Strongylocentrotus purpuratus-specific protein	none
SPU_018276	SPU_018276	Strongylocentrotus purpuratus-specific protein	none
SPU_018652	SPU_018652	Strongylocentrotus purpuratus-specific protein	none
SPU_018736	SPU_018736	Strongylocentrotus purpuratus-specific protein	none
SPU_018781	SPU_018781	Strongylocentrotus purpuratus-specific protein	none
SPU_017535	SPU_017535	Strongylocentrotus purpuratus-specific protein	none
SPU_018053	SPU_018053	Strongylocentrotus purpuratus-specific protein	none
SPU_018116	SPU_018116	Strongylocentrotus purpuratus-specific protein	none
SPU_016795	SPU_016795	Strongylocentrotus purpuratus-specific protein	none
SPU_017204	SPU_017204	Strongylocentrotus purpuratus-specific protein	none
SPU_017205	SPU_017205	Strongylocentrotus purpuratus-specific protein	none
SPU_017233	SPU_017233	Strongylocentrotus purpuratus-specific protein	none
SPU_017246	SPU_017246	Strongylocentrotus purpuratus-specific protein	none
SPU_017360	SPU_017360	Strongylocentrotus purpuratus-specific protein	none
SPU_016472	SPU_016472	Strongylocentrotus purpuratus-specific protein	none
SPU_016503	SPU_016503	Strongylocentrotus purpuratus-specific protein	none
SPU_015345	SPU_015345	Strongylocentrotus purpuratus-specific protein	none
SPU_015408	SPU_015408	Strongylocentrotus purpuratus-specific protein	none
SPU_015696	SPU_015696	Strongylocentrotus purpuratus-specific protein	none
SPU_015721	SPU_015721	Strongylocentrotus purpuratus-specific protein	none
SPU_015884	SPU_015884	Strongylocentrotus purpuratus-specific protein	none
SPU_013246	SPU_013246	Strongylocentrotus purpuratus-specific protein	none
SPU_013347	SPU_013347	Strongylocentrotus purpuratus-specific protein	none
SPU_012528	SPU_012528	Strongylocentrotus purpuratus-specific protein	none
SPU_012538	SPU_012538	Strongylocentrotus purpuratus-specific protein	none
SPU_012582	SPU_012582	Strongylocentrotus purpuratus-specific protein	none
SPU_013083	SPU_013083	Strongylocentrotus purpuratus-specific protein	none
SPU_010110	SPU_010110	Strongylocentrotus purpuratus-specific protein	none
SPU_010473	SPU_010473	Strongylocentrotus purpuratus-specific protein	none
SPU_011323	SPU_011323	Strongylocentrotus purpuratus-specific protein	none
SPU_009983	SPU_009983	Strongylocentrotus purpuratus-specific protein	none
SPU_008559	SPU_008559	Strongylocentrotus purpuratus-specific protein	none
SPU_009006	SPU_009006	Strongylocentrotus purpuratus-specific protein	none
SPU_007462	SPU_007462	Strongylocentrotus purpuratus-specific protein	none
SPU_007543	SPU_007543	Strongylocentrotus purpuratus-specific protein	none
SPU_006977	SPU_006977	Strongylocentrotus purpuratus-specific protein	none
SPU_005644	SPU_005644	Strongylocentrotus purpuratus-specific protein	none
SPU_005844	SPU_005844	Strongylocentrotus purpuratus-specific protein	none
SPU_005919	SPU_005919	Strongylocentrotus purpuratus-specific protein	none
SPU_005968	SPU_005968	Strongylocentrotus purpuratus-specific protein	none
SPU_006207	SPU_006207	Strongylocentrotus purpuratus-specific protein	none
SPU_006274	SPU_006274	Strongylocentrotus purpuratus-specific protein	none
SPU_005166	SPU_005166	Strongylocentrotus purpuratus-specific protein	none
SPU_005211	SPU_005211	Strongylocentrotus purpuratus-specific protein	none
SPU_005444	SPU_005444	Strongylocentrotus purpuratus-specific protein	none
SPU_004691	SPU_004691	Strongylocentrotus purpuratus-specific protein	none
SPU_004701	SPU_004701	Strongylocentrotus purpuratus-specific protein	none
SPU_004953	SPU_004953	Strongylocentrotus purpuratus-specific protein	none
SPU_003399	SPU_003399	Strongylocentrotus purpuratus-specific protein	none
SPU_003449	SPU_003449	Strongylocentrotus purpuratus-specific protein	none
SPU_003554	SPU_003554	Strongylocentrotus purpuratus-specific protein	none
SPU_003677	SPU_003677	Strongylocentrotus purpuratus-specific protein	none
SPU_003680	SPU_003680	Strongylocentrotus purpuratus-specific protein	none
SPU_003720	SPU_003720	Strongylocentrotus purpuratus-specific protein	none
SPU_003732	SPU_003732	Strongylocentrotus purpuratus-specific protein	none
SPU_003758	SPU_003758	Strongylocentrotus purpuratus-specific protein	none
SPU_003761	SPU_003761	Strongylocentrotus purpuratus-specific protein	none
SPU_003893	SPU_003893	Strongylocentrotus purpuratus-specific protein	none
SPU_003907	SPU_003907	Strongylocentrotus purpuratus-specific protein	none
SPU_004211	SPU_004211	Strongylocentrotus purpuratus-specific protein	none
SPU_003250	SPU_003250	Strongylocentrotus purpuratus-specific protein	none
SPU_002236	SPU_002236	Strongylocentrotus purpuratus-specific protein	none
SPU_002327	SPU_002327	Strongylocentrotus purpuratus-specific protein	none
SPU_002413	SPU_002413	Strongylocentrotus purpuratus-specific protein	none
SPU_002623	SPU_002623	Strongylocentrotus purpuratus-specific protein	none
SPU_002685	SPU_002685	Strongylocentrotus purpuratus-specific protein	none
SPU_002735	SPU_002735	Strongylocentrotus purpuratus-specific protein	none
SPU_001824	SPU_001824	Strongylocentrotus purpuratus-specific protein	none
SPU_001853	SPU_001853	Strongylocentrotus purpuratus-specific protein	none
SPU_002052	SPU_002052	Strongylocentrotus purpuratus-specific protein	none
SPU_001193	SPU_001193	Strongylocentrotus purpuratus-specific protein	none
SPU_001254	SPU_001254	Strongylocentrotus purpuratus-specific protein	none
SPU_001376	SPU_001376	Strongylocentrotus purpuratus-specific protein	none
SPU_001395	SPU_001395	Strongylocentrotus purpuratus-specific protein	none
SPU_001593	SPU_001593	Strongylocentrotus purpuratus-specific protein	none
SPU_001594	SPU_001594	Strongylocentrotus purpuratus-specific protein	none
SPU_001680	SPU_001680	Strongylocentrotus purpuratus-specific protein	none
SPU_000803	SPU_000803	Strongylocentrotus purpuratus-specific protein	none
SPU_000807	SPU_000807	Strongylocentrotus purpuratus-specific protein	none
SPU_000909	SPU_000909	Strongylocentrotus purpuratus-specific protein	none
SPU_025920	SPU_025920	Strongylocentrotus purpuratus-specific protein	none
SPU_026802	SPU_026802	Strongylocentrotus purpuratus-specific protein	none
SPU_027529	SPU_027529	Strongylocentrotus purpuratus-specific protein	none
SPU_015470.1	SPU_015470	contains Ion_trans domain	none
SPU_020691.1	SPU_020691	contains SMC_N domain. Strongylocentrotus purpuratus-specific protein.	none
SPU_013015	SPU_013015	dbj|BAD74048.1|  transcription factor Brachyury [Hemicentrotus pulcherrimus] is the closest match to Sp-Bra	this is one haplotype the other is SPU_020451\n
SPU_010203	SPU_010203	none	Also found in scaffolds 113729, 14987, and 2038\n
SPU_020330	SPU_020330	none	Only the last exon of SPU_020330 belongs to SpGrm, and a big part of the predicted ORF is DNA mismatch repair protein Mlh3, which has a duplicated seq in genome\n
SPU_002737	SPU_002737	none	See SPU_001774, _09709. \n
SPU_027023	SPU_027023	none	CDS sequences are part of rendezvin.  See SPU_019369 for which exons they encode.\n
SPU_027987	SPU_027987	none	In addition to duplicated copy, there is a third copy not in the Glean3 list on scaffold 134677\n
SPU_015027	SPU_015027	none	This prediction covers partial CDS that is missing C-terminal sequences, which are probably on another scaffold.  On the gene duplications page, I have listed another copy also lacking C-terminal sequences.  This second copy is on a scaffold, 52540, which contains multiple ACE-like genes, some complete and some partial.  Finally, there is another copy in the genome that is not on the GLEAN3 list on  Scaffold 123895.\n
SPU_023329	SPU_023329	none	Adjacent to this model is a very closely related gene, SPU_023330.\n
SPU_023330	SPU_023330	none	This gene model is adjacent to SPU_023329, which encodes a very similar protein.\n
SPU_018409	SPU_018409	none	240 bp intron was accepted as coding region by comparison to the corresponding FgenesAB prediction.\n
SPU_025719	SPU_025719	none	507 bp and 549bp introns and 426bp of the 5'UTR were accepted as coding region by comparison to the corresponding FgenesAB and ++ prediction. \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_028940	SPU_028940	none	Partial CDS.  The model may lack N-terminal half sequences. \n \nThis model of one of a cluster of 4 closely related NAALAD2 genes on scaffold 496.\n
SPU_015185	SPU_015185	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 93% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \n
SPU_011837	SPU_011837	none	RNA editing function, see also SPU_011875, for identical gene (duplication? or redundant scaffold?) \n
SPU_022057	SPU_022057	none	This prediction lacks 5'-terminus.  5'-terminus sequence is in SPU_010703 (scaffold 1972). \n
Sp-ADAM12-like	SPU_030025	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-STE24p-like	SPU_030050	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
SPU_010805	SPU_010805	none	duplicate SPU_017694\n
SPU_000154	SPU_000154	none	Lacking N-terminus.  See also SPU_006123. \n
SPU_003744	SPU_003744	none	Likely assembly error. Identical over frpm AA 180 onwards to SPU_003743. Recommend deletion?\n
SPU_016163	SPU_016163	none	This model is possibly incomplete.\n
SPU_017197	SPU_017197	none	SPU_015064 has the first part of the gene. SPU_017197 should have the latter half. SPU_015899 is likely a haplotype of SPU_017197.\n
SPU_019208	SPU_019208	none	expressed histone gene\n
SPU_000009	SPU_000009	none	 partial, missing C-terminus\n
SPU_000447	SPU_000447	none	 missing C-terminus\n
SPU_007150	SPU_007150	none	 partial, missing C-terminus\n
SPU_009099	SPU_009099	none	 missing stretch in middle, missing C-terminus\n
SPU_014007	SPU_014007	none	 partial, missing N-terminus\n
SPU_020808	SPU_020808	none	 missing C-terminus\n
SPU_020881	SPU_020881	none	 missing N-terminus\n
SPU_001956	SPU_001956	none	 extra C-terminus\n
SPU_014214	SPU_014214	none	 partial, missing N- and C-terminus\n
SPU_010918	SPU_010918	none	 missing central regions\n
SPU_014874	SPU_014874	none	 partial, missing N-terminus\n
SPU_024639	SPU_024639	none	 partial, missing C-terminus\n
SPU_020002	SPU_020002	none	 missing C-terminus\n
SPU_020176	SPU_020176	none	 missing N-terminus\n
SPU_017950	SPU_017950	none	short protein 0nly 99aa. \nOriginally called H2bk, changed to pseudo-gene on 28 July 2006.\n
SPU_001225	SPU_001225	none	Homologue to GGDEF and Neurotrophin. Am working to find out what it is exactly. \nSimilar to genescan predicted peptide: Supertig12270|GENSCAN_predicted_peptide_1|302_aa \nCHIMERIC PROTEIN:  \nSupertig12270|GENSCAN_predicted_peptide_1|302_aa CONTAINS A PIECE OF SEQUENCE WHICH IS IN Supertig153688|GENSCAN_predicted_peptide_1|85_aa and \n
SPU_008358	SPU_008358	none	Inspection of the tiling array suggests that glean may have missed the following exons: GFGYQREEVLFQIWDKRGGEKVAFLLDFVREAPKVEVQSLQKEGTWGNYKDSSQPPPTSPLPETPSLAKDTQTPPTSPLPETPSLAKDNQTPPTSPLPETQSLAKDTQTPPTSPLPETPFLAKDTQTPPTSPLPETPSLAKDTQTPPTSPLPETQSLAKDTQTPPTSPLPETQSLAKDTQTPPTSPLPETPSLAKDTQTPPTSPLPETLSPAKDTQTPPTSPLLEIPPHHDDDGISINTNPVDGWYEYTNDTGTQTSDPEDHIGKLMDRCRISIKQNPDSKRTEACSETQSYFHTLKSRES,QNRGKMVQRHGNQTEDGERKCLEDRVWNGWKEQVVYSPFTSVASSDCENFELTELKRQLQELFERQPTKLVMTPERKIFDGKAAEIEDLVISVKRSFSRYGIREEGKRSPFSWILSEKLQR,RFNLSRRKEPGGTTKTLPNLHQLHLSQKLRPWLKTPKHHQLPLSQKLRPWLKTTKHHQLPLSQKPSPLLKTPKHHQLPLSQKPRSWLKTLKHHQLPLSQKPRPWLKTPKHHQLPLSQKLSPWLKTPKHHQLPLSQKLSPWLKTPKHHQLPLSQKPHPWLKTPKHHQLPLSQKPCPRLKTPKHHQLPLS,FMNIVYHHRTMANKRKAWSLQGIIAVMQEYLSCHDVYKRMKRNGTTKGFHQTVADKLSYQENATENLLKSLGREYRDILQ,WGSDVLRTRSITGKGSNMEKGKFDTIMSQQKNSSFVKYLAVAIWGSDVLRTRSITGKGSNVDKGKFDTIMSQQKNSSFVKYLALAIWGSDVLRTRSIT,XXXXXXXGYLGFGCSENALDHRKKGKSICEPLELGQKRRHCVVMNLPKFPSSSEPALHLSDHEAVSIVVGLAIFSGTAGSKVHLCFLRKGV\n
SPU_012772	SPU_012772	none	Inspection of the tiling array suggests that glean may have missed the following exons: YWVHARAQWLEFDVTSHHLATTGLVYIMKLVCVFVSAKTSLLNGKYTSEDKVHHKTQPVLMPIRNAALLDMLQVTKRLQLGRITKKVSIISSFVRLK\n
SPU_014793	SPU_014793	none	Inspection of the tiling array suggests that glean may have missed the following exons: RRVDNCGPVCGAALSILIIMRIHDFISSLCGSLSLKLSSPFLMRNYLSALTLQSSFDDMAEAFTIFLFCHIYWYKFPYIVDCKNVPT\n
SPU_017348	SPU_017348	none	Inspection of the tiling array suggests that glean may have missed the following exons: CNRNHSLDGSFPLALSLALSPSSPSPAPPPLLPLPSTSLSLFISLIPPPYLSFPSPCLFLSSSLSPVLPPSPLSLSLSLSPTYLSRTS\n
SPU_019651	SPU_019651	none	Inspection of the tiling array suggests that glean may have missed the following exons: QTFVNTEDIGRYPSLRGPPAHDTSSGCTDGTSCRNGDTCKVADDQSKRSAFQAYQREMPKLLLLAYLKLSPFRLHPLPYQRL,RSFVRESFVAEGAEVAQLSSLMHPAFMSTYPFADTESLPTDLTLERFEACMFPLVIGEGAIRRKGSWAERAAVETVLLVTLVMLKEVAFFHEDTRAQVTLE,EHQPMVFACIVNAAILVLLKNCYKSQLIMAQLIITPAHSSFILHDEPLFISDISCHQLYIFTIPCLNVHLSLYCSPWHVSHNCDCLISVDSISL\n
SPU_019653	SPU_019653	none	Inspection of the tiling array suggests that glean may have missed the following exons: YFNLNKSHTSISSSLCLFTGFTLLLLLWAAVQIFCPYTVRSGAIASGLMFCPFDVLSLTIASVLVFCVPMVLFVTVAIPLFLTLSLFDP,EAGGNFVRPGSLDSVSGLLLDSMILPLALPSSFPRCFVISSMIPEDKMRTVSPLLLSGLVCPKPFDAPELRILEESCGRPSLVLAVL,PKMTVTSTPHPLPPVHLPAVHTVYQQGSPPQRSLQRSSGLHHDEMRKTCTRIPPVMVEHPSLLLLHPPSPSRHHPTPDSYHPHQRPHRSPLALPPPHNHYYNLHLSLTHLYLTLDCHHHHHHRRHCHPPPPLPSPPCLFFPSPLPPPHHTTLHPPPIHLHPPAVMRHNGTSPQHLSGNNNNKI\n
SPU_022473	SPU_022473	none	Inspection of the tiling array suggests that glean may have missed the following exons: PVLEGISPVSQDSSQVLPSMENSSQSERELDESINDPTNSPPTHDEDSTGKQGFDCTKCKKRFSVESDLGSHVMMCCGNLSTQCPVCKKIFASKSYIGKHMRLHTGEKPFQCGECGMRFTRKHHLVHHQRTHTGEKPFKCTE\n
SPU_026418	SPU_026418	none	Inspection of the tiling array suggests that glean may have missed the following exons: TRNELKNPRHLPRPRPRTFMYKSLVDIDRYRDDVTPNRTRRASLYTIEPSMVFKEIKDSQQIGYTRVNRVSENVLQRGSIVNEQVSEWLMWITTRNAFDIIMPHL\n
SPU_010248	SPU_010248	none	Incomplete Protein kinase domain\n
SPU_015668	SPU_015668	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nThis model Blasts best to Cathepsin L, and it strongly co-clusters with Cathepsin L and several other glean models in a NJ multiple protein alignment tree. We have therefore decided to name this model "Cathepsin 11" for consistency, while noting its high similarity to Cathepsin L by making "CathepsinL-like6" one of its synonyms. \n \nNB: The structure of this model is supported by the fact that other gene prediction protocols generated almost identical models and by the genome-wide tiling array hybridization data.\n
SPU_008227	SPU_008227	none	This gene model may represent a pseudogene or contain a sequence error. 1200bp of 3'UTR was accepted to a coding region that is encoding TIR domain.  \n
SPU_007337	SPU_007337	none	#\nIt's missing 5' and 3' end of the gene.  It overlaps with SPU_018583.\n
SPU_005376	SPU_005376	none	Position on contig and alignments with best hit suggest the gene is missing 3' exons.\n
SPU_026916	SPU_026916	none	thanks to Charlie W.\n
SPU_022078	SPU_022078	none	EGF-LAMG-LAMG-EGF-LAMG \n \nthese two domains (or other EGF variants) occur in intermingled fashion in quite a few known proteins - neurexins, perlecans, agrin, crumbs, CASPR, some cadherins \n \nurchins appear also to have novel EGF/LAMG proteins \n
SPU_009471	SPU_009471	none	 fragment\n
SPU_014306	SPU_014306	none	 small fragment\n
SPU_018083	SPU_018083	none	 partial, missing C-terminus half\n
SPU_017872	SPU_017872	none	possible relative of col1a1\n
SPU_017657	SPU_017657	none	SEA and EGFs\n
SPU_021262	SPU_021262	none	FN3-10- one LH\n
SPU_000876	SPU_000876	none	SRCR(2).  Probably partial. \n
SPU_002041	SPU_002041	none	SigPep-SRCR(3)-TM. (DMBT1)\n
SPU_011039	SPU_011039	none	Strongylocentrotus purpuratus calcium-binding protein (Endo16), mRNA. \n
SPU_006045	SPU_006045	none	SRCR(2). Probably incomplete.\n
SPU_008642	SPU_008642	none	SigPep-SRCR(8). Possibly incomplete.\n
SPU_015011	SPU_015011	none	 fragment\n
SPU_000745	SPU_000745	none	contains only part of the reprolysin domain\n
SPU_000216	SPU_000216	none	5 EGF repeats, 13 HYR\n
SPU_009224	SPU_009224	none	Could be haplotype pair of SPU_006396\n
SPU_014509	SPU_014509	none	Related to At1g79380/T8K14_20\n
SPU_010140	SPU_010140	none	very weak signal.  no est.  may be pseudogene\n
SPU_020754	SPU_020754	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 40.14% over 436 BLAST alignment positions. 183 of 666 Muscle alignment positions masked (27.400 %; 483 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_013447	SPU_013447	none	PSI/IPT x3/TM\n
SPU_020597	SPU_020597	none	SRCR(8)-Sushi(3)-HYR-Sushi_HYR(2)-Sushi(2). Possibly incomplete.\n
SPU_012085	SPU_012085	none	IDENTICAL TO 11970\n
SPU_002220	SPU_002220	none	PREDICTED: Strongylocentrotus purpuratus similar to survival of motor neuron 1, telomeric isoform b (LOC575759)\n
SPU_016039	SPU_016039	none	Groups with the caspase 8 subfamily by neighbor joining of multiple sequence alignment.  Sister of caspase8-like/caspase8-like-2a/b.\n
Sp-IL17r-like	SPU_030141	none	This gene was annotated based on a manual inspection of multiple protein alignments, reciprocal BLASTing and domain structure analyses. \n \nGiven the position of this gene at the end of a scaffold, and based on alignments to other IL17 receptor genes, it is highly likely that there is N-ter sequence missing in this model.\n
SPU_025671	SPU_025671	none	First exons more difficult to detect on expression microarray.  Similarity to known TAF1A proteins (mouse, rat, and human) show little similarity towards the carboxyl end, suggesting possible misprediciton of early exons (2 and 3).\n
SPU_020923	SPU_020923	none	Added 5'UTRs to the gene model based on the est data\n
SPU_004533	SPU_004533	none	Partial sequence similar to DUSP23.\n
SPU_027503	SPU_027503	none	SigPep-SRCR(3). Probably incomplete.\n
SPU_023126	SPU_023126	none	e val for NP_001804 = 2e-101, and for Q02224=e-139; CENPE_HUMAN [Homo sapiens].   \nKinesin-7 family member.   \nSee also SPU_017809 which also hits Q02224. \nCENPE_HUMAN data obtained from UniProtKB/Swiss-Prot entry Q02224.    \nAnnotation by RA Obar, RL Morris, SA Tower, KM Judkins\n
SPU_012137	SPU_012137	none	SPU_012137 coverage is limited to residues 441 (M) to 606 of 614 residue sequence used as a Query.\n
SPU_001781	SPU_001781	none	Domains: DEATH, NACHT, LRRs.\n
SPU_008382	SPU_008382	none	Domains: DEATH, NACHT, LRRs.\n
SPU_009488	SPU_009488	none	Domains: DEATH, NACHT, LRRs.\n
SPU_027639	SPU_027639	none	This model was fused to a modified version of SPU_027640. Please see SPU_027640 for details.\n
SPU_005799	SPU_005799	none	 fragment\n
SPU_014112	SPU_014112	none	Domains: DEATH, NACHT, LRRs.\n
SPU_005343	SPU_005343	none	An incomplete ORC6 sequence, missing exon 3 is also present in GLEAN_09374 (see comments on this Glean) \n
SPU_011491	SPU_011491	none	Partial MCM2 sequence likely due to inappropriate contig fusion. View GLEAN_06096 for full length sequence.\n
SPU_025077	SPU_025077	none	Genscan model may be more accurate. \nDomains: DEATH, NACHT, LRRs \n
SPU_002921	SPU_002921	none	Very similar to SPU_002923; looks like local duplication.\n
SPU_013952	SPU_013952	none	#\nDomains: DEATH, NACHT, LRRs.\n
SPU_002272	SPU_002272	none	Domains: NACHT, LRRs.  \nThis gene model could be incomplete, missing the DEATH domain(s).\n
SPU_011088	SPU_011088	none	Domains: NACHT, LRRs. \nThe Genscan model has additional exons that code for a DEATH domain. This gene model is on a short scaffold and could be incomplete, missing LRRs.\n
SPU_026320	SPU_026320	none	Domains: DEATH, NACHT, LRRs.\n
SPU_012239	SPU_012239	none	homolog of Cernunnos (protein) and Xlf (gene)\n
SPU_020692	SPU_020692	none	This is the 5'end end of the gene.  the rest is located in SPU_007086\n
SPU_016113	SPU_016113	none	 missing some N-terminus residues\n
SPU_025888	SPU_025888	none	The gene model C-term is likely incorrect.\n
SPU_024199	SPU_024199	none	#\nMay have an extra exon at the end.\n
SPU_021755	SPU_021755	none	Duplicate prediction for SPU_000408\n
SPU_012849	SPU_012849	none	SPU_012849 is a partial duplicate prediction for SPU_009023.\n
SPU_003696	SPU_003696	none	Missing ~150 AA at beginning.\n
SPU_026065	SPU_026065	none	SPU_026065 is a partial duplicate prediction for SPU_010328.\n
SPU_012417	SPU_012417	none	This Glean is part of the annotated full length Sp_DNAH2 gene (SPU_030224)\n
SPU_014179	SPU_014179	none	Likely missing an exon.\n
SPU_014882	SPU_014882	none	Incorrect gene model. Extra exon in middle?\n
SPU_017682	SPU_017682	none	Inspection of the tiling array suggests that glean may have missed the following exons: LSCLVFFKLKDDPPDMPDMGFPLSSHDSADQSPLDTALSVSAMLVETGSDHNSDSDFMTGDGVIPGDMVSGFRESTINLQDLE,PPPLPPPPPKTLRRMPSHRLQLHPSPAHPSSRLSPCSLPAPGDYPYPLPWPRPSSPSSHRIHPPTPPHPPSSSPHLHWYHRHRR,DGCHPTGYSCTRLQRIPPLVFLRAHCQLRGTIHTPFLGHARPLHPPTEFILQPLRTLHHPPLIFTGIIVIVGEGVATSICSPYHHHHGHALDHAQGVPGEQEWLGKDRLSGGNCHTSAASRCSGALEEDR,INLTTTTTTTTTKDSETDAIPPVTAAPVSSASLLSSFSVLTASSGGLSIPPSLATPVLSILPPNSSSNPSAPSIILPSSSLVSSSSSVKESLPVSAPPTTTTTVTPSTTPKVYLENRSGSARTDYRAVTATHPLPLDALERLKKTGNY\n
SPU_001634	SPU_001634	none	motor domain\n
SPU_010009	SPU_010009	none	Incorrect gene model. May be a mix of two separate genes.\n
SPU_018683	SPU_018683	none	gal_lec, gal_lec\n
SPU_014701	SPU_014701	none	Dynein heavy chain, N-terminal region 1. Dynein heavy chains interact with other heavy chains to form dimers, and with intermediate chain-light chain complexes to form a basal cargo binding unit. The region featured in this family includes the sequences implicated in mediating these interactions. It is thought to be flexible and not to adopt a rigid conformation.\n
SPU_003084	SPU_003084	none	This gene spans two GLEAN predictions: SPU_005979 + SPU_003084 \n \nExon \tStart \tStop \tScaffold \n1\t58051\t57408\t1179 \n1?\t54625\t55052?  21642\tincomplete \n2\t19833\t19675\t1179 \n3\t16229\t16386\t1201 \n4\t17471\t17710\t1201 \n5\t18192\t18357\t1201 \n6\t31949\t32079\t1201 \n7\t40928\t41191\t1201 \n8\t41532\t41731\t1201 \n9\t42426\t42601\t1201 \n10\t43112\t44008\t1201 \nDatabase version 2005/07/18\n
SPU_023976	SPU_023976	none	Looks to be an allele of SPU_004037.\n
SPU_028275	SPU_028275	none	This gene spans two GLEAN predictions: SPU_002473 + SPU_028275 \nThe SPU_002473 prediction appears to include two different genes, a helicase/ zinc finger protein and the first two Sp-osteonectin exons \n \nExon \tStart \tStop \tScaffold \n1\t27772\t27363\t56445 \n2       17533   17392   56445 \n3       19999   20241   2764 \n4       21405   21615   2764 \n5       23776   23924   2764 \n6       26489   26684   2764 \nDatabase version 2005/07/18\n
SPU_001893	SPU_001893	none	This model was identified in part on the basis of PFAM Lectin_C domains.  It has alternating Lectin_c and Fn3 domains.\n
SPU_009885	SPU_009885	none	THis gene was identified partially on the basis of the PFAM domains included in the model\n
SPU_014594	SPU_014594	none	5' and 3' UTRs added based on EST evidence \n \nThis and paralog SPU_022537 are urchin-specific homologs of metazoan 14-3-3 proteins. It is likely that the ancestral metazoan had 2 14-3-3 proteins. One epsilon ortholog (SPU_003825) and one other. This other has undergone differential expansion in vertebrates, nematodes, insects and echinoderms.\n
SPU_022537	SPU_022537	none	5' and 3' UTRs added based on EST evidence \n \nThis and paralog SPU_022537 are urchin-specific homologs of metazoan 14-3-3 proteins. It is likely that the ancestral metazoan had 2 14-3-3 proteins. One epsilon ortholog (SPU_003825) and one other. This other has undergone differential expansion in vertebrates, nematodes, insects and echinoderms.\n
SPU_006917	SPU_006917	none	Note that the 3' end of this gene (encoding the C-terminus of the protein) is contained on Scaffold118064, and that there is some overlap between the sequence at the 5' end of that scaffold and the 3' end of Scaffold874, which contains the 5' end of the gene (encoding the N-terminus).\n
SPU_007852	SPU_007852	none	The existence of this second Runx gene was predicted based on  Southern genomic blot analysis (Coffman et al., Dev. Biol. 174 (1), 43-54 (1996)).  No evidence for expression in the embryo.\n
SPU_023469	SPU_023469	none	Partial gene model. Single exon in the scaffold matches part of SPU_001638. Possible assembly problem\n
SPU_010876	SPU_010876	none	Based on comparison with the cloned orthologue from Lytechinus pictus (LpPKC1, acc. no. U02967), this Glean3 prediction encodes the C-terminal half of the protein (beginning with exon 8), corresponding to nucleotides 1108-2177 of the LpPKC1 cDNA (note that the full length CDS entered here is that of the Sp homologue, which was cloned by RT-PCR in our lab using primers based on manual assembly of the gene using EST sequences, genomic trace sequences, and LpPKC1 as a scaffold; this sequence has not yet been deposited at NCBI).  There appears to be missing sequence corresponding to at least one exon, between exon 9 and the C-terminal two exons; some of this falls on a very short scaffold (Scaffold131377). The sequence encoding the N-terminal half (exons 1-7) is contained on Scaffold161 (in gene models SPU_001048 and SPU_001049), corresponding to nucleotides 1-1107 of LpPKC1.\n
SPU_013522	SPU_013522	none	There is a missing chunk of amino acids 1-125 that is probably most likely on another Scaffod and 312-382 That is probably lost in sequence of  \nNNNNNNN\n
SPU_007485	SPU_007485	none	3' utr extended information added from the spline est data\n
SPU_013414	SPU_013414	none	the C-terminus of this known gene has its own Glean3 number and prediction (SPU_014515); combined predictions were annotated in this entry\n
SPU_014515	SPU_014515	none	this glean model is a c-terminus of another glean-defined model (gleas3_13414); all information will be entered with that glean annotation\n
SPU_025428	SPU_025428	none	EST data from cleavage-blastula\n
SPU_014131	SPU_014131	none	this is clearly an ortholog of LvNotch along its entire length.  Therefore the expression patterns which are known for that species are shown in the embryo expression series\n
SPU_026277	SPU_026277	none	Reference: \nFerkowicz,M.J., Stander,M.C. and Raff,R.A. \nPhylogenetic relationships and developmental expression of three sea urchin Wnt genes \nMol. Biol. Evol. 15 (7), 809-819 (1998)\n
SPU_011756	SPU_011756	none	Reference: \nFerkowicz,M.J., Stander,M.C. and Raff,R.A. \nPhylogenetic relationships and developmental expression of three sea urchin Wnt genes \nMol. Biol. Evol. 15 (7), 809-819 (1998)\n
SPU_024792	SPU_024792	none	potential duplication of the scaffold region, SPU_024793 prediction has exactly 100% identical exons\n
SPU_024793	SPU_024793	none	the prediction is incomplete; another prediction on the same scaffold is longer and contains the regions within this glean as if duplicated\n
SPU_010040	SPU_010040	none	the Glean only described the C-terminal part of the gene; \nhuman ortholog Accession Number is NP_892023\n
SPU_020371	SPU_020371	none	Reference: \nWikramanayake,A.H., Peterson,R., Chen,J., Huang,L., Bince,J.M., McClay,D.R. and Klein,W.H. \nNuclear beta-catenin-dependent Wnt8 signaling in vegetal cells of the early sea urchin embryo regulates gastrulation and differentiation of endoderm and mesodermal cell lineages \nGenesis 39 (3), 194-205 (2004)\n
SPU_023952	SPU_023952	none	The 3' end of the Guisti et al. mRNA sequence is not part of any known Abl protein.  It is probably UTR, but it has specific hits on two Scaffolds, # 41 and 1211. \n
SPU_010021	SPU_010021	none	Histidine Decarboxylase\n
SPU_003345	SPU_003345	none	See putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137536970-15530-214731484586.BLASTQ1\n
SPU_019261	SPU_019261	none	This is a partial sequence.  It appears to be the N-terminal region of Fmi2.  Another glean sequence, Glean_09215, contains this region plus the C-terminal sequence either of this gene or a duplicate form of this gene.  \n
SPU_009526	SPU_009526	none	Exon 1-8 of this gene are located on another scaffold (scaffold 72753 with SPU_026371 gene model). \nPlease refer to SPU_026371 for complete modified gene model.\n
SPU_012374	SPU_012374	none	Missing 5' end, no start site \n
SPU_018391	SPU_018391	none	5' partial  \nGLEAN_18392 belongs to 3' end of SpTbx20\n
SPU_020345	SPU_020345	none	5' partial \nGLEAN_20346 belongs to 3' end of SpTbx6/16 \n
SPU_024946	SPU_024946	none	5' partial  \nSPU_024947 belongs to 3' end of SpWntA\n
SPU_015341	SPU_015341	none	POSSIBLE DUPLICATES SPU_012295, 03996.\n
SPU_012294	SPU_012294	none	contains part of a peptidiase M16 inactive domain.  looks like part of an insulin degrading enzyme\n
SPU_023463	SPU_023463	none	5' partial \nSPU_023065& 24669 (identical duplicated seq) belongs to 3' end of SpWnt4  \n \nReference: \nFerkowicz,M.J., Stander,M.C. and Raff,R.A. \nPhylogenetic relationships and developmental expression of threesea urchin Wnt genes \nMol. Biol. Evol. 15 (7), 809-819 (1998)\n
SPU_026099	SPU_026099	none	Identical to cDNA cloned in our lab\n
SPU_017534	SPU_017534	none	This gene is on two scaffolds (441 and 84105). On scaffold441, two GLEAN models are predicted for this gene (GLEAN_17533 for exon 1-8 and GLEAN_17534 for exon 9-16). On scafold 84105, there is one GLEAN model (SPU_017719 for exon 17-22) prediceted for this gene. \nPlease refer to GLEAN_17533 for refined gene features.\n
SPU_017533	SPU_017533	none	This gene is on two scaffolds (441 and 84105). On scaffold441, two GLEAN models are predicted for this gene (GLEAN_17533 for exon 1-8 and GLEAN_17534 for exon 9-16). On scafold 84105, there is one GLEAN model (SPU_017719 for exon 17-22) prediceted for this gene.\n
SPU_019715	SPU_019715	none	Amino Acids 46-369 of Sp-Mapk match with part of SPU_019715 prediction. Amino acides 1-45 of Sp-Mapk match with supertig69362_2 on scaffold 69362\n
SPU_017719	SPU_017719	none	This gene is on two scaffolds (441 and 84105). On scaffold441, two GLEAN models are predicted for this gene (GLEAN_17533 for exon 1-8 and GLEAN_17534 for exon 9-16). On scafold 84105, there is one GLEAN model (SPU_017719 for exon 17-22) prediceted for this gene. \nPlease refer to GLEAN_17533 for refined gene features.\n
SPU_017790	SPU_017790	none	Highly homologous to a cDNA cloned in Sphaerechinus granularis (AJ841701)\n
SPU_019369	SPU_019369	none	Near-complete annotation for Rendezvin.  Also includes sequences from Scaffolds 102292 and 92747 (SPU_027023).\n
SPU_000668	SPU_000668	none	the predicted protein sequence bears 2 AA substitution (107H->N - 203L->V) compared to S.Purp_Univin previously published (AAA57553). \nThis might reflect polymorphism. \n
SPU_024859	SPU_024859	none	#\npartial prediction corresponding to the N-terminal end of eIF4G. In this scaffold, exon 21 stops at position 39027 (instead of  39057), and  exon 22 (not predicted) spans nt 39579 to the end of the scaffold. SPU_019064 corresponds to the C-terminus counterpart,there is a stretch of Ns between exon 1 and 2 of this prediction, these exons correspond respectively to exons 23 and 25, the missing exon (#24) is in scaffold155969 (=SPU_012286). Exon 17 is duplicated (scaffold137842, SPU_006085).  \nThe modified gene model is entered in the gene features and is highly homologous to a Sphaerechinus granularis cDNA. \n
SPU_020678	SPU_020678	none	I do not have any experimental evidence, but based on alignments with human and mouse Nanos1 orthologs I feel this GLEAN prediction may not be correct. I predict that the gene is only one exon and has the following sequence: \n \nATGGAGACATCTTCTTGGGATCTTTTCATGGGGAAAGGGTTGAACCTCAGTGAGATCATTTCTTCGACAAGCTGGAAAACTCCTCCAACCATGGCCATGCCACAACATTCACCAGCGATGTGGCCATCATCTCCGTGCCCATCGCCGCCTATGTCTCCATGGCCAGCTTTATCTCCCCCTATGTCTCCATGGCCAGCTCTATCTCCCTCAAGCACCGTACCACCATCAGCTTCACCACCACCATCAGCATCATCATCGCCGCATGAAGATGAGTTGATATTTCGATCCAGCTTTACCGACACCCTATCTGTCTCTTATGAGAAGAAGCGATACCTCAACACTTACTGCGTGTTCTGTAAGAACAACAAAGAAACTCTTTGCTTCTACAGCTCTCATGTCCTGAAGGATGATTTGGGGAACGTTCAATGTCCTGTTCTTAGGGCTTACAAGTGTCCTATTTGTGGGGCGAAGGGTGATAATGCGCACACCGTCAAGTATTGTCCTCAAAATTCCAGTTCATCAAAAGCCGAGAAGCTGACCAAATCATCAGGTTGCTGGTCGGATTACCCATCACCCCCGGGATTTTTTTAA\n
SPU_026559	SPU_026559	none	I don't have any experimental evidence, but this gene may not be annotated correctly. There is a large insertion in the middle of the protein that is not found in the human, mouse, fly, or worm orthologs.\n
SPU_027826	SPU_027826	none	Tiling microarray data predicts additional expressed tag between exons 3 and 4, but it's not present in the est for this same gene. This gene is homologous to the isoform 5 of mammalian Gbeta, but since urchin has only 1 other isoform (homologous to Gbeta1-4, and named "a"), this one is "b".\n
SPU_016157	SPU_016157	none	This gene spans two GLEAN predictions: SPU_018353 + SPU_016157 \nThe SPU_016157 prediction is contained within SPU_003874 (exons 3 and 4); gene duplication probably due to assembly and/or haplotype \n \nExon \tStart \tStop \tScaffold \n1\t48765\t48989\t773 \n2       51585   51683   773 \n3       61327   61444   773 \n3?      41672   41789   85877 \n4       62157   62392   773 \n4?      42315   42550   85877 \n5       63630   63768   773 \n6       64653   64771   773 \n7       65224   65337   773 \n8       65903   66014   773 \n9       66514   66719   773 \n10      67327   67588   773 \n11      68104   68246   773 \n12      68855   68980   773 \n13\t71135   71325   773\t3'UTR missing \nDatabase version 2005/07/18\n
SPU_015335	SPU_015335	none	Deleted 5' exon (16,057-16,235), not present in known cDNAs.\n
SPU_003874	SPU_003874	none	This genes spans three GLEAN predictions: SPU_010593 + SPU_013086 + SPU_003874 \nThe SPU_013086 and SPU_003874 predictions are overlapping (exons 3 and 4); gene duplication probably due to assembly \n3'UTR of this gene is missing \n \nExon \tStart \tStop \tScaffold \n1\t24676\t24229\t23709 \n1?\t2159\t1712\t12273 \n2\t13236\t13078\t23709 \n3\t95271\t95968\t1679 \n3?\t5804\t5107\t38156 \n4\t96341\t96594\t1679 \n4?\t4742\t4489\t38156 \n5\t106938\t107067\t1679 \n6\t109648\t109828\t1679 \n7\t110254\t110486? 1679\t3'UTR missing \nDatabase version 2005/07/18\n
SPU_013086	SPU_013086	none	This genes spans three GLEAN predictions: SPU_010593 + SPU_013086 + SPU_003874 \nThe SPU_013086 and SPU_003874 predictions are overlapping (exons 3 and 4); gene duplication probably due to assembly \n3'UTR of this gene is missing \n \nExon \tStart \tStop \tScaffold \n1\t24676\t24229\t23709 \n1?\t2159\t1712\t12273 \n2\t13236\t13078\t23709 \n3\t95271\t95968\t1679 \n3?\t5804\t5107\t38156 \n4\t96341\t96594\t1679 \n4?\t4742\t4489\t38156 \n5\t106938\t107067\t1679 \n6\t109648\t109828\t1679 \n7\t110254\t110486? 1679\t3'UTR missing \nDatabase version 2005/07/18\n
SPU_018505	SPU_018505	none	This is just a C-terminus of expected Gbeta cDNA. Since it looks like mammalian betas 1-4, it is designated A. Number 3 refers to the fact that it is a C-terminal piece of 3 pieces found. 3'UTR might extend until nt 8,000 by chip data.\n
SPU_000199	SPU_000199	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR.   \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_008526	SPU_008526	none	Middle part of the urchin GbetaA (hence A2). Further sequence located on scaffold59878, and "predicted" as SPU_018505. Two N-terminal exons exactly match sequences from another scaffold27704, which contains more N-terminal regions of what I think is the same gene. Prediction is modified to match est data.\n
SPU_017106	SPU_017106	none	Part of the cDNA sequence (AY130972 ) could not be found anywhere in the genome. Therefore, accept gene model prediction for exon 2. \n \nExon 1-3 of this gene are on Scaffold 87957 (SPU_017106). Exon 3-7 are on Scaffold 107218 (SPU_025601). The two scaffolds have exon 3 and flanking sequences overlap and can be aligned quite well (e=0). Therefore, the two scaffold might be assembled as one scaffold. The modified gene model is composed of Exon 1-3 from Scaffold 87957 and exon 4-7 from scaffold 107218. 3' UTR is extended based on the EST data. \n \nExon 5-7 are also on Scaffold 55268 (SPU_005718). This might be another allele of this gene.\n
SPU_008194	SPU_008194	none	daz homolog\n
SPU_018353	SPU_018353	none	This gene spans two GLEAN predictions: SPU_018353 + SPU_016157 \nThe SPU_016157 prediction is contained within SPU_003874 (exons 3 and 4); gene duplication probably due to assembly and/or haplotype \n \nExon \tStart \tStop \tScaffold \n1\t48765\t48989\t773 \n2       51585   51683   773 \n3       61327   61444   773 \n3?      41672   41789   85877 \n4       62157   62392   773 \n4?      42315   42550   85877 \n5       63630   63768   773 \n6       64653   64771   773 \n7       65224   65337   773 \n8       65903   66014   773 \n9       66514   66719   773 \n10      67327   67588   773 \n11      68104   68246   773 \n12      68855   68980   773 \n13\t71135   71325   773\t3'UTR missing \nDatabase version 2005/07/18\n
SPU_014573	SPU_014573	none	Highest homology to CYP3 family genes. Phylogenetically falls at base of CYP3 family, within CYP clan 3 (MrBayes v3.0.1 WAG+gamma+I model). Percent ID (on masked basis) to CYP3A4 (human) of 43.8% (aa level).\n
SPU_012253	SPU_012253	none	5' and 3' UTR are extended based on EST and expression data.\n
SPU_000436	SPU_000436	none	Highest homology to CYP3 family genes. Phylogenetically falls at base of CYP3 family, within CYP clan 3 (MrBayes v3.0.1 WAG+gamma+I model). Percent ID (on masked basis) to CYP3A4 (human) of 40.8% (aa level).\n
SPU_021555	SPU_021555	none	This gene encodes a precursor for seven putative SALMFamide neuropeptides (see Elphick & Thorndyke, 2005, J. Exp. Biol. 208, 4273-4282.) \nGLEAN predicts 3 exons. However, the second exon encodes a signal peptide (see above paper), which is always located at the N-terminus of neuropeptide precursors. Moreover, the tiling data do not show up a signal for exon 1 of the GLEAN prediction. Therefore, I think it is likely that the GLEAN prediction is wrong and should be changed so that the predicted CDS is derived only from the 2nd and 3rd exons of the GLEAN prediction. The tiling data also show signals for sequences located 5' and 3' to the two CDS exons; these may correspond to UTRs but this needs to be confirmed by EST/cDNA sequencing.\n
SPU_015450	SPU_015450	none	orb-like, similar to cytoplasmic polyadenylation binding protein 1\n
SPU_007822	SPU_007822	none	most of the N-terminal part of the glean prediction seems inaccurate.\n
SPU_000615	SPU_000615	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(15 to 23), LRR-CT, TM and TIR.  \n
SPU_000911	SPU_000911	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR.  \nThis is a member of sea urchin-specific Tlr Group IA. \n
SPU_002442	SPU_002442	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR(22), LRR-CT, TM and TIR. This is a member of sea urchin-specific Tlr Group I(orphan). \n
SPU_002538	SPU_002538	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR.  \nThis is a member of sea urchin-specific Tlr Group IB. \n
SPU_004139	SPU_004139	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR.   \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_004150	SPU_004150	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR.  \nThis is a member of sea urchin-specific Tlr Group IA. \n
SPU_004311	SPU_004311	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR.  \nThis is a member of sea urchin-specific Tlr Group IE. \n
SPU_004360	SPU_004360	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(18 to 23), LRR-CT, TM and TIR.  \n
SPU_005088	SPU_005088	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(18 to 24), LRR-CT, TM and TIR.  \n
SPU_027278	SPU_027278	none	Amino Acid number 41 predicted from cloned cDNA is an alanine. Glean3 model predicts a T. \n \nSPU_012529 predicts the same ORF\n
SPU_005950	SPU_005950	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(11 to 22), LRR-CT, TM and TIR. \nThis is a member of sea urchin-specific Tlr Group IA.  \n
SPU_006164	SPU_006164	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(23), LRR-CT, TM and TIR.  \nThis is a member of sea urchin-specific Tlr Group IB.  \n
SPU_006458	SPU_006458	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(18 to 24), LRR-CT, TM and TIR.  \n
SPU_007790	SPU_007790	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(10 to 22), LRR-CT, TM and TIR.  \n
SPU_008278	SPU_008278	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(13 to 22), LRR-CT, TM and TIR.  \n
SPU_008396	SPU_008396	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(10 to 21), LRR-CT, TM and TIR.  \n
SPU_008456	SPU_008456	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(16 to 24), LRR-CT, TM and TIR.  \n
SPU_004599	SPU_004599	none	In the absence of ESTs or cDNA sequences from purpuratus, the 5 and 3' UTR regions have not been described here. \n
SPU_008962	SPU_008962	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR.  \nThis is a member of sea urchin-specific Tlr Group IB.  \n
SPU_008963	SPU_008963	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR.   \nThis is a member of sea urchin-specific Tlr Group IB. \n
SPU_009037	SPU_009037	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR.  \nThis is a member of sea urchin specific Tlr Group IA.\n
SPU_010575	SPU_010575	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(11 to 22), LRR-CT, TM and TIR.  \n
SPU_010695	SPU_010695	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR.  \nThis is a member of sea urchin-specific Tlr Group IA. \n
SPU_011537	SPU_011537	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR. \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_006988	SPU_006988	none	This gene was annotated based on a manual revision of multiple protein sequence alignments. \nThe predicted N-terminal SH2 domains align best with vertebrate Syk genes, whereas the predicted C-terminal tyrosine kinase domain aligns best with various vertebrate ZAP-70 tyrosine kinases. \n \nThere is some extra exons present in the NCBI prediction (XM_793943.1), however SPU_006988 shows a better pairwise alignment to murine Syk. There is still a few of gaps in such alignment, which suggest there might be excess sequence in this glean model. \n
SPU_012257	SPU_012257	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(14 to 24), LRR-CT, TM and TIR.  \n
SPU_013470	SPU_013470	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(17 to 24), LRR-CT, TM and TIR.  \n
SPU_013824	SPU_013824	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(16 to 23), LRR-CT, TM and TIR.  \n
SPU_014041	SPU_014041	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR. \nThis is a member of sea urchin-specific Tlr Group IB.   \n
SPU_014073	SPU_014073	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(15 to 22), LRR-CT, TM and TIR.  \n
SPU_014266	SPU_014266	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR. \nThis is a member of sea urchin-specific Tlr Group IE. \n
SPU_015066	SPU_015066	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR.   \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_005171	SPU_005171	none	See also the latest tree of Hox affinities: \n \nCameron, R.A., Rowen, L., Nesbitt, R., Bloom, S., Rast, J.P., Berney, K., Arenas-Mena, C., Martinez, P., Lucas, S., Richardson, P.M., Davidson, E.H., Peterson, K. J. and Hood, L. (2005) Unusual gene order and organization of the sea urchin HOX cluster. J. Exp. Zoology. In press.\n
SPU_002634	SPU_002634	none	This GLEAN prediction only corresponds to the N-terminus of SpHox7. The homeodomain and C-terminus of this protein are predicted as SPU_005170. \nIn fact SPU_005170 contains the 2nd exon of the gene plus a misspreddicted miniexon.\n
SPU_015303	SPU_015303	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(8 to 21), LRR-CT, TM and TIR.   \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_016457	SPU_016457	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(10 to 22), LRR-CT, TM and TIR.  \n
SPU_019309	SPU_019309	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR.  \nThis is a member of sea urchin-specific Tlr Group IA. \n
SPU_021420	SPU_021420	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(15 to 23), LRR-CT, TM and TIR.  \n
SPU_022911	SPU_022911	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(18 to 24), LRR-CT, TM and TIR.  \n
SPU_000388	SPU_000388	none	See also the latest tree of Hox affinities: \n \nCameron, R.A., Rowen, L., Nesbitt, R., Bloom, S., Rast, J.P., Berney, K., Arenas-Mena, C., Martinez, P., Lucas, S., Richardson, P.M., Davidson, E.H., Peterson, K. J. and Hood, L. (2005) Unusual gene order and organization of the sea urchin HOX cluster. J. Exp. Zoology. In press.\n
SPU_023035	SPU_023035	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(14 to 23), LRR-CT, TM and TIR.  \n
SPU_023321	SPU_023321	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR. \nThis is a member of sea urchin-specific Tlr Group I(orphan). \n \n
SPU_027600	SPU_027600	none	cDNA sequence from another individual has been submitted to genebank (DQ082723).  The sequence differs from FgenesH prediction in simple sequence repeat region.    \n
SPU_024062	SPU_024062	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(11 to 22), LRR-CT, TM and TIR.  \n
SPU_024204	SPU_024204	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR.   \nThis is a member of sea urchin-specific Tlr Group IC.\n
SPU_024205	SPU_024205	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR. \nThis is a member of sea urchin-specific Tlr Group IC.  \n
SPU_000669	SPU_000669	none	extra exon on 5'end... \nunclear duplication GLEAN_21497\n
SPU_024208	SPU_024208	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR.   \nThis is a member of sea urchin-specific Tlr Group IC.\n
SPU_024385	SPU_024385	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR. \nThis is a member of sea urchin-specific Tlr Group IE. \n
SPU_024429	SPU_024429	none	#\nIntronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR. \nThis is a member of sea urchin-specific Tlr Group IE. \n
SPU_024731	SPU_024731	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(18 to 23), LRR-CT, TM and TIR.  \n
SPU_024733	SPU_024733	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(13 to 23), LRR-CT, TM and TIR.  \n
SPU_024868	SPU_024868	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(24), LRR-CT, TM and TIR.  \nThis is a member of sea urchin-specific Tlr Group IB.  \n
SPU_026200	SPU_026200	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(16 to 24), LRR-CT, TM and TIR.  \n
SPU_028639	SPU_028639	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(11 to 21), LRR-CT, TM and TIR.  \n
SPU_028893	SPU_028893	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR. \nThis is a member of sea urchin-specific Tlr Group ID.  \n
SPU_025646	SPU_025646	none	Partial sequence.  DIX domain missing and some deletions (in comparison to the Lytechinus variegatus form).  There appears to be a mistake in sequencing.  Individuals who have sequenced this gene in S. purpuratus did not note these deletions.\n
SPU_013536	SPU_013536	none	Initial Glean3 model is missing exons 681-782 and 2024-2126 and has an incorrect, low complexity exon 2154-2787 as exon 1.  The 5' end of this cDNA is in SPU_009788(Scaffold 1841)\n
SPU_013810	SPU_013810	none	This gene was annotated based on a manual revision of multiple protein sequence alignments. \n \nNote: there are slight differences between the Glean and other predictions. None of them shows a significantly improved alignment to Pptn11.\n
SPU_025612	SPU_025612	none	Encodes the C-terminus of SPU_006917 (see annotation to that gene).\n
SPU_001048	SPU_001048	none	This is a gene that falls on two different scaffolds.  See SPU_010876 for full annotation.\n
SPU_009788	SPU_009788	none	This is the 5' end of the Sp-Alpha P subunit the completed annotation is on SPU_013536 \n
SPU_021303	SPU_021303	none	#\nActual Exon number 6 was missing in GLEAN3 prediction.\n
SPU_003911	SPU_003911	none	This gene was fused to an adjacent glean model (SPU_003912) to obtain a full sequence that best aligns with vertebrate IL1AP (the last exon in the original version of this model was removed from the modified model). The DNA and protein sequences corresponding to the modified model are provided.\n
SPU_000428	SPU_000428	none	56 nucleotides encoding a predicted signal peptide were added at the 5'end of the GLEAN3 model by comparison to the corresponding FgeneshAB prediction.  \n \nIntronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(19 to 23), LRR-CT, TM and TIR.  \n
SPU_001877	SPU_001877	none	1381 nucleotides encoding a predicted signal peptide, LRRNT, LRRs were added at the 5'end of the GLEAN3 model by comparison to the corresonding FgeneshAB prediction.  \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_003912	SPU_003912	none	A modified version of this model was fused to its adjacent glean model (SPU_003911) based on manually revised sequence alignments to vertebrate IL1AP sequences. \n \nSee SPU_003911 for more details, features and modified sequence.\n
SPU_021516	SPU_021516	none	Deleted 5' exon (6960-6973) not represented in known cDNAs. SPU_021516 is missing N-terminal half of amino acid coding sequence reported in AF061750, AF036902.\n
SPU_010245	SPU_010245	none	This annotation is based on a manual revision of multiple protein sequence alignments. \n \nNote that the protein encoded by this gene is shorter than that of its vertebrate homologs, which indicates there might be some N-terminal sequence missing from this model. \nSimilar predictions do not add a significant amount of sequence, nor do they improve the protein alignment to vertebrate SOCS7.\n
SPU_011298	SPU_011298	none	In a neighbor-joining tree based on multiple sequence alignments with vertebrate and fruit fly SOCS-related sequences, this gene does not co-group with any distinct vertebrate homolog. Because its sister group includes hSOCS6 and hSOCS7, and because the most significant Blast hit is to hSOCS6 is that we named this gene Sp-SOCS6-like; but it should be noted that this name only reflects its closest similarity to vertebrate SOCS6, and should not be taken as to reflect true orthology. \n \nThere are slightly different predictions for this gene, but none of them significantly improved its protein  alignment to vertebrate SOCS genes.\n
SPU_002792	SPU_002792	none	Even though the best Blast hit for this gene is to hSOCS2, we named SPU_002792 Sp-Socs2/3 because it seems similarly related to both as indicated by neighbor joining trees made from various SOCS-related sequences (it typically co-distributes with a sister group that includes both SOCS2 and SOCS3 and no other vertebrate SOCS genes). \n \nThe embryonic expression of this gene is supported by tiling array data.\n
SPU_020879	SPU_020879	none	Glean_20879 sequence incomplete in 5', completed with Glean_12710\n
SPU_002224	SPU_002224	none	75 nucleotides encoding a predicted signal peptide were added at the 5'end of the GLEAN3 model by comparison to the corresponding FgeneshAB prediction.  \n \nIntronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(14 to 24), LRR-CT, TM and TIR.  \n
SPU_026496	SPU_026496	none	Even though the best Blast hit for this gene is to vertebrate SOCS5, we named SPU_002792 Sp-Socs4/5 because it seems similarly related to both as indicated by neighbor joining trees made from various SOCS-related sequences (it typically co-distributes with a sister group that includes both SOCS2 and SOCS3 and no other vertebrate SOCS genes). \n \nBetter Blast hits to this gene were found, but they all correspond to predicted sequences in various genomes. The accession number provided is to murine SOCS5. \n \nNote that the protein encoded for this gene has a longer N-ter region than that encoded by its vertebrate counterparts. However, the embryonic tiling array data correlate well with the entire predicted coding exon, which argues against possible annotation mistakes.\n
SPU_019743	SPU_019743	none	located in an intron of predicted SPU_019742 (a cysteine protease)\n
SPU_012710	SPU_012710	none	5'region of Glean_20879\n
SPU_003578	SPU_003578	none	54 nucleotides encoding a predicted signal peptide were added at the 5'end of the GLEAN3 model by comparison to the corresponding FgeneshAB prediction.  \n \nIntronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(12 to 23), LRR-CT, TM and TIR.  \n
SPU_003579	SPU_003579	none	#\n54 nucleotides encoding a predicted signal peptide and 138 nucleotides encoding a part of TIR domain were added at the 5'end of the GLEAN3 model by comparison to the corresponding FgeneshAB prediction and BLASTN.  \nThis gene model may be a pseudogene. \n
SPU_014243	SPU_014243	none	Predicted amino acid sequence from exon 1 (N-terminus) is not present in mammalian homologs.\n
SPU_011539	SPU_011539	none	96 nucleotides encoding a predicted signal peptide were added at the 5'end of the GLEAN3 model by comparison to the corresonding FgeneshAB prediction.  \n \nIntronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR.  \nThis is a member of sea urchin specific Tlr Group IA.\n
SPU_022112	SPU_022112	none	Sp-CSK spans two scaffolds 40333 and begins on the very small 127652.  Somehow the N-terminus half of the predicted sequence has a 7-transmembrane domain that DOES NOT BELONG!!!!  The origional scaffolds, above, contain the correct sequence predicted.\n
SPU_015533	SPU_015533	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR. There are 50 unknown nucleotides (NNN) in the TIR domain. \nThis is a member of sea urchin specific Tlr Group IA.\n
SPU_007343	SPU_007343	none	The alignment between this protein and murine MyD88 is very strong, except for the N-terminus of GLEAN_07343, which is notably longer that that of vertebrate MyD88 genes. There are several internal Methionines in this glean model, which could account for this seemingly "extra" N-terminus sequence. \n \nA partial duplication of this model was found in SPU_007342.\n
SPU_018394	SPU_018394	none	1) alignment with best blast hits suggest glean model contains complete coding sequence. \n \n2) There is an excellent, but short, match on scaffold76149_1, 1230 to 1344, which is not on the glean3 list.\n
SPU_007004	SPU_007004	none	Actually two GLEAN3 predictions (SPU_007004 & SPU_017646) give the same "best genebank hit" (BMP11/GDF11_DanioRerio)  \nbut the 2 sequence divergence seems too high to be due to duplication. \nSPU_017646 was then annotated as "Sp-BMP11b"\n
SPU_017647	SPU_017647	none	Same best genbank hit than SPU_007004 (GDF11/BMP11_DanioRerio). \nbut most likely not a duplication. \nArbitrarily called Sp-BMP11b \nto differenciate from Sp-BMP11 (SPU_007004) \n \n
SPU_017352	SPU_017352	none	#\nSee also the latest tree of Hox affinities: \n \nCameron, R.A., Rowen, L., Nesbitt, R., Bloom, S., Rast, J.P., Berney, K., Arenas-Mena, C., Martinez, P., Lucas, S., Richardson, P.M., Davidson, E.H., Peterson, K. J. and Hood, L. (2005) Unusual gene order and organization of the sea urchin HOX cluster. J. Exp. Zoology. In press.\n
SPU_017469	SPU_017469	none	A cap in this contig contains at least one exon for this gene. \n
SPU_002874	SPU_002874	none	Amino Acids 1-97 of Sp-Ets1-2 match with part of SPU_004409 prediction on Scaffold1036. Amino acides  \n464-555 of Sp-Ets1-2 match with part of SPU_027053 prediction on Scaffold28371.\n
SPU_005170	SPU_005170	none	-This GLEAN prediction covers only the C-terminal part (second exon) of the known protein. The N-terminal part (first exon) is in SPU_002634 \n-One GLEAN? predicted miniexon has been deleted. It doesn't appear in the known cDNA sequence. \n \n-See also the latest tree of Hox affinities: \n \nCameron, R.A., Rowen, L., Nesbitt, R., Bloom, S., Rast, J.P., Berney, K., Arenas-Mena, C., Martinez, P., Lucas, S., Richardson, P.M., Davidson, E.H., Peterson, K. J. and Hood, L. (2005) Unusual gene order and organization of the sea urchin HOX cluster. J. Exp. Zoology. In press.\n
SPU_002680	SPU_002680	none	ESTs cover nearly the whole gene. 5' and 3' UTRs not annotated.\n
SPU_015378	SPU_015378	none	See SPU_015377\n
SPU_015379	SPU_015379	none	See SPU_015377\n
SPU_000547	SPU_000547	none	See SPU_015377\n
SPU_015377	SPU_015377	none	SPU_015377, 15378, 15379, and 106060 combine to form one gene.    15377, part of 15378, and 15379 are contiguous and all blast as alpha integrins and SPU_000547 is an overlapping sequence on a small scaffold that includes the last 4 exons of the gene.  The start of translation and first approximately 50 aa not included.\n
SPU_009335	SPU_009335	none	Supported by 3 ESTs\n
SPU_013815	SPU_013815	none	1st 3 exons are in SPU_013814.  Junction between them is not correct.\n
SPU_013814	SPU_013814	none	This is the first 3 exons of the gene.  The rest is on SPU_013815.  That sequence was annotated to have the correct sequence, except that an exon joining the two GLEANs seems to be missing.\n
SPU_011307	SPU_011307	none	Removed two incorrectly added exons.  One EST covers about 1 kb of the gene.\n
SPU_012694	SPU_012694	none	Also could be considered an ortholog of ABCD2.  The 3' end may be wrong.  The NCBI prediction gives a different C-terminus, but there are no ESTs to verify the correct one.\n
SPU_007530	SPU_007530	none	We have cloned Chk1 from S. pupruratus eggs and are in the process of modifying the model with the updated sequence. There appears to be another exon not identified in the model.\n
SPU_014051	SPU_014051	none	Only exons 4 and 5 are present on this scaffold1351. There is a huge gap of Ns (>100kb) where exons 1-3 could be located. However, exons 1-3 are present on scaffold21 SPU_027234 but exons 4-5 are missing. \n
SPU_024525	SPU_024525	none	cloned from egg cDNA \nGlean sequence accepted, however further validation needed for sequence not matching the cloned mRNA sequence.\n
SPU_005957	SPU_005957	none	SPU_005957 prediction contains 2 out of the 3 exons for Sp-4EBP. Exon 1 is located on Scaffold4246 (no GLEAN prediction). Coordinates 52018 to 54824 in SPU_005957 prediction do not match 4EBP.  \nNew gene model as follows: \nscaffold 4246 strand +  start 54530 stop 54673 \nscaffold 111348 strand -  start 51046 stop 51210 \nscaffold 111348 strand -  start 49781 stop 49815 \nThe new gene model matches also a Sphaerechinus granularis partial cDNA (accession # AM161045).  \n
SPU_000826	SPU_000826	none	#\nSPU_000826 and SPU_000827 are very similar to previously cloned sea urchin SM30 genes.  A previously isolated genomic clone (Akasaka et al 1994, JBC 269: 20592-20598) indicates that SPU_000826 is most similar to the SM30-alpha gene. This is because of SPU_000826's intron size and its distance to the next downstream SM30 gene. Glean_00825, glean__00826, glean_00827, and glean_00828 encode SM30 like proteins and they are tandemly arranged on Scaffold25604. \n \nMatches c-type lectin domain (cd00037).\n
SPU_000508	SPU_000508	none	Full gene of the GbetaA is annotated here. Further C-terminal sequences found in SPU_008526 and SPU_018505 are/will be added in. 2 (last) exons contained in this Glean match exactly with 2 (first) exons contained in SPU_008526, so there might be a problem with contigs assembly. \nExon 5 might be alternatively spliced: it's present in some but not all ests.\n
SPU_000827	SPU_000827	none	SPU_000826 and SPU_000827 are very similar to cloned sea urchin SM30 genes.  A previously isolated genomic clone (Akasaka et al 1994, JBC 269: 20592-20598) indicates that SPU_000827 is most similar to the partially cloned SM30-beta gene. SPU_000827's intron size and its distance to SPU_000826 supports this conclusion. Glean_00825, glean__00826, glean_00827, and glean_00828 encode SM30-like proteins and they are tandemly arranged on Scaffold25604. \n \nMatches c-type lectin domain (cd00037).\n
SPU_000828	SPU_000828	none	SPU_000828 is similar to previously cloned S. purpuratus SM30-alpha but to a lesser extent than glean3-00825,glean3-00826, and SPU_000827.  Transcriptome data indicates that this gene is expressed. SPU_000825, SPU_0_00826, SPU_000827, and SPU_000828 encode SM30-like proteins that are tandemly arranged on Scaffold25604. \n \nMatches c-type lectin domain (cd00037). \n \nHad to alter gene sequence.  Now has two exons instead of four.\n
SPU_018204	SPU_018204	none	See SPU_023216. \n
SPU_018500	SPU_018500	none	Pulling up same GLEAN3 for dynactin isoform 2 (p50).\n
SPU_026748	SPU_026748	none	partial sequence \nThis incomplete model has been incorporated in Glean_3 24044 by Mariano Loza Coll (Toronto).\n
SPU_019711	SPU_019711	none	See SPU_019710. \n
SPU_004021	SPU_004021	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nNote that the best Blast hit to this model corresponds to a vertebrate Lim-Hox2 gene. However, there is no detectable homeodomain in SPU_004021. An overlapping, larger Fgenesh++ model was inspected for a homeodomain prediction, albeit unsuccessfully. Note, however, that this model is located on a scaffold with several gaps between contigs; therefore, a full LIM-Hox gene may exist in this region and it simply was not picked up by the predictions. \n \nUntil better evidence becomes available, we have decided to name this gene Sp-Lim-containing1.\n
SPU_013569	SPU_013569	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nThe best Blast hit for this model is vertebrate LMO2, which contains two LIM domains. This model, however, contains a single LIM domain, and is located in a scaffold that contains various gaps between contigs, for which it could as well represent an incomplete model. There are Fgenesh and Genscan models that, though slightly longer, are otherwise identical to SPU_013569. The Genscan model codes for a protein that is almost the same length as LMO2, but that does not include additional LIM domains. For this reason, and until additional evidence becomes available, we have decided to name this gene Sp-Lmo2t (for "truncated").\n
SPU_002631	SPU_002631	none	See also the latest tree of Hox affinities: \n \nCameron, R.A., Rowen, L., Nesbitt, R., Bloom, S., Rast, J.P., Berney, K., Arenas-Mena, C., Martinez, P., Lucas, S., Richardson, P.M., Davidson, E.H., Peterson, K. J. and Hood, L. (2005) Unusual gene order and organization of the sea urchin HOX cluster. J. Exp. Zoology. In press.\n
SPU_002632	SPU_002632	none	See also the latest tree of Hox affinities: \n \nCameron, R.A., Rowen, L., Nesbitt, R., Bloom, S., Rast, J.P., Berney, K., Arenas-Mena, C., Martinez, P., Lucas, S., Richardson, P.M., Davidson, E.H., Peterson, K. J. and Hood, L. (2005) Unusual gene order and organization of the sea urchin HOX cluster. J. Exp. Zoology. In press.\n
SPU_005169	SPU_005169	none	-THE GLEAN3 PREDICTION COVERS TWO FUSED GENES:SpHox5 and an Acetylcholinesterase. It is a known fact that these genes are very close (or fused?) in the sea urchin genome. \nI have seen the exon 2 of Hox genes (indicated in the gene model). The other exons are, most probably (given BLAST values), fragments of a Cholinesterase gene. \n \n-See also the latest tree of Hox affinities: \n \nCameron, R.A., Rowen, L., Nesbitt, R., Bloom, S., Rast, J.P., Berney, K., Arenas-Mena, C., Martinez, P., Lucas, S., Richardson, P.M., Davidson, E.H., Peterson, K. J. and Hood, L. (2005) Unusual gene order and organization of the sea urchin HOX cluster. J. Exp. Zoology. In press.\n
SPU_021309	SPU_021309	none	See also the latest tree of Hox affinities: \n \nCameron, R.A., Rowen, L., Nesbitt, R., Bloom, S., Rast, J.P., Berney, K., Arenas-Mena, C., Martinez, P., Lucas, S., Richardson, P.M., Davidson, E.H., Peterson, K. J. and Hood, L. (2005) Unusual gene order and organization of the sea urchin HOX cluster. J. Exp. Zoology. In press.\n
SPU_027568	SPU_027568	none	See also the latest tree of Hox affinities: \n \nCameron, R.A., Rowen, L., Nesbitt, R., Bloom, S., Rast, J.P., Berney, K., Arenas-Mena, C., Martinez, P., Lucas, S., Richardson, P.M., Davidson, E.H., Peterson, K. J. and Hood, L. (2005) Unusual gene order and organization of the sea urchin HOX cluster. J. Exp. Zoology. In press.\n
SPU_019586	SPU_019586	none	This model was annotated based on a manual inspection of multiple protein sequence alignments.\n
SPU_024044	SPU_024044	none	This model was annotated and modified based on full length cDNA sequences from Courtney Smith and Peggy Stevens.  The gene was first isolated from a coelomocyte library (Smith et al. 1996. J. Immunol. 156:593).  QPCR analyses indicates embryonic expression in gastrula.  In situ indicates expression in SMC. \n \nThe original version of SPU_024044 was incomplete (C-terminus missing). The rest of its sequence was found on a partly duplicated model (SPU_026748) and annotated by Christian Gache (Villefranche-sur-Mer, France). Both models have now been fused in a modified version of SPU_024044 that is supported by cDNA sequence.\n
SPU_009178	SPU_009178	none	The same gene is found in three different glean3 models, 2 are haplotypes, and 1 contains another part of the gene. \n
SPU_028698	SPU_028698	none	The same gene is found in three different glean3 models, 2 are haplotypes, and 1 contains another part of the gene. \n
SPU_018813	SPU_018813	none	Aligns with S. purpuatus SM37. Lee et al (1999, Develop. Growth Differ 41: 303-312 PUB MED:10400392). Is on the same scaffold as SM50 which is consistant with Lee et al.'s findings that SM37 and SM50 are linked. \n \nMatches c-type lectin domain (cd00037). \n \n
SPU_012979	SPU_012979	none	See SPU_002875. \n
SPU_026899	SPU_026899	none	This Glean has 74% identies on the nt. level with Mus Musculus DYRK2 mRNA (from 614 to 1687 out of 2165).\n
SPU_000870	SPU_000870	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains frame shift.\n
SPU_024900	SPU_024900	none	See SPU_023252. \n
SPU_022703	SPU_022703	none	See SPU_008370.  N- and C-termini are shorter than those of other organism.  \n
SPU_027462	SPU_027462	none	This is the full sequence of PLCg.  It is missing sequences for approx. 40 amino acids in the reigon of 176-217. \n \nThe other scaffolds are located on 464 (Glean 10275) and 801 (glean 06056), However these have been added to this annotation and contain only inclomplete modles.\n
SPU_008370	SPU_008370	none	See SPU_022703.  N- and C-termini of the prediction is shorter than those of other organism.  \n
SPU_020904	SPU_020904	none	We pulled this partical cDNA from an Sp egg library.  It aligns with the first half of this Glean prediction.  SPU_005654 is also a good BLAST hit.  \n
SPU_007964	SPU_007964	none	This gene was annotated based on a curated analysis of alignments to known vertebrate genes. \n \nThe protein inhibitor of activated STAT (PIAS) family of proteins has been proposed to regulate the activity of many transcription factors, including STATs, and recent genetic studies support an in vivo function for PIAS proteins in the regulation of innate immune responses. \n \nThis gene model was modified at the time of annotation. One of its exons was removed from its original version, since its expression was not supported by the embryonic tiling array data, which otherwise strongly support the expression of every other exon in this model. \n \nOnce the exon was removed, the alignment of this model to its vertebrate counterpart was significantly improved. \n \nAlso note that unaccounted exons for this model might exist, also based on the tiling array data.\n
SPU_005339	SPU_005339	none	The intron of this GLEAN3 model was modified to a coding region by comparison to the corresonding FgeneshAB prediction.  \n \nIntronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(16 to 23), LRR-CT, TM and TIR.    \n
SPU_027144	SPU_027144	none	one of two, duplication. The other is _02836. SPU_022717 is overlapping and mostly non-identical. This and _02836 are the internal sequences, while _06197 is the N-terminal sequence \n
SPU_002836	SPU_002836	none	#\none of two, duplicate. Other is _27144. SPU_022717 is a mostly non-identical overlapping duplicate. This and _27144 are internal sequence, while the N terminal part of this gene is in _06197\n
SPU_011536	SPU_011536	none	Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(11 to 22), LRR-CT and TIR. Unknown sequence (NNN...) in the intron of the current model could make this gene model incomplete.\n
SPU_012078	SPU_012078	none	This annotation was done with alignments using A. pectinifera mRNA and peptide sequence.  There are a total of 10 scaffolds and 43 exons (predicted).  Of note, in SPU_024561 (Scaffold101715) there are two predicted exons which are out of sequencial order on the scaffold, but do not overlap in their nt. sequence of the protein.  In SPU_027674 (Scaffold102204) there are two predicted exons which are in sequential order, but have the same nt. sequence.  One was omitted.  Changes have been made to other Glean3 predictions in this annotation, but not in the origional glean3 predictions.  Other glean3 predictions of this protein were accepted, usually with little or no change.  They also all refer to this Glean3 prediction in the comments.  SPU_012078 aligns with of ApIP3R from AAs 316-903.\n
SPU_010424	SPU_010424	none	Alternate transcripts, beta-1 and beta-3: \nbeta-1 NCBI accession #: NM_001032368 \nbeta-3 NCBI accession #: NM_001032369 \n \nThere are 4 basepairs from mRNA missed between exon 1 and exon 2.\n
SPU_022820	SPU_022820	none	A small fragment (168921 - 169128, 211bp) is highly identical to Sp-Soxb2 (320520 - 320730) on Scaffold467, SPU_025113.\n
SPU_028103	SPU_028103	none	Could be a continuation of the Sp-proteoliaisin (SPU_0PLN), based on high sequence identity to the CDS across the repetitive low-density lipoprotein repeat.  NOT CERTAIN, however, given the absence of the complete proteoliaisin cDNA.\n
SPU_000985	SPU_000985	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure. \n
SPU_025113	SPU_025113	none	A small fragment (320520 - 320730, 208bp) is highly identical to Sp-Soxb1 (168921 - 169128) on Scaffold732, SPU_022820.\n
SPU_001970	SPU_001970	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(6 to 10), LRR-CT, TM and TIR.\n
SPU_001971	SPU_001971	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(7 to 10), LRR-CT, TM and TIR.\n
SPU_003684	SPU_003684	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure. \nThis is a member of sea urchin-specific Tlr Group I(orphan).\n
SPU_004791	SPU_004791	none	605 bp intron was modified to a coding region by comparison to the corresponding FgenesAB prediction.\n
SPU_026687	SPU_026687	none	>Spec2a protein sequence: \nMAVQLLFTEEEKALFKSSFKSEDTDGDGKITSEELRAAFKSIEIDLTQEKIDEMMGMVDK \nDGSKDMDFSEFLMRKAEQWRGREVQLTKAFVDLDKDHNGSLSPQELRTAMSACTDPPMTE \nKEIDAIIEKADCNGDGKICLEEFMKLIHSS \n \n>CDS of Spec2a gene \natggctgtcc aattattatt taccgaagag gaaaaagctt tattcaaaag ctccttcaaa        60 \ntcagaagaca cggatggcga tggcaaaatc acttctgaag agttgagagc agcgtttaaa       120 \ntcaattgaaa tagacttgac tcaggaaaag attgacgaaa tgatgggaat ggttgataaa       180 \ngatggtagca aagatatgga cttttctgag tttttgatga ggaaggcaga acagtggcgc       240 \nggaagagaag tacaattaac taaagctttc gtcgacttgg acaaggatca caacggatcc       300 \nctcagtcctc aagagctgcg tacagcgatg tcagcatgca ccgatccacc gatgacggag       360 \naaggaaatcg atgcaatcat cgagaaagcc gactgcaatg gggacggtaa aatctgcctt       420 \ngaagaattca tgaaattgat tcactcgtct taa       453 \n
SPU_021354	SPU_021354	none	SPU_021354 should be linked to SPU_021353\n
SPU_021351	SPU_021351	none	This prediction is one of many ACE genes on scaffold 52540. \n \nAlignment with best blast sequence and transcriptome signals suggest that there may be a missing exon 3' of SPU_021351|Scaffold52540|76503|76746| and 2 exons may be incorrect (see below). \nSPU_021351|Scaffold52540|77756|77824|  \n>SPU_021351|Scaffold52540|77756|77824| DNA_SRC: Scaffold52540 START: 77756 STOP: 77824 STRAND: +  \nGATCTTCCTCGCTTCGTCTTTCATCTCTAAGCCATAGTTTCCCGACAGCATGGACACATTAGACAAAAA \nSPU_021351|Scaffold52540|83924|84128 \n>SPU_021351|Scaffold52540|83924|84128| DNA_SRC: Scaffold52540 START: 83924 STOP: 84128 STRAND: +  \nCGTCCAATGCATATTGGACACAGTTCGTGCACAACGGACAGAGTTCCGCATTGCTATCCAAGCATGCCAT \nATGTCCTAGAACTCCCTTCGTCTTTTCGACAAACTGATCAAGGTTGTCGAGCTTGAGCTGAAGATAACAC \nGTCACGTTGGTGATCAGGAGTACTGTAGCGAGAACCCCGTATCCGGTCCATCGGAGGGACGCCAT \n
SPU_021357	SPU_021357	none	#\nThis prediction is one of many ACE genes or parts of genes on scaffold52540.\n
SPU_026751	SPU_026751	none	Two comments: \n \n1)Exon 3 of this prediction encodes reverse transcriptase domain.  Transcriptome data suggests that this exon is represented in embryo RNA, but cross-reaction cannot be excluded. \n>SPU_026751|Scaffold1575|15835|16444| DNA_SRC: Scaffold1575 START: 15835 STOP: 16444 STRAND: +  \nAATGAATTCCGTCTAGGAAGATCTACAGTAGCGCGAATTCTAACTTTGCGGAGACTGGTGGAAGGTACTA \nAAGCAAAGCATCTGACAGCAGTACTTACGTTCGTGGGTTTTAAGAAGGCCTTCGATTCAATCAATAGGAA \nGAAGATGTTAGAGATCTTAAGAGCCTACGGAATACCATACACAATAGTCACAGCAGTAGGGTTGCTGGAC \nAAAGTTACTACAGCTCAAGTGCGTTCACCAAATGGAGAGACTGACTACTTTACCATCTTAGCAGGAGTGC \nTCCAAGGCAATACTTTAGCACCATACCTATTCATCGTAGCATTGAATTATGCTCTAAGAATGGCTACTGA \nATGGTTCGAGGATCTGGGCTTTACCCTAGAGGAAAGAGAAAGTAGCAGATATTCTGCTGTAATGATCACA \nGATACTGACTTTGCTGATGATATTGCACTAATTTCAGACAATGTGGAAAAGGCACAGAAGCTCCTAAAAC \nAACTAAAGTCTGCAGCAAGTCAAATCGGTCTACAAATAAACAGTACTAAGACAGAATTCAAGATGTACAA \nCCTTCAGCCTATATTTCACACATATCGTCATTTGCCTGACGTAGGAATGC \n \nIn addition to duplicated copies on the glean3 list, there are many not on the glean3 list with e values>100.  Scaffolds 55516, 93134, 115888, 20, 302, 807, 102465, 95798, 1755, 11431, 52209, 431, 87717, 1081, 53542, 138714, 51779, 86354, 119129, 1241, 1111, 28699, 58694, 18304.  The ACE family cannot be described from the current assembly many fragments are scattered both on the glean3 list and outside of it. \n
SPU_028021	SPU_028021	none	Alignment with best blast sequence suggests that the model may lack N-terminal sequence.\n
SPU_012611	SPU_012611	none	Prediction covers partial CDS as inferred from alignments with best blast hit.\n
SPU_027362	SPU_027362	none	TWO COMMENTS: \n \n1)Last  three exons are unlikely to be part of this gene and should be deleted from the model. \n \nExon 8 is COG5048, COG5048, FOG: Zn-finger;  not blasting to thiamet oligopeptidease \n \n>SPU_027362|Scaffold50623|88407|89245| DNA_SRC: Scaffold50623 START: 88407 STOP: 89245 STRAND: +  \nCTGTATGGATGTGTTTATGTCTTGTGTGATGAGTTTCTTGATTAAATGTCTTACCACATTGATCACATAC \nATAGGGCTTCTCACCTGTATGGGTGTGTTTATGTCTTGTGTGATGAGTTTCTTGATTAAATGTCTTACCG \nCATTGATCACATACATAGGGCTTCTCACCTGTATGGGTGTGTTTATGTCTTGTGTGATGAGTTTCTTGAT \nTAAATGTCTTACCACATTGATCACATACATAGGGCTTCTCACCTGTATGGATGTGTTTATGTCTTGTGTG \nATGAGTTTCTTGATGAATGTCTTACCACATTGATCACATACATAGGGCTTCTCACCTGTATGGATGTGTT \nTATGTCTTGTGTGATGAGTTTCTTGATTAAATGTCTTACCACATTGATCACATACATAGGGCTTCTCACC \nTGTATGGATGTGTTTATGTCTTGTGTGATGAGTTTCTTGATTAAATGTCTTACCACATTGATCACATACA \nTAGGGCTTCTCACCTGTATGGATGTGTTTATGTCTTGTGTGATGAGTTTCTTGATTAAATGTCTTACCAC \nATTGATCACATACATAAGGCTTCTCACCTGTATGGATGCGCTTATGTCTTGTGTGATGAGTTTCTTGATT \nAAATGTCTTTCCACATTGATCACATACATAAGGCTTCTCACCTGTATGGATGCGCTTATGTCTTGTGTGA \nTGAGTTTCTTGATTAAATGTCTTACCACATTGATCACATACATAGGGCTTCTCACCTGTATGGATGTGTT \nTATGTCTTGTGTGATGAGTTTCTTGATTAAATGTCTTACCACATTGATCACATACATAGGGCTTCTCAC \n \nExons 9 and 10 do not blast to thiamet oligopeptidease.  This regions contans a SIR1 domain:  COG0846, SIR2, NAD-dependent protein deacetylases, SIR2 family \n \n>SPU_027362|Scaffold50623|92553|92742| DNA_SRC: Scaffold50623 START: 92553 STOP: 92742 STRAND: +  \nCACTCTCCACATCTGGCAATAGGCATGCATGGATATAGGAATGGATAGCGTGTTCACATCTCCACCGACC \nCCCATGGTCGATGAGAGGGCCTACTCTCCTGTATCGACGGCGGGTGATGAGGTCATCCACAAATTGCCGA \nATGATGAAATAAAGACGGCTAGCCATGGGTTCGTCTTCACCCACACGCCG \n>SPU_027362|Scaffold50623|94169|94282| DNA_SRC: Scaffold50623 START: 94169 STOP: 94282 STRAND: +  \nCTGGTAGATGTCCAGTGACCCCTGGATGACCTTCTCGAGGGGGAAGTACTCCTTGAGCTTGTTGTGGTCC \nACAGAGTAGCGCTTCTCTTCGGACATGTTCATGAAGTAGCGCAT \n\t\t \n2) In additions to multiple copies on the glean3 list, there are excellent matches on scaffolds 71773 and 89507. \n \n
SPU_002117	SPU_002117	none	Exon sequences below contain chromo and ChSh domains characteristic of some nuclear proteins, and are unlikely to be part of this gene and should be removed from the gene model. \n>SPU_002117|Scaffold310|75820|75949| DNA_SRC: Scaffold310 START: 75820 STOP: 75949 STRAND: +  \nTTATCTTTTGTCCTCCTCGTCGTCTTCACTGTGCCACGTCAAACGCTCTTCGTAGAACTGGATGACAATC \nTGAGGGCACCTGTGGTTGGCTTCCTTGGCTCGCACAAGGTCTGCTTCGTTGTTGTTTTTC \n>SPU_002117|Scaffold310|77625|77755| DNA_SRC: Scaffold310 START: 77625 STOP: 77755 STRAND: +  \nCACTTCATGAGGAATAGGAGTTCATTGTTGGATTCTGTGGCACCAATGATTCTCTCTGGATCCAGCCCTC \nTGTCAAATCCTCTGTACTTTTTCTCATCTTGTTTCGTCTGCTTGCCATCTTTGGAAGGCCC \n>SPU_002117|Scaffold310|78216|78405| DNA_SRC: Scaffold310 START: 78216 STOP: 78405 STRAND: +  \nAGATGAGTCTTTCCTCTTCTCGGTTGCAGCAGCTTCTTCTTTCCTTCTTTTCTTTGCTGCTACATCTCCA \nGCAGCAACATTTTGAGCTGATTTTCTCTTTAAGGCCTCCTTTTCTCGGATCTTCTTTTCATACGCCTCAA \nTTAAATCAGGGCACTCTAGATTGTCCTGGGGTTCCCATGTCGATTCATCA \n>SPU_002117|Scaffold310|80007|80191| DNA_SRC: Scaffold310 START: 80007 STOP: 80191 STRAND: +  \nTCTCCATAGCCCTTCCACTTGAGGAGGTATTCTACTCTTCCTTTGTGTATCCTCTTATCGACAACCTTCT \nCCACTTGGTAGACCTCTTCTTCTTCCTCCTCACTTTCTCCCTCGGTTTTCTTTTCTTCATTTTCTTCACC \nTTCTGGTTCAGGCCCATCTTCAGGTTTCCTCTGCTTTTTGCCCAT \n
SPU_010808	SPU_010808	none	Some exons may be not belong in this gene:  The first three at one end of the gene blast to nothing in the nr database; the second set of two are in the middle of the gene and also do not blast to anything. \n \n>SPU_010808|Scaffold1433|165949|166138| DNA_SRC: Scaffold1433 START: 165949 STOP: 166138 STRAND: +  \nATGACAATGGATACAAATAAGAGGAACACCATGATAAATCTCAAATTAATTCTGACTGTGATGATCATCA \nTTTTTCTACAATGTTGGGAAGCTACTTCTCTGTCGTCTGCTCCAGCTCCTAGCCGTTGCATATTTGATGA \nAGTTCAAAAGCATCAAAACGTAGAAAGAACACTTATAAAATACCATCCAG \n>SPU_010808|Scaffold1433|166548|166721| DNA_SRC: Scaffold1433 START: 166548 STOP: 166721 STRAND: +  \nGTGATGTAAGCGCAAAATCAAAGAGGTCAGTAGAAGAAGAAGCAAATGCCTACCAGCCAATCAGAGTGAA \nGACGTTTGTCCAGAATGAGGAGCATCTGATGGACTCCGTGCAGGTTGAAAAACTAGAGACCATCATGGCT \nGGTGCAACATCTGTTGTTCAAAAACTTCTGTCAG \n>SPU_010808|Scaffold1433|167584|167668| DNA_SRC: Scaffold1433 START: 167584 STOP: 167668 STRAND: + \n \n>SPU_010808|Scaffold1433|170097|170175| DNA_SRC: Scaffold1433 START: 170097 STOP: 170175 STRAND: +  \nCTTGCCCTGCATGAAGCGTTTCATGTTCTTGGATTTTCTACAAGTCTTTTTGACCAGTTTCAAGATTGTA \nGTGTATGTG \n>SPU_010808|Scaffold1433|170631|170782| DNA_SRC: Scaffold1433 START: 170631 STOP: 170782 STRAND: +  \nAAGATGGACTCGAGTGCGAGACAAGAGAGGATGTTGTGAGAGTGGATGCCGGTGGGCAGTCTAGACTCCA \nCACCCCAGCAGTCGTGGCTGCATCTCAGATTCATTTTGGCTGCACTGAAGAAGAAGAAATGGGTGTTCCT \nCTGGAAAATCTG \n
SPU_026072	SPU_026072	none	Exons 1 and 2 cannot be confirmed because they do not blast to endothelin converting enzyme. \n \n>SPU_026072|Scaffold692|19600|19738| DNA_SRC: Scaffold692 START: 19600 STOP: 19738 STRAND: +  \nATGACGAGTAGTCAGGCTAAACTCGCCGTCGATGAGGGTGTCGTTGTCAGACGAAAAGCCCCCAAGGTCA \nTTACCAGGAATCTGGTCGTCATCGTTGTCGTCTTGGCACTCCTCACCGTGTCACTTATAGTAGCTACCG \n>SPU_026072|Scaffold692|21211|21327| DNA_SRC: Scaffold692 START: 21211 STOP: 21327 STRAND: +  \nTTGTAATCGCGTCAGACCGGGATAACCTTTCTTCAAGATTACGATCATATACCGGCCACCAAACCTCACC \nATGCCCTGAACCGAAGCAATGTCTCACGCCCTCTTGTGTTAAAGCAG \n
SPU_004765	SPU_004765	none	Partial cds inferred from alignment with best blast hit.\n
SPU_011071	SPU_011071	none	partial cds inferred from alignment with best blast hit.\n
SPU_002141	SPU_002141	none	Partial cds inferred from alignments with best blast hit.\n
SPU_008959	SPU_008959	none	Alignment with best blast sequence suggests that the model may lack N- and C-terminal sequences.\n
SPU_015178	SPU_015178	none	There appear to be several CPA2 genes tandemly repeated; the other SPU_015179 is partial CDS while this appears to be complete, as inferred with alignments to best blast hits.\n
SPU_015179	SPU_015179	none	There are 2 CPA2-like genes on this scaffold.  This is partial CDS while the other, SPU_015178 appears to be complete, as inferred by alignments to best blast hits.\n
SPU_001397	SPU_001397	none	TWO COMMENTS: \n \n1) This prediction covers partial cds, as inferred from best blast hit alignments;  2N-TERMINAL exons may not belong to this protein, since they do not blast to CPA genes \n>SPU_001397|Scaffold22766|21673|21839| DNA_SRC: Scaffold22766 START: 21673 STOP: 21839 STRAND: +  \nTTATTTTGCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTCCTTCTCCT \nTCTTCTTCTTCTTCTTCTTCTTCGTCTTCTTCGTCTTCATTGTCTTCTTTGTCTTCTTCTTCTTCTTCTT \nCTTCTGTTTCTTCTTCTTCTTCTTCAT \n>SPU_001397|Scaffold22766|21978|22053| DNA_SRC: Scaffold22766 START: 21978 STOP: 22053 STRAND: +  \nAGATAACGTTGGCGATAGAGCCAGTCTTGTACGATCTATTACCCATGGCGGATACGGCACTGTCTGATAC \nAGATTT \n \nSimilarly, 2 C-terminal exons may belong. \n>SPU_001397|Scaffold22766|28996|29077| DNA_SRC: Scaffold22766 START: 28996 STOP: 29077 STRAND: +  \nATAGTTTCCCAGTACGCTCTTGAGAACGACTAGCTTCTCCAGCTGCAGGCGATCTTTAGGGGTGACACGG \nAATACCTTGTAC \n>SPU_001397|Scaffold22766|29479|29540| DNA_SRC: Scaffold22766 START: 29479 STOP: 29540 STRAND: +  \nCCATCGTAATTGACGGCTAGCGCAGTAGCTAAGAGAGCAGTGAATACCAGAAAACGCATCA \n \n2) There are 4 CPA-like genes or parts of genes on this scaffold. \n
SPU_001400	SPU_001400	none	TWO COMMENTS: \n \n1) exon below should be added upstream of present exon1 since it blasts to the N-terminal part of its best blast hit. \n \n>Supertig22766_5|Scaffold22766|32108|32236| DNA_SRC: Scaffold22766 START: 32108 STOP: 32236 STRAND: +  \nATGCTGATAATAATTATTCAATCTCTTTCAGCTCGACTTCTGGAGGGAGGCGACGCCGTCTTCGATCGGT \nCGTCCCGTCGACATCATGGTACCATCGAGCCTTCGGAACAACGTCCACGACATACTGAC \n \n2) This is one of 4 whole or parts of CPA-like genes on scaffold 22766.  This model covers partial CDS, as inferred from alignments with best blast hit.\n
SPU_014935	SPU_014935	none	This is one of two CPA-like genes on scaffold 27970.\n
SPU_012020	SPU_012020	none	This is one of two CPA-like genes on scaffold 48388.  The other is SPU_012021.  There are several clusters of CPA genes that may link to form a large cluster.\n
SPU_012021	SPU_012021	none	This is one of two CPA-like genes on scaffold 48388.  The other is SPU_012020.\n
SPU_004157	SPU_004157	none	BLAST data suggest that the following exons should be deleted from this model. \n \n>SPU_004157|Scaffold48533|10538|10720| DNA_SRC: Scaffold48533 START: 10538 STOP: 10720 STRAND: +  \nCTCGTCAGCCATGCTGGTCTCTAGCTCCCTCAGCCATTGCCTGTGCTCATCCGTACCAGGGGTCACACGA \nAGCACCTGGTATCTGTAAATTAATAAGAGAGACATATAGAGGGGTATGGAGTGATGGGGTATAGAGGGAA \nGAGATGGGAGATTGANGGCACGTCTTCGTGCAATCAAGATTGC \n>SPU_004157|Scaffold48533|15593|15676| DNA_SRC: Scaffold48533 START: 15593 STOP: 15676 STRAND: +  \nCTTGAGTTTGCGTCTAGCGTTGAACTTCTTCAAGCAATCAACGGTCTCCTGTCTGTGCATTGCTGAAGCA \nTAGCGATCTCGGTT \n>SPU_004157|Scaffold48533|21092|21201| DNA_SRC: Scaffold48533 START: 21092 STOP: 21201 STRAND: +  \nCTGGATCCATGGATGTTTTAGAGCCTGGCAGGCAGAGATACGCTTTCCTGGGTTGACTGTCAGCATGCTA \nTCTATCAAGTTCTTTGCTTCAGGTGTCACTGTGTCCCATT \n>SPU_004157|Scaffold48533|22651|22855| DNA_SRC: Scaffold48533 START: 22651 STOP: 22855 STRAND: +  \nCTGAATTAGACTCCAATCCACCATCTTGAACAATCTCCTCAAAATCCATTGAATGGCTGGCATTGAATAA \nTCCAGTCTTCGCTTTGACGATGGCATACTGCCCTTTGAAGGCAGACATATTGCCACAGGTAGCTTGACTT \nTGGATGTATCGTAGGGTCTTCCCACTGCGAGTCATGCACTTCTTTCTCCCGACGCTTGCTGCCAT \n
SPU_022102	SPU_022102	none	This is one of two CPA-like genes on scaffold114652.  There are clusters of CPA-like genes on several scaffolds, raising the possibility that large clusters of these genes exist.\n
SPU_009007	SPU_009007	none	partial CDS, C-terminal only, because this is a short scaffold.\n
SPU_027451	SPU_027451	none	TWO COMMENTS: \n \n1) partial cds; note that there are repeated elements in the cds of the best blast hit. \n \n2) In addition to duplicated copy on glean3 list, there is also an excellent match on scaffold 73285.\n
SPU_004957	SPU_004957	none	This gene model may represent a pseudogene or contain a sequence error. 5' upstream sequence matches coding sequence of other Sp-Tlr genes and contains stop codons . Modified gene model has some frame shifts, but reflects best gene structure. \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_005830	SPU_005830	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons (frame shift).\n
SPU_006218	SPU_006218	none	This gene model may represent a pseudogene or contain a sequence error. 5' upstream sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure.\n
SPU_025601	SPU_025601	none	Part of the cDNA sequence (AY130972 ) could not be found anywhere in the genome. Therefore, accept gene model prediction for exon 2. \n \nExon 1-3 of this gene are on Scaffold 87957 (SPU_017106). Exon 3-7 are on Scaffold 107218 (SPU_025601). The two scaffolds have exon 3 and flanking sequences overlap and can be aligned quite well (e=0). Therefore, the two scaffold might be assembled as one scaffold. The modified gene model is composed of Exon 1-3 from Scaffold 87957 and exon 4-7 from scaffold 107218. \n \nExon 5-7 are also on Scaffold 55268 (SPU_005718). This might be another allele of this gene. \n \nPlease refer SPU_017106 for the modified gene model.\n
SPU_005718	SPU_005718	none	Part of the cDNA sequence (AY130972 ) could not be found anywhere in the genome. Therefore, accept gene model prediction for exon 2. \n \nExon 1-3 of this gene are on Scaffold 87957 (SPU_017106). Exon 3-7 are on Scaffold 107218 (SPU_025601). The two scaffolds have exon 3 and flanking sequences overlap and can be aligned quite well (e=0). Therefore, the two scaffold might be assembled as one scaffold. The modified gene model is composed of Exon 1-3 from Scaffold 87957 and exon 4-7 from scaffold 107218. \n \nExon 5-7 are also on Scaffold 55268 (SPU_005718). This might be another allele of this gene. \n \nPlease refer SPU_017106 for the modified gene model.\n
SPU_017575	SPU_017575	none	partial cds; Note that there are repetitive elements in the best blast hit.\n
SPU_007682	SPU_007682	none	There is an excellent match to part of this predicted gene on scaffold 34914_1, but it is not on the glean3 list.\n
SPU_021609	SPU_021609	none	partial CDS, inferred from alignments with best blast hits.  N-terminal sequence is not included on this scaffold, 52285.\n
SPU_003363	SPU_003363	none	Partial CDS, as inferred from best blast hit alignment.  One predicted and fairly long exon, listed below, is probably not included in this protein because it does not blast to carboxypeptidase E. \n>SPU_003363|Scaffold112125|14529|14828| DNA_SRC: Scaffold112125 START: 14529 STOP: 14828 STRAND: +  \nTCAACCTTTACTTATTGTGTTGTTAAATAAATTCTGTCGGGTTACAATTCTCGGAGGGGGTTGGGCGTCG \nGTGAGTGTGATGAGAATAATGGTTGTGAAGATGATTTTGACAACTCTCAGATTTGATTTTATGATGACGA \nAGACACCGATGTTAACGATGGTGGTGTTGATGAAGCTGATGCTGCTGCTGCTGATGATGATGATAGTGAT \nGATGATTAACGCTGCACTTGCTATGTTGTGGCGTTTGTCGAGGTCAAGGATAATGATTATGATCATAAAA \nTCAATCATTATAGCAATGAA\n
SPU_026013	SPU_026013	none	partial CDS, as inferred from alignment with best blast hit.  Appears to missing N-terminal sequences.\n
SPU_016331	SPU_016331	none	partial CDS, as inferred by alignments with best blast hit. Model  appears to missing N-terminal sequences.\n
SPU_020494	SPU_020494	none	1)This model, SPU_020494 and the neigboring model, GLEAN_320393, appear to be parts of the same gene, as inferred from alignment with best blast hit. \n
SPU_007105	SPU_007105	none	This gene model may represent a pseudogene or contain a sequence error. A part of intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure. \nNew assembly doesn't show any frame shifts and stop codons. \nThis is a member of sea urchin-specific Tlr Group IB. \n
SPU_009755	SPU_009755	none	Amino terminal half of the protein is missing due to end of contig. \n \nThis region is slightly more closely related to mammalian SubgroupA TSPs than subgroupB based on blast.\n
SPU_007430	SPU_007430	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure. \nThis is a member of sea urchin-specific Tlr Group IC.\n
SPU_009230	SPU_009230	none	partial CDS;  \nTWO COMMENTS: \n1) Three exons in the model may not belong because they blast to a cub domain not normally found in this class of peptidase.  >SPU_009230|Scaffold376|224783|225029| DNA_SRC: Scaffold376 START: 224783 STOP: 225029 STRAND: +  \nGAGTGGTCCATCACAGCACCCCTTCAGAGGAGGATCCTCGTCACTTTTAATGACTTAAAGTTGGAATCTC \nCTTTTGATTTTATTGTCATTCGGGACATGTACCGTGATAAAAAGCCCTACACAGGCGAGAACATGCTACT \nCCATCCATTTCTGACGTTAGGACGTACTCTTGATATTGAGTTCTCCTCATCTAGAAGTGGCAGAAGGAGA \nGGATTCAATATTTCAGTCTCATGTAGCGAACTTTCAA \n>SPU_009230|Scaffold376|226152|226229| DNA_SRC: Scaffold376 START: 226152 STOP: 226229 STRAND: +  \nATACATATCGTCTGATGGATTGCGCAGCAGAGCATTCAATGTGCTTGTCTGAGAAGGGCGTCAAAATAAA \nATGTCATG \n>SPU_009230|Scaffold376|227528|227685| DNA_SRC: Scaffold376 START: 227528 STOP: 227685 STRAND: +  \nCTTCTGTGGGTTTCTGTGAGGAACTAAACGACACGATTGATGGATCTTGGGATCCTAACATCACATGGTT \nTGGTTCTATCGTTCATCGTACATGTATGGATGGATACAGTCTAAAAGGCAATGGAACCCTGCAATGTGTG \nCCGGGGTATCACCATTGA \n2)Six N-terminal exons are questionable because there is no conservation with this class of peptidase based on alignments to best blast hits. \n>SPU_009230|Scaffold376|53883|53961| DNA_SRC: Scaffold376 START: 53883 STOP: 53961 STRAND: +  \nATGCATGTTGATCGTTGTACAACGGTGATAACCGGTGCAACGCACTGTCCTTGGTTCAGTGCCTTTCCCA \nTTGATACCT \n>SPU_009230|Scaffold376|59076|59173| DNA_SRC: Scaffold376 START: 59076 STOP: 59173 STRAND: +  \nGTGACGGTAAAGACTCTGGAATTTTACTCATTGATGAAAGGACAAAAGCAGTAATGACTGACCAACCAAG \nACATGCACAAGAAGCTTTCAAGGAACAG \n>SPU_009230|Scaffold376|60698|60806| DNA_SRC: Scaffold376 START: 60698 STOP: 60806 STRAND: +  \nATTGCCAAAGTTCGTGAGTTGGTTCCTACCCGGAGTAGAGATGATATTGCACTGGTTCTTCAATGCCATG \nAGGGAAATGTGGATAAAGCAGTACAGTCATTCATAGACG \n>SPU_009230|Scaffold376|66479|66525| DNA_SRC: Scaffold376 START: 66479 STOP: 66525 STRAND: +  \nATGGAGCCAAAACTGTTTTGAATGAGTGGCAGTCGCATGGCAAGAAG \n>SPU_009230|Scaffold376|69825|69978| DNA_SRC: Scaffold376 START: 69825 STOP: 69978 STRAND: +  \nTCTGCAAATAAGAGAAACAAGAAAAAGAAACGAGGCCCTGATGCACCAGATGAGAAATCAAATGGTGGTG \nATGCTGCTGTAGCTAGTAAAACAGGTAAATATAACGCACTAGAGCAATTCCATGGAAAGTTGCCTAAGAC \nGGGCAACATGCAAG \n>SPU_009230|Scaffold376|103311|103387| DNA_SRC: Scaffold376 START: 103311 STOP: 103387 STRAND: +  \nGTGAAGTAAATGGGTACCTGGTAGGAATTTATTCCTTGAAACGCAGCGCGCGTAACAGCTGCACTGCTAA \nAGCCAGG \n \n
SPU_007850	SPU_007850	none	#\nThis gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some flame shifts, but reflects best gene structure. \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_007986	SPU_007986	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure. \nThis is a member of sea urchin-specific Tlr Group IE. \n
SPU_008267	SPU_008267	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(18 to 24), LRR-CT, TM and TIR. \n
SPU_009129	SPU_009129	none	Unknown sequence (NNN...) in the intron of the current model could make this gene model incomplete. Modified gene model includes the long unknown sequence.\n
SPU_009829	SPU_009829	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some flame shifts, but reflects best gene structure. \nThis is a member of sea urchin-specific Tlr Group I(orphan).\n
SPU_015920	SPU_015920	none	5' end of the gene is missing because scaffold data is incomplete.  This gene is on same scaffold adjacent to Sp-AlphaD\n
SPU_010940	SPU_010940	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure. \nThis is a member of sea urchin specific Tlr Group IA.\n
SPU_011030	SPU_011030	none	This is longer than most homologs/orthologs.  May be duplication of SPU_011029 with actual starting codon at bp 105. \n
SPU_010619	SPU_010619	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure. \nThis is a member of sea urchin specific Tlr Group IA.\n
SPU_011540	SPU_011540	none	Intron of this gene model was modified to a coding region by comparison to the corresponding FgenesAB, ++ and Genscan prediction. \nThis is a member of sea urchin specific Tlr Group IA.\n
SPU_011949	SPU_011949	none	The intron of this gene model included a long unknown sequence. So the model was modified as an intronless gene by comparison to the corresponding FgenesAB prediction. \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_013144	SPU_013144	none	partial CDS; Contains C-terminal half sequences, based on best blast alignment.  Some sequences of predicted exons are not conserved and are therefore questionable.  They are: \n>SPU_013144|Scaffold9356|6814|6907| DNA_SRC: Scaffold9356 START: 6814 STOP: 6907 STRAND: +  \nTCATTCCTTGAGTATATTCCATCCCTGACTGGTCATGGTGGTGGTTGGTCCTAGTAGAGTAGTGGCATGG \nGTTCTCTTACCATCCATCAAGTAC \n>SPU_013144|Scaffold9356|12446|12517| DNA_SRC: Scaffold9356 START: 12446 STOP: 12517 STRAND: +  \nCTGTAAGAGTAAAACTTGAAAGCGCCCTCGGTGCCCATGGATGCTCCACCTCCATATGCTCCTCCCTTCT \nCC \n>SPU_013144|Scaffold9356|13299|13352| DNA_SRC: Scaffold9356 START: 13299 STOP: 13352 STRAND: +  \nCTGATTTCCCGATGCAGGTATTTGGCAGACATCAAACGGGCAAGGACCCTCAAC \n>SPU_013144|Scaffold9356|14762|14873| DNA_SRC: Scaffold9356 START: 14762 STOP: 14873 STRAND: +  \nCTGTGTTAAATGTAGACTGTCTCCTAAGGGTGAGCCTGGTAGATTATCTAGAAACCTTGTCAGCTGATTG \nGCTGCTTGATCCACGCCTTCTGGGCTGGAGTTGACTGCACAT \n>SPU_013144|Scaffold9356|15689|15789| DNA_SRC: Scaffold9356 START: 15689 STOP: 15789 STRAND: +  \nCTCATGTTGGTCTTGTTTAGTACCAAGGAGGCAATAGTCTGAAGGTGGGCCAAGACTGGGTCCAAGTTCT \nCCTTCTCGGCAAGTCCTTTTAGGAATGACAC \n
SPU_025459	SPU_025459	none	#\npartial CDS, as inferred from alignment with best blast hit.  The following exon sequence is not conserved and may not be part of this gene. \n>SPU_025459|Scaffold2341|6491|6568| DNA_SRC: Scaffold2341 START: 6491 STOP: 6568 STRAND: +  \nACTGCCATGCCTGGTATGAAGCGGGACTGCGGTGGCGCAGCAGCGATTTTGGGTGCATTCTATGCAGCCG \nTTAAAGAA \n
SPU_013676	SPU_013676	none	Intron was modified to a coding region by comparison to the corresponding FgenesAB, ++ and Genscan prediction.\n
SPU_025520	SPU_025520	none	This model contains to predicted exons that may not belong to this gene, because they are not conserved, while flanking exons are strongly conserved.  The first of these was also included in a haplotype, SPU_025459. \n>SPU_025520|Scaffold32958|7581|7658| DNA_SRC: Scaffold32958 START: 7581 STOP: 7658 STRAND: +  \nACTGCCATGCCTGGTATGAAGCGGGACTGCGGTGGCGCAGCAGCGATTTTGGGTGCATTCTATGCAGCCG \nTTAAAGAA \n>SPU_025520|Scaffold32958|12680|12777| DNA_SRC: Scaffold32958 START: 12680 STOP: 12777 STRAND: +  \nGCGATGGCAAACTATCACTTCCTCGTCGATCAGAATGTATACGCTATCTTTCCTCAGCTTCGCTTCGGAA \nAGATAGCTACATCCAGATCGCCTCGGAA \n
SPU_013751	SPU_013751	none	#\nThis gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure. \nThis is a member of sea urchin specific Tlr Group IA.\n
SPU_003749	SPU_003749	none	This model may encode two similar adjacent genes, but it is difficult to annotate because there are some exact amino acid duplications in different parts, raising the possibility of an assembly error. \n
SPU_009159	SPU_009159	none	Both ends of the gene run are at scaffold boundaries and the CDS is incomplete.  I have added 2 exons in the Fgenes H prediction that blast to the gaps in alignments of the GLEAN3 prediction.  One EST\n
SPU_012617	SPU_012617	none	Might be missing exons (from best hit alignment, position on contig).\n
SPU_020050	SPU_020050	none	Phylogenetic analysis using the PTPc domains shows that this is an orthologue of the human PTPRs D, F, and S.  The missing extracellular portion of this gene may correspond to SPU_023737.  This sequence has not been added to the SPU_020050 sequence, since cloning has not yet verified this relationship.\n
SPU_027234	SPU_027234	none	Only exons 1-3 are present on this scaffold21. Exons 4-5 are missing. Exons 4-5 are present on scaffold1351 SPU_014051 but exons 1-3 are missing.\n
SPU_015371	SPU_015371	none	The predicted exons 2, 3, 4, 14 may not belong to this gene,  \nas inferred by alignment to best blast hit. \n \nExon2 \n>SPU_015371|Scaffold2|816312|816482| DNA_SRC: Scaffold2 START: 816312 STOP: 816482 STRAND: +  \nGTTGAAGCAAAGATTCGAATCACCGAGTTTGATTCTGAATCACGTCGGACTGCTCAGACCTACTACCATA \nCCTACCCTTCTCATCAGATCTCATACGATGTCAGACGTGAACTCGAAGCTATAGCCTCGACATCGGGTTC \nGCCCACCTACGTGGACGAAGTCACGCAAGAG \n \nExon3 \n \n>SPU_015371|Scaffold2|818170|818272| DNA_SRC: Scaffold2 START: 818170 STOP: 818272 STRAND: +  \nCTAGAGGATGTGAAGTCTCGAATGGAGCAGAGATATCACACGGCTAAGGTATGCAGGAAGAAAGGTAGAC \nGAGCACGGAAAGAATGTCTGCGTCTAGATCCAG \n \nExon4 \n \n>SPU_015371|Scaffold2|821627|821929| DNA_SRC: Scaffold2 START: 821627 STOP: 821929 STRAND: +  \nGACTAGAAGAAGACTGTGCATTGGTATTCAAGAAACGAAGATGCGTAATCATGACAATGCTACGAGATGA \nGAGATTATATGTTTACTGCATGCAGGCTGTGATGAAAGCTAGGAAAGTGGATCAAGGGTCCCTTTATTTG \nTATGTTATGATCAATTATTTGAAGGAAGGTAGTGATGACAAGATTGGTGATGATCATAATGCCATGGACT \nTTGACCGGACATCATGGTCAAGGACCATGGTTAATCTCACGAAATGGAAAGTAGCAATGAAGCGAAACGA \nTGAGGGAGGATGTACTTACGAGG \n \nExon14 \n \n>SPU_015371|Scaffold2|831817|831927| DNA_SRC: Scaffold2 START: 831817 STOP: 831927 STRAND: +  \nTGATATGCGTGAGGCAAACACCATTGGTGCCGATAAGTACTTCCATGCCCGGGGCAACTTCGACGCTGCT \nCAGCGAGGATCAGGAGGTAGATTCGCTGCCGAGGTTATCAG \n
SPU_016055	SPU_016055	none	Added 3prime UTR based on EST evidence. \nN-terminus is missing due to end of contig.\n
SPU_023767	SPU_023767	none	Exons 6-23 are accepted on the + strand from the SPU_023767 predictions. Exons 2-5 are present on the - strand between exons 7 and 8. Exon 1 is from scaffold59902 but there was no GLEAN3 prediction for it, but it was predicted by FirstEF.\n
SPU_007431	SPU_007431	none	TWO COMMENTS: \n \n1)partial CDS; missing N-terminal sequence as judged by alignment with best blast hit \n2)In addition to two glean3 copies, there is an excellent match on scaffold 123301, 194-227 that is not on the glean3 list.\n
SPU_002912	SPU_002912	none	partial CDS; scaffold 52017 is short\n
SPU_027157	SPU_027157	none	partial CDS; scaffold 66222 is short.\n
SPU_015029	SPU_015029	none	Intronless Toll-like receptor with predicted LRR-NT, LRR(22), LRR-CT, TM and TIR. \nThis is a member of sea urchin specific Tlr Group IA.\n
SPU_015132	SPU_015132	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure.\n
SPU_028898	SPU_028898	none	This gene was annotated based on a manual inspection of multiple sequence alignments. \n \nNote that slightly different models were created by other predictions for this gene. The Glean model provides the best alignment with vertebrate TRAF6, and it was therefore accepted in its original version. \n \nAlso note there is a gap in the alignment that is introduced by extra sequence in the glean model. There is no a priori computational evidence to suggest this extra sequence is not real.\n
SPU_016468	SPU_016468	none	Unknown sequence (NNN...) in the intron of the current model could make this gene model incomplete. But The intron sequence except NNN matches coding sequence of other Sp-Tlr genes.\n
SPU_016501	SPU_016501	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure.\n
SPU_026495	SPU_026495	none	This gene was annotated based on manual inspection of multiple sequence alignments. \n \nA slightly different model was created by the NCBI prediction; however, the pairwise alignment with vertebrate TRAF3 is significantly better for the glean model. \n
SPU_017180	SPU_017180	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure.\n
SPU_017529	SPU_017529	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT and TIR. \nThis is a member of sea urchin-specific Tlr Group I(orphan). \n
SPU_018055	SPU_018055	none	#\n54 nucleotides encoding a predicted signal peptide were added at the 5'end of the GLEAN3 model by comparison to the corresponding FgeneshAB and ++ prediction. \n
SPU_008332	SPU_008332	none	This gene was annotated based on a manual inspection of multiple sequence alignments. \n \nEven though the available information is sufficient to confidently assign this gene as Sp-Traf4, there is also strong computational evidence to suggest that there is missing sequence towards the N-terminus of this gene, which is supported by the fact that this model is located at one end of its respective scaffold. \n \nA slightly different model was created based on a Fgenesh++ prediction; however, the glean/NCBI model shows a better alignment to vertebrate TRAF4.\n
SPU_003462	SPU_003462	none	This gene was annotated based on a manual inspection of multiple sequence alignments. \n \nIn a multiple sequence alignment that included vertebrate and Drosophila TRAF family sequences, this gene failed to cluster with any specific family member. Thus, we have decided to name this gene with an arbitrary classifier ("A"). \n \nSlightly different models were created by other predictions; however the exonic structure of the glean model is very strongly supported by the tiling array data. The array data also suggest there might be additional sequence absent from the current glean model, which coincides with gaps in the alignment with vertebrate TRAFs. However, at present we have no evidence to determine whether the model should indeed be accordingly modified. \n
SPU_023069	SPU_023069	none	This gene was annotated based on a manual inspection of multiple sequence alignments to family members in other animal groups. \n \nThis sequence did not cluster with any specific family member in a multiple alignment tree, and was therefore named with an arbitrary identifier ("B"). \n \nMost exons in the present model are supported by the genome-wide tiling array data. Based on the same array data, there seem to be some exons that may have been erronously left out from the model. We have no evidence to independently support such possibility, and we have therefore accepted the glean model in its present form. \n \nAlso note that the first exon in the glen model codes for some very low complexity aminoacidic sequence. Exon 1 in the corresponding NCBI model does not include such sequence. However, the transcription of this sequence is supported by the tiling array data. Thus, it is highly likely that the most upstream sequence in exon 1 of the glean model is in fact 5' UTR, and that the true CDS corresponds to that of the NCBI model's exon 1. At the present time we cannot support any additional evidence to support this possibility.\n
SPU_018100	SPU_018100	none	The first exon was eliminated and 117 nucleotides encoding a predicted signal peptide were added at the 5'end of the second exon of the GLEAN3 model by comparison to the corresponding FgeneshAB prediction. \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_018212	SPU_018212	none	Unknown sequence (NNN...) in the intron of this gene model could make this gene model incomplete. The nucleotides in the first intron except NNN... have highly similar to other Sp-Tlr genes, so it was modified to a coding region and third exon was eliminated.\n
SPU_011877	SPU_011877	none	The 5' end of the gene is incomplete, there are probably 1 or 2 exons (140 amino acids) missing.  The 6 ESTs are up to 3.5 kb upstream and only UTR sequence seems to be available.  There is embyronic expression data for 5' sequences.\n
SPU_020620	SPU_020620	none	Exons 8-19 are from this scaffold57107 and SPU_020620 prediction. Exons 3-7 are from scaffold533 and SPU_023989 predictions except for exon 5 which was only predicted by the Fgenesh++ prediction. Exon 2 is from scaffold65249 with no tracks predicting it. Exon 1 is incomplete and present on scaffold137005 with no tracks predicting it.\n
SPU_025471	SPU_025471	none	This gene was annotated with A.pectinifera mRNA and peptide.  The complete annotation of this gene is on SPU_012078.  This glean aligns up with ApIP3R AAs 1247-1632.\n
SPU_024561	SPU_024561	none	The mRNA and pepetide sequence of A. pectinifera were used to check the Glean prediction.  The complete annotation of this gene is on SPU_012078.  Of note, two exons predicted are in sequential order, but code for the exact same nts. of the protein.  Also, not all the predicted exons were used in the full annotation under SPU_012078.  This glean aligns up with ApIP3R AAs 116-403.\n
SPU_007449	SPU_007449	none	This predicted gene SPU_007449 matches to exon 4 to 6 of Spec2c gene. The other predicted gene SPU_014607 matches to exon 1 to 3 of Spec2c.And, there are 10 Amino Acid that cannot match to any predicted genes. \n
SPU_014607	SPU_014607	none	This predicted gene matches to exon 1 to 3 of Spec2c. Another predicted gene SPU_007449 matchs to exon 4 to 6 of Spec2c. For more information of Spec2c, please search SPU_007449 to see details.\n
SPU_003175	SPU_003175	none	First exon and last exon of Spec2d gene is not predicted in \nSPU_003175 \n
SPU_006513	SPU_006513	none	Duplicated piece matching nucleotide numbers 11549-12671 of this scaffold (#56396) on scaffold #33010 (nucleotide numbers 1-1123, SPU_023676)\n
SPU_007655	SPU_007655	none	Additional evidences of the existence of the gene have been obtained in Sphaerechinus granularis \nSPU_021839 encodes a partial 3'-terminal sequence of the mRNA \nPosition of the mRNA 5'end has been deduced from Sp ESTs (i.e. CX678933.1) \nTentative assignment of the 3'UTR end position by computational methods and comparison with S. granularis mRNA\n
SPU_003332	SPU_003332	none	SPU_003332 and the neighboring prediction SPU_003333 are partial CDS and may result from assembly problems.  The first two exons of 03332 are exact repeats and this sequence appears again in 03333.\n
SPU_021022	SPU_021022	none	Alignment data suggests this model contains complete CDS.\n
SPU_000075	SPU_000075	none	Alignment with best blast hit suggests that the model may be missing N-terminal sequences\n
SPU_015452	SPU_015452	none	Alignment with best blast hit suggests that model may be missing N-terminal sequence.  All exons except for the first one (see below) of the model blast to methionyl aminopeptidase.  This first exon may or may not be correct. \n>SPU_015452|Scaffold1614|92736|92855| DNA_SRC: Scaffold1614 START: 92736 STOP: 92855 STRAND: +  \nATGTCTTTCAACAGCTACAGAAAACCCAGACCAGAACAGCTTTCAATATCTTCCAGAAATGGGGCAAAAA \nAGCAGAGCCAACAACACCCCAGTAACTTCTCAATTGTTCAGGCAGGAAAG\n
SPU_014474	SPU_014474	none	Partial CDS, because it is on a short scaffold. Alignment with best blast sequences suggests that each of the 4 exons of this model is similar to sequences in the same protein, methionyl aminopeptidase.\n
SPU_006794	SPU_006794	none	Alignment with best blast hit suggests that several internal exons may be missing from the model.\n
SPU_019205	SPU_019205	none	Alignment to best blast hit suggests that there may be two missing internal exons in this model\n
SPU_000619	SPU_000619	none	partial CDS, missing C-terminal sequences, based on alignment with best blast hit sequence\n
SPU_014274	SPU_014274	none	Partial CDS based on alignment with best blast hit sequence\n
SPU_008606	SPU_008606	none	Alignment with best blast hit sequence suggests that model is complete.\n
SPU_011258	SPU_011258	none	Partial CDS\n
SPU_019102	SPU_019102	none	TWO COMMENTS: \n \n1) Alignment with best blast hit sequence suggests that this model may lack N-terminal sequence. \n2) This model is nearly identical to the adjacent model SPU_019101 on Scaffold 112071\n
SPU_010630	SPU_010630	none	Alignment with best blast sequence suggests that this model is complete.\n
SPU_002994	SPU_002994	none	Alignment with best blast hit sequence suggests that C-terminal exon(s) may be missing from the model.\n
SPU_018410	SPU_018410	none	Intronless Toll-like receptor with LRR-NT, LRR(11 to 22), LRR-CT, TM and TIR. \n
SPU_023048	SPU_023048	none	Partial CDS as suggested by alignment to best blast hit sequence\n
SPU_018519	SPU_018519	none	This gene model was modified as intronless Toll-like receptor by comparison to the corresponding FgenesAB and Genscan prediction. \nThis is a member of sea urchin specific Tlr Group IA.\n
SPU_002598	SPU_002598	none	Alignment with best blast hit sequence suggests that both N- and C-terminal exons are missing from the model\n
SPU_018534	SPU_018534	none	Unknown sequence (NNN...) in the 5'region of the current model could make this gene model incomplete.\n
SPU_018838	SPU_018838	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. The fist intron is similar to a coding region of other Sp-Tlr gene, but the second is not.\n
SPU_018928	SPU_018928	none	471 bp intron was accepted as coding region by comparison to the corresponding FgenesAB and ++ prediction.\n
SPU_019042	SPU_019042	none	Unknown sequence (NNN...) in the intron of the current model could make this gene model incomplete. The intron was accepted to a coding region.  \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_019834	SPU_019834	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure.\n
SPU_020996	SPU_020996	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR(5 to 10), LRR-CT, TM and TIR.\n
SPU_020997	SPU_020997	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR(5 to 10), LRR-CT, TM and TIR.\n
SPU_021162	SPU_021162	none	Unknown sequence (NNN...) in the intron of the current model could make this gene model incomplete. This gene is at the end of scaffold.\n
SPU_021225	SPU_021225	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure.\n
SPU_022302	SPU_022302	none	5' end of the gene (200 aa) is missing from the scaffold end.  There is an allele or a paralogue predicted by SPU_002273 Scaffold4501.  \n
SPU_002273	SPU_002273	none	This appears to be 575-840 of SpAlpha-J.  Either an allele or a duplication\n
SPU_026617	SPU_026617	none	Alignment with best blast hit sequence suggests the gene model may lack N-terminal sequence, but otherwise appears to be complete. \n \nThere are excellent matches to sequences in this model on scaffold64641_1, 766-1098 and on scaffold 32233, 2047-2418.  Neither of these is on the glean3 list.\n
SPU_004913	SPU_004913	none	Alignment with best blast hit sequence suggests that N-terminal sequence may be missing, but otherwise model appears to be complete.\n
SPU_021008	SPU_021008	none	This appears to be a good prediction for the half of an alphaV,5-like subunit\n
SPU_021395	SPU_021395	none	Partial Toll-like receptor. This gene model is located at the end of the scaffold.\n
SPU_021415	SPU_021415	none	546 bp intron and 63bp 5'UTR were accepted as coding region by comparison to the corresponding FgenesAB and ++ prediction.\n
SPU_021787	SPU_021787	none	Unknown sequence (NNN...) in the 5' upstream of the current model could make this gene model incomplete. 331bp of 5'UTR next to NNN... was accepted to a coding region. The modified gene model has a stop codon, but reflects best gene structure.\n
SPU_015735	SPU_015735	none	Likely only partial gene.\n
SPU_021936	SPU_021936	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has a stop codon, but reflects best gene structure. \nThis is a member of sea urchin specific Tlr Group IA.\n
SPU_023090	SPU_023090	none	The 5' end of this protein seems to be from some other gene, but the mid-3' end corresponds to Prickle.  \n
SPU_025302	SPU_025302	none	1) SPU_022817 shows a perfect match to the N-terminal region of Sp-Alx1. The scaffold ends within a large intron in the Sp-Alx1 gene. \n2) The best Genbank hit (XP_785238) is to SPU_0 22816, a closely related gene. Note that SPU_022816 and SPU_022817 are on the same scaffold (Scaffold 260).\n
SPU_022451	SPU_022451	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure. \nThis is a member of sea urchin-specific Tlr Group IC.\n
SPU_012766	SPU_012766	none	This Glean was checked with alignment to A. pectinifera mRNA and peptide.  The full annotation of this gene is in SPU_012078.  This Glean aligns up with ApIP3R on AAs 1-93.\n
SPU_002129	SPU_002129	none	Scaffold 1876 hit the first 352 basepairs (1-352) of Sp-Not mRNA (NM_214562 from NCBI), which might be exon 1. So modified gene feature starts from exon 2. \nFrame number is decided by assuming exon 2 is frame 0. \n
SPU_023033	SPU_023033	none	#\n429 bp intron was accepted as coding region by comparison to the corresponding FgenesAB, ++ and Genscan prediction.\n
SPU_010698	SPU_010698	none	Structure similar to Protein Tyrosine Kinase 7 isoform c precursor.\n
SPU_022716	SPU_022716	none	Partial CDS, lacking N-terminal half, as inferred from alignment with best blast hit sequence,\n
SPU_022909	SPU_022909	none	468 bp intron was accepted as coding region by comparison to the corresponding FgenesAB, ++ and Genscan prediction.\n
SPU_025078	SPU_025078	none	Alignment with best blast hit sequence suggests that model is lacking C-terminal 2/3 of gene\n
SPU_024526	SPU_024526	none	See SPU_026779.  Partial duplication.\n
SPU_019976	SPU_019976	none	Alignment with best blast hit sequence suggests that model lacks N-terminal half of the gene.\n
SPU_002093	SPU_002093	none	Alignment with best blast hit sequence suggests that the model lacks both N- and C-terminal sequences. \n \nThere is an excellent, although short, match on Scaffold 1524_1, 455-727, that is not on the glean3 list.\n
SPU_023544	SPU_023544	none	This gene model may represent a pseudogene or contain a sequence error. Intron and 3'UTR sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure. \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_024032	SPU_024032	none	Alignments and tiling data suggest that two additional exons are possible at 3'end- position 10,010-10,144 and 13,805-13,986. \n
SPU_006768	SPU_006768	none	Repair polymerase. Conducts "gap-filling" DNA synthesis in a stepwise distributive fashion rather than in a processive fashion as for other DNA polymerases. Has a 5'-deoxyribose-5-phosphate lyase (dRP lyase) activity (By similarity).\n
SPU_024815	SPU_024815	none	429 bp intron was accepted as coding region by comparison to the corresponding FgenesAB, ++ and Genscan prediction.\n
SPU_025136	SPU_025136	none	Unknown sequence (NNN...) in the intron and the small scaffold could make this gene model incomplete. The third exon was accepted as the coding region of a partial Toll-like receptor. \nThis is a member of sea urchin-specific Tlr Group ID.\n
SPU_025263	SPU_025263	none	Partial Toll-like receptor. This gene model is located at the end of a small scaffold. \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_025312	SPU_025312	none	Unknown sequence (NNN...) in the intron of the current model could make this gene model incomplete. Intron sequence except NNN is highly similar to other Sp-Tlr genes, so it was accepted as a coding region.  \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_028938	SPU_028938	none	Alignment to best blast sequence suggests that this model may be complete, possibly lacking sequences at the N-terminal end. \n \nThis is one of 4 very similar NAALADase genes on scaffold 496.\n
SPU_028939	SPU_028939	none	Alignment with best blast sequence suggests that this model lacks C-terminal sequences. \n \nThis is one of 4 closely related NAALADase genes on Scaffold 496.\n
SPU_026274	SPU_026274	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure.\n
SPU_009215	SPU_009215	none	This sequence is a composite of the original Glean_09215 plus the Glean_19261 sequences.  Although these sequences are on separate scaffolds, they appear to be complementary.  Alternatively, it is possible that they are from two different forms of Fmi.\n
SPU_015929	SPU_015929	none	Exons 8-24 are present on this scaffold60165 and modified SPU_015929 prediction.  Modified SPU_015930 and SPU_015931 have been merged into the SPU_015929 prediction. Exons 3-7 are from scaffold41327 and SPU_023590 prediction.  Exons 1-2 are missing.\n
SPU_026275	SPU_026275	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons (frame shift).\n
SPU_027163	SPU_027163	none	This gene model may represent a pseudogene or contain a sequence error. Intron and 5'UTR sequences match coding sequences of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure.\n
SPU_016406	SPU_016406	none	CDS of positive strand of this gene align exactly with CDS of the minus strand of SPU_016409, same scaffold. Sequence between CDS as well as the repeat structure in the two regions is very similar as well. SPU_016409 could be a haplotype that was put on the wrong scaffold. The contig that it is on is attached to the very end of the scaffold right next to this gene. \n
SPU_018375	SPU_018375	none	complete gene model : SPU_018375 + SPU_023408\n
SPU_011225	SPU_011225	none	Alignment with best blast sequence suggests that the model lacks the N-terminal half of the gene.\n
SPU_023984	SPU_023984	none	Alignment with best blast sequence suggests that the model lacks the N'terminal half of the CDS.\n
SPU_004980	SPU_004980	none	Alignment with best blast sequence suggets that this model may be missing N-terminal CDS.\n
SPU_014945	SPU_014945	none	Alignment with best blast sequence suggests that this model may be complete.\n
SPU_027698	SPU_027698	none	Partial TLR. The locus of the gene model is at the end of the scaffold.\n
SPU_027721	SPU_027721	none	Unknown sequence (NNN...) in the intron of the current model could make this gene model incomplete. The intron sequence except NNN... is highly similar to other Sp-Tlr genes, so it was accepted to a coding region. \nThis is a member of sea urchin-specific Tlr Group I(orphan). \n
SPU_027798	SPU_027798	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has a stop codon, but reflects best gene structure. \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_027815	SPU_027815	none	This gene model may represent a pseudogene or contain a sequence error. 5'UTR sequence doesn't match coding sequence of other Sp-Tlr genes. \nThis is a member of sea urchin-specific Tlr Group IE.\n
SPU_028404	SPU_028404	none	Intron sequence was accepted to a coding region that could make TIR domain complete. \nThis is a member of sea urchin-specific Tlr Group I(orphan). \n
SPU_021404	SPU_021404	none	#\nAlignment with best blast hit suggests that this model is missing C-terminal CDS.  The best blast hit is huge, >2200 aa.\n
SPU_006660	SPU_006660	none	Alignment with best blast sequence suggests model is missing large N- and C-terminal  CDSs.\n
SPU_017421	SPU_017421	none	There is a family of carbonic anhydrase genes in the sea urchin. These have not  been carefully compared to potential vertebrate orthologs. \n
SPU_000851	SPU_000851	none	Alignment with best blast sequence suggests that this model lacks the N-terminal half of a huge protein, >2200 amino acids\n
SPU_004135	SPU_004135	none	A family of carbonic anhydrase-like proteins exists in sea urchin. Orthologous relationships with vertebrate carbonic anhydrases and related proteins have not been carefully examined.\n
SPU_002570	SPU_002570	none	Alignment with best blast sequence indicates that this model includes only a small central portion of the CDS.\n
SPU_000871	SPU_000871	none	Unknown sequence (NNN...) in the intron of the current model could make this gene model incomplete. There are duplicated 600bp senquence in this gene model. \nThis is a member of sea urchin-specific Tlr Group IIA.\n
SPU_007418	SPU_007418	none	Partial Toll-like receptor. The locus of this gene model is at the end of a scaffold.\n
SPU_007859	SPU_007859	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(14), LRR-CT(2), TM and TIR. \n
SPU_004136	SPU_004136	none	Annotated gene shows coordinates and sequences of the "long form" of SpP19. There is a shorter, alternatively spliced form (see AF519413).\n
SPU_011823	SPU_011823	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(11 to 18), LRR-CT(2) and TIR. \n
SPU_014191	SPU_014191	none	This gene model was modified based on Fgenesh++ prediction. Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(24), LRR-CT, TM and TIR. \nThis is a member of sea urchin-specific Tlr Group IB. \n
SPU_020259	SPU_020259	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure.This is a member of sea urchin-specific Tlr Group IB.  \n
SPU_024404	SPU_024404	none	Unkown sequence (NNN) in the intron could make the gene model incomplete. Modified model was accepted as an intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(12 to 22), LRR-CT, TM and TIR.  \n
SPU_024960	SPU_024960	none	Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT and TIR. \nThis gene model may represent a pseudogene or contain a sequence error, but reflect best gene structure. \nThis is a member of sea urchin-specific Tlr Group ID. \n
SPU_027222	SPU_027222	none	#\nThe gene model was accepted as a intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(11 to 20), LRR-CT, TM and TIR.\n
SPU_002174	SPU_002174	none	Alignment with best blast sequence suggests that this model is missing C-terminal CDS.  The best blast hit is >2200 amino cds and the model is 1492.\n
SPU_026537	SPU_026537	none	Alignment with best blast seuence suggests model may be missing a small C-terminal segment. \n \nThis model is adjacent to a short segment of a closely related gene, SPU_026538, on Scaffold107330.\n
SPU_026538	SPU_026538	none	Alignment with best bast sequence suggests that the model contains only a small N-terminal segement of the CDS. \n \nThis model is adjacent to a very similar gene, SPU_026537, on Scaffold107330.\n
SPU_023590	SPU_023590	none	Refer to SPU_015929 for complete REJ4 gene features\n
SPU_019049	SPU_019049	none	Alignment with best blast sequence suggests that this model contains only  a short segment, which is repeated in human multifunctional protein CAD.\n
SPU_013209	SPU_013209	none	Alignment with best blast sequence suggests this model could be complete if the predicted N-terminal exon which is not conserved is correct.\n
SPU_021494	SPU_021494	none	Alignment with best blast sequence shows that this model contains only a short conserved segment. \n \nThere is a match to sequences on Scaffold72931_1, 262-413, that is not on the GLEAN3 list.\n
SPU_009429	SPU_009429	none	Alignment with best blast sequence suggests that this model may be missing a short N-terminal segment and a longer C-terminal segement.\n
SPU_025325	SPU_025325	none	Alignments with best blast hits suggest that the gene model may be correct.\n
SPU_028671	SPU_028671	none	Alignment with best blast seuence suggests that this model contains only a short conserved sequence.  The following exons are considered unlikely to be part of this model. \n>SPU_028671|Scaffold36273|17615|17788| DNA_SRC: Scaffold36273 START: 17615 STOP: 17788 STRAND: +  \nTTAGATACCTGGAACCTGGGCAAGTAAGTTGACGTAAGAGCAAGTAATCTTGATCTCACAAGTAATCTTG \nATCTCACAAGTAATCTTGATCTCACAAGTAACCTTGATCTCACAAGTAATCTTGATCTCACAAGTAATCT \nTGATCTCACAAGTAATCTTGATCTCACAAGTAAC \n>SPU_028671|Scaffold36273|17825|18165| DNA_SRC: Scaffold36273 START: 17825 STOP: 18165 STRAND: +  \nCTTGATCTCACAAGTAACCTTGATCTCACTAGTAATCTTGATCTCACAAGTAATCTTGATCTCACAAGTA \nACCTTGATCTCACAAGTAACCTTGATCTCACAAGTAACCTTGATCTCACAAGTAATCTTGATCTCACAAG \nTAATCTTGATCTCACAAGTAATCTTGATCTCACAAGTAATCTTGATCTCACAAGTAATCTTGATCTTACA \nACTAACCTTGATCTCACAAGTAACCTTGATCTCACAAGTAATCTTGATCTCACAAGTAACCTTGATCTCA \nCAAGTAATCTTGATCTCACAAGTAATCTTGATCTCACAAGTAACCCTTACTGTTATCCTTC \n
SPU_001458	SPU_001458	none	This gene model may be a short Toll-like recepter.  5'UTR sequence doesn't match coding sequence of other Sp-Tlr genes.\n
SPU_001650	SPU_001650	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence  has 95% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \nThis is a member of sea urchin-specific Tlr Group IA. \n
SPU_001862	SPU_001862	none	#\nPartial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 95% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \n \n
SPU_001993	SPU_001993	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 99% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \n \n
SPU_019349	SPU_019349	none	#\nThis gene was annotated based on a manual inspection of protein alignments. \n \nTwo other adjacent models (SPU_019350 and SPU_019351) code for very similar sequences. It is yet to be determined to which extent this reflects true gene duplications or assembly problems. \n \nSeveral Sp-IL17 genes were identified computationally. Each of them shows a best Blast-p hit to specific vertebrate IL17 family members. However, multiple sequence alignments demonstrated that almost all the Sp-IL17 sequences segregate from their vertebrate counterparts as a separate cluster. For consistency purposes, therefore, we decided to arbitrarily assign numerical identifiers to all the Sp-Il17 genes, and thus avoid the implication that they all have specific vertebrate orthologs (which are classified by the letters A-F).\n
SPU_002803	SPU_002803	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 94% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \nThis is a member of sea urchin-specific Tlr Group IC. \n
SPU_003419	SPU_003419	none	Partial Toll-like receptor. This gene model is located at the end of a short contig. The nucleotide sequence has 96% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \n \n
SPU_003846	SPU_003846	none	#\nPartial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 95% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \n \n
SPU_004655	SPU_004655	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 95% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. This is a member of sea urchin-specific Tlr Group I(orphan). \n \n
SPU_005871	SPU_005871	none	#\nThis gene was annotated based on a manual inspection of protein alignments. \n \nThere might be some missing N-terminus sequence in this model, as suggested by an incomplete alignment to homologous vertebrate sequences, and by signal from the tiling array upstream of the first annotated exon that do not correspond with any other models. Such potential missing sequence was searched computationally, but with no success.\n
SPU_013950	SPU_013950	none	This gene was annotated based on a manual inspection of protein alignments. \n \nWe named this gene "Sp-Il1R-rs1" because no other significant Blast hit could be found for it, and because of its overall domain composition, which closely resembles that of a typical IL-1 receptor. In fact, among the Blast hits obtained for this sequence, the one that spans most of the query corresponded to a Gallus gallus predicted IL1RAcP.\n
SPU_005148	SPU_005148	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 97% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor.\n
SPU_007991	SPU_007991	none	Unknown sequence (NNN...) in the intron of the current model could make this gene model incomplete. A part of the intron sequence is highly similar to other Sp-Tlr genes, so it was accepted a coding region and the second exon was eliminated. \n
SPU_009343	SPU_009343	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 96% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor.\n
SPU_009459	SPU_009459	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 96% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \nThis is a member of sea urchin-specific Tlr Group ID.\n
SPU_009933	SPU_009933	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 94% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \nThis is a member of sea urchin-specific Tlr Group IB. \n
SPU_002467	SPU_002467	none	Same gene as SPU_002025. All annotation information is documented there. \n
SPU_009952	SPU_009952	none	 The nucleotides of the coding and 3'UTR sequence have 99% identity to those of SPU_010695. \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_010320	SPU_010320	none	This gene model may represent a pseudogene or contain a sequence error. 3'UTR sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure.\n
SPU_010680	SPU_010680	none	This gene model may represent a pseudogene or contain a sequence error. 5'UTR sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene model.\n
SPU_010693	SPU_010693	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 96% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \nThis is a member of sea urchin-specific Tlr Group I(orphan).\n
SPU_011277	SPU_011277	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 93% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_005983	SPU_005983	none	This gene was annotated based on a manual inspection of protein alignments. \n \nAnother glean model (SPU_012845) codes for a very similar sequence (92% identity). It is yet to be determined to which extent this corresponds to a true gene duplication or a problem with the assembly. \n \nSeveral Sp-IL17 genes were identified computationally. Each of them shows a best Blast-p hit to specific vertebrate IL17 family members. However, multiple sequence alignments demonstrated that almost all the Sp-IL17 sequences segregate from their vertebrate counterparts as a separate cluster. For consistency purposes, therefore, we decided to arbitrarily assign numerical identifiers to all the Sp-Il17 genes, and thus avoid the implication that they all have specific vertebrate orthologs (which are classified by the letters A-F).\n
SPU_019350	SPU_019350	none	This gene was annotated based on a manual inspection of protein alignments. \n \nTwo other adjacent models (SPU_019349 and SPU_019351) code for very similar sequences. It is yet to be determined to which extent this reflects true gene duplications or assembly problems. \n \nSeveral Sp-IL17 genes were identified computationally. Each of them shows a best Blast-p hit to specific vertebrate IL17 family members. However, multiple sequence alignments demonstrated that almost all the Sp-IL17 sequences segregate from their vertebrate counterparts as a separate cluster. For consistency purposes, therefore, we decided to arbitrarily assign numerical identifiers to all the Sp-Il17 genes, and thus avoid the implication that they all have specific vertebrate orthologs (which are classified by the letters A-F).\n
SPU_019351	SPU_019351	none	This gene was annotated based on a manual inspection of protein alignments. \n \nTwo other adjacent models (SPU_019349 and SPU_019350) code for very similar sequences. It is yet to be determined to which extent this reflects true gene duplications or assembly problems. \n \nSeveral Sp-IL17 genes were identified computationally. Each of them shows a best Blast-p hit to specific vertebrate IL17 family members. However, multiple sequence alignments demonstrated that almost all the Sp-IL17 sequences segregate from their vertebrate counterparts as a separate cluster. For consistency purposes, therefore, we decided to arbitrarily assign numerical identifiers to all the Sp-Il17 genes, and thus avoid the implication that they all have specific vertebrate orthologs (which are classified by the letters A-F).\n
SPU_011328	SPU_011328	none	This gene model may represent a pseudogene or contain a sequence error. Intron and 3'UTR sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified model contains some frame shifts, but reflects best gene structure. \n
SPU_012844	SPU_012844	none	This gene was annotated based on a manual inspection of protein alignments. \n \nThis prediction contains a sequence that seems a triplication of the sequence from an adjacent model (SPU_012845). It is yet to be determined if this reflects a true exon multiplication/gene duplication or problems with the assembly. In addition, another model (SPU_005983) codes for a very similar sequence (92% identical). Again, it is yet to be determined whether this reflects a true gene duplication or assembly problems. \n \nSeveral Sp-IL17 genes were identified computationally. Each of them shows a best Blast-p hit to specific vertebrate IL17 family members. However, multiple sequence alignments demonstrated that almost all the Sp-IL17 sequences segregate from their vertebrate counterparts as a separate cluster. For consistency purposes, therefore, we decided to arbitrarily assign numerical identifiers to all the Sp-Il17 genes, and thus avoid the implication that they all have specific vertebrate orthologs (which are classified by the letters A-F).\n
SPU_022838	SPU_022838	none	This gene was annotated based on a manual inspection of sequence alignments. \n \nThere seems to be an annotation problem with this gene: the first two exons, located in one contig, are highly similar in sequence to the last three exons, which are located in a separate contig of the same scaffold. Since no other models map between these contigs, and since they lie at the end of the scaffold, this may represent a case of exon amplification, or it may be a case of erroneous assembly (haplotypes?) that led to a duplicated sequence within this model. We have not modified the present model because we have no independent evidence to support either claim. \n \nSeveral Sp-IL17 genes were identified computationally. Each of them shows a best Blast-p hit to specific vertebrate IL17 family members. However, multiple sequence alignments demonstrated that almost all the Sp-IL17 sequences segregate from their vertebrate counterparts as a separate cluster. For consistency purposes, therefore, we decided to arbitrarily assign numerical identifiers to all the Sp-Il17 genes, and thus avoid the implication that they all have specific vertebrate orthologs (which are classified by the letters A-F).\n
SPU_012845	SPU_012845	none	This gene was annotated based on a manual inspection of protein alignments. \n \nAn adjacent prediction (SPU_012844) contains a sequence that seems a triplication of this model. It is yet to be determined if this reflects a true exon amplification/gene duplication case or problems with the assembly. In addition, another model (SPU_005983) codes for a very similar sequence (92% identical). Again, it is yet to be determined whether this reflects a true gene duplication or assembly problems. \n \nSeveral Sp-IL17 genes were identified computationally. Each of them shows a best Blast-p hit to specific vertebrate IL17 family members. However, multiple sequence alignments demonstrated that almost all the Sp-IL17 sequences segregate from their vertebrate counterparts as a separate cluster. For consistency purposes, therefore, we decided to arbitrarily assign numerical identifiers to all the Sp-Il17 genes, and thus avoid the implication that they all have specific vertebrate orthologs (which are classified by the letters A-F).\n
SPU_011481	SPU_011481	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 95% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \n
SPU_027904	SPU_027904	none	#\nThis gene was annotated based on a manual inspection of protein alignments. \n \nSeveral Sp-IL17 genes were identified computationally. Each of them shows a best Blast-p hit to specific vertebrate IL17 family members. However, multiple sequence alignments demonstrated that almost all the Sp-IL17 sequences segregate from their vertebrate counterparts as a separate cluster, except for this model that co-distributed with vertebrate IL17E (or IL25). For consistency purposes, however, we decided to arbitrarily assign numerical identifiers to all the Sp-Il17 genes, and thus avoid the implication that they all have specific vertebrate orthologs (which are classified by the letters A-F). At this point, a more careful analysis is needed to determine whether Sp-Il17-8 is indeed an ortholog of IL17E.\n
SPU_013111	SPU_013111	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 96% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \n
SPU_013162	SPU_013162	none	#\nPartial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 97% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \n
SPU_014352	SPU_014352	none	This gene model may represent a pseudogene or contain a sequence error. 3'UTR sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified model contains some frame shifts, but reflects best gene structure.\n
SPU_014548	SPU_014548	none	This gene model may represent a pseudogene or contain a sequence error. 3'UTR sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified model contains some frame shifts, but reflects best gene structure. \n
SPU_014929	SPU_014929	none	#\nThis gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model contains some frame shifts, but reflects best gene structure. \n
SPU_015553	SPU_015553	none	#\nPartial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 96% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \nThis is a member of sea urchin-specific Tlr Group IB.  \n
SPU_016388	SPU_016388	none	Unknown sequence (NNN...) in the intron of the current model could make this gene model incomplete. Intron sequence except NNN... matches a coding region of other Sp-Tlr genes. Modified gene model contains some frame shifts, but reflects best gene structure. \nThis is a member of sea urchin-specific Tlr Group ID.\n
SPU_022815	SPU_022815	none	similar to G-protein coupled receptor 64 precursor  \n            (Epididymis-specific protein 6) (He6 receptor)\n
SPU_015867	SPU_015867	none	#\nS.purpuratus EF1B alpha cloned (AJ973180) \n
SPU_015285	SPU_015285	none	Part of this sequence is also contained in SPU_026576.  \n"Additional evidences of the existence of the gene" have been obtained in Sphaerechinus granularis\n
SPU_016438	SPU_016438	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 98% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \n
SPU_016554	SPU_016554	none	#\nPartial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 97% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \nThis is a member of sea urchin-specific Tlr Group IC.\n
SPU_017735	SPU_017735	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 96% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor.\n
SPU_017794	SPU_017794	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 95% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor.\n
SPU_018380	SPU_018380	none	This gene model may represent a pseudogene or contain a sequence error. 5'UTR sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified model contains some frame shift, but reflects best gene structure. \nThis is a member of sea urchin-specific Tlr Group ID. \n
SPU_020428	SPU_020428	none	Partial Toll-like receptor. This gene model is located at the end of a contig. The nucleotide sequence has 94% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \n
SPU_020644	SPU_020644	none	#\nUnknown sequence (NNN...) in the intron of the current model could make this gene model incomplete. But the second exon was eliminated based on BLASTN search. \nThis is a member of sea urchin-specific Tlr Group IB. \n
SPU_011299	SPU_011299	none	This gene was annotated based on a manual inspection of multiple protein alignments. \n \nSeveral gene models related to this one were found. A multiple sequence alignment with various vertebrate and nematode related proteins failed to strongly support any homologies among them, and we have therefore arbitrarily numbered the members of this S.purpuratus family of genes.\n
SPU_020652	SPU_020652	none	Unknown sequence (NNN...) in the 3'and 5' UTR of the current model could make modified gene model still incomplete.  \nThis is a member of sea urchin specific Tlr Group IA.\n
SPU_017901	SPU_017901	none	This gene was annotated based on a manual inspection of multiple protein alignments. \n \nThis gene model was modified based on an overlapping FgeneshAB prediction which extends the protein sequence of this model and improves its alignment with related sequences from other phyla. Nonetheless, this model seems still incomplete after this modification (N-ter sequence missing). The model is situated in a small scaffold, and that likely accounts for the missing information. \n \nSeveral gene models related to this one were found. A multiple sequence alignment with various vertebrate and nematode related proteins failed to strongly support any homologies among them, and we have therefore arbitrarily numbered the members of this S.purpuratus family of genes.\n
SPU_019323	SPU_019323	none	This gene was annotated based on a manual inspection of multiple protein alignments. \n \nThe best Blast hit for this gene corresponds to human dopachrome tautomerase (a.k.a. phenylpyruvate tautomerase II), a gene very closely related to MIF (phenylpyruvate tautomerase). For this reason, we have arbitrarily named this gene Sp-Mif-like2. \n \nNote that the genome-wide tiling array data correlate with the exon structure of this model, but that they also indicate high expression levels of a genomic region that falls in the second intron of this model. No other models cover the region, and it is unclear at this point what might account for these observations.\n
SPU_020035	SPU_020035	none	This gene was annotated based on a manual inspection of multiple protein alignments. \n \nThe genome-wide tiling array data correlate with the exon structure of this model. \n \nSeveral gene models related to this one were found. A multiple sequence alignment with various vertebrate and nematode related proteins failed to strongly support any homologies among them, and we have therefore arbitrarily numbered the members of this S.purpuratus family of genes.\n
SPU_020036	SPU_020036	none	This gene was annotated based on a manual inspection of multiple protein alignments. \n \nNote that the genome-wide tiling array data correlate with the exon structure of this model, but that they also indicate high expression levels of a genomic region that falls in only intron of this model. This region, however, is included in a Genescan model (Supertig1576_6) on the opposite strand, suggesting it may reflect the expression of an overlapping gene. This is yet to be determined experimentally. \n \nSeveral gene models related to this one were found. A multiple sequence alignment with various vertebrate and nematode related proteins failed to strongly support any homologies among them, and we have therefore arbitrarily numbered the members of this S.purpuratus family of genes.\n
SPU_016226	SPU_016226	none	This gene was annotated based on a manual inspection of multiple protein alignments. \n \nNote that the genome-wide tiling array data correlate with the exon structure of this model, but that they also indicate expression of a genomic region that falls in the second intron of this model. No other models cover the region, and it is unclear at this point what might explain this observation. \n \nSeveral gene models related to this one were found. A multiple sequence alignment with various vertebrate and nematode related proteins failed to strongly support any homologies among them, and we have therefore arbitrarily numbered the members of this S.purpuratus family of genes.\n
SPU_001152	SPU_001152	none	This gene was annotated based on a manual inspection of multiple protein alignments. \n \nThe genome-wide tiling array data correlate with the exon structure of this model. Note, however, that there are overlapping Fgenesh++/AB predictions that incorporate more C-ter sequence than this glean model, which is also supported by the genome-wide tiling array data. Since we have no experimental evidence to favor either model, we have accepted the glean model in its present form. \n \nSeveral gene models related to this one were found. A multiple sequence alignment with various vertebrate and nematode related proteins failed to strongly support any homologies among them, and we have therefore arbitrarily numbered the members of this S.purpuratus family of genes.\n
Mif-2	SPU_030001	none	This gene was created based on an NCBI prediction identified based on a manual inspection of multiple protein alignments. The NCBI prediction was accepted with no modifications. However, notice that this model is incomplete (CDS has no stop codon), which is likely due to the fact that it is located in a small scaffold. \n \nSeveral gene models related to this one were found. A multiple sequence alignment with various vertebrate and nematode related proteins failed to strongly support any homologies among them, and we have therefore arbitrarily numbered the members of this S.purpuratus family of genes.\n
SPU_019408	SPU_019408	none	This model was annotated based on a manual inspection of multiple protein sequence alignments.\n
SPU_010703	SPU_010703	none	This prediction includes Semaphorin 5'-terminus domain (predicted exon 1-5) and UDP-glucuronosyltransferase (predicted exon 6).  See SPU_022057 (scaffold 181) for the other part of Semaphorin sequence. \n \nThis prediction maps to Scaffoldi5629 and a hand editted genewise prediction identifies a gene with the architecture  NH2, SEMA, PSI, TM, COOH.  This sequence is a class 6 Sema and the peptide sequence in Gene Sequences has been updated.(RDB 3 May06)\n
SPU_021931	SPU_021931	none	SPU_016610 (scaffold 50410) is also likely ortholog of AADC, but the prediction is missing 40 aa that is consistent with predicted exon 3 of SPU_021931.  \n
SPU_003528	SPU_003528	none	Part of this sequence is also contained in SPU_011191. Exon 3 and 4 are missing in SPU_011191. \n"Additional evidences of the existence of the gene" have been obtained in Sphaerechinus granularis\n
SPU_006778	SPU_006778	none	This gene is on two scaffolds (59529 and 1777). On scaffold59529, one GLEAN model is predicted for this gene (SPU_006778 for exon 1-5). On scafold 1777, there is one GLEAN model (SPU_008535 for exon 3-24) prediceted for this gene. \nPlease refer to GLEAN_08535 for refined gene features.\n
SPU_008535	SPU_008535	none	This gene is on two scaffolds (59529 and 1777). On scaffold59529, one GLEAN model is predicted for this gene (SPU_006778 for exon 1-5). On scafold 1777, there is one GLEAN model (SPU_008535 for exon 3-24) prediceted for this gene. \n
SPU_015654	SPU_015654	none	#\nOne more exon was accepted as coding region by comparison to the corresponding FgenesAB and ++ prediction.\n
SPU_007240	SPU_007240	none	Alignment with best blast sequence suggests that the model is good.\n
SPU_019579	SPU_019579	none	Blast alignments suggest that the model is good.\n
SPU_019610	SPU_019610	none	Blast alignments suggest that the model is good. \n \nThere are two excellent matches defined  by Genscan that are not on the glean3 list.  The first on scaffold 2001_1, 1128-31566, is likely to be a haplotype sequence.  The second is on scaffold 102441_1, 594-10744.\n
SPU_022407	SPU_022407	none	partial CDS;  \n \nThere are two excellent matches defined by Genscan that are not on the glean3 list.  Scaffold2--1_1, 1128-31455 and Scaffold102411_1, 591-10744\n
SPU_002939	SPU_002939	none	Long sequence presenting homologie with Neurofilament Heavy Polypeptide in N-term and acin1 in C-terminal \nFor SpAcin1 see duplication gene Glean_02578\n
SPU_010966	SPU_010966	none	Blast alignment suggests that the model is good.\n
SPU_007208	SPU_007208	none	THREE COMMENTS: \n \n1) Alignment with best blast sequence suggests that N-terminal sequences are missing in the model.   \n2) They also suggest that an internal exon is missing between the following two exons. \n>SPU_007208|Scaffold38521|27220|27355| DNA_SRC: Scaffold38521 START: 27220 STOP: 27355 STRAND: +  \nTGAGCTGAGTAAAAGCATTCATCCCTCCTCCTCTCATTGCTGAGCGTATGATGTAGAAGAGAAAACCCAG \nAACAAGGACTGTTCCCAGGATACTCATCATTCCATCAGAGCTAGAATGCTGGTAACGGATCTGAAT \n>SPU_007208|Scaffold38521|28790|28906| DNA_SRC: Scaffold38521 START: 28790 STOP: 28906 STRAND: +  \nCTCTCTACCAAAGACGATAGCACCATCATGCAGGAAGACATGGACTACATCAGACTCAGGATTCTTGCTC \nTCTCCACTGCCATCAGTATCATAATACGTTACTGACACATCCTTCAC \n3) There are matches to the model sequence on scaffolds 115226_1, 13 to 446 and 118973_1, 377 to 537. \n \n
SPU_025264	SPU_025264	none	Alignment to best blast sequence suggests that the model is missing an internal exon, N-terminal sequence and C-terminal sequence.  See notes for GLEAN3-07208, which probably is the same gene, concerning the internal exon. \n \n
SPU_007374	SPU_007374	none	#\nBlast alignments suggest that the model is good.  Sequence similarity with Hs sequence is 89% at amino acid level!\n
SPU_002234	SPU_002234	none	Alignment with best blast sequence suggests that model is good.  88% amino acid identity over 389 residues!!\n
SPU_016198	SPU_016198	none	Alignment with best blast sequence suggests that the N-terminal sequence in the model is not conserved although the remainder of the protein is 90% identical.  The N-terminal sequence cannot be confirmed by blast alignment.\n
SPU_028593	SPU_028593	none	Alignment with best blast sequence suggests that the N-terminal sequence in the model is not conserved, altough the remainder of the protein is 90% identical.  Cannot confirm that the N-terminal sequence is correct.\n
SPU_018934	SPU_018934	none	Alignment with best blast sequence suggests assembly problems with this model.  N-terminal half sequences in the model match C-terminal half sequences in the best blast hit and C-terminal half in the model match N-terminal sequences in the best blast hit.  The N-terminal sequence in the model is not conserved and cannot be verified.\n
SPU_019415	SPU_019415	none	Alignment with best blast sequence suggests that an exon is missing between the following predicted exons: \n>SPU_019415|Scaffold469|23524|23696| DNA_SRC: Scaffold469 START: 23524 STOP: 23696 STRAND: +  \nGCTAAGATGGATGAGCTTCAGCTCTTCCGTGGAGACACAGTCATGCTCAAAGGCAAGAAAAGGCGAGACA \nCCGTCTGCATTGTACTCTCAGATGACACCGTAACAGATGACAAGATTCGTGTCAACCGAGTTGTCAGGAG \nTAATCTTCGCGTTCGTCTAGGAGACATTGTCAG \n>SPU_019415|Scaffold469|25849|26041| DNA_SRC: Scaffold469 START: 25849 STOP: 26041 STRAND: +  \nAAACCTCTTTGATGTATACCTGAGGCCGTACTTCCAGGAGGCGTACCGCCCCGTCAGGAAAGGTGACATC \nTTTCAAATCCGTGGAGGCATGAGGGCGGTAGAATTCAAAGTGGTGGAAACAGACCCCGGACCATACTGCA \nTCGTTTCACCTGATACAGTCATACACTTTGAGGGAGATGCAATCAAGCGAGAG \n
SPU_022919	SPU_022919	none	Alignment with best blast sequence suggests that the only a small segment (~10%) of the model is conserved with the best blast hit. \n \nAlignment of the Genscan model shows a much longer alignment suggesting that SPU_022918, 22919, 22920 and 22921 should be combined in one model.  However two exons in the Genscan model are not conserved and cannot be confirmed.  They are: \n>Supertig39397_1|Scaffold39397|21179|21262| DNA_SRC: Scaffold39397 START: 21179 STOP: 21262 STRAND: +  \nGAGTGCATCCAACAGCTGACGTCAGATGATGCGTGGTATCCGTGCGGAGAAGGGCGCGAAAATTCAACAG \nATTATTGGCTGAAG \n>Supertig39397_1|Scaffold39397|26430|26602| DNA_SRC: Scaffold39397 START: 26430 STOP: 26602 STRAND: +  \nGGGCTATCTGCAAGGCTAAATGTCGTTTGGGTCGGTCATACCAGATTGCCCTGGAGGGTGCTACCTACAG \nAGAGCGGGGGCTGCTGGGAAGGCATGAGGTCATCCTGTGCACGGCCGGTGGTGTCTGGTACCCCAACCTC \nGACCAGATAGTATGTCATGAAAAATGCTTGGAG\n
SPU_018513	SPU_018513	none	Alignment with best blast sequence verifies that all but the N-terminal sequence is conserved.   This is another haplotype copy of the combination SPU_022918, 22919, 22920 and 22921 (see Genscan 61252_1 or annotation notes for SPU_022919)\n
SPU_006212	SPU_006212	none	Alignment with best blast sequence shows that model does not include N-terminal sequence.\n
SPU_013070	SPU_013070	none	Alignment with best blast sequence suggest that the N-terminal exon may not be part of the protein; it is not conserved and the remainder of the model matches the entire length of the best blast hit.  The questionable exon is: \n>SPU_013070|Scaffold31758|4248|4352| DNA_SRC: Scaffold31758 START: 4248 STOP: 4352 STRAND: +  \nTCAGGCTGCAGAAACAGCAGGAATAGACTTTGTGATGACAGCTCTTTGAGAAGGGATTCTGTGAGGTAGC \nTCAGGGCAGTTGCAGTACTGTCTAACCTGGATGGC\n
SPU_016511	SPU_016511	none	Alignment with best blast sequence suggests that the model may lack N- and C-terminal sequence; conserved sequences are located in the central region of the best blast hits.\n
SPU_010215	SPU_010215	none	Alignment with best blast sequence suggests that the model lacks conserved N-terminal sequence.\n
SPU_016327	SPU_016327	none	Alignment with best blast sequence suggests that the model is nearly complete, but lacks an internal exon between the following predicted exons: \n>SPU_016327|Scaffold102457|14501|14625| DNA_SRC: Scaffold102457 START: 14501 STOP: 14625 STRAND: +  \nTTGAGAGCCTCTCTTGAGAGAGACAGGTAGATTCCTACACACTCTGCTCTACATTCTTCATAAGGTGAGG \nCAATCACAGGAAACTTGGAATCCCAAACCTCTCCTGGCATGTACCATGATGAGAT \n>SPU_016327|Scaffold102457|18405|18511| DNA_SRC: Scaffold102457 START: 18405 STOP: 18511 STRAND: +  \nCTTGTCTGTATCTGTCAAGAAGGTGATCGTCTGATCTTTACCGTAAGCAGACAGCACATTGCCTAGAGAT \nACATTCTTAAATCCTTCATCCTGACGAATGTCATCAT\n
SPU_028721	SPU_028721	none	TWO COMMENTS: \n1) partial CDS; lacking C-terminal half and possibly short N-terminal region. \n2) There is a match to a short segment of this model on Scaffold40239_1, 687-798, that is not on the glean3 list.\n
SPU_004886	SPU_004886	none	partial CDS; lacking both N- and C-terminal sequences, probably because this model is on a short scaffold.\n
SPU_026005	SPU_026005	none	Alignment with best blast sequence suggest that the model is good. \n  \n
SPU_015187	SPU_015187	none	TWO COMMENTS: \n \n1) Alignment of best blast sequence suggests that the model is good. \n2) There is an excellent match on Scaffold131610, 127-321 that is not on the glean 3 list.\n
SPU_000976	SPU_000976	none	partial CDS, lacking C-terminal half\n
SPU_023965	SPU_023965	none	TWO COMMENTS: \n \n1) Alignment with top blast sequences suggusts the model is good. \n2) There is an excellent match to a portion of this model on Scaffold17468_1, 217-384.\n
SPU_023686	SPU_023686	none	Alignment with best blast data and blasts with individual exons strongly suggest that the only exons in the model that should be included are: \n>SPU_023686|Scaffold1004|11066|11134| DNA_SRC: Scaffold1004 START: 11066 STOP: 11134 STRAND: +  \nATGGCGGATGAGCGCTACGTCATTGACGTTCTGGTTTGTTGTTGTCAAGAGATAAGGGTAGGGTTGCAT \n>SPU_023686|Scaffold1004|143554|143633| DNA_SRC: Scaffold1004 START: 143554 STOP: 143633 STRAND: +  \nGTCGCAGTGGACATGGAGTTTGCCAAGAACATGTTTGAGTTACATAAAAAGGTGAACTCATGGGAGAACA \nTTGTAGGATG \n>SPU_023686|Scaffold1004|143857|143994| DNA_SRC: Scaffold1004 START: 143857 STOP: 143994 STRAND: +  \nGTATGCAACAGGACGTGACATCACAGGTCATTCAGTGCTGATACACGACTACTACTCAAGAGAGTGTCAA \nAACCCGATTCACGTCACGGTCGATACAACGATGGTAGACCTCAATATGTCAGTCAAGACATGGGTTAG \n>SPU_023686|Scaffold1004|144623|144714| DNA_SRC: Scaffold1004 START: 144623 STOP: 144714 STRAND: +  \nGCAAAATATGGGCGTACCAGACAAGTCACAAGGCACCGTGTTCATTCCAATTCCCATGAAAATCTCCTTC \nCATCAACCAGAGAAAGTAGCAA \n>SPU_023686|Scaffold1004|145417|145553| DNA_SRC: Scaffold1004 START: 145417 STOP: 145553 STRAND: +  \nTGGATGCGCTTATAAGGGAGACAGAACCAAACAGAAAAACCATTGAGTTGACGACCGATCTTCAGTACGT \nGTCTAAAGCCTCTGGTAAACTTCAAGAGATGTTGACAAGAGTGCTCCAGTATGTTGATGATATCCTG \n>SPU_023686|Scaffold1004|146767|146880| DNA_SRC: Scaffold1004 START: 146767 STOP: 146880 STRAND: +  \nAGTGGAAAGATTCAAGCCGATAACCAGATTGGCCGGTTTCTGATGAATCTAGTTTCCAATGTTCCTAAGT \nTGCAGCCTGATGAGTTTGATGAGATGCTCAATAACAGTATGAAG \n>SPU_023686|Scaffold1004|147401|147496| DNA_SRC: Scaffold1004 START: 147401 STOP: 147496 STRAND: +  \nGATCTTCTGATGGTAGTCTACCTGTCCGGTCTGATTGAGACCCAGCTCACTCTCAACGAAAAGCTGACGT \nTATCCAAAGCAGCTAATGCAGTTGCA \nOthers blast to either different proteins or nothing.\n
SPU_002210	SPU_002210	none	"Additional evidences of the existence of the gene" have been obtained in Sphaerechinus granularis\n
SPU_012936	SPU_012936	none	SubgroupB thrombospondin. Large gap in the middle of the gene.\n
Sp-ACE I-like	SPU_030002	none	Genes or parts of genes predicted by Genscan that are not in the Glean3 list.\n
Sp-TAF1-like	SPU_030003	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list. \n \n
Sp-leishmanolysin-like	SPU_030004	none	#\nGenes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-TAF2-like	SPU_030005	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list. \n \n
SPU_013106	SPU_013106	none	This may be 2 concatenated genes because of the (TSP3)n-TSP_C-LamGL domain architecture. It's also likely to be a partial gene model missing the N-terminus due to end of contig.\n
Sp-MT4-mmp-like	SPU_030006	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-mmp2-like	SPU_030007	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-mmp27-like	SPU_030008	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-matrix metallopeptidase2-like	SPU_030009	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-matrix metallopeptidase3-like	SPU_030010	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-matrix metalloproteinase2-like	SPU_030011	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-matrix metalloproteinase3/14-like	SPU_030012	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
SPU_017370	SPU_017370	none	This gene encodes a protein with an unusual domain architecture. The N-terminus contains: \nPfam:F5_F8_type_C-Pfam:GCC2_GCC3 - which occurs in metazoa \n \nthe C-terminus looks like a subgroupB thrombospondin. Therefore, it might be a concatenation of 2 genes or a novel urchin architecture.  \nThere is no est evidence or anything else to justify spliting it into 2. The NCBI gene model may be more accurate than the GLEAN3 model.\n
Sp-meprinA-like	SPU_030013	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-TLL1-like	SPU_030014	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
SPU_022667	SPU_022667	none	N-terminus may be truncated due to end of contig\n
Sp-tolloid2-like	SPU_030015	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-tolloid1-like	SPU_030016	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-meprinA, beta-like	SPU_030017	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-ADAMTS6-like	SPU_030018	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-ADAMTS6 metalloprotease-like	SPU_030019	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list. \n \n
Sp-metagidin-like	SPU_030020	none	#\nGenes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-ADAM10-like	SPU_030021	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-ADAMTS1-like	SPU_030022	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-ADAM10 metallopeptidase-like	SPU_030023	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-ADAMTS1 metallopeptidase-like	SPU_030024	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list. \n \n
Sp-ADAMTS1 metalloprotease-like	SPU_030026	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-endothelin converting enzyme1-like	SPU_030027	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-carboxypeptiaseD-like	SPU_030030	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-aminopeptidase1-like	SPU_030031	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-dipeptidease-like	SPU_030032	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-ACY1L2-like	SPU_030033	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-aminocyclase-like	SPU_030034	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
SPU_000959	SPU_000959	none	SPU_00059 corresponds to N-ter domain of Sp-EF1B delta. \nSPU_00059 contains three exons :  \nexon1 : scaffold1943|194310|194423|strand(-);   \nexon2 : scaffold1943|191630|191707|strand(-);  \nexon3 : scaffold1943|189643|189735|strand(-); \n \nTwo isoforms are expressed in sea urchin (Y14235 and AJ973181). Y14235 does not contain exon2. \n \nThe C-ter domain of Sp-EF1B delta is encoded by SPU_000960 \nThe entire sequence is given in annotation for SPU_000960 \n \n
SPU_012281	SPU_012281	none	#\nSame sequence as SPU_011711 except the most N-terminal part. \nDuplication most likely due to assembly process.\n
Sp-aminocyclase1-like	SPU_030035	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
SPU_011711	SPU_011711	none	Same sequence as SPU_012281 except the most N-terminal part. \nDuplication most likely due to assembly process.\n
Sp-O-sialoglycoprotein endopeptidase-like	SPU_030036	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
SPU_009593	SPU_009593	none	Binds the stem-loop structure of replication-dependent histone pre-mRNAs and contributes to efficient 3' end processing by stabilizing the complex between histone pre-mRNA and U7 small nuclear ribonucleoprotein (snRNP). Could play an important role in targeting mature histone mRNA from the nucleus to the cytoplasm and to the translation machinery. Stabilizes mature histone mRNA and could be involved in cell-cycle regulation of histone gene expression (By similarity). \n
SPU_024780	SPU_024780	none	see anotation to SPU_009593\n
Sp-O-peptidaseD-like	SPU_030037	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-O-NAALAD2-like	SPU_030038	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list. \n \n
Sp-N-acetylated alpha-linked acidic dipeptidase 2-like	SPU_030039	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
SPU_020741	SPU_020741	none	This gene model may represent a pseudogene or contain a sequence error. 3'UTR sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified model contains some frame shifts, but reflects best gene structure. \nThis is a member of sea urchin specific-Tlr Group ID.\n
Sp-N-acetylated alpha-linked acidic dipeptidase 1-like	SPU_030040	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-NAALAD2-like	SPU_030042	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-NAALAD2-like protease	SPU_030043	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
SPU_021075	SPU_021075	none	This gene model may represent a pseudogene or contain a sequence error.5'UTR sequence matches coding sequence of other Sp-Tlr genes and contains stop codons (frame shift).\n
Sp-aspartate transcarbamylase-like	SPU_030044	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
SPU_024092	SPU_024092	none	Same sequence as SPU_000814 except the most N-terminal part. \nDuplication most likely due to assembly process.\n
Sp-carbamoylphosphate synthetase-like	SPU_030045	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
SPU_000814	SPU_000814	none	Same sequence as SPU_024092 except the most N-terminal part. \nDuplication most likely due to assembly process.\n
Sp-carbamoylphosphate	SPU_030046	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
SPU_021283	SPU_021283	none	The nucleotide sequence of this gene model has 100% identity to SPU_002442. It may be caused by an assembly error. \n
SPU_020322	SPU_020322	none	#\nmany similarities to plant HSP-90 also; see also (SPU_020322) looks to be the same as (SPU_001586) and above Glean\n
SPU_021362	SPU_021362	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold.\n
Sp-allantoinase-like	SPU_030047	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
SPU_021502	SPU_021502	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 94% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \nThis is a member of sea urchin-specific Tlr Group IA.\n
Sp-YME1-like	SPU_030048	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-paraplegin-like	SPU_030049	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
SPU_023179	SPU_023179	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 94% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor.\n
SPU_023491	SPU_023491	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 99% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \nThis is a member of sea urchin-specific Tlr Group ID.\n
SPU_024501	SPU_024501	none	Partial Toll-like receptor. The nucleotide sequence has 95% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \nThis is a member of sea urchin-specific Tlr Group ID.\n
SPU_028424	SPU_028424	none	this glean prediction contains only the c-term (tyrosine kinase domain) of the protein. it seems that SPU_007624 contains the N-term (extracellular domain).\n
SPU_007624	SPU_007624	none	this glean prediction contains only the N-term (extracellular domain) of the protein. it seems that SPU_028424 contains the C-term (tyrosine kinase domain).\n
SPU_024590	SPU_024590	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 97% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \nThis is a member of sea urchin-specific Tlr Group IC.\n
SPU_024847	SPU_024847	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 96% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor.\n
SPU_027735	SPU_027735	none	792bp nucleotides of 5'UTR that was highly similar to other Sp-Tlr genes was accepted to a coding region by BLASTN search. Modified gene model is located at the end of a contig, which could make the model still incomplete. \nThis is a member of sea urchin specific Tlr Group IA.\n
SPU_020718	SPU_020718	none	Exons 1-7 are present on this scaffold772 and SPU_020718 prediction. Exons 8-28 are present on scaffold772 and SPU_009002 prediction. Refer to SPU_009002 for the complete gene features of REJ3.\n
Sp-COP9signalosome/s6-like	SPU_030051	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list. \n \n
Sp-EIF3S3-like	SPU_030052	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-angiotensin I converting enzyme-like	SPU_030053	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
SPU_010180	SPU_010180	none	A part of the 3rd intron was accepted as coding region by comparison to the corresponding FgenesAB and ++ prediction. Unknown sequence (NNN...) in the 1st intron of the current model could make this gene model incomplete.\n
Sp-mmp20-like	SPU_030054	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list. \n \n
SPU_012211	SPU_012211	none	#\nThis gene model is located at the end of a short scaffold. Three exons were added at the 3'end of the GLEAN3 model by comparison to the corresponding Genscan prediction.\n
Sp-NAALAD2 protease-like	SPU_030055	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
SPU_018915	SPU_018915	none	Partial TNF receptor. This gene model is located at the end of a short scaffold\n
Sp-leishmanolysin-like metalloprotease	SPU_030056	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-ACE-T-like	SPU_030057	none	Genes or parts of genes predicted by one or more of the prediction programs (Genscan, Gnomen, fgenesh) that are not in the Glean3 list.\n
Sp-angiotensin I converting enzyme isoform 1 precursor-like	SPU_030058	none	Genes or parts of genes predicted by Genscan that are not in the Glean3 list.\n
SPU_020740	SPU_020740	none	Unknown sequence (NNN...) in the intron of the current model could make this gene model incomplete.\n
SPU_024584	SPU_024584	none	Partial TNF receptor. This gene model is located at the end of a short scaffold. The first exon was changed by comparison to the corresponding FgenesAB and ++ prediction. \n
SPU_010230	SPU_010230	none	This gene model was modified based on FgeneshAB ++ prediction. The model is located at the end of a scaffold, which could make it still incomplete.\n
SPU_020955	SPU_020955	none	Partial TNF receptor. This gene model is located at the end of a short scaffold. \n
SPU_006859	SPU_006859	none	Exons 1-4 are on this scaffold59004 and SPU_0036859. Exons 5-7 are on scaffold83735 and SPU_021516.\n
Sp-Twist	SPU_030059	none	Determined by the sequence info from Lv-Twist\n
SPU_012238	SPU_012238	none	SPU_002950 is a partial sequence identical to this one, but on a different scaffold\n
SPU_019440	SPU_019440	none	Nectin sequence from L.variegatus hits 5 GLEAN predictions with a score of 0.0.  This prediction has 5 exons that are not in cDNA sequence, embryonic expression data agrees with cDNA and predicted protein mass is larger than that found in eggs.\n
SPU_011009	SPU_011009	none	Highest homology to CYP3 family genes. Phylogenetically falls at base of CYP3 family, within CYP clan 3 (MrBayes v3.0.1 WAG+gamma+I model). Percent ID (on masked basis) to CYP3A4 (human) of 42.3% (aa level). Gene model modified to include 2 more exons as predicted by FGENESH+ homology to SPU_014573\n
SPU_013642	SPU_013642	none	Exons 5-12 are present on this scaffold85921 with SPU_013642 and FgeneshAB predictions. Exons 1-4 are on scaffold6090 with SPU_020643 and FgeneshAB predictions.\n
SPU_020643	SPU_020643	none	Exons 1-4 are on this scaffold6090 with SPU_020643 and FgeneshAB predictions. Exons 5-12 are present on scaffold85921 with SPU_013642 and FgeneshAB predictions. Refer to SPU_013642 for the complete Sp-HCN2 gene features.\n
SPU_006462	SPU_006462	none	Scaffold1146 contains stretches of Ns in exon 1 of SPU_006462 (GCM), and the length of this region is shorter than that of the corresponding region in experimentally verified mRNA sequence. Scaffold sequences and coordinate of exon 1 should be revised.  \n
SPU_017511	SPU_017511	none	Although Type2 TGFbeta receptors belongs to the "Ser/Thr Protein Kinase", and that this sequence gives TGFbetaR2 as best genebank hits, NCBI Blast predict a "Tyr Kinase" domain for this gene.\n
SPU_027254	SPU_027254	none	see SPU_008876\n
SPU_008197	SPU_008197	none	Highly similar to SPU_003264.\n
SPU_000229	SPU_000229	none	Closely related to SPU_009517 \nProbably missing the N-terminus due to end of contig\n
SPU_004340	SPU_004340	none	This sequence was modified by adding the 5' sequence from SPU_006789\n
SPU_026288	SPU_026288	none	Exons 2-9 are on this scaffold97931 and SPU_026288 prediction. Exon 1 is on scaffold96 with SPU_002214 prediction.\n
SPU_006789	SPU_006789	none	This sequence has been added to the 5' end of SPU_004340\n
SPU_013709	SPU_013709	none	There was a long unknown sequence (NNN) between the 7th and 8th exons. The 1st to 7th exons were eliminated.\n
SPU_023670	SPU_023670	none	also SPU_006070 \n
SPU_006070	SPU_006070	none	also SPU_023670\n
SPU_003292	SPU_003292	none	also SPU_016634\n
SPU_016634	SPU_016634	none	also SPU_003292\n
SPU_017117	SPU_017117	none	probable atp-dependent helicase ddx41 (dead-box protein 41)\n
SPU_000526	SPU_000526	none	Exons 1-28 are on this scaffold80510 and SPU_000526. Exons 29-58 are from scaffold1165 and SPU_018112. Exons 49-58 are repeats and may be alternatively spliced. Exons 59-60 are from scaffold100796 and SPU_008369.\n
SPU_005854	SPU_005854	none	The 3' end SPU_005854 is located near the edge of the scaffold. By comparison to the published cDNA (AAB67801), the model is missing 3' exons coding for 106 C-terminal aa. Exon 6 was modified to agree with known cDNAs. \n
SPU_027915	SPU_027915	none	GLEAN prediction originally for 386 to 806 (end).  The front end 1-386 is on SPU_027915  Use Annotations on GLEAN 20342\n
SPU_020342	SPU_020342	none	GLEAN prediction originally for 386 to 806 (end).  The front end 1-386 is on SPU_027915.  Exon 1 does not agree with cDNA sequence, two small intron exon boundary problems when aligned with BetaC cDNA\n
SPU_014418	SPU_014418	none	SPU_014418, SPU_027648 are the same gene, two different alleles. \n
SPU_027648	SPU_027648	none	SPU_014418, SPU_027648 are the same gene, two different alleles. \n
SPU_002292	SPU_002292	none	SPU_002292 has a tandem duplication of the N-terminal half of the C2A domain.\n
SPU_015621	SPU_015621	none	partial CDS of N-terminal region of SPU_024838\n
SPU_011065	SPU_011065	none	CDS contains 26-631 or betaG cDNA sequence. First and last exons are missing from the scaffold\n
SPU_015213	SPU_015213	none	dna mismatch repair protein mlh1 (mutl protein homolog 1)\n
SPU_016009	SPU_016009	none	Prediction has small errors at exon boundaries by comparison with cDNA.  The last exon is incorrect.\n
SPU_028261	SPU_028261	none	SPU_028261 is the C-terminal match to MSH6 mouse. \nSPU_012960 is the N-terminal match to MSH6 mouse.\n
SPU_026225	SPU_026225	none	Highest homology to CYP3 family genes. Phylogenetically falls at base of CYP3 family, within CYP clan 3 (MrBayes v3.0.1 WAG+gamma+I model). Percent ID (on masked basis) to CYP3A4 (human) of 45.0% (aa level).\n
SPU_021255	SPU_021255	none	Highest homology to CYP3 family genes. Phylogenetically falls at base of CYP3 family, within CYP clan 3 (MrBayes v3.0.1 WAG+gamma+I model). Percent ID (on masked basis) to CYP3A4 (human) of 37.9% (aa level). GLEAN3 gene model altered to follow FGENESH+ predictions and cDNA sequences (added exon 3 and 4).\n
SPU_021588	SPU_021588	none	Predicted C-terminus sequence is longer than those of the other organism. \n
SPU_012985	SPU_012985	none	Part of a previously unknown beta subunit.  The 5' end is missing because scaffold is incomplete.  Good evidence for embryonic expression.\n
SPU_013102	SPU_013102	none	Highest homology to CYP3 family genes. Phylogenetically falls at base of CYP3 family, within CYP clan 3 (MrBayes v3.0.1 WAG+gamma+I model). Percent ID (on masked basis) to CYP3A4 (human) of 35.8% (aa level). By homology to other CYP3-like genes appears to be missing exons 1-4\n
SPU_013107	SPU_013107	none	See SPU_009766 (scaffold 98369). \n
SPU_007644	SPU_007644	none	Different parts of the cDNA are found in different strands in a non-linear organization. \n
SPU_011635	SPU_011635	none	Different parts of the cDNA are found in different strands in a non-linear organization. \n
SPU_017379	SPU_017379	none	SPU_017379 matches SIX1_mouse and SIX2_mouse in Homobox with high identity. \n
SPU_017380	SPU_017380	none	Six5 also hits the same Glean3 gene.\n
SPU_012076	SPU_012076	none	This Glean model is a concatenation of 2 adjacent genes \nThe Cterminus is a Calcium channel alpha2 delta subunit. \nthe Nterminus is a homolog of human Q69YN2\n
SPU_014621	SPU_014621	none	SPU_004346 could be an alternate model of this gene.\n
SPU_004346	SPU_004346	none	see also SPU_014621\n
SPU_020140	SPU_020140	none	SPU_022433 has high sequence homolgy and could be a variant.\n
SPU_018112	SPU_018112	none	Exons 29-58 are from this scaffold1165 and SPU_018112. Exons 49-58 are repeats and may be alternatively spliced. Exons 1-28 are on scaffold80510 and SPU_000526. Exons 59-60 are from scaffold100796 and SPU_008369. Refer to SPU_000526 for the complete Sp-EBR1 gene features.\n
SPU_022433	SPU_022433	none	see also SPU_020140\n
SPU_027905	SPU_027905	none	Likely haplotype of SPU_028897, but 27905 gene model incomplete. See SPU_028897 (Sp-apn6) for annotation.\n
SPU_008369	SPU_008369	none	Exons 59-60 are from this scaffold100796 and SPU_008369. Exons 1-28 are on scaffold80510 and SPU_000526. Exons 29-58 are from scaffold1165 and SPU_018112. Exons 49-58 are repeats and may be alternatively spliced. Refer to SPU_000526 for the complete Sp-EBR1 gene features.\n
SPU_023808	SPU_023808	none	Highest homology to CYP3 family genes. Phylogenetically falls at base of CYP3 family, within CYP clan 3 (MrBayes v3.0.1 WAG+gamma+I model). Percent ID (on masked basis) to CYP3A4 (human) of 45.0% (aa level). Edited to add exons 2 and 3 based on FGENESH+ prediction and cDNA evidence, and exon 5 was shortened. Possible missing or mispredicted exon 1. Removed from gene model.\n
SPU_023898	SPU_023898	none	Scaffold75943 covers exons 3-25 by merging SPU_023898 and SPU_023899 modified predictions. Exons 3, 6,13, and 19 have sequence gaps. Scaffold6755 covers exons 1-2 which have no GLEAN3 predictions. \n
SPU_016056	SPU_016056	none	Highest homology to CYP3 family genes. Phylogenetically falls at base of CYP3 family, within CYP clan 3 (MrBayes v3.0.1 WAG+gamma+I model). Percent ID (on masked basis) to CYP3A4 (human) of 42.3% (aa level). Missing exons 1 and 2 due to incomplete assembly.\n
SPU_007406	SPU_007406	none	Member of CYP1 family. Tentatively designated CYP1F1 pending CYP nomenclature committee approval. Phylogenetically falls at base of CYP1 family, within CYP clan 2 (MrBayes v3.0.1 WAG+gamma+I model). Percent ID (on masked basis) to CYP1C1  (fugu) of 37.4% (aa level).\n
SPU_006989	SPU_006989	none	Member of CYP1 family. Tentatively designated CYP1F2 pending CYP nomenclature committee approval. Phylogenetically falls at base of CYP1 family, within CYP clan 2 (MrBayes v3.0.1 WAG+gamma+I model). Percent ID (on masked basis) to CYP1A (Danio rerio) of 41.7% (aa level).\n
SPU_001262	SPU_001262	none	also SPU_005908 is the cterminal portion only \n \nSPU_001262 annotated as Sp-Birc6 but aa5240-6588 similar to hypoxia-inducible factor 1 alpha (HIF-1a).  The assembly of nt sequence seems consistent with a modeled intronic region being, rather an intergenic region.  The HIF-1a was added annotation as such by M. Hahn. \n
SPU_005908	SPU_005908	none	this glean model is a subset of SPU_001262\n
SPU_020646	SPU_020646	none	aa 1-215 do not align to closest homologs\n
SPU_023899	SPU_023899	none	Scaffold75943 covers exons 3-25 by merging SPU_023898 and SPU_023899 modified predictions. Exons 3, 6,13, and 19 have sequence gaps. Scaffold6755 covers exons 1-2 which have no GLEAN3 predictions. Refer to SPU_023898 for the complete gene features.\n
SPU_024610	SPU_024610	none	Ig2-Fz-TM-kinase \nBest Blast hit is MUSK, which has one more Ig domain - gene may be missing N-terminal piece.\n
SPU_000310	SPU_000310	none	10 Ig-TM-kinase. Closest human protein is VEGFR-1, but only distantly related\n
SPU_021021	SPU_021021	none	7 Ig - TM - kinase \nBest Blast hit is VEGFR1\n
SPU_001905	SPU_001905	none	BLASTP search shows this gene model has very low similarity to MyD88 in other animals, but domain structure shows it could be a member of MyD88 gene family. \n
SPU_022684	SPU_022684	none	See SPU_011320, _22683, _28351. \n
SPU_016914	SPU_016914	none	Missing first 756bp of coding gene sequence.  Scaffold 42354,65902, and 40781 contain the first 756bp and scaffold 23542 contains 1420 to 2189bp of the coding region.\n
SPU_022683	SPU_022683	none	See SPU_011320, _22684, _28351. \n
SPU_028351	SPU_028351	none	See SPU_011320, _22683, _22684. \n
SPU_019779	SPU_019779	none	Overlaps with SPU_023615\n
SPU_023615	SPU_023615	none	Overlaps with SPU_019779\n
SPU_002050	SPU_002050	none	SPU_002050|Scaffold111471|764|1126|corresponds to the last exon for encoding Sp-EF1A,  \nother exons are contained in Glean3-00595 (see for complete gene)\n
SPU_000595	SPU_000595	none	EF1A is encoded by 5 exons in SPU_000595(scaffold1277) plus one exon in SPU_002050(scaffold111471) \nthe sequence was constructed on the basis of a fusion between the two scaffolds\n
Arnone1	SPU_030060	none	exon 1-7 are on Scaffold22273; exon 2-8 are on Scaffold59839\n
SPU_012295	SPU_012295	none	See SPU_015341 for annotation.\n
SPU_003996	SPU_003996	none	See SPU_015341 for annotation\n
SPU_008008	SPU_008008	none	The protein aligns with dentin and other proteins with long stretches of serine/Asp/Gln. This is not real.\n
SPU_022707	SPU_022707	none	This gene model has a Death domain and two TIR domains, which indicates a member of MyD88 family.\n
SPU_004955	SPU_004955	none	 partial\n
SPU_011843	SPU_011843	none	Aligns to vertebrate dentin due to serine repeat-not true homology.\n
SPU_000567	SPU_000567	none	Partial CDS.  This model contains the N-terminal 2/3 of the CDS and all of the predicted exons are correct based on alignment data.  A full length copy of hatching enzyme gene is adjacent on scaffold581, SPU_000566.  Where these sequences overlap, they are >90% identical at the amino acid level.\n
SPU_000343	SPU_000343	none	Allele of SPU_007948\n
Sp-astacin protease	SPU_030061	none	Alignment with best blast sequence, a C. elegans hatching enzyme, suggests that the model may be incomplete at both N- and C- termini.\n
SPU_028132	SPU_028132	none	SPU_028132 is located near the edge of the contig; it appears to be missing 5' exons. SPU_016698 appears to be an allele representing the 5' end of this gene, the overlapping region is nearly identical sequence. Sp-Syt15-1 has three (predicted) C2 domains.\n
SPU_016698	SPU_016698	none	SPU_016698 is located near the edge of the contig; it appears to be missing 3' exons. SPU_028132 appears to be an allele representing the 3' end of this gene, the overlapping region is nearly identical sequence. Sp-Syt15-1 has three (predicted) C2 domains.\n
SPU_022210	SPU_022210	none	SPU_022210 appears to be a tandem duplication of the 3' end of SPU_022209, located just 5' to SPU_022210. The 3' exons of these two gene models are identical sequence. \nSp-Syt15b only has one C2 domain.\n
SPU_021274	SPU_021274	none	Merge with SPU_021274 and SPU_021273 predictions.\n
SPU_019127	SPU_019127	none	Also related to electric ray synaptotagmin C, P24507.\n
SPU_004113	SPU_004113	none	This gene is in a cluster with 4 other genes closely related to SpAN which are called SpAN-like.  All of these genes encode proteins closely related to tolloid.\n
SPU_018035	SPU_018035	none	Contains a 60 kb intron. Appears to have an allele, SPU_020349, without this long intron.\n
SPU_004114	SPU_004114	none	This gene is one of 4 clustered SpAN-like genes; this cluster also contains SpAN.  All of these genes encode proteins closely related to tolloid.\n
SPU_020349	SPU_020349	none	SPU_020349 appears to have an allele, SPU_018035, containing a 60 kb intron.\n
SPU_005832	SPU_005832	none	This gene model could not be a typical Toll-like receptor. There is only four LRR in the coding region and the nuleotides of the intron have no similarities. \n
SPU_005850	SPU_005850	none	#\nUnknown sequence (NNN) in the first intron of the current model makes this gene model incomplete. The first exon only shows typical Toll-like receptor structures (signal peptide, LRRNT, LRR(22), LRRCT, TIR(partial)). \nThis is a member of sea urchin-specific Tlr Group IC.\n
SPU_009970	SPU_009970	none	This gene model was fused to an adjacent glean model (SPU_009969) to obtain a full sequence. This fused gene model may represent a pseudogene or contain a sequence error. The sequence between these two models matches coding sequence of other Sp-Tlr genes and contains stop codons (frame shift).\n
SPU_009969	SPU_009969	none	This gene model was fused to an adjacent glean model (SPU_009970) to obtain a full sequence. This fused gene model may represent a pseudogene or contain a sequence error. The sequence between these two models matches coding sequence of other Sp-Tlr genes and contains stop codons (frame shift).\n
SPU_024207	SPU_024207	none	This gene model was fused to an adjacent glean model (SPU_0024206) to obtain a full sequence. The nucleotide sequence between 024206 and 024207 has 94% identity to another Sp-Tlr gene. This fused gene model may represent a pseudogene or contain a sequence error, but reflect best gene structure. \nThis is a member of sea urchin-specific Tlr Group IC.\n
SPU_024206	SPU_024206	none	This gene model was fused to an adjacent glean model (SPU_0024207) to obtain a full sequence. The nucleotide sequence between 024206 and 024207 has 94% identity to another Sp-Tlr gene. This fused gene model may represent a pseudogene or contain a sequence error, but reflect best gene structure.\n
SPU_007599	SPU_007599	none	There is another predicted gene SPU_015362  has some domain as this one, maybe there are different isoforms of Glass protein.\n
SPU_019810	SPU_019810	none	There are many homologs of this protein.\n
SPU_004115	SPU_004115	none	This is one of 4 SpAN-like genes in a cluster that also contains SpAN.  All of these genes encode proteins closely related to tolloid.\n
SPU_004116	SPU_004116	none	This is one of four clustered SpAN-like genes; the cluster also contains SpAN.  All of these genes encode proteins closely related to tolloid.\n
SPU_004117	SPU_004117	none	This is one of four SpAN-like genes in a cluster; the cluster also contains SpAN.  All five genes encode proteins that are closely related to tolloid. \n \nBased on alignment with best blast sequence, it is likely that the model lacks the N-terminal exon.  Transcriptome expression data suggests that the missing exon lies between coordinates 13000 and 13500 on scaffold 61174.\n
SPU_026629	SPU_026629	none	SPU_026630 immediately downstream also has homology to fibulin, but differs from 26629.\n
SPU_023280	SPU_023280	none	Partial cds based on alignment with best blast sequence suggests that this model contains only a portion of the gene encoding the N-terminal half of the protein.  The model is located at one end of a short scaffold (73008) making it likely that the remainder of the gene is on another scaffold.\n
SPU_000881	SPU_000881	none	4 Glean3 models match the mvp sequence \nSPU_007085 \nSPU_000881 \nSPU_018647 \nand SPU_018164 partial\n
SPU_018647	SPU_018647	none	4 Glean3 models match the mvp sequence \nSPU_007085 \nSPU_000881 \nSPU_018647 \nand SPU_018164 partial\n
SPU_018164	SPU_018164	none	4 Glean3 models match the mvp sequence \nSPU_007085 \nSPU_000881 \nSPU_018647 \nand SPU_018164 partial\n
Sp-AN-like7	SPU_030062	none	Partial cds sequence based on alignment with best blast sequence suggests that this model encodes two CUB domains only.  The sequences of these domains are most closely related to those in SpAN, but it is not clear that they are linked to a metalloprotease domain.  Based on the position of this model on Scaffold16164, the remaining part of the gene is on another scaffold.\n
Sp-AN-like8	SPU_030063	none	Partial cds sequence based on alignment with best blast sequence suggests that this model encodes two CUB domains only.  The sequences of these domains are most closely related to those in SpAN, but it is not clear that they are linked to a metalloprotease domain.  Based on the position of this model on Scaffold37352, the remaining part of the gene is on another scaffold.\n
SPU_015333	SPU_015333	none	Partial Toll-like receptor. This gene is located at the end of a short scafford.  Nucleotide seq has 94% similarity to another Sp-Tlr gene.\n
SPU_021495	SPU_021495	none	One of 4 genes containing multiple EGF and TB domains.  Appears that gene is truncated at 5 end due to scaffold being incomplete\n
SPU_001532	SPU_001532	none	One of two tandem genes containing EGF and TB repeats.\n
SPU_024479	SPU_024479	none	Partial Toll-like receptor. This gene is located at the end of a short scafford.  Nucleotide seq has 95% similarity to another Tlr gene.\n
SPU_012550	SPU_012550	none	One of 4 genes containing multiple EGF and TB domains.\n
SPU_017530	SPU_017530	none	The first exon of this gene model was eliminated and a part of the intron was accepted as coding region by comparison to the corresponding FgenesAB and ++ prediction. \nThis is a member of sea urchin-specific Tlr Group I(orphan). \n
SPU_019661	SPU_019661	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 94% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor. \nThis is a member of sea urchin-specific Tlr Group IB. \n
SPU_026630	SPU_026630	none	This may be a splice variant of SPU_026629.\n
SPU_019882	SPU_019882	none	This gene model was fused to an adjacent glean model (SPU_019881) to obtain a full sequence. This fused gene model may represent a pseudogene or contain a sequence error. The nucleotide sequence between 19881 and 82 matches coding sequence of other Sp-Tlr genes and contains stop codons (frame shift).\n
SPU_019881	SPU_019881	none	This gene model was fused to an adjacent glean model (SPU_019882) to obtain a full sequence. This fused gene model may represent a pseudogene or contain a sequence error. The nucleotide sequence between 19881 and 82 matches coding sequence of other Sp-Tlr genes and contains stop codons (frame shift).\n
SPU_020258	SPU_020258	none	This gene model was fused to an adjacent glean model (SPU_020257) to obtain a full sequence. This fused gene model may represent a pseudogene or contain a sequence error. The nucleotide sequence of these two models matches coding sequence of other Sp-Tlr genes and contains stop codons (frame shift). \nThis is a member of sea urchin-specific Tlr Group IB. \n
SPU_020257	SPU_020257	none	This gene model was fused to an adjacent glean model (SPU_020258) to obtain a full sequence. This fused gene model may represent a pseudogene or contain a sequence error. The nucleotide sequence of these two models matches coding sequence of other Sp-Tlr genes and contains stop codons (frame shift).\n
SPU_020654	SPU_020654	none	This gene model was fused to an adjacent glean model (SPU_020653) to obtain a full sequence. This fused gene model may represent a pseudogene or contain a sequence error. The nucleotide sequence of these two models matches coding sequence of other Sp-Tlr genes and contains stop codons (frame shift). \nThis is a member of sea urchin specific Tlr Group IA.\n
SPU_020653	SPU_020653	none	This gene model was fused to an adjacent glean model (SPU_020654) to obtain a full sequence. This fused gene model may represent a pseudogene or contain a sequence error. The nucleotide sequence of these two models matches coding sequence of other Sp-Tlr genes and contains stop codons (frame shift). \nThis is a member of sea urchin specific Tlr Group IA.\n
SPU_026338	SPU_026338	none	missing 5' of CDS \n5' of CDS probably encoded by SPU_000937 + SPU_028620 or SPU_012324 \n
SPU_015080	SPU_015080	none	#\nThere are 10 ESTs. \nGene required correction from EST data\n
SPU_020849	SPU_020849	none	1 EST, CD342027 Has 1st exon and 5'UTR only \n \nGene model is missing the C-terminus \n
SPU_026767	SPU_026767	none	No ESTs \nModel has only the nucleotide bing domain, is missing the C-terminal half od the protein.\n
SPU_012874	SPU_012874	none	No ESTs \nMissing N-terminus.\n
SPU_024785	SPU_024785	none	1 EST DN561873, contins central half of the gene. \nDuplicated exon in the middle of the gene removed \n \nMissing N-terminus, has the wrong C-terminus\n
SPU_021184	SPU_021184	none	Amino Acid Sequence corrected. \n2 ESTs. CD311111 overlaps GLEAN_14007 \n
SPU_014013	SPU_014013	none	#\nNo ESTs \nOn the same contig as GLEAN_14007 another ABCG2-like gene. \nMissing the N-terminus and Walker A domain\n
SPU_015930	SPU_015930	none	Merged with SPU_015929 and SPU_015931. Refer to SPU_015929 for the complete gene features of REJ4.\n
SPU_015931	SPU_015931	none	Merged with SPU_015929 and SPU_015930. Refer to SPU_015929 for the complete gene features of REJ4.\n
SPU_028620	SPU_028620	none	missing 5' of CDS (domain I; HS attachment sites) \n5' of CDS probably encoded by SPU_000937 \nmissing 3' of CDS \n3' of CDS probably encoded by SPU_026338 \nhaplotype duplication of SPU_012324 \nSPU_028620 is shorter than SPU_012324 and is missing a 29 residue piece in the middle of the sequence that is present in SPU_012324\n
SPU_009876	SPU_009876	none	Remove exons belonging to another gene.\n
SPU_002758	SPU_002758	none	Similarity to vertebrate dentin due to serine repeats.\n
SPU_026442	SPU_026442	none	Aligns with vertebrate dentin primarily due to serine repeats. Alignment included.\n
SPU_019064	SPU_019064	none	Partial prediction for eIF4G. N-terminus part of the protein is predicted by SPU_024859 (gene model modified)\n
SPU_012174	SPU_012174	none	Aligns with vertebrate dentin due to serine rich repeat region.\n
SPU_003725	SPU_003725	none	Missing exon 4 and 6. \n
SPU_028124	SPU_028124	none	CDSs of this gene align exactly with CDSs of SPU_009718 on scaffold 31633. Sequence between CDS very similar also.\n
SPU_016409	SPU_016409	none	CDS of minus strand of this gene align exactly with CDS of the positive strand of SPU_016406, on same scaffold.  Sequence between CDS as well as the repeat structure in the two regions is very similar as well. This gene could be a haplotype that was put on the wrong scaffold. The contig that it is on is attached to the very end of the scaffold right next to SPU_016406. \n
SPU_018861	SPU_018861	none	The ligand binding domain is in SPU_025239 on scaffold46634.\n
SPU_016657	SPU_016657	none	see also SPU_018404 for very similar glean\n
SPU_016032	SPU_016032	none	Aligns with vertebrate dentin due to serine rich repeat.\n
SPU_002821	SPU_002821	none	Aligns with vertebrate protein phosphatase 4 regulatory subunit and dentin due to repetitive regions.\n
SPU_004861	SPU_004861	none	Alignment to vertebrate dentin due to serine rich repeat.\n
SPU_013358	SPU_013358	none	Alignment with vertebrate dentin is due to serine rich repeat in dentin.\n
SPU_019545	SPU_019545	none	Alignment to vertebrate dentin due to serine repeat in dentin. \nPossible additional exons 5' to Glean3 prediction.\n
SPU_007317	SPU_007317	none	Blast data suggest that this model could be tolloid1 rather than suBMP1, as it is currently named based on cDNA sequence.  However, this conclusion is tentative because the model lacks the last three domains (EGF, CUB, CUB) domains characteristic of tolloid proteins.  This is undoubtedly because the scaffold is too short and these remaining exons are on another scaffold. \n
SPU_026920	SPU_026920	none	Tiling data suggests exons in regions of repeated units.  These areas require further examination to determine if they are truly exons.\n
SPU_019152	SPU_019152	none	because the sequence is so short it is possible this is actually a fragment\n
SPU_013651	SPU_013651	none	some similarities to a part of a peptidase_M1 domain, but sequence is very short\n
SPU_006947	SPU_006947	none	SPU_002418 duplicate\n
SPU_005884	SPU_005884	none	great prediction of the N-terminus; c-terminus is in SPU_011914 (entered here)\n
SPU_010829	SPU_010829	none	77.8% identity with Aedes aegypti elongation factor 2 (AAK01430) \n \n \npredicted exon1, i.e. SPU_010829|Scaffold51549|194|325| has been deleted since it does not seem to exist in other eukaryotic EF2 sequences \nThe first exon in mRNA becomes SPU_010829|Scaffold51549|2642|2856|which lacks a methionine \n \npredicted exon6 SPU_010829|Scaffold51549|6024|6059| has been deleted since it was a repeated sequence of exon7\n
SPU_011914	SPU_011914	none	this is the C-terminus of Sp-RACK, which will be completely annotated with its N-terminal prediction (SPU_005884)\n
SPU_000923	SPU_000923	none	a duplicated exon of Sp-RACK, fully annotated as SPU_005884 \n
SPU_006221	SPU_006221	none	#\nN-terminus probably truncated due to end of contig\n
SPU_006547	SPU_006547	none	Single exon gene encoding a partial copine. This is a possible pseudogene.\n
SPU_000906	SPU_000906	none	best match of 4 good matches\n
SPU_007865	SPU_007865	none	one of 4 matches no obvious haplotypes\n
SPU_007866	SPU_007866	none	one of 4 matches no obvious haplotypes\n
SPU_014221	SPU_014221	none	one of 4 matches no obvious haplotypes\n
SPU_028566	SPU_028566	none	one of 4 matches no obvious haplotypes\n
SPU_026438	SPU_026438	none	The ABCH subfamily is found only in insect, Dictyostelium and zebrafish, to date. \nThere are 2 ESTs.  One has an inserted sequence corresponding to an exon I can't find in the genomic sequence. \n \nGLEAN model is missing the 3' part of the gene.\n
SPU_022633	SPU_022633	none	SPU_022633 and SPU_022634 appear to be a single gene (agrin) split into two models.  I have added the Gene features of 22634 to 22633.    \nSPU_002025 contains a single NtA domain like N-terminus of agrin. Other GLEAN  predictions contain FOLN and KAZAL repeats and may comprise the next segment (especially SPU_002467 and possibly SPU_024994).  A fourth gene looks like the next piece (SPU_022633)and the adjacent gene (SPU_022634) contains LamG repeats that look like the C-terminus. These five gene predictions may be adjacent and comprise a full agrin gene.\n
SPU_022634	SPU_022634	none	This appears to be the last 2 LamininG domains of Agrin.  The tandem gene, SPU_022633 has been amended to include the exons originally predicted for this gene\n
SPU_023247	SPU_023247	none	Small scaffold\n
SPU_007946	SPU_007946	none	Complete 5' end sequence?\n
SPU_003882	SPU_003882	none	Very short scaffold\n
SPU_002955	SPU_002955	none	Alignment with vertebrate dentin due to serine rich repeat in dentin.\n
Sp-PGRP5	SPU_030064	none	Small scaffold.\n
SPU_003669	SPU_003669	none	exons 107766-107856 & 108255-108332 & 132399-132485 do not have homology on a protein level to phospholipase C beta [Lytechinus pictus] but do have nucleotide level homology.   \n \nSPU_022715 is a haplotype of this gene \n
SPU_023216	SPU_023216	none	N-terminus of prediction is longer than nAChR of the other organism.  \n
SPU_003655	SPU_003655	none	#\nclose to Fz1,Fz2,Fz7, orthology to be precisely determined\n
SPU_022373	SPU_022373	none	This model was fused to another glean model (SPU_025278) to create a more accurate model for Sp-Triad. The Gene ID for such new model is: Sp-Triad.\n
SPU_025278	SPU_025278	none	This model was fused to another glean model (SPU_022373) to create a more accurate model for Sp-Triad. The Gene ID for such new model is: Sp-Triad.\n
Sp-Triad	SPU_030065	none	This model was created by fusing two overlapping glean models based on a manual inspection of multiple protein sequence alignments. Redundant exons were taken out from the final sequence.\n
SPU_012877	SPU_012877	none	Other names: KIF27 and KIF7 \nThis gene is part of the Hedgehog signaling pathway\n
SPU_003743	SPU_003743	none	PRP19/PSO4 pre-mRNA processing factor 19 homolog.\n
SPU_014295	SPU_014295	none	part of the Hedgehog signaling pathway \nthis model has been modified by adding the sequences of SPU_003312 and SPU_003313 in front of it's sequence.\n
SPU_026178	SPU_026178	none	Gene is intact! \nNo ESTs.  There are no introns, but the ORF seems intact. \nThis would be the first intronless ABC gene in a multicellular organism! \nChip data suggests this is an expressed sequence.\n
SPU_017959	SPU_017959	none	This is an excellent match, although alignment is missing for the first 34 N-term AA's.  Entry appears to encode a complete ORF, however.\n
SPU_016028	SPU_016028	none	First few exons appear irrelevant, matching to mannose receptor or extracellular proteins. A 200 aa bcl domian matches exactly with Mil2 (SPU_001916).\n
SPU_003241	SPU_003241	none	Three ESTs. Two extend the 5' end, but the extra sequences are not on the contig (could be on another contig).\n
SPU_018342	SPU_018342	none	No ESTs \nGene apppears complete!!\n
SPU_024666	SPU_024666	none	Missing C-term and N-term \nContig missassembled \n3 ESTs but they do not assemble into a single contig\n
SPU_019656	SPU_019656	none	It seems a pseudogene. It contains a big part of the homeodomain, but nothing else similar to other genes. \nThe homeodomain sequences are identical to those in SPU_024715. No Chip expression.\n
SPU_019327	SPU_019327	none	Gene contains a frameshift in element 17, which may make this a pseudogene.  The scaffold stops short around 50-100 nt before the stop codon.  \n
SPU_011836	SPU_011836	none	This 185 gene is partially present on Scaffold1870, but the location of the 3' end is unknown.  This scaffold contains the leader, intron, and part (elements 1-17) of the open reading frame.  The sequence contains subelement 15e, which makes it a member of the 185/333-E group, although the exact pattern is unknown due to the missing sequence.  The best BLAST hit is to Sp0368, or 185/333-E4.  A frameshift in element 1 may indicate that this is a pseudogene.\n
SPU_026825	SPU_026825	none	1 EST \nMissing N-terminus and C-terminus\n
SPU_001916	SPU_001916	none	First few exons appear irrelevant, matching to mannose receptor or extracellular proteins. A 200 aa bcl domian matches exactly with Mil1 (SPU_016028).\n
SPU_016525	SPU_016525	none	the evidence for this gene assignment is a similar domain organization\n
SPU_028724	SPU_028724	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nThe three last exons of this model do not align with corresponding Pelle/IRAKs from other animal groups. An overlapping Fgenesh++ prediction (S.P_Scaffold255.seq.N000008) does not include these three exons (which fall on a separate predicted gene), and would therefore appear as a better model. These three exons of SPU_028724 do not align strongly with any other protein, and they do not code for any detectable protein domain, which argues against them representing a separate gene. In addition, they are very strongly supported by tiling array data. For lack of better evidence for either alternative, we have decided to accept this glean model in its present form. It is however left to be determined experimentally whether these exons do correspond to Sp-Pik2 or a separate gene. \n \nA closely related model (SPU_000073) clusters with Irak4 in a multiple alignment tree. On the other hand, this model is equally distant to all other irak-related molecules, both based on alignments of their kinase domains or combined kinase and death domains. Because IRAKs and pelle proteins from insects are so similar, and because there is no clear co-clustering of both sea urchin homologs with specific Irak genes, we have decided to follow the approach taken for naming the C.elegans orthologs: to name them "Pik" genes after Pelle/Irak.\n
SPU_010283	SPU_010283	none	Alignment with vertebrate dentin due to serine-asp rich repeat.\n
SPU_015338	SPU_015338	none	Alignment to vertebrate dentin due to serine rich repeat in dentin.\n
SPU_000073	SPU_000073	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nThere seems to be duplicated exons towards the C-terminus of this model. For lack of better evidence, we cannot currently determine whether these are truly duplicated exons or due to assembly errors. If these "duplicated" exons are taken out, then the alignment with murine IRAK4 is significantly improved. \n \nThis model clusters with Irak4 in a multiple alignment tree. On the other hand, a closely related model (SPU_028724) is equally distant to all other irak-related molecules, both based on alignments of their kinase domains or combined kinase and death domains. Because IRAKs and pelle proteins from insects are so similar, and because there is no clear co-clustering of both sea urchin homologs with specific Irak genes, we have decided to follow the approach taken for naming the C.elegans orthologs: to name them "Pik" genes after Pelle/Irak.\n
SPU_008499	SPU_008499	none	Alignment to vertebrate dentin due to serine rich repeat in dentin. \nEST sequences align with Glean3 model.\n
SPU_020082	SPU_020082	none	This model is part of a novel gene (Sp-Jak) that results from fusing SPU_022023 and SPU_020082. This modification was made based on a manual inspection of sequence alignments at both the aminoacidic and nucleotide levels.\n
SPU_022023	SPU_022023	none	This model is part of a novel gene (Sp-Jak) that results from fusing SPU_022023 and SPU_020082. This modification was made based on a manual inspection of sequence alignments at both the aminoacidic and nucleotide levels.\n
Sp-Jak	SPU_030066	none	This model was created as a fusion of SPU_022023 and SPU_020082, based on a manual inspection of sequence alignments at both the aminoacidic and nucleotide levels, both between these models and between each model and vertebrate JAKs.\n
SPU_028494	SPU_028494	none	Alignment with vertebrate dentin due to serine rich repeat.\n
SPU_000825	SPU_000825	none	SPU_000825 was found to be very similar to previously cloned sea urchin SM30 genes.  Comparison to a previously isolated genomic clone (Akasaka et al 1994, JBC 269: 20592-20598) indicates that glean_00825 is probably not SM30-alpha or SM30-beta. SPU_000825, SPU_0_00826, SPU_000827, and SPU_000828 encode SM30 like proteins and they are tandemly arranged on Scaffold25604. \n \nMatched c-type lectin domain (cd00037).\n
SPU_013788	SPU_013788	none	See SPU_012296 for AHR-like model; see SPU_005022 for bHLH domain of AHR or AHRR homolog.  SPU_005022 could be the missing N-terminus of this model (SPU_013788) or of SPU_012296.\n
SPU_005762	SPU_005762	none	The C-terminal end of this sequence is also contained in SPU_023715\n
SPU_004230	SPU_004230	none	This glean result matches the C terminal reigion of Sp-PLC-delta.  The rest of the sequence is contained on scaffold 70915.  Part of scaffold 85759 is duplicated on scaffold 106154.  This gene was cloned by Coward et. al., 2003. Its accession number is NP_001008790.1. \n \n***NOTE only This annotation contains the fully complete data.\n
SPU_012103	SPU_012103	none	This scaffold has the N terminal reigion of the PLC-delta sequence.  NOTE, The fully annotated sequence can be found on scaffold 85789.   Part of scaffold 85759 is duplicated on scaffold 106154.  This gene was cloned by Coward et. al., 2003. Its accession number is NP_001008790.1. \n
SPU_023044	SPU_023044	none	#\nincomplete\n
puromycin-sensitive aminopeptidease	SPU_030067	none	partial CDS\n
SPU_018530	SPU_018530	none	incorrect 5' exon prediction\n
SPU_028479	SPU_028479	none	The predicted N-terminal region was incorrect. The predicted C-terminal region was largely incomplete. \n
SPU_008595	SPU_008595	none	sequence incomplete\n
SPU_014645	SPU_014645	none	SPU_007315 is almost identical but shorter, duplication most likely due to assembly process. \n
SPU_012278	SPU_012278	none	sequence probably incomplete\n
SPU_007315	SPU_007315	none	Almost identical to SPU_014645 but shorter, duplication most likely due to assembly process. \n
SPU_005654	SPU_005654	none	We pulled the partial cDNA from an Sp egg library.  It aligns well with six of the middle Glean predictions.  SPU_020904 also aligns well with the partial cDNA.\n
SPU_004867	SPU_004867	none	Similar to SM30-alpha. Adjacent to SpSM30-like-B (SPU_004869), but on the opposite strand. \n \nMatches c-type lectin domain (cd00037) and pericardin like repeats (PR009765).\n
SPU_008747	SPU_008747	none	See putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137538013-8866-40191825170.BLASTQ4\n
SPU_004869	SPU_004869	none	Similar to SM30-alpha. Adjacent to SpSM30-E (SPU_004867) but on the opposite strand. \n \nMatches c-type lectin domain (smart00034).\n
SPU_007404	SPU_007404	none	Member of CYP1 family. Tentatively designated CYP1F6 pending CYP nomenclature committee approval. Phylogenetically falls at base of CYP1 family, within CYP clan 2 (MrBayes v3.0.1 WAG+gamma+I model). Percent ID (on masked basis) to   () of % (aa level). Single exon gene.\n
SPU_001365	SPU_001365	none	Duplicated Gene...see also SPU_008207. \nSee putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137535659-19561-20778833499.BLASTQ4\n
SPU_008207	SPU_008207	none	Duplicated gene...see also SPU_001365 \nSee putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137537713-13786-83388701897.BLASTQ1\n
SPU_018810	SPU_018810	none	Appears most similar to SpSM32 (Illies et al.)  But it also contains the first exon of SpSM50 (SPU_018811).  SpSM32 and SpSM50 share a first exon. \n \nMatches c-type lectin domain (cd00037).\n
SPU_007081	SPU_007081	none	Alignment with vertebrate dentin due to serine rich repeats. \nEmbryonic and larval EST sequences conform to gene prediction. \n
SPU_002589	SPU_002589	none	See SPU_006123, _00154 (C-terminus only). \n
SPU_010811	SPU_010811	none	Alignments with known proteins, including vertebrate dentin, are due to high number of repetitive amino acids.\n
SPU_012324	SPU_012324	none	missing 5' of CDS (domain I; HS attachment sites) \n5' of CDS probably encoded by SPU_000937 \nmissing 3' of CDS \n3' of CDS probably encoded by SPU_026338 \nhaplotype duplication of SPU_028620 \nSPU_012324 is longer than SPU_028620 and possesses a 29 residue piece in the middle of the sequence that is missing in SPU_028620 (possible intron)\n
SPU_009723	SPU_009723	none	Alignments with known proteins, including vertebrate denti, dis due to repetitive amino acids.\n
SPU_004183	SPU_004183	none	See SPU_006123, _02589, _00254. \n
SPU_020107	SPU_020107	none	Missing N-terminus.  See SPU_006123, _00154, _02589, _04183. \n
SPU_017634	SPU_017634	none	Several high scoring hits, entry had a BLAST score of 0.00. \nBLAST of ABP-120 revealed the same GLEAN3 prediction with a score of 0.00.\n
SPU_002605	SPU_002605	none	See SPU_006123, _00154, _02589, _04183, _20107. \n
SPU_019540	SPU_019540	none	Missing 230 amino acids found in mouse at the amino terminal.\n
SPU_009340	SPU_009340	none	Blasted with human homolog of Myosin IIIA and Myosin IIIB and obtained the same three highest scoring predictions.\n
SPU_019203	SPU_019203	none	N-terminus prediction is longer than those of other organism.  See SPU_006123, _00154, _02589, _04183, _20107, _02605.  \n
SPU_022763	SPU_022763	none	See SPU_006123, _00154, _02589, _04183, _20107, _02605, _19203. \n
SPU_012184	SPU_012184	none	Missing N-terminus.  High scoring hits to AChE (SPU_006123, _00154, _02589, _04183, _20107, _02605, _19203, _22763). \n
Sp-PLC-delta	SPU_030068	none	Part of scaffold 85759 is duplicated on scaffold this scaffold (106154). NOTE  annotation on scaffold 85759 contains the fully complete data.  This gene was cloned by Coward et. al., 2003. Its accession number is NP_001008790.1.\n
SPU_007745	SPU_007745	none	Pfam00484.11 match.   \n \nTranscriptome data indicate that it is expressed in embryos. \n \nA family of carbonic anhydrase-like proteins exists in the sea urchin. Orthologous relationships with vertebrate carbonic anhydrases and related proteins have not been carefully examined.\n
SPU_014664	SPU_014664	none	N-terminus is missing.  See SPU_014664.  \n
SPU_024978	SPU_024978	none	See SPU_013765.\n
SPU_008910	SPU_008910	none	See SPU_028455, _14664. \n
SPU_018811	SPU_018811	none	SPU_018811 is on the same scaffold as SM37. This is to be expected for SM50 [Lee et al (1999) Develop. Growth Differ 41: 303-312 PUB MED:10400392]. SPU_018811 is most similar to the aa sequence of S. purpuratus SM50.  However it is missing the first exon and the intron one would expect from the canonical SM50 gene (Sucov et al and Katoh-Fukui et al.) Just 5' to this glean model is SPU_018810 that encodes SpSM32 which shares the first exon with SpSM50. Need to change this gene model to reflect this. \n \nMatched c-type lectin domain (cd00037).\n
SPU_016008	SPU_016008	none	2 exons are missing : \nexon 1, containing the signal peptide, found on scaffold 82776 \nexon 4 (instead of the 5 aa exon "GLFCF"), containing the transmembrane domain, found on scaffold 7868 \n
SPU_008058	SPU_008058	none	48% identity with corresponding region in human valyl-tRNA synthetase (VARS2) \n45% identity with human VARS2-like \nSp-VARS isoformA has 47% identity with the Sp-VARSisoformB (SPU_002908)  \n
SPU_019101	SPU_019101	none	#\nTWO COMMENTS: \n \n1) Alignment with best blast hit sequence suggests that the first exon (see below) is incorrect. This conclusion is supported by transcriptome signals. \n>SPU_019101|Scaffold112071|10738|10828| DNA_SRC: Scaffold112071 START: 10738 STOP: 10828 STRAND: +  \nCTGAAATTCTGAGATGTCGAATGGGGAGGTGCGTAGTGACAAGGGTGTGGTATAGGCCTAGCTGTAGAGG \nATGCTTCAGAAGAGCCGACAT \n \n2) This model is adjacent to a very similar gene model,  SPU_019102\n
SPU_021352	SPU_021352	none	This prediction of one of many ACE genes on scaffold 52540. \nAlignment with best blast sequence suggests there may be a missing exon between the following exons in this model: \n>SPU_021352|Scaffold52540|47460|47651| DNA_SRC: Scaffold52540 START: 47460 STOP: 47651 STRAND: +  \nCAAGGAGATGAGCGGGTATAGGTCCTGTGGGGTTAACCTTGTCCGTCCCGTACTTCTCTCCCAGCTTTCT \nTCTCACAAAGGCATGGATCTGGAGGTACATCGGTTTGACTGCATCCCAAAGGGCGTCGATCTTCTCCACG \nAAGTGTGGGTCTTCATAACGACGACGGAGTAAGTCGCCACGATCTTCATAAC \n>SPU_021352|Scaffold52540|49826|49971| DNA_SRC: Scaffold52540 START: 49826 STOP: 49971 STRAND: +  \nGTGATTCGCTCATCAGGTGTTGAAGGCCTGGTTCCATGTTCAAGCATTGCTCCGATCGTTTCTTCCTCTG \nTGCATGCTCTCTCTTCCGACAAACCTTTCCGGTAGCGAAGATAGTTGTCATGTTGTCCTGGACCTCACGT \nTCTTTA \n
SPU_011641	SPU_011641	none	45% identity with human EF2 kinase (AAH32665)\n
SPU_021837	SPU_021837	none	ESTs used to confirm model only cover 5' portion of gene (first 6 exons) and 3' UTR, however tiling array correlates well with model predictions throughout.  Multiple splice varints likely exist since some ESTs contain the 3rd exon in the prediction and some do not.  Length of 3'UTR based on tiling array data and the presence of AAUAAA and CA at most 3' end.  Start of transcript indicated here is different than that in origional prediction and is based on EST data (BCM Exonerate CD304782 and CX555128) and tiling data.  The end of the CDS based on the existance of a TAA at site indicated.\n
SPU_026627	SPU_026627	none	Missing N-terminus and should be combined with SPU_026626.  The prediction includes extra C-terminus.  \n
SPU_017426	SPU_017426	none	Missing N-terminus.  \n
SPU_008988	SPU_008988	none	See SPU_025315, _25999, _08988. \n
SPU_007186	SPU_007186	none	Missing N-terminus.  See SPU_025315, _25999, _08988.  \n
SPU_020213	SPU_020213	none	Missing N-terminus.  \n
SPU_023617	SPU_023617	none	Missing the N-terminal TM domins, they are off the contig. \nThere are two ESTs, one in the coding and one in the 3'UTR. \nOne exon in an NBD was deleted from the GLEAN model\n
SPU_005577	SPU_005577	none	SPU_005577 may be partial- at end of contig. It is similar to a pair of genes, 09924, 09925, that are adjacent to one another on a contig. The other of the pair (with 05723) may be 05723, which is also on a small contig and likely truncated.\n
SPU_012621	SPU_012621	none	See Developmental Biology 204(1) 293-304 (1998) for more information. \n \nUnable to verify all exons experimentally, but likely correct. \nPublished mRNA sequence does not extent along the scaffolding as far as the upstream UTR.\n
Sp-Zic4-like	SPU_030070	none	complete cds\n
SPU_026395	SPU_026395	none	This sequence represents only the N-terminus of the protein.  SP-ABCC1a has the full-length sequence of a closely related gene.\n
SPU_028797	SPU_028797	none	Unable to differentiate between ABCC8 and ABCC9 families.  This is orthologous to the human ABCC8/9 families.\n
SPU_005723	SPU_005723	none	SPU_005723 may be partial- at end of contig. It is similar to a pair of genes, 09924, 09925, that are adjacent to one another on a contig. The other of the pair (with 05723) may be 05577, which is also on a small contig and likely truncated.\n
SPU_009924	SPU_009924	none	Lies adjacent to a highly similar gene, 09925, on the same contig.\n
SPU_009925	SPU_009925	none	Adjacent to SPU_009924, a similar gene.\n
SPU_024191	SPU_024191	none	The first 2000 bases on the 5' end of this gene match very closely with the 5' half of SPU_020669.  The latter half of both genes are quite divergent from eachother.  This may be a case of duplication of all or half of one of these genes.  They are distant enough, however, that they are not being labeled as duplicates.\n
SPU_002411	SPU_002411	none	This gene is quite similar to SPU_020669.  However, there are several instances of large insertions/deletions as well as numerous amino acid differences.  Perhaps it is a relatively recent duplication.  The number of differences make it unlikely that this is simply due to haplotype variation.\n
SPU_025903	SPU_025903	none	Unable to differentiate between ABCC8 and ABCC9 families.  This is orthologous to the human ABCC8/9 families.\n
SPU_008351	SPU_008351	none	This prediction  is incomplete. The 5' end of the protein is not predicted. Refer to the modified sequence of SPU_028479 for the corrected sequence.\n
SPU_004417	SPU_004417	none	U5 snRNP-associated 102 kDa protein. First part of the gene on SPU_024258. Latter part on SPU_004417. Likely missing one exon between two parts.\n
SPU_021366	SPU_021366	none	U1 small nuclear ribonucleoprotein 70 kDa like. Likely missing 5' exon(s). Can't find the missing exon in protein predictions.\n
SPU_027526	SPU_027526	none	N-terminus of this gene is SPU_027525 and should be combined.   \n
SPU_027443	SPU_027443	none	Also, likely ortholog of plex B1.  \n
SPU_020916	SPU_020916	none	This gene model is possibly incomplete, since it is located at the end of the scaffold.\n
SPU_003553	SPU_003553	none	The CARD domain is located downstream of the NACHT. This is not seen in mammalian Nod1 and Nod2, where the CARD domain (or both) are located upstream of the NOD.\n
SPU_003539	SPU_003539	none	This gene model could be incomplete since it is located at the end of a scaffold.\n
SPU_006610	SPU_006610	none	This gene model could be incomplete since it is located at the end of a scaffold.\n
SPU_013619	SPU_013619	none	The DEATH domain encoded in this protein resembles the Dr5 receptor protein DEATH domain. \n
Sp-VC1_2	SPU_030071	none	#\nThis gene model has been predicted by fgeneshAB and ++ that is not in the Glean3 list.\n
Sp-Tnfsf_like4	SPU_030072	none	This gene model has been predicted by fgeneshAB and ++ that is not in the Glean3 list. The expression of this gene model has been confirmed with QPCR.\n
SPU_006529	SPU_006529	none	This gene includes most of SPU_006530. The missing exons were added to this model. The last exon does not belong to gene model as well as the first exon of SPU_006530. Fgenesh prediction is right. The SPU_006530 model was not modified to reflect these discrepencies.\n
SPU_006530	SPU_006530	none	This gene model is incomplete and combines with parts of SPU_006529 to make a complete gene model. The SPU_006529 model was modified to include the exons in this model. Please refer to this gene model for further detail.\n
SPU_024075	SPU_024075	none	Fgenesh prediction contains one additional exon in the 5' end, which contains the signal peptide. This exon has been added to the glean3 model.\n
Sp-NT1	SPU_030073	none	Neurotrophin found by Genscan but not by Glean. \nManual predictions agree with genscan except that the manual prediction has one main 3' exon containing most if not all of the translated part, whereas the genscan prediction will have the gene encoded by 2 separate exons.\n
SPU_010719	SPU_010719	none	Member of CYP1 family. Tentatively designated CYP1F3 pending CYP nomenclature committee approval. Phylogenetically falls at base of CYP1 family, within CYP clan 2 (MrBayes v3.0.1 WAG+gamma+I model). Percent ID (on masked basis) to CYP1A (Danio rerio) of 41.7% (aa level). Single exon, possibly part of tandem duplication.\n
SPU_010720	SPU_010720	none	Member of CYP1 family. Tentatively designated CYP1F4 pending CYP nomenclature committee approval. Phylogenetically falls at base of CYP1 family, within CYP clan 2 (MrBayes v3.0.1 WAG+gamma+I model). Percent ID (on masked basis) to CYP1A (Danio rerio) of 41.7% (aa level). Single exon, possibly part of tandem duplication.\n
SPU_010721	SPU_010721	none	Member of CYP1 family. Tentatively designated CYP1F5 pending CYP nomenclature committee approval. Phylogenetically falls at base of CYP1 family, within CYP clan 2 (MrBayes v3.0.1 WAG+gamma+I model). Percent ID (on masked basis) to CYP1A (Danio rerio) of 41.7% (aa level). Single exon, possibly part of tandem duplication.\n
SPU_011692	SPU_011692	none	Chip expression data indicates likely expression in embryo. Ortholog to honey bee voltage gated L type Ca channel. Related to Ca channel gene found in coral ( Stylophora pistillata) by Allemand grouop in Monaco.\n
SPU_007770	SPU_007770	none	Chip expression data does NOT confirm embryonic expression. Closely related to SPU_011692, which is expressed in embryo. Best hit is L type Ca channel in the snail, Limnea stagnalis. Also related to the coral Ca Channel gene described by Allemand's group.\n
SPU_019522	SPU_019522	none	Beginning of gene on SPU_024095. End of gene is on SPU_019522. Possible haplotype for SPU_019522 is SPU_007176\n
SPU_007176	SPU_007176	none	Beginning of gene on SPU_024095. End of gene is on SPU_019522. Possible haplotype for SPU_019522 is SPU_007176. \nRECOMMEND DELETION\n
SPU_024095	SPU_024095	none	Beginning of gene on SPU_024095. End of gene is on SPU_019522. Possible haplotype for SPU_019522 is SPU_007176\n
SPU_006939	SPU_006939	none	This gene model was modified by comparison to the corresponding FgeneshAB prediction and other typical Sp-Tlr genes. \nThis gene model may represent a pseudogene or contain a sequence error. \n
SPU_019917	SPU_019917	none	Lacking N-terminus.  See SPU_027526, _09600. \n
SPU_018059	SPU_018059	none	See SPU_012977.\n
SPU_028135	SPU_028135	none	polyadenylated histone H10\n
SPU_006899	SPU_006899	none	This gene encodes a precursor for a vasopressin/oxytocin/vasotocin-like peptide CFISNCPKGamide, which I suggest is named Sp-echinotocin. This is first vasopressin/oxytocin/vasotocin-like peptide to be identified in an echinoderm. It is likely that similar or identical peptides will be found in other echinoderms, which I suggest are known collectively as echinotocins. \n \nThe GLEAN model of this gene with 4 exons is wrong because the N-terminal signal peptide is encoded by a putative internal exon (exon 2 of the GLEAN model). The model that I have produced is comprised of 3 exons with: \n1. exon 1 encoding a signal peptide (confirmed by SignalP3.0 analysis), the echinotocin peptide and the N-terminal part of neurophysin. \n2. exon 2 encoding the main middle portion of neurophysin \n3. exon 3 encoding the C-terminal region of neurophysin. \nThe tiling data do not show signals that correspond with these exons, indicating that this gene is not expressed in the early stages of sea urchin development used as a source of mRNA for the tiling analysis. \nNo EST or cDNA data are available at present to confirm the prediction. \n \nThe model is however consistent with the structure of vasotocin/vasopressin/oxytocin genes in vertebrates, which are also comprised of 3 equivalent exons. \nBLAST analysis GenBank with the predicted 165 amino-acid precursor shows a high level of sequence similarity with vasotocin precursors in fish.\n
SPU_015742	SPU_015742	none	divergently transcribed as a gene pair with GLEANH3_15741 \nSp-late-histone-H3b\n
SPU_016523	SPU_016523	none	Classic zinc finger. Closest Ciona hit (ensembl:ENSCING00000008508)logged as Thyroid hormone receptor.\n
SPU_011485	SPU_011485	none	Huntington like gene in Sea Urchin shows more than 50 % similarity with Human huntingtin gene. This part of the gene codes for 844 amino acids making N terminal hunting protein from the start codon. The Scaffold 54379 which has this gene is reverse complimented. The C terminal part of this gene has to be identified which might continue in another scaffold.\n
SPU_015064	SPU_015064	none	SPU_015064 has the first part of the gene. SPU_017197 should have the latter half. SPU_015899 is likely a haplotype of SPU_017197.\n
SPU_015899	SPU_015899	none	SPU_015064 has the first part of the gene. SPU_017197 should have the latter half. SPU_015899 is likely a haplotype of SPU_017197.\n
SPU_018213	SPU_018213	none	This gene model was modified based on Genscan and domain structures: the first exon + the following 200bp intron show a typical Toll-like receptor.  The third exon could belong to next glean model(18214). \n
SPU_024258	SPU_024258	none	U5 snRNP-associated 102 kDa protein. First part of the gene on SPU_024258. Latter part on SPU_004417. Likely missing one exon between two parts.\n
SPU_018214	SPU_018214	none	This gene model was modified based on BLASTN search and domain structures: the third exon of SPU_018213 + the upstream intron + this gene model + the gap between them show a typical Toll-like receptor.  This gene model may represent a pseudogene or contain a sequence error.\n
SPU_016594	SPU_016594	none	U5 snRNP-specific protein, 116 kD. Haplotype of SPU_014430\n
SPU_001709	SPU_001709	none	one nt sequencing error (or else is a very new pseudogene since the 3' end is correct\n
SPU_014099	SPU_014099	none	Given extreme similarity to TBP elsewhere in the genome, but many fewer nts suggests that this is an assembly error, not a real gene.  No transcription seen in the first two exons.  Plus we have evidence elsewhere that there is only one copy of TBP\n
SPU_021279	SPU_021279	none	cleavage stage histone H3 \n \nNote I found the first 126 nts of the coding region on scaffold 24661 \n \nnts  14772 to 14897\n
SPU_007165	SPU_007165	none	incomplete sequence tructaed at 3'end  \ncould also be ortholog of mrp8\n
SPU_003833	SPU_003833	none	divergently transcribed gene pair with SPU_003828 \nSp-late-histone-H3e\n
SPU_003916	SPU_003916	none	incomplete\n
SPU_003407	SPU_003407	none	U5 small nuclear ribonucleoprotein 200 kDa helicase (U5 snRNP-specific 200 kDa protein) (U5-200KD) (Activating signal cointegrator 1 complex subunit 3-like 1). \n \nSPU_003407 and SPU_006432 are orthologs of human U5-snRNP-200kDa protein. \n \nSPU_013364 and SPU_028486 belong to second class similar to activating signal cointegrator 1. \n \nSPU_000121 and SPU_017541 are unknown DEAD-box proteins similar to 200kDa snRNP protein and activating signal cointegrator 1 complex subunit 3. \n \nSPU_017541 is the longest predicted protein.\n
SPU_006432	SPU_006432	none	U5 small nuclear ribonucleoprotein 200 kDa helicase (U5 snRNP-specific 200 kDa protein) (U5-200KD) (Activating signal cointegrator 1 complex subunit 3-like 1). \n \nSPU_003407 and SPU_006432 are orthologs of human U5-snRNP-200kDa protein. \n \nSPU_013364 and SPU_028486 belong to second class similar to activating signal cointegrator 1. \n \nSPU_000121 and SPU_017541 are unknown DEAD-box proteins similar to 200kDa snRNP protein and activating signal cointegrator 1 complex subunit 3. \n \nSPU_017541 is the longest predicted protein.\n
SPU_013364	SPU_013364	none	U5 small nuclear ribonucleoprotein 200 kDa helicase (U5 snRNP-specific 200 kDa protein) (U5-200KD) (Activating signal cointegrator 1 complex subunit 3-like 1). \n \nSPU_003407 and SPU_006432 are orthologs of human U5-snRNP-200kDa protein. \n \nSPU_013364 and SPU_028486 belong to second class similar to human U5-snRNP-200kDa protein and activating signal cointegrator 1. \n \nSPU_000121 and SPU_017541 are likely orthologs of activating signal cointegrator 1 complex subunit 3. \n \nSPU_017541 is the longest predicted protein.\n
SPU_017541	SPU_017541	none	U5 small nuclear ribonucleoprotein 200 kDa helicase (U5 snRNP-specific 200 kDa protein) (U5-200KD) (Activating signal cointegrator 1 complex subunit 3-like 1). \n \nSPU_003407 and SPU_006432 are orthologs of human U5-snRNP-200kDa protein. \n \nSPU_013364 and SPU_028486 belong to second class similar to activating signal cointegrator 1. \n \nSPU_000121 and SPU_017541 are unknown DEAD-box proteins similar to 200kDa snRNP protein and activating signal cointegrator 1 complex subunit 3. \n \nSPU_017541 is the longest predicted protein.\n
SPU_027604	SPU_027604	none	Alignment indicates the gene may be truncated at amino terminal- other gene models suggest additional sequences could be included at 5'end, but no cDNA data.\n
SPU_000121	SPU_000121	none	U5 small nuclear ribonucleoprotein 200 kDa helicase (U5 snRNP-specific 200 kDa protein) (U5-200KD) (Activating signal cointegrator 1 complex subunit 3-like 1). \n \nSPU_003407 and SPU_006432 are orthologs of human U5-snRNP-200kDa protein. \n \nSPU_013364 and SPU_028486 belong to second class similar to activating signal cointegrator 1. \n \nSPU_000121 and SPU_017541 are unknown DEAD-box proteins similar to 200kDa snRNP protein and activating signal cointegrator 1 complex subunit 3. \n \nSPU_017541 is the longest predicted protein.\n
SPU_028486	SPU_028486	none	U5 small nuclear ribonucleoprotein 200 kDa helicase (U5 snRNP-specific 200 kDa protein) (U5-200KD) (Activating signal cointegrator 1 complex subunit 3-like 1). \n \nSPU_003407 and SPU_006432 are orthologs of human U5-snRNP-200kDa protein. \n \nSPU_013364 and SPU_028486 belong to second class similar to activating signal cointegrator 1. \n \nSPU_000121 and SPU_017541 are unknown DEAD-box proteins similar to 200kDa snRNP protein and activating signal cointegrator 1 complex subunit 3. \n \nSPU_017541 is the longest predicted protein.\n
Sp-Tlr223	SPU_030075	none	Partial Toll-like receptor predicted by FgeneshAB and Genscan. The nucleotides of this gene model have 92% identity to a typical Sp-Tlr gene (SPU_012257). The model is located at the end of a short scaffold. \n
Sp-Tlr224	SPU_030076	none	#\nPartial Toll-like receptor. A part of this modified gene model is predicted by Genscan and NCBI. The nucleotides have 96% similarity to a typical Sp-Tlr gene (SPU_010940). The model occupies all sequence of a short scaffold. \n
Sp-Tlr225	SPU_030077	none	Partial Toll-like receptor predicted by FgeneshAB. The modified gene model occupies all sequence of a short scaffold and the nucleotides have 92% identity to a typical Sp-Tlr (SPU_000615).  \n
Sp-Tlr226	SPU_030078	none	#\nPartial Toll-like receptor predicted by FgeneshAB and Genscan. The coding region of this gene model occupies all sequence of a short scaffold and the nucleotides have 89% identity to a typical Sp-Tlr (SPU_023035).  \n
Sp-Tlr227	SPU_030079	none	Partial Toll-like receptor predicted by Fgenesh++. The nucleotides of the coding region have 92% identity to a typical Sp-Tlr (SPU_015066). This model is located at the end of a short scaffold. \n
SPU_015139	SPU_015139	none	paralog of Sp-tachykinin-receptor-2 (SPU_015140); is located on scaffold 49502 adjacent to Sp-tachykinin-receptor-2 (SPU_015140), indicating that these two genes arose by recent gene duplication event\n
SPU_015140	SPU_015140	none	paralog of Sp-tachykinin-receptor-1 (SPU_015139); is located on scaffold 49502 adjacent to Sp-tachykinin-receptor-1 (SPU_015139), indicating that these two genes arose by recent gene duplication event\n
SPU_000111	SPU_000111	none	 partial, missing N-terminus\n
SPU_000116	SPU_000116	none	 partial, missing N-terminus\n
SPU_000187	SPU_000187	none	 partial, missing C-terminus\n
SPU_000383	SPU_000383	none	 partial, missing N-terminus\n
SPU_000455	SPU_000455	none	 extra stretch of aminoacids in middle\n
SPU_000590	SPU_000590	none	 missing C-terminus\n
SPU_000597	SPU_000597	none	 extra C-terminus\n
SPU_000607	SPU_000607	none	 partial, missing N-terminus\n
SPU_000612	SPU_000612	none	 partial, missing C-terminus half\n
SPU_000771	SPU_000771	none	 extra N-terminus\n
SPU_000772	SPU_000772	none	 extra N-terminus\n
SPU_000774	SPU_000774	none	 missing N- and C-terminus\n
SPU_000781	SPU_000781	none	 extra N-terminus\n
SPU_000799	SPU_000799	none	 missing some C-terminus residues\n
SPU_000830	SPU_000830	none	 partial, missing N-terminus\n
SPU_001068	SPU_001068	none	 extra C-terminus\n
SPU_001141	SPU_001141	none	 partial, missing N- and C-terminus\n
SPU_001142	SPU_001142	none	 partial, missing N-terminus\n
SPU_001201	SPU_001201	none	 partial, missing some N-terminus residues\n
SPU_001214	SPU_001214	none	 extra N-terminus\n
SPU_001525	SPU_001525	none	 partial, missing N-terminus\n
SPU_001849	SPU_001849	none	 partial, missing N-terminus half\n
SPU_001860	SPU_001860	none	 partial, missing C-terminus\n
SPU_001864	SPU_001864	none	 missing N-terminus residues\n
SPU_001934	SPU_001934	none	 partial, missing N-terminus\n
SPU_002030	SPU_002030	none	 partial, missing C-terminus\n
SPU_002051	SPU_002051	none	 partial, missing C-terminus\n
SPU_002142	SPU_002142	none	 partial, missing N- and C-terminus\n
SPU_002201	SPU_002201	none	 partial, missing C-terminus\n
SPU_002209	SPU_002209	none	 partial, missing C-terminus\n
SPU_002429	SPU_002429	none	 partial, missing C-terminus\n
SPU_002491	SPU_002491	none	 partial, missing some N-terminus residues\n
SPU_002546	SPU_002546	none	 missing stretch in middle\n
SPU_002650	SPU_002650	none	 missing C-terminus\n
SPU_002686	SPU_002686	none	 extra residues on N-terminus\n
SPU_002736	SPU_002736	none	 partial, missing C-terminus\n
SPU_002773	SPU_002773	none	 partial, missing C-terminus\n
SPU_002881	SPU_002881	none	 partial, missing N-terminus\n
SPU_002901	SPU_002901	none	 partial, missing C-terminus\n
SPU_002903	SPU_002903	none	 partial, missing N-terminus\n
SPU_002909	SPU_002909	none	 partial, missing N-terminus\n
SPU_002911	SPU_002911	none	 partial, missing C-terminus\n
SPU_002963	SPU_002963	none	 extra stretch in middle\n
SPU_002998	SPU_002998	none	 missing stretch in middle\n
SPU_003021	SPU_003021	none	 extra stretch in middle\n
SPU_003138	SPU_003138	none	 partial, missing C-terminus\n
SPU_003325	SPU_003325	none	 extra stretch in middle\n
SPU_003329	SPU_003329	none	 partial, missing N- and C-terminus residues\n
SPU_003397	SPU_003397	none	 partial, missing N-terminus\n
SPU_003415	SPU_003415	none	 partial, missing N-terminus\n
SPU_003444	SPU_003444	none	 partial, missing N-terminus\n
SPU_003457	SPU_003457	none	 partial, missing N-terminus\n
SPU_003561	SPU_003561	none	 partial, missing C-terminus\n
SPU_003667	SPU_003667	none	 partial, missing C-terminus\n
SPU_003712	SPU_003712	none	 partial, missing N-terminus residues\n
SPU_003714	SPU_003714	none	 missing N- and C-terminus\n
SPU_003785	SPU_003785	none	 missing N- and C-terminus\n
SPU_003839	SPU_003839	none	 extra N-terminus, missing C-terminus\n
SPU_003913	SPU_003913	none	 missing two stretches in middle\n
SPU_003922	SPU_003922	none	 partial, missing N-terminus\n
SPU_003982	SPU_003982	none	 partial, missing C-terminus\n
SPU_004022	SPU_004022	none	 partial, missing N- and C-terminus\n
SPU_004075	SPU_004075	none	 partial, missing N- and C-terminus\n
SPU_004123	SPU_004123	none	 partial, missing N-terminus\n
SPU_004216	SPU_004216	none	 partial, missing N-terminus\n
SPU_004271	SPU_004271	none	 missing N-terminus\n
SPU_004304	SPU_004304	none	 missing N-terminus\n
SPU_004310	SPU_004310	none	 partial, missing C-terminus\n
SPU_004326	SPU_004326	none	 missing stretch in middle\n
SPU_004376	SPU_004376	none	 partial, missing C-terminus\n
SPU_004416	SPU_004416	none	 partial, missing N-terminus half\n
SPU_004593	SPU_004593	none	 partial, missing N-terminus\n
SPU_004673	SPU_004673	none	 partial, missing N-terminus\n
SPU_004694	SPU_004694	none	 missing N-terminus residues\n
SPU_004720	SPU_004720	none	 partial, missing N-terminus\n
SPU_004769	SPU_004769	none	 partial, missing C-terminus\n
SPU_004771	SPU_004771	none	 extra stretch in middle\n
SPU_004892	SPU_004892	none	 partial, missing N-terminus\n
SPU_004946	SPU_004946	none	 partial, missing C-terminus\n
SPU_004965	SPU_004965	none	 missing N-terminus residues\n
SPU_005056	SPU_005056	none	 partial, missing N- and C-terminus\n
SPU_005177	SPU_005177	none	 missing N-terminus residues\n
SPU_005203	SPU_005203	none	 partial, missing N-terminus\n
SPU_005246	SPU_005246	none	 missing N- and C-terminus residues\n
SPU_005253	SPU_005253	none	 partial, missing N-terminus\n
SPU_005261	SPU_005261	none	 partial, missing C-terminus\n
SPU_005274	SPU_005274	none	 partial, missing N-terminus\n
SPU_005285	SPU_005285	none	 partial, missing N-terminus\n
SPU_005331	SPU_005331	none	 partial, missing N-terminus\n
SPU_005356	SPU_005356	none	 partial, missing N-terminus\n
SPU_005407	SPU_005407	none	 partial, missing N-terminus\n
SPU_005423	SPU_005423	none	 missing N-terminus\n
SPU_005523	SPU_005523	none	 extra N-terminus\n
SPU_005534	SPU_005534	none	 missing stretch in middle\n
SPU_005665	SPU_005665	none	 partial, missing N-terminus\n
SPU_005675	SPU_005675	none	 partial, missing stretch in middle\n
SPU_005757	SPU_005757	none	 missing N-terminus residues\n
SPU_005811	SPU_005811	none	 partial, missing N- and C-terminus\n
SPU_005848	SPU_005848	none	 partial, missing N-terminus\n
SPU_005852	SPU_005852	none	 extra stretch in middle\n
SPU_005863	SPU_005863	none	 missing stretch in middle\n
SPU_005952	SPU_005952	none	 partial, missing C-terminus\n
SPU_005977	SPU_005977	none	 missing stretch in middle\n
SPU_006017	SPU_006017	none	 partial, missing N-terminus\n
SPU_006102	SPU_006102	none	 missing N-terminus, extra stretch in middle\n
SPU_006210	SPU_006210	none	 missing N-terminus residues\n
SPU_006291	SPU_006291	none	 missing N- and C-terminus\n
SPU_006301	SPU_006301	none	 partial, missing C-terminus\n
SPU_006339	SPU_006339	none	 partial, missing C-terminus\n
SPU_006361	SPU_006361	none	 partial, missing C-terminus\n
SPU_006381	SPU_006381	none	 partial, missing N-terminus\n
SPU_006383	SPU_006383	none	 partial, missing N-terminus\n
SPU_006414	SPU_006414	none	 partial, missing N-terminus and a stretch in middle\n
SPU_006426	SPU_006426	none	 partial, missing N-terminus\n
SPU_006437	SPU_006437	none	 partial, missing C-terminus\n
SPU_006509	SPU_006509	none	 missing stretch in middle\n
SPU_006565	SPU_006565	none	 partial, missing C-terminus\n
SPU_006603	SPU_006603	none	 partial, missing N-terminus\n
SPU_006630	SPU_006630	none	 partial, missing C-terminus\n
SPU_006663	SPU_006663	none	 partial, missing N-terminus\n
SPU_006693	SPU_006693	none	 partial, missing N-terminus\n
SPU_006790	SPU_006790	none	 extra N-terminus\n
SPU_006852	SPU_006852	none	 extra stretch in middle\n
SPU_006888	SPU_006888	none	 extra N-terminus\n
SPU_006890	SPU_006890	none	 extra C-terminus\n
SPU_006985	SPU_006985	none	 partial, missing N-terminus\n
SPU_007041	SPU_007041	none	 missing C-terminus\n
SPU_007114	SPU_007114	none	 partial, missing C-terminus\n
SPU_007122	SPU_007122	none	 partial, missing C-terminus\n
SPU_007155	SPU_007155	none	 partial, missing C-terminus\n
SPU_007228	SPU_007228	none	 partial, missing stretches in middle\n
SPU_007245	SPU_007245	none	 extra N-terminus\n
SPU_007246	SPU_007246	none	 partial, missing C-terminus\n
SPU_007249	SPU_007249	none	 partial, missing C-terminus, extra N-terminus\n
SPU_007284	SPU_007284	none	 extra stretch in middle\n
SPU_007499	SPU_007499	none	 partial, missing N-terminus\n
SPU_007511	SPU_007511	none	partial, missing N-terminus\n
SPU_007774	SPU_007774	none	 partial, missing N-terminus\n
SPU_007833	SPU_007833	none	 partial, misisng N- and C-terminus\n
SPU_007905	SPU_007905	none	 partial, missing N-terminus\n
SPU_007906	SPU_007906	none	 partial, missing N-terminus and stretch in middle\n
SPU_007909	SPU_007909	none	 partial, missing N- and C-terminus\n
SPU_007920	SPU_007920	none	 missing N- and C-terminus\n
SPU_007927	SPU_007927	none	 missing C-terminus\n
SPU_008009	SPU_008009	none	 missing stretch in middle\n
SPU_008080	SPU_008080	none	 extra N-terminus, missing C-terminus\n
SPU_008094	SPU_008094	none	 missing N-terminus\n
SPU_008135	SPU_008135	none	 extra N-terminus\n
SPU_008142	SPU_008142	none	 missing C-terminus\n
SPU_008176	SPU_008176	none	 missing stretch in middle\n
SPU_008219	SPU_008219	none	 missing N-terminus and stretch in middle\n
SPU_008265	SPU_008265	none	 missing N-terminus\n
SPU_008318	SPU_008318	none	 missing C-terminus\n
SPU_008349	SPU_008349	none	 missing N-terminus residues\n
SPU_008458	SPU_008458	none	 missing C-terminus\n
SPU_008464	SPU_008464	none	 missing C-terminus\n
SPU_008542	SPU_008542	none	 missing C-terminus\n
SPU_008553	SPU_008553	none	 missing stretch in middle\n
SPU_008554	SPU_008554	none	 missing C-terminus\n
SPU_008585	SPU_008585	none	 missing N-terminus\n
SPU_008616	SPU_008616	none	 missing N-terminus residues\n
SPU_008646	SPU_008646	none	 missing C-terminus\n
SPU_008701	SPU_008701	none	 missing C-terminus, missing stretch in middle\n
SPU_008705	SPU_008705	none	 missing N- and C-terminus\n
SPU_008815	SPU_008815	none	 extra N-terminus\n
SPU_008875	SPU_008875	none	 extra N-terminus\n
SPU_008970	SPU_008970	none	 missing N- and C-terminus, missing small stretch in middle\n
SPU_009066	SPU_009066	none	 missing N-terminus residues\n
SPU_009082	SPU_009082	none	 missing C-terminus\n
SPU_009100	SPU_009100	none	 missing N- and C-terminus\n
SPU_009114	SPU_009114	none	 missing N-terminus\n
SPU_009249	SPU_009249	none	 missing N-terminus, missing stretch in middle\n
SPU_009359	SPU_009359	none	 partial, missing N-terminus;\n
SPU_009372	SPU_009372	none	 extra N-terminus\n
SPU_009392	SPU_009392	none	 missing N- and C-terminus, extra stretch in middle\n
SPU_009437	SPU_009437	none	 partial, missing N-terminus\n
SPU_009462	SPU_009462	none	 partial, missing N-terminus\n
SPU_009491	SPU_009491	none	 partial, missing C-terminus\n
SPU_009564	SPU_009564	none	 partial, missing C-terminus\n
SPU_009617	SPU_009617	none	 partial, missing N-terminus\n
SPU_009644	SPU_009644	none	 extra N-terminus\n
SPU_009768	SPU_009768	none	 partial, missing N- and C-terminus\n
SPU_009773	SPU_009773	none	 partial, missing N-terminus\n
SPU_009853	SPU_009853	none	 partial, missing N-terminus\n
SPU_009929	SPU_009929	none	 partial, missing N- and C-terminus\n
SPU_009998	SPU_009998	none	 missing N-terminus\n
SPU_010108	SPU_010108	none	 partial, missing N- and C-terminus\n
SPU_010205	SPU_010205	none	 missing N-terminus\n
SPU_010209	SPU_010209	none	 missing N-terminus residues\n
SPU_010247	SPU_010247	none	 missing N-terminus\n
SPU_010500	SPU_010500	none	 missing stretch in middle\n
SPU_011041	SPU_011041	none	 partial, missing N-terminus\n
SPU_011234	SPU_011234	none	 missing N-terminus\n
SPU_011283	SPU_011283	none	 missing short stretch in middle\n
SPU_011329	SPU_011329	none	 missing C-terminus\n
SPU_011566	SPU_011566	none	 missing N-terminus residues\n
SPU_011589	SPU_011589	none	 partial, missing C-terminus\n
SPU_011596	SPU_011596	none	 partial\n
SPU_011713	SPU_011713	none	 partial, missing C-terminus;\n
SPU_011928	SPU_011928	none	 partial, missing C-terminus\n
SPU_011980	SPU_011980	none	 partial, missing N-terminus\n
SPU_012279	SPU_012279	none	 missing some N-terminus residues\n
SPU_012344	SPU_012344	none	 missing some N-terminus residues\n
SPU_012348	SPU_012348	none	 missing some N-terminus residues\n
SPU_012369	SPU_012369	none	 missing stretch middle\n
SPU_012390	SPU_012390	none	 partial, missing N-terminus\n
SPU_012615	SPU_012615	none	 missing stretch in middle\n
SPU_012642	SPU_012642	none	 partial, missing C-terminus\n
SPU_012644	SPU_012644	none	 extra N-terminus\n
SPU_012698	SPU_012698	none	 missing stretch in middle\n
SPU_012790	SPU_012790	none	 partial, missing N-terminus\n
SPU_012817	SPU_012817	none	 missing stretch in middle\n
SPU_013130	SPU_013130	none	 partial, missing N-terminus\n
SPU_013232	SPU_013232	none	 partial, missing N-terminus\n
SPU_013283	SPU_013283	none	 partial, missing N-terminus\n
SPU_013353	SPU_013353	none	 missing stretch in middle\n
SPU_013586	SPU_013586	none	 partial, missing C-terminus\n
SPU_013757	SPU_013757	none	 partial, missing N-terminus\n
SPU_013853	SPU_013853	none	 extra stretch in middle\n
SPU_013906	SPU_013906	none	 missing stretches in middle\n
SPU_013936	SPU_013936	none	 missing stretch, extra stretch in middle\n
SPU_013966	SPU_013966	none	 missing stretch in middle\n
SPU_014019	SPU_014019	none	 partial, missing N-terminus, missing stretch in middle\n
SPU_014062	SPU_014062	none	 missing N-terminus, extra stretch in middle\n
SPU_014176	SPU_014176	none	 missing N-terminus\n
SPU_014341	SPU_014341	none	 partial, missing N- and C-terminus\n
SPU_014342	SPU_014342	none	 partial, missing N-terminus\n
SPU_014372	SPU_014372	none	 partial, missing N-terminus\n
SPU_014460	SPU_014460	none	 partial, missing C-terminus\n
SPU_014504	SPU_014504	none	 partial, missing N-terminus;\n
SPU_014595	SPU_014595	none	 partial, missing N-terminus\n
SPU_014653	SPU_014653	none	 missing central stretch\n
SPU_014694	SPU_014694	none	 partial, missing N-terminus\n
SPU_014710	SPU_014710	none	 partial, missing N-terminus and a stretch in middle\n
SPU_014718	SPU_014718	none	 missing N-terminus and central stretch\n
SPU_014939	SPU_014939	none	 partial, missing C-terminus\n
SPU_015089	SPU_015089	none	 partial, missing N-terminus\n
SPU_015155	SPU_015155	none	 partial, missing C-terminus and stretch in middle\n
SPU_015257	SPU_015257	none	 partial, missing C-terminus\n
SPU_015283	SPU_015283	none	 partial, missing N-terminus\n
SPU_015310	SPU_015310	none	 missing C-terminus\n
SPU_015323	SPU_015323	none	 missing N-terminus\n
SPU_015348	SPU_015348	none	 missing N-terminus and stretch in middle\n
SPU_015372	SPU_015372	none	 missing N-terminus\n
SPU_015485	SPU_015485	none	 missing N-terminus\n
SPU_015486	SPU_015486	none	 missing N-terminus\n
SPU_015507	SPU_015507	none	 missing N-terminus residues\n
SPU_015737	SPU_015737	none	 extra N-terminus\n
SPU_015738	SPU_015738	none	 extra N-terminus residues\n
SPU_015770	SPU_015770	none	 missing C-terminus\n
SPU_015842	SPU_015842	none	 missing N- and C-terminus\n
SPU_015851	SPU_015851	none	 missing C-terminus\n
SPU_015894	SPU_015894	none	 missing C-terminus\n
SPU_015966	SPU_015966	none	 missing C-terminus\n
SPU_016082	SPU_016082	none	 partial, missing N-terminus\n
SPU_016205	SPU_016205	none	 missing N-terminus\n
SPU_016277	SPU_016277	none	 missing N-terminus\n
SPU_016344	SPU_016344	none	 missing N-terminus residues\n
SPU_016345	SPU_016345	none	 missing N-terminus residues\n
SPU_016350	SPU_016350	none	 missing N-terminus residues\n
SPU_016377	SPU_016377	none	 missing N-terminus residues\n
SPU_016383	SPU_016383	none	 missing N-terminus\n
SPU_016494	SPU_016494	none	 partial, missing N-terminus\n
SPU_016520	SPU_016520	none	 missing N-terminus\n
SPU_016651	SPU_016651	none	 missing N-terminus\n
SPU_016702	SPU_016702	none	 missing C-terminus\n
SPU_016825	SPU_016825	none	 missing C-terminus\n
SPU_016826	SPU_016826	none	 missing C-terminus\n
SPU_016831	SPU_016831	none	 missing N-terminus\n
SPU_016860	SPU_016860	none	 missing stretch in middle\n
SPU_016882	SPU_016882	none	 missing stretch in middle\n
SPU_017058	SPU_017058	none	 missing N-terminus\n
SPU_017086	SPU_017086	none	 missing N-terminus\n
SPU_017099	SPU_017099	none	 missing C-terminus\n
SPU_017257	SPU_017257	none	 missing N-terminus\n
SPU_017403	SPU_017403	none	 missing C-terminus\n
SPU_017417	SPU_017417	none	 missing C-terminus\n
SPU_017478	SPU_017478	none	 missing N- and C-terminus; very similar to DAG kinase zeta form\n
SPU_017538	SPU_017538	none	 missing N-terminus\n
SPU_017573	SPU_017573	none	 missing C-terminus\n
SPU_017585	SPU_017585	none	 missing C-terminus\n
SPU_017661	SPU_017661	none	 missing N-terminus\n
SPU_017759	SPU_017759	none	 missing N-terminus\n
SPU_018110	SPU_018110	none	 missing N- and C-terminus\n
SPU_018127	SPU_018127	none	 extra N-terminus, missing C-terminus\n
SPU_018177	SPU_018177	none	 missing N-terminus\n
SPU_018270	SPU_018270	none	 missing N-terminus, missing stretch in middle\n
SPU_018322	SPU_018322	none	 missing N-terminus residues\n
SPU_018421	SPU_018421	none	 missing N-terminus\n
SPU_018431	SPU_018431	none	 missing C-terminus\n
SPU_018433	SPU_018433	none	 missing N-terminus\n
SPU_018446	SPU_018446	none	 missing C-terminus\n
SPU_018466	SPU_018466	none	 missing N-terminus residues\n
SPU_018527	SPU_018527	none	 missing C-terminus\n
SPU_018618	SPU_018618	none	 missing C-terminus\n
SPU_018620	SPU_018620	none	 missing N- and C-terminus\n
SPU_018748	SPU_018748	none	 missing N-terminus\n
SPU_018907	SPU_018907	none	 extra N-terminus\n
SPU_019001	SPU_019001	none	 missing N-terminus\n
SPU_019016	SPU_019016	none	 missing N-terminus\n
SPU_019288	SPU_019288	none	 partial, missing N-terminus\n
SPU_019357	SPU_019357	none	 missing stretch in middle\n
SPU_019398	SPU_019398	none	 missing stretch in middle\n
SPU_019420	SPU_019420	none	 missing N-terminus\n
SPU_019421	SPU_019421	none	 missing N-terminus\n
SPU_019428	SPU_019428	none	 missing N-terminus\n
SPU_019468	SPU_019468	none	 partial, missing C-terminus\n
SPU_019471	SPU_019471	none	 missing C-terminus\n
SPU_019505	SPU_019505	none	 missing N-terminus\n
SPU_019692	SPU_019692	none	 missing N-terminus\n
SPU_019744	SPU_019744	none	 missing N-terminus\n
SPU_019914	SPU_019914	none	 missing C-terminus\n
SPU_022002	SPU_022002	none	 extra stretch in middle\n
SPU_022028	SPU_022028	none	 missing N- and C-terminus\n
SPU_022045	SPU_022045	none	 missing C-terminus\n
SPU_022252	SPU_022252	none	 missing N-terminus\n
SPU_022260	SPU_022260	none	 missing short stretch in middle\n
SPU_022263	SPU_022263	none	 missing N-terminus\n
SPU_022326	SPU_022326	none	 missing stretch in middle\n
SPU_022399	SPU_022399	none	 extra stretch in middle\n
SPU_022597	SPU_022597	none	 missing stretches in middle\n
SPU_022651	SPU_022651	none	 missing N-terminus\n
SPU_022750	SPU_022750	none	 missing C-terminus, extra stretch in middle\n
SPU_022807	SPU_022807	none	 missing stretch in middle\n
SPU_022949	SPU_022949	none	 missing N-terminus\n
SPU_022986	SPU_022986	none	 missing N-terminus\n
SPU_023118	SPU_023118	none	 missing N-terminus\n
SPU_023270	SPU_023270	none	 missing N-terminus, missing stretch in middle\n
SPU_023332	SPU_023332	none	 missing N-terminus\n
SPU_023511	SPU_023511	none	 missing stretch\n
SPU_023691	SPU_023691	none	 missing N-terminus\n
SPU_023764	SPU_023764	none	 missing N-terminus\n
SPU_023829	SPU_023829	none	 missing C-terminus\n
SPU_023834	SPU_023834	none	 missing stretches\n
SPU_023842	SPU_023842	none	 extra stretch in middle\n
SPU_023859	SPU_023859	none	 missing C-terminus\n
SPU_023890	SPU_023890	none	 missing N-terminus\n
SPU_023942	SPU_023942	none	 missing N-terminus\n
SPU_020026	SPU_020026	none	 missing C-terminus\n
SPU_020089	SPU_020089	none	 missing N-terminus\n
SPU_020200	SPU_020200	none	 missing some N-terminus residues\n
SPU_020368	SPU_020368	none	 missing N- and C-terminus\n
SPU_020397	SPU_020397	none	 missing C-terminus\n
SPU_020402	SPU_020402	none	 missing N-terminus\n
SPU_020445	SPU_020445	none	 missing C-terminus\n
SPU_020497	SPU_020497	none	 missing N-terminus\n
SPU_020566	SPU_020566	none	 missing N-terminus residues\n
SPU_020576	SPU_020576	none	 missing C-terminus\n
SPU_020707	SPU_020707	none	 missing N- and C-terminus\n
SPU_020739	SPU_020739	none	 missing N-terminus\n
SPU_020886	SPU_020886	none	 missing stretch in middle\n
SPU_020970	SPU_020970	none	 missing N-terminus\n
SPU_021058	SPU_021058	none	 missing N-terminus\n
SPU_021355	SPU_021355	none	 missing N-terminus\n
SPU_021387	SPU_021387	none	 missing C-terminus\n
SPU_021465	SPU_021465	none	 missing C-terminus\n
SPU_021628	SPU_021628	none	 missing stretch in middle\n
SPU_021658	SPU_021658	none	 missing C-terminus\n
SPU_021772	SPU_021772	none	 missing N-terminus\n
SPU_021788	SPU_021788	none	 missing C-terminus\n
SPU_021802	SPU_021802	none	 missing N-terminus\n
SPU_021854	SPU_021854	none	 extra N-terminus\n
SPU_021867	SPU_021867	none	 missing N- and C-terminus\n
SPU_021895	SPU_021895	none	 missing C-terminus\n
SPU_021933	SPU_021933	none	 missing N- and C-terminus\n
SPU_021934	SPU_021934	none	 missing C-terminus\n
SPU_021979	SPU_021979	none	 missing N-terminus\n
SPU_024016	SPU_024016	none	 missing N-terminus residues\n
SPU_024174	SPU_024174	none	 missing N-terminus\n
SPU_024224	SPU_024224	none	 missing N-terminus\n
SPU_024261	SPU_024261	none	 missing C-terminus\n
SPU_024383	SPU_024383	none	 missing N-terminus\n
SPU_024483	SPU_024483	none	 missing N-terminus\n
SPU_024535	SPU_024535	none	 missing C-terminus\n
SPU_024622	SPU_024622	none	 missing N-terminus\n
SPU_024755	SPU_024755	none	 extra N-terminus; missing C-terminus\n
SPU_024775	SPU_024775	none	 missing C-terminus\n
SPU_024846	SPU_024846	none	 missing N-terminus\n
SPU_024873	SPU_024873	none	 missing N-terminus\n
SPU_024970	SPU_024970	none	 missing N-terminus, missing stretch in middle\n
SPU_025092	SPU_025092	none	 missing N-terminus\n
SPU_025100	SPU_025100	none	 missing C-terminus\n
SPU_025411	SPU_025411	none	 missing N-terminus\n
SPU_025567	SPU_025567	none	 missing N-terminus\n
SPU_025661	SPU_025661	none	 missing C-terminus\n
SPU_025697	SPU_025697	none	 missing N-terminus\n
SPU_025770	SPU_025770	none	 missing N-terminus\n
SPU_025858	SPU_025858	none	 missing C-terminus\n
SPU_025989	SPU_025989	none	 missing N-terminus, missing short stretch in middle\n
SPU_026127	SPU_026127	none	 missing C-terminus\n
SPU_026212	SPU_026212	none	 missing C-terminus\n
SPU_026291	SPU_026291	none	 missing some C-terminus residues\n
SPU_026437	SPU_026437	none	 missing N-terminus\n
SPU_026552	SPU_026552	none	 missing N-terminus\n
SPU_026556	SPU_026556	none	 missing N-terminus\n
SPU_026625	SPU_026625	none	 missing C-terminus\n
SPU_026639	SPU_026639	none	 missing some N-terminus residues\n
SPU_026702	SPU_026702	none	 missing C-terminus\n
SPU_026737	SPU_026737	none	 missing C-terminus\n
SPU_026881	SPU_026881	none	 extra N-terminus residues\n
SPU_027010	SPU_027010	none	 missing stretches in middle\n
SPU_027078	SPU_027078	none	 missing some N-terminus residues\n
SPU_027209	SPU_027209	none	 partial, missing C-terminus\n
SPU_027304	SPU_027304	none	 missing some N-terminus residues\n
SPU_027344	SPU_027344	none	 missing some N-terminus residues\n
SPU_027388	SPU_027388	none	 partial, missing N-terminus\n
SPU_027650	SPU_027650	none	 partial, missing C-terminus\n
SPU_027669	SPU_027669	none	 missing some N-terminus residues\n
SPU_028001	SPU_028001	none	 missing N-terminus residues\n
SPU_028105	SPU_028105	none	 missing stretch\n
SPU_028139	SPU_028139	none	 extra N-terminus\n
SPU_028141	SPU_028141	none	 missing some N-terminus residues\n
SPU_028167	SPU_028167	none	 partial, missing C-terminus\n
SPU_028178	SPU_028178	none	 missing N-terminus\n
SPU_028238	SPU_028238	none	 partial, missing N-terminus\n
SPU_028504	SPU_028504	none	 missing N-terminus, missing stretch in middle\n
SPU_028572	SPU_028572	none	 partial, missing N-terminus\n
SPU_028573	SPU_028573	none	 partial, missing N-terminus\n
SPU_028728	SPU_028728	none	 extra N-terminus residues\n
SPU_003616	SPU_003616	none	 partial, missing N-terminus\n
SPU_003731	SPU_003731	none	 partial, missing N-terminus\n
SPU_004870	SPU_004870	none	 extra N-terminus\n
SPU_008806	SPU_008806	none	 missing N-terminus half\n
SPU_010539	SPU_010539	none	 missing N-terminus half\n
SPU_013979	SPU_013979	none	 missing N-terminus\n
SPU_015618	SPU_015618	none	 missing most of the C-terminus\n
SPU_018531	SPU_018531	none	 missing N-terminus half\n
SPU_020972	SPU_020972	none	 missing some N-terminus residues\n
SPU_023656	SPU_023656	none	 missing N-terminus half\n
SPU_024406	SPU_024406	none	 missing N-terminus half\n
SPU_024714	SPU_024714	none	 missing some N-terminus residues\n
SPU_024790	SPU_024790	none	 missing N-terminus half\n
SPU_026832	SPU_026832	none	 missing most of the C-terminus\n
SPU_027484	SPU_027484	none	 missing N-terminus half\n
SPU_027859	SPU_027859	none	 missing N-terminus half\n
SPU_000334	SPU_000334	none	 extra stretch in middle\n
SPU_000967	SPU_000967	none	 partial, missing N-terminus\n
SPU_001285	SPU_001285	none	 missing stretch in middle\n
SPU_001563	SPU_001563	none	 partial, missing C-terminus\n
SPU_002346	SPU_002346	none	 partial, missing N-terminus\n
SPU_002465	SPU_002465	none	 partial, missing N-terminus\n
SPU_002709	SPU_002709	none	 missing N-terminus\n
SPU_003103	SPU_003103	none	 extra N-terminus\n
SPU_003394	SPU_003394	none	 partial, missing N-terminus\n
SPU_003812	SPU_003812	none	 partial, missing N-terminus\n
SPU_004751	SPU_004751	none	 partial, missing N-terminus\n
SPU_004990	SPU_004990	none	 partial, missing N-terminus\n
SPU_005195	SPU_005195	none	 partial, missing N-terminus\n
SPU_005196	SPU_005196	none	 missing stretches in middle\n
SPU_005590	SPU_005590	none	 partial, missing C-terminus\n
SPU_006347	SPU_006347	none	 extra stretch in middle\n
SPU_006438	SPU_006438	none	 partial, missing N-terminus\n
SPU_006617	SPU_006617	none	 partial, missing N- and C-terminus\n
SPU_009308	SPU_009308	none	 partial, missing N-terminus and stretches in middle\n
SPU_011046	SPU_011046	none	 extra N-terminus\n
SPU_011345	SPU_011345	none	 missing stretch in middle\n
SPU_012046	SPU_012046	none	 missing some N-terminus residues\n
SPU_012806	SPU_012806	none	 extra stretch in middle, missing another stretch\n
SPU_013454	SPU_013454	none	 missing stretches in middle\n
SPU_022704	SPU_022704	none	 partial, missing N- and C-terminus\n
SPU_027953	SPU_027953	none	 partial, missing C-terminus\n
SPU_011059	SPU_011059	none	 extra N-terminus\n
SPU_011934	SPU_011934	none	 extra N-terminus\n
SPU_012498	SPU_012498	none	 missing N-terminus\n
SPU_012676	SPU_012676	none	 missing C-terminus\n
SPU_012814	SPU_012814	none	 partial, missing N-terminus\n
SPU_013310	SPU_013310	none	 missing stretch in middle, missing C-terminus\n
SPU_013361	SPU_013361	none	 partial, missing N-terminus\n
SPU_013539	SPU_013539	none	 missing N-terminus\n
SPU_013587	SPU_013587	none	 partial, missing C-terminus\n
SPU_013735	SPU_013735	none	 extra stretch in middle\n
SPU_014024	SPU_014024	none	 missing C-terminus/central stretch\n
SPU_014152	SPU_014152	none	 missing C-terminus\n
SPU_014338	SPU_014338	none	 partial, missing C-terminus\n
SPU_014519	SPU_014519	none	 missing C-terminus\n
SPU_014596	SPU_014596	none	 extra C-terminus\n
SPU_014615	SPU_014615	none	 missing N-terminus\n
SPU_014937	SPU_014937	none	 missing N-terminus, extra stretch in middle\n
SPU_015162	SPU_015162	none	 missing C-terminus\n
SPU_015314	SPU_015314	none	 partial, missing N- and C-terminus\n
SPU_015853	SPU_015853	none	 extra stretch in middle\n
SPU_015946	SPU_015946	none	 missing C-terminus residues\n
SPU_016164	SPU_016164	none	 missing C-terminus and  stretch in middle\n
SPU_016679	SPU_016679	none	 missing N-terminus\n
SPU_016853	SPU_016853	none	 missing N-terminus, missing stretch in middle\n
SPU_016947	SPU_016947	none	 missing C-terminus\n
SPU_017326	SPU_017326	none	 extra C-terminus\n
SPU_017681	SPU_017681	none	 extra stretch in middle\n
SPU_006711	SPU_006711	none	 missing C-terminus\n
SPU_007009	SPU_007009	none	 missing C-terminus, extra stretch in middle\n
SPU_007217	SPU_007217	none	 partial, missing C-terminus\n
SPU_007403	SPU_007403	none	 missing C-terminus\n
SPU_008057	SPU_008057	none	 extra C-terminus\n
SPU_008192	SPU_008192	none	 extra C-terminus\n
SPU_008230	SPU_008230	none	 partial, missing N- and C-terminus\n
SPU_008397	SPU_008397	none	 missing N- and C-terminus\n
SPU_008467	SPU_008467	none	 extra C-terminus\n
SPU_008631	SPU_008631	none	 partial, missing N- and C-terminus\n
SPU_008633	SPU_008633	none	 missing N-terminus, extra C-terminus\n
SPU_008869	SPU_008869	none	 partial, missing N- and C-terminus\n
SPU_008934	SPU_008934	none	 partial, missing N- and C-terminus\n
SPU_008937	SPU_008937	none	 missing C-terminus\n
SPU_009014	SPU_009014	none	 partial, missing C-terminus\n
SPU_009204	SPU_009204	none	 partial, missing C-terminus\n
SPU_009458	SPU_009458	none	 partial, missing C-terminus\n
SPU_009494	SPU_009494	none	 missing N-terminus\n
SPU_009871	SPU_009871	none	 extra N-terminus\n
SPU_014156	SPU_014156	none	 missing N- and C-terminus, extra stretches in middle\n
SPU_014280	SPU_014280	none	 partial, missing N-terminus half\n
SPU_014789	SPU_014789	none	 missing N-terminus\n
SPU_015067	SPU_015067	none	 extra N-terminus\n
SPU_016878	SPU_016878	none	 missing some N-terminus residues\n
SPU_016897	SPU_016897	none	 partial, missing N-terminus\n
SPU_016916	SPU_016916	none	 partial, missing C-terminus\n
SPU_017736	SPU_017736	none	 extra N-terminus residues\n
SPU_017988	SPU_017988	none	 partial, missing N-terminus half\n
SPU_018396	SPU_018396	none	 missing N-terminus\n
SPU_018653	SPU_018653	none	 extra N-terminus\n
SPU_019095	SPU_019095	none	 partial, missing N-terminus\n
SPU_019683	SPU_019683	none	 missing steches in middle\n
SPU_020426	SPU_020426	none	 partial, missing N-terminus half\n
SPU_020467	SPU_020467	none	 partial, missing N- and C-terminus\n
SPU_020639	SPU_020639	none	 partial, missing N-terminus\n
SPU_020860	SPU_020860	none	 missing N- and C-terminus\n
SPU_021110	SPU_021110	none	 missing N-terminus\n
SPU_023232	SPU_023232	none	 extra stretch in middle, missing C-terminus\n
SPU_023320	SPU_023320	none	 missing C-terminus\n
SPU_023426	SPU_023426	none	 missing N-terminus\n
SPU_023457	SPU_023457	none	 partial, missing N-terminus half\n
SPU_023702	SPU_023702	none	 partial, missing C-terminus\n
SPU_023760	SPU_023760	none	 partial, missing N-terminus half\n
SPU_023846	SPU_023846	none	 partial, missing C-terminus half\n
SPU_025439	SPU_025439	none	 missing C-terminus\n
SPU_025470	SPU_025470	none	 missing N-terminus\n
SPU_025929	SPU_025929	none	 missing N-terminus\n
SPU_025934	SPU_025934	none	 partial, missing N-terminus helf\n
SPU_026466	SPU_026466	none	 partial, missing N- and C-terminus\n
SPU_026857	SPU_026857	none	 missing C-terminus\n
SPU_027797	SPU_027797	none	 missing central stretch\n
SPU_027850	SPU_027850	none	 extra stretch in middle\n
SPU_028290	SPU_028290	none	 extra C-terminus\n
SPU_028310	SPU_028310	none	 extra N-terminus\n
SPU_028802	SPU_028802	none	 missing C-terminus\n
SPU_003330	SPU_003330	none	 partial, missing middle stretch and C-terminus\n
SPU_003559	SPU_003559	none	 partial, missing C-terminus\n
SPU_005150	SPU_005150	none	 partial, missing N-terminus\n
SPU_006125	SPU_006125	none	 partial, missing C-terminus\n
SPU_006705	SPU_006705	none	 partial, missing N-terminus\n
SPU_006845	SPU_006845	none	 partial, missing C-terminus\n
SPU_007043	SPU_007043	none	 missing N-terminus\n
SPU_018060	SPU_018060	none	 extra stretch on C-terminus\n
SPU_019510	SPU_019510	none	 extra N-terminus half\n
SPU_020937	SPU_020937	none	 missing C-terminus\n
SPU_021169	SPU_021169	none	 missing C-terminus\n
SPU_024001	SPU_024001	none	 missing N-terminus\n
SPU_024891	SPU_024891	none	 extra N-terminus\n
SPU_025779	SPU_025779	none	 missing N-terminus, extra C-terminus\n
SPU_026067	SPU_026067	none	 missing N-terminus\n
SPU_026777	SPU_026777	none	 extra N-terminus half\n
SPU_026817	SPU_026817	none	 missing N-terminus half\n
SPU_026904	SPU_026904	none	 extra strectch in middle\n
SPU_026913	SPU_026913	none	 missing C-terminus half\n
SPU_027400	SPU_027400	none	 extra C-terminus\n
SPU_027510	SPU_027510	none	 missing N-terminus half, extra C-terminus\n
SPU_027914	SPU_027914	none	 missing N-terminus half\n
SPU_028587	SPU_028587	none	 missing C-terminus\n
SPU_028628	SPU_028628	none	 missing C-terminus, extra N-terminus\n
SPU_028876	SPU_028876	none	 missing stretch in middle\n
SPU_000862	SPU_000862	none	 partial, missing N-terminus\n
SPU_001188	SPU_001188	none	 missing N-terminus\n
SPU_001363	SPU_001363	none	 partial, missing C-terminus\n
SPU_001561	SPU_001561	none	 partial, missing N- and C-terminus\n
SPU_002713	SPU_002713	none	 partial, missing N-terminus\n
SPU_003022	SPU_003022	none	 missing C-terminus\n
SPU_003273	SPU_003273	none	 partial, missing N-terminus\n
SPU_003429	SPU_003429	none	 partial, missing N-terminus\n
SPU_004380	SPU_004380	none	 partial, missing N-terminus\n
SPU_004666	SPU_004666	none	 missing N-terminus\n
SPU_005157	SPU_005157	none	 partial, missing N-terminus\n
SPU_006877	SPU_006877	none	 partial, missing N-terminus\n
SPU_011374	SPU_011374	none	 partial, missing N-terminus\n
SPU_012901	SPU_012901	none	 partial, missing N-terminus\n
SPU_015291	SPU_015291	none	 partial, missing N-terminus\n
SPU_015488	SPU_015488	none	 partial, missing N-terminus\n
SPU_016048	SPU_016048	none	 partial, missing N- and C-terminus\n
SPU_017522	SPU_017522	none	 partial, missing N-terminus\n
SPU_019648	SPU_019648	none	 partial, missing N-terminus\n
SPU_019864	SPU_019864	none	 partial, missing N- and C-terminus\n
SPU_021685	SPU_021685	none	 missing N-terminus\n
SPU_023801	SPU_023801	none	 partial, missing N-terminus\n
SPU_024740	SPU_024740	none	 partial, missing N- and C-terminus\n
SPU_024922	SPU_024922	none	 partial, missing N-terminus\n
SPU_027274	SPU_027274	none	 partial, missing N-terminus\n
SPU_027775	SPU_027775	none	 partial, missing N-terminus\n
SPU_027849	SPU_027849	none	 partial, missing N-terminus\n
SPU_028164	SPU_028164	none	 partial, missing C-terminus\n
SPU_028496	SPU_028496	none	 partial, missing N-terminus\n
SPU_010026	SPU_010026	none	 partial, missing C-terminus\n
SPU_010208	SPU_010208	none	 partial, missing N-terminus\n
SPU_010211	SPU_010211	none	 extra stretches\n
SPU_010270	SPU_010270	none	 partial, missing C-terminus\n
SPU_010310	SPU_010310	none	 partial, missing N-terminus\n
SPU_010479	SPU_010479	none	 partial, missing N-terminus\n
SPU_010549	SPU_010549	none	 partial, missing C-terminus\n
SPU_010644	SPU_010644	none	 partial, missing C-terminus\n
SPU_010645	SPU_010645	none	 partial, missing N-terminus\n
SPU_010646	SPU_010646	none	 partial, missing N- and C-terminus\n
SPU_010683	SPU_010683	none	 partial, missing C-terminus\n
SPU_010785	SPU_010785	none	 extra steches in middle\n
SPU_011036	SPU_011036	none	 partial, missing N- and C-terminus\n
SPU_011133	SPU_011133	none	 extra residues on N-terminus\n
SPU_011213	SPU_011213	none	 partial, missing N-terminus\n
SPU_011438	SPU_011438	none	 partial, missing N-terminus\n
SPU_011449	SPU_011449	none	 partial, missing N-terminus\n
SPU_011604	SPU_011604	none	 extra N-terminus\n
SPU_011845	SPU_011845	none	 partial, missing N-terminus\n
SPU_011893	SPU_011893	none	 missing C-terminus residues\n
SPU_011978	SPU_011978	none	 partial, missing C-terminus\n
SPU_012289	SPU_012289	none	 extra N-terminus\n
SPU_012430	SPU_012430	none	 extra residues on C- and N-terminus\n
SPU_012473	SPU_012473	none	 partial, missing C-terminus\n
SPU_012675	SPU_012675	none	 missing N-terminus; extra C-terminus\n
SPU_012789	SPU_012789	none	 partial, missing C-terminus\n
SPU_012834	SPU_012834	none	 extra N-terminus residues\n
SPU_013242	SPU_013242	none	 extra N-terminus; extra residues in center\n
SPU_013338	SPU_013338	none	 extra N-terminus\n
SPU_013376	SPU_013376	none	 missing some N-terminus residues\n
SPU_013428	SPU_013428	none	 extra N-terminus residues\n
SPU_013440	SPU_013440	none	 partial, missing N-terminus\n
SPU_013750	SPU_013750	none	 partial, missing N-terminus\n
SPU_013912	SPU_013912	none	 missing N-terminus\n
SPU_013942	SPU_013942	none	 partial, missing C-terminus\n
SPU_013943	SPU_013943	none	 partial, missing C-terminus\n
SPU_014025	SPU_014025	none	 extra residues in center\n
SPU_014135	SPU_014135	none	 extra N-terminus\n
SPU_014136	SPU_014136	none	 extra N-terminus\n
SPU_014315	SPU_014315	none	 partial, missing N-terminus, matches on small C-terminus part\n
SPU_014406	SPU_014406	none	 partial, missing N-terminus\n
SPU_014507	SPU_014507	none	 partial, missing N-terminus\n
SPU_014569	SPU_014569	none	 partial, missing C-terminus\n
SPU_015048	SPU_015048	none	 partial, missing N-terminus\n
SPU_015062	SPU_015062	none	 extra N-terminus\n
SPU_015069	SPU_015069	none	 partial, missing C-terminus\n
SPU_015156	SPU_015156	none	 partial, missing N-terminus\n
SPU_015169	SPU_015169	none	 extra N-terminus\n
SPU_015343	SPU_015343	none	 missing N- and C-terminus\n
SPU_015380	SPU_015380	none	 partial, missing N- and C-terminus\n
SPU_015497	SPU_015497	none	 extra C-terminus\n
SPU_015568	SPU_015568	none	 partial, missing N-terminus and central stretch\n
SPU_015573	SPU_015573	none	 partial, missing N- and C-terminus\n
SPU_015586	SPU_015586	none	 partial, missing N-terminus\n
SPU_015781	SPU_015781	none	 missing C-terminus\n
SPU_015814	SPU_015814	none	 partial, missing C-terminus\n
SPU_015837	SPU_015837	none	 partial, missing C-terminus\n
SPU_015963	SPU_015963	none	 partial, missing C-terminus\n
SPU_015981	SPU_015981	none	 missing N-terminus\n
SPU_015997	SPU_015997	none	 extra N-terminus\n
SPU_016152	SPU_016152	none	 partial, missing central region\n
SPU_016169	SPU_016169	none	 extra stretch in middle\n
SPU_016302	SPU_016302	none	 missing N-terminus, extra C-terminus\n
SPU_016324	SPU_016324	none	 missing N-terminus\n
SPU_016347	SPU_016347	none	 partial, missing C-terminus\n
SPU_016408	SPU_016408	none	 extra C-terminus\n
SPU_016410	SPU_016410	none	 extra N-terminus\n
SPU_016456	SPU_016456	none	 extra C-terminus\n
SPU_016713	SPU_016713	none	 missing C-terminus and stretch in middle\n
SPU_016747	SPU_016747	none	 partial, missing N-terminus and stretch in middle\n
SPU_016824	SPU_016824	none	 extra N- and C-terminus\n
SPU_016988	SPU_016988	none	 missing N-terminus\n
SPU_016996	SPU_016996	none	 missing N-terminus\n
SPU_017107	SPU_017107	none	 partial, missing N- and C-terminus\n
SPU_017212	SPU_017212	none	 extra stretch in middle\n
SPU_017281	SPU_017281	none	 partial, missing C-terminus\n
SPU_017347	SPU_017347	none	 missing N-terminus\n
SPU_017415	SPU_017415	none	 extra N-terminus\n
SPU_017467	SPU_017467	none	 extra N-terminus\n
SPU_017481	SPU_017481	none	 partial, missing N- and C-terminus\n
SPU_017662	SPU_017662	none	 partial, missing C-terminus\n
SPU_017695	SPU_017695	none	 partial, missing C-terminus and central stretch\n
SPU_017801	SPU_017801	none	 extra N-terminus\n
SPU_017838	SPU_017838	none	 partial, missing N- and C-terminus\n
SPU_017965	SPU_017965	none	 partial, missing C-terminus\n
SPU_017990	SPU_017990	none	 partial, missing N-terminus\n
SPU_018166	SPU_018166	none	 partial, missing C-terminus\n
SPU_018179	SPU_018179	none	 partial, missing C-terminus and stretch in middle\n
SPU_018240	SPU_018240	none	 partial, missing N-terminus\n
SPU_018250	SPU_018250	none	 extra N-terminus\n
SPU_018251	SPU_018251	none	 partial, missing N-terminus\n
SPU_018271	SPU_018271	none	 missing stretch in middle\n
SPU_018489	SPU_018489	none	 partial, missing C-terminus\n
SPU_018522	SPU_018522	none	 ;partial, missing C-terminus\n
SPU_018751	SPU_018751	none	 partial, missing N- and C-terminus\n
SPU_018894	SPU_018894	none	 partial, missing C-terminus\n
SPU_018900	SPU_018900	none	 partial, missing N- and C-terminus\n
SPU_018975	SPU_018975	none	 extra N- and C-terminus\n
SPU_019004	SPU_019004	none	 extra stretch in middle\n
SPU_019007	SPU_019007	none	 partial, missing C-terminus\n
SPU_019118	SPU_019118	none	 partial, missing N-terminus\n
SPU_019177	SPU_019177	none	 extra stretch in middle\n
SPU_019278	SPU_019278	none	 extra C-terminus\n
SPU_019297	SPU_019297	none	 partial, missing N-terminus\n
SPU_019409	SPU_019409	none	 partial, missing C-terminus\n
SPU_019417	SPU_019417	none	 partial, missing N-terminus and central stretch\n
SPU_019634	SPU_019634	none	 extra stretch in middle\n
SPU_019640	SPU_019640	none	 partial, missing N-terminus\n
SPU_019712	SPU_019712	none	 extra C-terminus\n
SPU_019771	SPU_019771	none	 partial, missing N-terminus\n
SPU_019781	SPU_019781	none	 extra N-terminus\n
SPU_019811	SPU_019811	none	 partial, missing N-terminus\n
SPU_019885	SPU_019885	none	 partial, missing N-terminus\n
SPU_019970	SPU_019970	none	 partial, missing N-terminus\n
SPU_020038	SPU_020038	none	 missing C-terminus\n
SPU_020060	SPU_020060	none	 extra N-terminus\n
SPU_020138	SPU_020138	none	 missing N-terminus\n
SPU_020143	SPU_020143	none	 extra N-terminus\n
SPU_020155	SPU_020155	none	 partial, missing N-terminus\n
SPU_020162	SPU_020162	none	 missing N-terminus\n
SPU_020250	SPU_020250	none	 missing central region\n
SPU_020362	SPU_020362	none	 extra N-terminus, missing stretch in midddle\n
SPU_020435	SPU_020435	none	 missing N-terminus\n
SPU_020530	SPU_020530	none	 missing N-terminus\n
SPU_020673	SPU_020673	none	 extra N- and C-terminus\n
SPU_020679	SPU_020679	none	 extra N-terminus\n
SPU_020757	SPU_020757	none	 extra stretch in middle\n
SPU_020758	SPU_020758	none	 partial, missing C-terminus\n
SPU_020759	SPU_020759	none	 missing N-terminus\n
SPU_020773	SPU_020773	none	 extra N-terminus\n
SPU_020819	SPU_020819	none	 extra N-terminus\n
SPU_020850	SPU_020850	none	 extra N-terminus\n
SPU_020880	SPU_020880	none	 extra N-terminus\n
SPU_021035	SPU_021035	none	 missing stretches in middle\n
SPU_021103	SPU_021103	none	 missing N-terminus\n
SPU_021267	SPU_021267	none	 missing N-terminus\n
SPU_021559	SPU_021559	none	 partial, missing N- and C-terminus\n
SPU_021598	SPU_021598	none	 extra N-terminus\n
SPU_021625	SPU_021625	none	 partial, missing C-terminus\n
SPU_021652	SPU_021652	none	 extra N-terminus\n
SPU_021775	SPU_021775	none	 missing N-terminus residues\n
SPU_021781	SPU_021781	none	 missing N-terminus\n
SPU_021862	SPU_021862	none	 extra N-terminus\n
SPU_021941	SPU_021941	none	 missing N-terminus\n
SPU_022108	SPU_022108	none	 extra N-terminus\n
SPU_022135	SPU_022135	none	 missing C-terminus\n
SPU_022159	SPU_022159	none	 missing N-terminus\n
SPU_022186	SPU_022186	none	 missing N-terminus\n
SPU_022349	SPU_022349	none	 missing N-terminus\n
SPU_022476	SPU_022476	none	 extra N-terminus\n
SPU_022548	SPU_022548	none	 missing C-terminus\n
SPU_022584	SPU_022584	none	 missing N-terminus\n
SPU_022628	SPU_022628	none	 missing C-terminus\n
SPU_022639	SPU_022639	none	 partial, missing N-terminus\n
SPU_022796	SPU_022796	none	 missing N-terminus\n
SPU_022900	SPU_022900	none	 extra stretch in middle\n
SPU_023059	SPU_023059	none	 extra N-terminus\n
SPU_023144	SPU_023144	none	 missing N-terminus\n
SPU_023220	SPU_023220	none	 extra N-terminus\n
SPU_023237	SPU_023237	none	 extra N-terminus\n
SPU_023311	SPU_023311	none	 partial, missing C-terminus\n
SPU_023630	SPU_023630	none	 partial, missing N-terminus\n
SPU_023634	SPU_023634	none	 partial, missing C-terminus\n
SPU_023638	SPU_023638	none	 partial, missing N-terminus\n
SPU_023689	SPU_023689	none	 partial, missing N-terminus\n
SPU_023693	SPU_023693	none	 extra N-terminus\n
SPU_023816	SPU_023816	none	 partial\n
SPU_023909	SPU_023909	none	 extra N-terminus\n
SPU_023972	SPU_023972	none	 missing C-terminus\n
SPU_023979	SPU_023979	none	 missing C-terminus\n
SPU_024234	SPU_024234	none	 missing C-terminus\n
SPU_024264	SPU_024264	none	 missing C-terminus\n
SPU_024312	SPU_024312	none	 partial, missing C-terminus\n
SPU_024355	SPU_024355	none	 partial, missing N-terminus\n
SPU_024450	SPU_024450	none	 missing N-terminus\n
SPU_024522	SPU_024522	none	 missing N-terminus\n
SPU_024629	SPU_024629	none	 missing N-terminus\n
SPU_024668	SPU_024668	none	 partial, missing C-terminus\n
SPU_024736	SPU_024736	none	 partial, missing C-terminus\n
SPU_024774	SPU_024774	none	 partial, missing N- and C-terminus\n
SPU_024862	SPU_024862	none	 partial, missing N-terminus\n
SPU_024895	SPU_024895	none	 partial, missing N- and C-terminus\n
SPU_024949	SPU_024949	none	 extra N-terminus\n
SPU_025014	SPU_025014	none	 partial, missing N-terminus\n
SPU_025024	SPU_025024	none	 missing N-terminus\n
SPU_025036	SPU_025036	none	 extra N-terminus\n
SPU_025052	SPU_025052	none	 partial, missing N-terminus\n
SPU_025123	SPU_025123	none	 missing C-terminus\n
SPU_025196	SPU_025196	none	 missing some N-terminus residues\n
SPU_025227	SPU_025227	none	 extra N-terminus, missing C-terminus\n
SPU_025313	SPU_025313	none	 missing C-terminus and stretch in middle\n
SPU_025469	SPU_025469	none	 partial, missing C-terminus\n
SPU_025545	SPU_025545	none	 missing C-terminus and stretch in middle\n
SPU_025546	SPU_025546	none	 missing N-terminus\n
SPU_025709	SPU_025709	none	 missing N-terminus\n
SPU_025728	SPU_025728	none	 missing N-terminus and stretch in middle\n
SPU_025751	SPU_025751	none	 missing N-terminus\n
SPU_025758	SPU_025758	none	 missing N- and C-terminus\n
SPU_025917	SPU_025917	none	 missing N-terminus\n
SPU_025958	SPU_025958	none	 partial, missing C-terminus\n
SPU_026160	SPU_026160	none	 extra stretch in middle\n
SPU_026220	SPU_026220	none	 extra N-terminus\n
SPU_026273	SPU_026273	none	 missing C-terminus\n
SPU_026311	SPU_026311	none	 missing N-terminus\n
SPU_026352	SPU_026352	none	 missing N-terminus\n
SPU_026475	SPU_026475	none	 partial, missing N- and C-terminus\n
SPU_026558	SPU_026558	none	 missing N-terminus and stretch in middle\n
SPU_026714	SPU_026714	none	 extra N-terminus\n
SPU_026718	SPU_026718	none	 partial, missing C-terminus\n
SPU_026807	SPU_026807	none	 extra N-terminus\n
SPU_026833	SPU_026833	none	 partial, missing N-terminus\n
SPU_026931	SPU_026931	none	 missing C-terminus and stretch in middle\n
SPU_027152	SPU_027152	none	 partial, missing N- and C-terminus\n
SPU_027176	SPU_027176	none	 extra N- and C-terminus\n
SPU_027179	SPU_027179	none	 partial, missing C-terminus\n
SPU_027374	SPU_027374	none	 partial, missing N-terminus\n
SPU_027728	SPU_027728	none	 partial, missing C-terminus\n
SPU_027756	SPU_027756	none	 partial, missing C-terminus\n
SPU_027852	SPU_027852	none	 extra N-terminus, missing stretch in middle\n
SPU_027870	SPU_027870	none	 partial, missing C-terminus\n
SPU_028097	SPU_028097	none	 extra stretch in middle\n
SPU_028111	SPU_028111	none	 partial, missing C-terminus\n
SPU_028274	SPU_028274	none	 missing C-terminus\n
SPU_028305	SPU_028305	none	 missing N-terminus\n
SPU_028451	SPU_028451	none	 partial, missing C-terminus\n
SPU_028493	SPU_028493	none	 extra N-terminus\n
SPU_028554	SPU_028554	none	 extra N-terminus\n
SPU_028562	SPU_028562	none	 extra N-terminus\n
SPU_028586	SPU_028586	none	 missing N- and C-terminus\n
SPU_028794	SPU_028794	none	 partial, missing C-terminus\n
SPU_028836	SPU_028836	none	 partial, missing C-terminus\n
SPU_000314	SPU_000314	none	  missing some N-terminus residues\n
SPU_002759	SPU_002759	none	 partial, missing N- and C-terminus\n
SPU_002977	SPU_002977	none	 partial, missing N-terminus\n
SPU_003360	SPU_003360	none	 partial, missing N- and C-terminus\n
SPU_004289	SPU_004289	none	 extra N-terminus, missing stretch in middle\n
SPU_005521	SPU_005521	none	 partial, missing stretch in middle\n
SPU_005684	SPU_005684	none	 partial, missing C-terminus\n
SPU_005984	SPU_005984	none	 partial, missing C-terminus\n
SPU_006142	SPU_006142	none	 partial, missing C-terminus\n
SPU_006330	SPU_006330	none	 extra stretch in middle\n
SPU_006470	SPU_006470	none	 partial, missing N-terminus\n
SPU_006968	SPU_006968	none	 partial, missing stretch in middle\n
SPU_007015	SPU_007015	none	  partial, missing N-terminus\n
SPU_008652	SPU_008652	none	 missing C-terminus\n
SPU_008683	SPU_008683	none	 extra residues on N-terminus, missing stretch in middle\n
SPU_010567	SPU_010567	none	 extra stretch in middle\n
SPU_011965	SPU_011965	none	 partial, missing C-terminus\n
SPU_011983	SPU_011983	none	 partial, missing N-terminus\n
SPU_014671	SPU_014671	none	 extra C-terminus\n
SPU_017318	SPU_017318	none	 missing stretches in middle\n
SPU_018102	SPU_018102	none	  partial, missing C-terminus\n
SPU_019330	SPU_019330	none	 missing N-terminus\n
SPU_023389	SPU_023389	none	 missing C-terminus\n
SPU_023503	SPU_023503	none	 missing N-terminus\n
SPU_021699	SPU_021699	none	 missing N-terminus\n
SPU_021700	SPU_021700	none	 missing N-terminus\n
SPU_021800	SPU_021800	none	 missing C-terminus\n
SPU_024133	SPU_024133	none	 missing C-terminus\n
SPU_027188	SPU_027188	none	 missing stretch in middle\n
SPU_024028	SPU_024028	none	 missing N-terminus half\n
SPU_010213	SPU_010213	none	 extra C-terminus\n
SPU_014975	SPU_014975	none	 extra N-terminus\n
SPU_026855	SPU_026855	none	 missing central and C-terminus\n
SPU_028730	SPU_028730	none	 extra stretch in middle\n
SPU_021364	SPU_021364	none	 missing N-terminus residues\n
SPU_023364	SPU_023364	none	 extra N-terminus and missing C-terminus\n
SPU_023440	SPU_023440	none	 missing N-terminus\n
SPU_025997	SPU_025997	none	 missing N-terminus\n
SPU_026183	SPU_026183	none	 missing N-terminus\n
SPU_028101	SPU_028101	none	 extra C-terminus\n
SPU_028199	SPU_028199	none	 extra N-terminus\n
SPU_020635	SPU_020635	none	 partial, missing N-terminus\n
SPU_010555	SPU_010555	none	 partial, missing N- and C-terminus\n
SPU_012120	SPU_012120	none	 partial, missing C-terminus\n
SPU_013820	SPU_013820	none	 missing N-terminus; unrelated stretch in middle\n
SPU_014982	SPU_014982	none	 partial, missing C-terminus\n
SPU_016013	SPU_016013	none	 partial, missing N-terminus\n
SPU_017727	SPU_017727	none	 \n
SPU_018659	SPU_018659	none	 extra C-terminus, missing stretch in middle\n
SPU_018893	SPU_018893	none	 extra C-terminus\n
SPU_019341	SPU_019341	none	 partial, missing N- and C-terminus\n
SPU_028146	SPU_028146	none	 partial, missing N-terminus\n
SPU_028175	SPU_028175	none	 extra stretches in middle\n
SPU_028446	SPU_028446	none	 missing N-terminus\n
Sp-Tlr228	SPU_030080	none	Partial Toll-like receptor predicted by Fgenesh, NCBI and Genscan. This gene model occupies all sequence of a short scaffold and the nucleotides have 96% identity to a typical Sp-Tlr (SPU_027798).  \n
SPU_021908	SPU_021908	none	#\nPartial Toll-like receptor. The nucleotides of TIR domain have 99% identity to SPU_021907. Only 200bp of nucleotides in 5' upstream has high similarity to another Sp-Tlr gene. This gene may represent a recent duplication or assembly error. \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_021907	SPU_021907	none	Partial Toll-like receptor.  The nucleotide sequence has 94% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor related gene although no LRR is found in the upstream sequence.  \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_004792	SPU_004792	none	Partial Toll-like receptor. The nucleotides of this gene have 94% identity to a typical Sp-Tlr gene (SPU_005950). This is located at the end of a cintig and Unkown sequence (NNN) in the upstream region could make this gene model incomplete.\n
SPU_006278	SPU_006278	none	Possible duplicated gene: SPU_009260\n
SPU_009260	SPU_009260	none	Possible duplicated gene: SPU_006278\n
SPU_010513	SPU_010513	none	3' Partial   \nSPU_010488 is the 5' part of this gene\n
SPU_010488	SPU_010488	none	5'partial\n
SPU_020282	SPU_020282	none	Possible duplication: SPU_022904\n
SPU_022904	SPU_022904	none	Possible duplication: SPU_020282\n
SPU_021512	SPU_021512	none	Best empirically verified GenBank hit is poly(A) polymerase in Carassius auratus (goldfish), accession BAB39139, with E-value 0.0 and bit score 642. \n \nPSSMs producing significant alignments (indicating conserved domains) include pfam04928, "PAP_central, Poly(A) polymerase central domain" and pfam04926, "PAP_RNA-bind, Poly(A) polymerase predicted RNA binding domain". \n \nThe GLEAN gene model has been modified as follows: \n* The 3' UTR has been added, based on the Samanta embryonic expression data. \n* The first exon has been extended 5', based on BCM:Exonerate, NCBI:Splign, and the Stolc tiling array data.\n
SPU_021760	SPU_021760	none	Different parts of this gene are found in different scaffolds in a non-linear organization.\n
SPU_013967	SPU_013967	none	Different parts of this gene are found in different scaffolds in a non-linear organization.\n
SPU_013930	SPU_013930	none	Different parts of this gene are found in different scaffolds in a non-linear organization.\n
Sp-VC1_3	SPU_030081	none	This gene model is located at the end of a short scaffold (Scaffold120560).The nucleotides have 87% identity to another Sp-VC1 gene.\n
SPU_017609	SPU_017609	none	Duplicate prediction for SPU_015676\n
SPU_002696	SPU_002696	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nThere is an overlapping NCBI model that includes less exons but shows a similar alignment to vertebrate Map3k7. The size of this model is closer to that of its vertebrate counterpart, and thus we have decided to accept this model in its present form.\n
SPU_005254	SPU_005254	none	This model was annotated based on a manual inspection of multiple protein sequence alignments.\n
SPU_003955	SPU_003955	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nIt is possible that there is some N-terminus sequence missing from this model, as based on alignments to vertebrate Tab2/3 and given that this model is located next to a region of various gaps between contigs. \n \nThere seems to be a duplication of this model (SPU_012219). See the Gene Duplication page for further details.\n
SPU_012219	SPU_012219	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nIt is possible that there is some N-terminus sequence missing from this model, as based on alignments to vertebrate Tab2/3 and given that this model is located next to a region of various gaps between contigs. \n \nThere seems to be a duplication of this model (SPU_003955). See the Gene Duplication page for further details.\n
SPU_018598	SPU_018598	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nAll other gene prediction protocols provide an identical structure for this gene, which is also supported by the genome-wide tiling array hibridization data.\n
SPU_000742	SPU_000742	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nThis glean model is, in its present form, slightly largert than vertebrate Ube2 genes. An overlapping NCBI model (XM_791573.1) is slightly shorter; however it does not show a better alignment to vertebrate Ube2 genes. Therefore, in lack of additional evidence to favour either model, we have accepted the glean sequence in its present form.\n
SPU_028607	SPU_028607	none	first exon with cadherin-like sequences is most likely irrelevant\n
SPU_006829	SPU_006829	none	likely histone H2a pseudogene\n
SPU_012627	SPU_012627	none	likely histone H2a pseudogene\n
SPU_020123	SPU_020123	none	Sp-Elf has two splice variants differing in the 5' region: \nSp-Elf A       SPU_020124 \nSp-Elf B       SPU_020123 \n
SPU_021673	SPU_021673	none	This model was annotated based on manual inspections of multiple protein sequence alignments. \n \nThis model is identical in aminoacidic sequence to SPU_026252, and an inspection of the models strongly suggests the duplication is due to an assembly error. \n \nPlease refer to SPU_026252 for further annotation details (exon structure, sequence, etc).\n
SPU_026252	SPU_026252	none	This model was annotated based on manual inspections of multiple protein sequence alignments. \n \nThis model is identical in aminoacidic sequence to SPU_021673, and an inspection of the models strongly suggests the duplication is due to an assembly error.\n
SPU_010374	SPU_010374	none	Gene model includes 21 tandem Fibronectin Type 3 repeats\n
SPU_014498	SPU_014498	none	Predicted protein sequence matches exactly to est-derived prediction Sp-Gg1d, except an intron is found in 3' UTR\n
SPU_018408	SPU_018408	none	The gene model contains two exons that are not present in this protein as determined by cDNA sequencing.\n
SPU_005096	SPU_005096	none	ATP-dependent RNA helicase A (Nuclear DNA helicase II) (NDH II) (DEAH-box protein 9)\n
SPU_004517	SPU_004517	none	This model was modified and annotated based on a manual inspection of multiple protein sequence alignments. \n \nWe found that there was a gap in the alignment of the original version of this model with vertebrate/insect Pellino, which mapped to exon#5. The corresponding NCBI model, otherwise identical, has a slightly shorter exon#5, and shows a better alignment to other Pellino proteins. Therefore, we have decided to modify the GLEAN3 prediction accordingly. \n \nNB: The CDS for this model does not end with a STOP codon (i.e. there might be some C-ter sequence missing for this gene).\n
SPU_008228	SPU_008228	none	There is unknown sequence (NNN) in the intron of this gene model. So It is still unknown if this model is intronless or not. \n
SPU_019314	SPU_019314	none	cyclin G associated kinase/DnaJ (HSP 40) homolog. Gene prediction is not complete. SPU_000818 is a related/similar protein.\n
SPU_012096	SPU_012096	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nNote there is a slightly different FgeneshAB model for this gene; however, it does not provide a better alignment with other Ecsit proteins, and we have therefore decided to accept this model in its present form until additional evidence is obtained.\n
SPU_009399	SPU_009399	none	Sp-MAP2K5 spans two glean prediction: \nSPU_009399 and SPU_009398\n
SPU_000818	SPU_000818	none	cyclin G associated kinase/DnaJ (HSP 40) homolog. Gene prediction is not complete. SPU_019314 is a related/similar protein.\n
Sp-Gg2	SPU_030082	none	created gene model on basis of est data\n
SPU_001588	SPU_001588	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nWhile the best Blast hit for this model is to sea star regeneration-associated protease SRAP, a careful inspection of its size and domain composition reveals that it generally resembles members of the granzyme family and vertebrate marapsin. We therefore propose to name this and related genes Sp-Gra[nzyme]mar[apsin]-like. \n \nThe location of this model in the scaffold (away from ends or gaps), and the fact that other gene prediction protocols generated very similar models strongly suggest that this is not an incomplete model.\n
SPU_016107	SPU_016107	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nWhile the best Blast hit for this model is to sea star regeneration-associated protease SRAP, a careful inspection of its size and domain composition reveals that it more generally resembles members of the granzyme family and vertebrate marapsin. We therefore propose to name this and related genes Sp-Gra[nzyme]mar[apsin]-like. \n \nThe location of this model in the scaffold (away from ends or gaps), and the fact that other gene prediction protocols generated very similar models strongly suggest that this is not an incomplete model.\n
SPU_022264	SPU_022264	none	E1B-55kDa-associated protein 5 isoform a. SPU_022264 and SPU_022265 are likely incomplete/incorrect predictions for this gene.\n
SPU_022265	SPU_022265	none	E1B-55kDa-associated protein 5 isoform a. SPU_022264 and SPU_022265 are likely incomplete/incorrect predictions for this gene.\n
SPU_006871	SPU_006871	none	Inspection of the tiling array suggests that glean may have missed the following exons: KLYPVTLWKAHLVTKSAKMQEMGWVRKIMHGRLRNDTSGLSAGRSSLNLLLTGCCGWGGCGCCWSCCCCCCGGGPRGCWSTCCGWPIARPPPAGCGWTG,YQHVSQRSLQDKSKLSKILFSHMAKYYSKLHSLKCDIKIIIMQNIVSPGLLLILNLLQKLVYCDCALAIPSLGLKTPFLEK\n
SPU_007046	SPU_007046	none	Inspection of the tiling array suggests that glean may have missed the following exons: NGKHGEEGQREGGKENGKHREEGQREGMRRMRDMGIRDREREGGEWETWRGGTERGREENKKHREEGQREGGRRMRNMGMRDRERE\n
SPU_007360	SPU_007360	none	Inspection of the tiling array suggests that glean may have missed the following exons: EDDEDNDYFDPNESVVEEEESMEQSGTDGEDDDGVGDRGQVLPKEPKRADTKKGRSTSNKALICNICGLECEHGKALKQHLISHDPKSLQCSYCKRYFKRKGCLVFHLRTKHQVSIGKKWSRHEKEDLMTPSKIDEKGDDGDNDYFDPNERVVEKDESQEQSGTDGEDDEWVEDRGKELRKRRKRAVTKMSRNTSDKALICKICGLECEHSKSLKQHLISHDPNALQCSYCKWYFRRKSCLVFHLRTKHRVNVGKKWSRGNELVKNAVRAKAPKALEPKDLGDLQGGASTQLEDSSTTTILYSCKFCTKKFTKPDFLLKHEAIVHVNFRRYRCRVCRKAFSTKYALQSHSHIHVGEKRFECFICNRKFNSNSLLVRHLMHHDKPDNSDLVFAAMQPHVLSESGDPIDETVEAESAAPSTIDV,AESDPRSTHERHDKDQEQSQKLDPKEVKDQTQDKIFEQDDENKKTLPVCEICGEECKHNMALKQHLLSHDPTTYQCQYCDWFFKRKGCLIFHLRTKHKISAGRKWLRGTIDELLERDNALEDDSEEARLEEELRKQKRLQRLANNPKGPRHRCKLCGKECEHTRALKQHVMSHDPKSFQCKFCKWYFKRKGCLVFHLRTKHQVSVGKKWSRLEKEDLMTRSKTDEEENDEDNDYFDPN\n
SPU_007361	SPU_007361	none	Inspection of the tiling array suggests that glean may have missed the following exons: VREGIPHQKVHAEAQAKAAHDTIQTLQVQLLRQDFQRQHWARAPRAPPQGYPASCLPDLWKGIWDQVFLADAPAGPYRREEVLVPHLRSEICLEQHPYTASPAA,KKFPSEGHLKEHAAFHKEMRDVRPICEVCGLECKHNKALKQHLLSHNPHAYQCEFCKRYFRRRGCLVYHLRHIHQTFVGKKWTRGNAQEQMTRYVGPGEEDEEEEFHPELAKDMLGKHILIKSKKPRVFKCQFCPKKFIRHNLVCKHERTVHMNNGRFKCEFCTKTFMEEYNYTLHKRKHTKERPFKCTECPQSFASEKALINHQPEHRGERPFKCDECGKAFRTRKYMLKHKRRQHMTPSRLFKCSYCDKTFKDNTGRERHERRHKGIRPHVCLTCGKAFGTKYSLQTHLQVHTGEKKFSCHICDQRFALNNTLIRHLLRHDKVAASEDPALITMQEEVTVQNNSTASTGLQEVQL,GSNVQGHYQQEPAPHSQHGMPLQQIPSLPSAQQPSPPLQQGQAHTGSGDAYHIEDLSHNTSHSRLATSTSGGPMNTTPGSIGGCANTTKSNSRVIASTGRTPKKRAAPGSSPKQPSKPPSMRAPMTQIVTPQAYQQAQMLQQQQQHQPRPRECMSFSKHCQTEPLFQQVYNASMQFTKENDLPDGLEFTEDEKEKKISGVVATKDFEPGVEFGPFTGEFVKEGLGCFNPNTWEVIEQNHK\n
SPU_009474	SPU_009474	none	Inspection of the tiling array suggests that glean may have missed the following exons: CQVACEPTGPIPYAPEHTETHEAEIIRVQQHPNISVQGVHCSEEWQRLYRERLWTQQTSTSSTSKLEAAIHSEVQRRSPGANGQRPYVYPASALIQSDASV\n
SPU_009553	SPU_009553	none	Inspection of the tiling array suggests that glean may have missed the following exons: IVCMGCSKMFCLEENMSQHLRRCKGLRELLKRKKNLRKISSNSDDDDDDGFIPNEEEESCGVLKALEGPDRAMGQSTMGDLNETETFGVETNRQGEDSKGRVIDMLGVRSFQDASKPFKCSYCTKRFLTKNRLLRHKHNRHPKSSVFKCDHCDQTFPYKHRLLKHLPTHNKDKRYKC,KQPLEGRSHLEPPETSWSIRELLCDKEEHFVCVTCGKHFPTNGRLKAHERFHESTCEKFECDMCGAVFKTSLSLMRHKKIHTEIQFKCTLCFKKYTCRSHLSRHMHTAHGFERVRGKILCMGCSRKFLLEDDMLKHLKSCKG\n
SPU_009642	SPU_009642	none	Inspection of the tiling array suggests that glean may have missed the following exons: RSLFLFLPLFFFAFSAKASTRLFWICSCSGSMMAPASISFAAYSRLIPSMDDVRLGPVLALQLWFFKYFSLAYPLLHLSQWKGKSLVWSFMCSARLGLRLNVLEQWKHLKGLIPLCVMM,CSFGTRFGVTVMVLQVLLPGISLVALVTMERKVVGVELHVFSQVRPATECLGAVEAFEGFDPTVRDDVSFELVGSVERHVAACHRVEWTLEFLIRFMDQHVSFEFVLTVELCGADLTAEWFLAGVNENVRLQIVLTLKLLVTDEAFMQGLGAVGDEMASQVPLTSKDLVAFWTVELM,LVYFYKELTPFWLSADEIFDKHVIPILITMPWKAVCESLLPRKLSTEMNIIFCYLFTGGGTKHSFNYQSTEKTIRQSLTGLKKKSFCLSNSKKGFTIREFTD\n
SPU_009685	SPU_009685	none	Inspection of the tiling array suggests that glean may have missed the following exons: LRPLLDAAICSLFCVFLPSNSLSLSLSLSLSLSLSLSPPPPPPPLRLYLYLSLPLLFSVFLPSNSISVSLPLPHSSLFVSSF,FVSLSSSLPLPPSSLFCVFLPPNSLSFFLSLSLTLYLFLSLLLSVYSNLLILSLSLFLSLPLLFSMSSYLLPSLSLPISSLM,MLLYVLFSVSSFLLILSRSLSLSLSLSLSLSLLLLLLLLFDSTSTSPSLFSFLSSFLLILSLSLFLSLTLLFLCLPSNLSLSLPLYLSLPLLFSVSSYLLILSPSSSPSLLLSTSSSLFSCPCIPTS,FSLLLPLPLSYSLPLPLSSLVRVFQPPNSLSLSLPLPPSSLFYVFLPPSLSLSPHIVFNVRQSIHTVIALLVVCLEMALSCRSSTVGSDCLAQGSSKTVRPRDSSSPSGLGICHRTKQ\n
SPU_009831	SPU_009831	none	Inspection of the tiling array suggests that glean may have missed the following exons: ANILSLNQQSATGGQQIVTNGNVQYNITPQYQIDSEGNLITTHVATPVSVSQAQAQTTQTVVRQAPTTTSQATNVAGSNVAQIALPGAGVQYIQNGQIIQAVAPQPAPTQQRIMLGSQTITLQLAPNAVMSNANSHDQPVVTPVYNIISSTPPQGNQDNSAQTQQLQIQDQLQNAQIISNSGQIQAGNAANSNQATFVQVAGKPGQIILQQPQQAGQVQQIQVSGLQTSNTNVQTTQIRQQSGKVVASVIPQQVQQVQQVQQVQQQQTTTSQIISQQSSQTVQQQPQAQVIQIHQPIQGQAGTQISLQQQPGTGYYTI,VKPKLKLPKQLSGKHPLQHPKQQTWLEAMLHKLLCLVPGYSIFRMGRSFRRLHHNQHRLSNALCSALKPSPYNLLRMQSCPMQIHMINLSLHQSTISSAPPRHRATRTILPKLSSFRSKINFRMHKSSQIVAKFKQGMPPTVTRQHLCKLPVNQDKLSCNNHNKQDRCNKYKLAGSRHRIQMFRRRRSDNKVARLWHLLSHSRYNKCNKCSKYSSSRQQHHRLFHSSHHRQFNSSHRHRSYRSINPSRGKRALKSHYSNNQVLATTQ,CKSSPSSNYPNSCQASTHYNIPSNKRGWKQCCTNCFAWCRGTVYSEWADHSGGCTTTSTDSATHYARLSNHHPTTCSECSHVQCKFT,RAPDIEYKCSDDADQTTKWQGCGICYPTAGTTSATSAASTAAADNNITDYFTAVITDSSTAATGTGHTDPSTHPGASGHSNLITATTRYWLLHN,AAKSEHQFPCDWISMFVLVKRQPSGLLLEWNKVQKHNCGGPAVSSLTDIYKNPLFSKSTDWGNRYVRALVRLCRNWSRETTLVKRLPSCKSILTQIHAA,PPPLSLSLSSPPHTPATQQQITQQQQQQISQALVGMKAEKQQQASWQGVIQTEATGGAQGGTVTTISTNGTNYPPTMASYELQIDQSQIPKQEPPKKVRRLACTCPNCKDGDGR,LILSSLISFFFVLGEKKFVCKSCGKKFMRSDHLAKHQRTHIRKPGTVSMKGQSGGGDAPLQVDLSQGVDEFDEEMEKVMRDPQHEAMVQDAVQEAVY,SVLQSIQQINGGQFMQNPIVLKAPQTQVQTVHLQHGGTPVATPQSSSSQVQQQVITTDVSPAISTANNSTIAGLPNYNVHLAPLSPGPGPAGNTSAVNINTSTAGMFNSHLLNIP,VIFRPSDQFQLAPLYVFPHLFVCLFFCRNSEKGKKQHICHIADCGKIYGKTSHLRAHLRWHTGERPFVCDWLFCGKRFTRSDELQRHRRTHTGIPTQIII\n
SPU_009832	SPU_009832	none	Inspection of the tiling array suggests that glean may have missed the following exons: ALLAATCSKIGTPAEGQAGAVNQGQTVTVLGQNQGQAVQIPGGFINAANAQQIQQALGLPPGFPLQFTTASAGVAGQQGTAAGGPMYIEVGPGGNIPSSSVGGATPTKSINAANILSLNQQGATGGQQIVTNGNVQYNITPQYQIDSEGNLITTHVATPVSVSQAQAQTTQTVVRQAPTTTSQATNVAGSNVTQIALPGAGVQYIQNGQIIQAVAPQPAPTQQRIMLGSQTITLQLAPNAVMSNANSHDQPVVTPVYNIISSTPPQGNQDNSAQTQQLQIQDQLQNAQIISNSGQIQAGNAANSNQATFVQVAGKPGQIILQQPQQAGQVQQIQVSGLQTSNTNVQTTQIRQQSGKVVASVIPQQVQQVQQVQQVQQQQTTTSQIISQQSSQTVQQQPQAQVIQIHQPIQGQAGTQISLQQQPGTGYYTI,VKPKLKLPKQLSGKHPLQHPKQQTWLEAMLHKLLCLVPGYSIFRMDRSFKRLHHNQHQLSNALCSALKPSPYNLLRMQSCPMQIHMINLSLHQSTTSSAPPRHRATRTILPKLSSFRSKINFRMHKLSQIVAKFKQGMQPTVTRQHLCKLPVNQDKLSCNNHNKQDRCNKYKLAGSRHLIQMFRRRRSDNKVARLWHLLSHSRYNKCNKCSKYSSSRQQHHRLFHSSHHRQFNSSHRHRSYRSINPSRGKRALKSHYSNNQVLATTQY,FRGKFDHHSCCYPCQCKSSPSSNYPNSCQASTHYNIPSNKRGWKQCYTNCFAWCRGTVYSEWTDHSSGCTTTSTNSATHYARLSNHHPTTCSECSHVQCKFT,YKCSDDADQTTKWQGCGICYPTAGTTSATSAASTAAADNNITDYFTAVITDSSTAATGTGHTDPSTHPGASGHSNLITATTRYWLLHN,ILILSSLISFFFVPGEKKFVCKSCGKKFMRSDHLAKHQRTHIRKPGTVSMKGQSGGGDAPLQVDLSQGVDEFDEEMEKVMRDPQHEAMVQDAVQEAVY,SVLQSIQQINGGQFMQNPIVLKTPQTQVQTVHLQHGGTPVATPQSSSSQVQQQVITTDVSPAISTANNSTIAGLPNYNVHLAPLSPGPGPAGNTSAVNINTSTAGMFNSHLSNIP,LPPPPLSLSSPPHTPATQQQITQQQQQQISQALVGMKAEKQQQAPWQGVIQTEATGGAQGGTVTTISTNGTNYPPTMASYELQIDQSQIKQEPPKKVRRLACTCPNC,SIPTDTPVRLPSFICMPFFCRNSEKGKKQHICHIADCGKIYGKTSHLRAHLRWHTGERPFVCDWLFCGKRFTRSDELQRHRRTHTGIPTQIII\n
SPU_010295	SPU_010295	none	Inspection of the tiling array suggests that glean may have missed the following exons: IYLFSIPYIHSHHQPCLPLLLFVCPISSRDNVAQVNIAALVGNALRSVVPSPLLITRIPSLRILFRMTSMLPLYLFFPSNPSA\n
SPU_010922	SPU_010922	none	Inspection of the tiling array suggests that glean may have missed the following exons: ISDSRVQQLDDNLSSQAERGTPLDSSESDSSTSKCGPCNYANDSGKKQSVAERETLDQGLSHHVYGQMDTERTKECDEKCEETLIEGIGQSSQFGGTEENLAKPLFTLVAGEPYISQCRIL,SFNPHEKPPSRKPYQCLVCTVTGCGLYRNRVWFVPQQGVVCTATGCGLYRNRVWFVPLQGGVCTATGCGSYRNRVWFVPQQGVVCTATGCGLYRYRVGFVPQQGGVCTFTGCGSYRNMVWFVS,FRIVEYNSLMTIYLHKRSGELLWIHPKVTVLQVSVALVTMPTILVKNKVSLKERHWTRGFHIMFTDRWIQKEQKNVTRSVKRL,LKALDNLPSLEEQRKILPSPCLPSLQESRTSASVEFCNKEFSNVLDVESHTCLMRPTREMIFKCSLCNKKFTQSTHLLRHATDARNHKGMKSLYQCSLCNQRFFYLSSLLKHVKLHSQRYPCLVCDLRFSSEKRLSMHSWSHRKDEPCECAVCKKTFPDAMSLAYHTRTHVGRNPYQYSACDLRFSDEKRPTRATRYERNRTGKNLFECHICNKIYLYERALTSHMETHTIKKLLCSSCGELFHDNFDLSLHMRSSHTGEKPYQCSVCSERFSKANSLLIHMKSHPAENPTNVWFVP,QGVVCIATGCGLYRNRVWFVPQQGVVCTATGCGLYRYRVGFVPQQGVVRTATGCGSYRNRVWFVPQQGVVCTVTGWGLYRNRVGFVPLQGVDRTATWCGLYRNRVWFVPLQGGVCTVTGWGLYRNRVGFVP,KATQQKTLPMFGLYRNRVWFVSQQGVVCTATGCGLYRNRVWFVPQQGVVCTVTGWGLYRNRVWFVPQQGVVRTATGCGLYRNRVWFVPLQGGVCTATGWGLYLYRVWIVPQHGVVCIVTGCGSYRYRVGFVP,QGGVCTATGWGLYRNRVWFVPLQGGVCTVTGWGLYRNRVWFVPQQGGVCTVTGCGLYRYRVWFVPLQGGVCTVTGCGLYRNRVWFLP,ESSEEGDTLHHRESQPGRQRHGYAIGHRTPNFTRATTAITSSHDVQDPPPPDSHPSSRLHLTERQSYKKSTSSSFHQTWYIL,NSLEKVQRRGIRFITGNHSREDSVTAMQLDIGLPTLQERRLQSRLAMMYKILHHQIAIPLPDYISQKGRATRSQHHLRFTRLGTSSDSYKNSFFPRTMKGWDELP,XXXXXXXXITRKDKDGDCAEEDGAIEMNLQLSDQVHVNDGCSKEDGGKEIHASQLGEVNYLTLTGYVKEEPLDSDSMEGVLNGANWGMIKPQPLGEEG\n
SPU_011272	SPU_011272	none	Inspection of the tiling array suggests that glean may have missed the following exons: GWSSDGTSHHTRDIGRAFHPCGLFGGTARKLSKKIVFHIPHTRVTFPLRETSVYEKQGSPSKNKFYHNQSNGYAQHPGVFSKRVHVDCPL,VTWCTKTFLCHFALEPVARDRTQDTWCTITFLRHFALEPVARDRTQDTWCTITFLRHFELEPVARDRTQDTWCTITFLRHFALEPVARDQTW,TAHCSKFFITLFAFVWPSVFTIGMYSQMVILQGTLCEEFPTAFGTMIQTFFRVKSHDVSIATPLGSELDIAQGTRKSLDTGMYCKLMSLKVTGIAKCFVAL,PPEEVREPMRLSIVDSHVYTDEALLTEGLITDGTLMFFSFLHLSFKHLSFLHLSFLHLSFKHLSFLHLSFKHLSLKHLSFKHPFFLHLSFLHPSFKHLSFIHLSLKHISVMHLSFLHLSFMQSHMFRKATICKKLFITLRTLRRLLIIVPSHVHREGTLSQEPDTTH,VTYTKEPFIALWAWKRLLDIFRTFSPCIMTIFTMFFSGSGGSCKTVDQDLTSSFCRVDCVRITTLLSQRAYSCLFLTFECEQWSVQSTDDICH\n
SPU_011280	SPU_011280	none	Inspection of the tiling array suggests that glean may have missed the following exons: PSPYPISDIHVHTRGSLNVITEKRSSLKTWEDLTPGHKAFRPAQLKAHHTGPKLTALSNAILIFITGWMFDLKSHYDGGGGGTFVISDFDHIVQCEIIYMFSC\n
SPU_011583	SPU_011583	none	Inspection of the tiling array suggests that glean may have missed the following exons: NLPYPFLISLCLSLICGFRVILFLPLETYRPTSVNNSGIGSYGLLENTTFEQIRRVMETNFFGAVRMTQEVIPIMKKQRSGRIINISSTTGIFGEWKDAFIIE\n
SPU_012083	SPU_012083	none	Inspection of the tiling array suggests that glean may have missed the following exons: CISVSVSAVSLSLSLSSSIFSNNMSLSVTLFSLTTSLFSGFYLLPSFTSSHSFLYNLYISLSLSDLPSLSPTTCHIHCIMKTYT,YKFKHISIDAYQSPSLLSLSLSLSLLLSSLIICLCLSLSSPSLHLSFQVFTFFHLLLPLIHSFIISISLSLSLTYPPSLLLHVTYIVL\n
SPU_012546	SPU_012546	none	Inspection of the tiling array suggests that glean may have missed the following exons: LVVHSHRRGRGSKQQCKTVHEFHSFEEEGVVRCELYGDVLEPAFRALGQTIYHLTSVPAIVLSFLIQYHTDISELAPVEVEAL,LSAIKASVLSLIDIANWLRSILRSWWCIVTEGAGGPNSSARLFMNSIPLKKRGLSVVNCMVMSWSLRSGLSARRFITSQAFLRLSSPF,IFLMGESPEYAGKCVVNLAADKDVIKKTGRVLLTMELAEEYGFTDVDGHRPMNYRQLKALALMGGHTWVGAMIPGFIKIPFWALAAFTHKF,LLDGREPRVRRQVRGQSRGRQGRHKEDGPCAPHDGACRRVRFHGRRRPQADELPSAEGAGPDGWTHLGWSYDPWLHQDSLLGPGRLYSQILIGHIFLFLIVKNMEEERPEISMCNSLLYVFLYNDRKKSYMLAMQYCQS\n
SPU_012632	SPU_012632	none	Inspection of the tiling array suggests that glean may have missed the following exons: CSLCSGGDSSLQSSYQSMRNSQEMSGDRSCNGARHLPEDFPLQDYLGTQSDSSERESCTSENNSYIIPSQVPYFHEKRAANRETPDATTSYHKPGGMDKERNEDIMMELDETINQSSQLCATGSVQNDPSCSLAKEKPFLCCVCSKGFALRISLSRHMTIHG,EDFPLQDYLGTQSDSSERESCTSENNSYIIPSQVPYFHEKRAADRETPDATTSYHKPGGIDKDRNEDIMMEFDETINQSSQLCATGSVQNDPSCSLAKEKPFLCCVCSKGFALRISLSRHMTIHGG\n
SPU_012911	SPU_012911	none	Inspection of the tiling array suggests that glean may have missed the following exons: KSPSRKRGRPRGKTSSSVKSPKIAFSPEPATSQDDPGHLGRGMRKRKKKLSLDEVNSEDVDDGDDDDGDDDHGDDDDEEDILDDEEHDNDSEVEDESTEEQISNEANKNTIKLKPLEFKPRRRGRPRKNERRTHRKRDRNWSTMDDVKPKNEERVKLNLVVPMKVFKKILRDRADEWQKTYDDEHSLNLFRCEVERCRGMGMPEAEFDVHMKCHVSNMEGFRCFICQFMCLHWRNMRHHYCKVHDQMLSKVTCDFEGCQKEFPKYGALRTHVTISHIKPDLVTKLSTSTSDLSEFSSYLDNVKIKKIGKEDHEEDDEDDGEEPRKQRGRPPKRKRPVGRPPKDTEGYSRRQNLQRRVHGEDREKFQRRLVAFVCEVCSAKFNEEEKLMEHSLRHYHNDNDQINCTECEAFVTAEESSLRIHMSEEHKRLLQLHRCDKCNFSSNRFHDLKKHNIVHTGAKNFMCDKCGKCTTTPYNLKVHYRRMHASDEEKKIKCISCEYRCADKAVLKVTFHL,LFSNFIGATNKIISEHVMCKHANVRPYHCNICGWSTAYSGNMWKHVDTHQKELGDKMPEFPVNVVSTENHSVPTPLRAPSGKKRGMNKASNFKLKLAKPGKTRRQQQAQKTQQMEQTATISILDDNVQTIQVQAGGNLPEGVLMQVSE,QPTCIDCLISFYPKVPIHMGPNGQMMVTKGLSEEESIGSSALSRLAAAVASAQEVHIIQGNEGLEGGGQHQEHRIIATQVCLAFCPFF\n
SPU_012912	SPU_012912	none	Inspection of the tiling array suggests that glean may have missed the following exons: SKSEIGNHFLTDHMEAYVSPIKEEGKTVSTNDSKEPDGEKEMEEKAGEEIGEDEENLAEEVEEKRPAPRPRGRPPKKRNQKPIKKQPYYYIEEVEVKDEGEETGEGEEEPKEFGRGMRRRKKAIPRHILRYELDDDEEFMEDEDYNENEERNVLPKKRVPTIPPIVMKGKRGRPRSSGSQEKGRQDEGSPKLKTPKIFKKTPTVSKLPLSDKVIERILDDRVNEWYLVFQEKHVLPIPCPFDGCALDVTQAELDVHLQCHAANLEGFRCPIDECNFLCPHWSNMRVHYRKTHEPTFYRLLCDLDGCAMVFPRIDKKSIHIHVTRKHLRPELHDELSSKDFNIEKYEKYISIVKAEEADLMKQVGEQAEDDEGEAGTSLGEHSNVVHVQQMSAVEQLETESDFDTENSRLKKRGRPRGSTKAAKLARIAAGEVFEKKGKRDKKGKGKRSFQRVMNNVSFFCDVCGGKYKSEQAVFDHKTLHYRDENCNVLRCTECTEYSTEESSELREHVALSHKSLLHLHRCDECQFSTNRYHDLKKHILVHSGSKDYMCDKCGTCTTTAYNLRVHWRRYHAPESEKNVKCFACDYMCADNGILKVLFFSFSF,VTRASCICIAVMSVNSQPIATMISRSTFSSIRAARTTCAISAAPVPPRLTISESTGVATTLLNRRRMSSASLVTTCVLIMAF,LFIFILQEHIRSKHGLMVYGKDFDNARPLPTYACSQCDYIGRKKSSLAYHMRIHTENRQFKCHICPYASKTKNNLLLHIRTHEGLQPLKCPECDFRGKNDKGLDNSFER,HLGNYVHIFYPFLQDHIKQHHKSMLKNPIYHNCPHCDYVGHKRQSLEFHMRIHMEQRRFKCHLCPYASKTKNHLKIHMQTHDGFQSASCPDCNFKGL\n
SPU_012914	SPU_012914	none	Inspection of the tiling array suggests that glean may have missed the following exons: SKSEIGNHFLTDHMEAYVSPIKEEGKTVSTNDSKEPDGEKEMEEKAGEEIGEDEENLAEEVEEKRPAPRPRGRPPKKRNQKPIKKQPYYYIEEVEVKDEGEETGEGEEEPKEFGRGMRRRKKAIPRHILRYELDDDEEFMEDEDYNENEERNVLPKKRVPTIPPIVMKGKRGRPRSSGSQEKGRQDEGSPKLKTPKIFKKTPTVSKLPLSDKVIERILDDRVNEWYLVFQEKHVLPIPCPFDGCALDVTQAELDVHLQCHAANLEGFRCPIDECNFLCPHWSNMRVHYRKTHEPTFYRLLCDLDGCAMVFPRIDKKSIHIHVTRKHLRPELHDELSSKDFNIEKYEKYISIVKAEEADLMKQVGEQAEDDEGEAGTSLGEHSNVVHVQQMSAVEQLETESDFDTENSRLKKRGRPRGSTKAAKLARIAAGEVFEKKGKRDKKGKGKRSFQRVMNNVSFFCDVCGGKYKSEQAVFDHKTLHYRDENCNVLRCTECTEYSTEESSELREHVALSHKSLLHLHRCDECQFSTNRYHDLKKHILVHSGSKDYMCDKCGTCTTTAYNLRVHWRRYHAPESEKNVKCFACDYMCADNGILKVLFFSFSF,VTRASCICIAVMSVNSQPIATMISRSTFSSIRAARTTCAISAAPVPPRLTISESTGVATTLLNRRRMSSASLVTTCVLIMAF,TVLDGQTITLAEGISEEAAMGASALSRLSQGGEITVREVHFLQGGDNQQQHHEMITYNVPPVAVTQQIIAGGDMGHVISEAHYQQQQQQLQQEHEVEQHHYVEAGLQTVQVVTSHQDNGVPRAVTEQVVHAMPHPQDHHQDHHQDHRGADQNQDQPVVHQLIPMTLPHEAELVMSMMQAHQASISQSQ,SRDGTGGARHAAPSRPPPGSSSGSQRCRSKPGSACRAPAHSDDSATRGRASHEHDAGTPSKYLSVTVTYHQEICGSRGGAGAMRPFI,LFIFILQEHIRSKHGLMVYGKDFDNARPLPTYACSQCDYIGRKKSSLAYHMRIHTENRQFKCHICPYASKTKNNLLLHIRTHEGLQPLKCPECDFRGKNDKGLDNSFER\n
SPU_013406	SPU_013406	none	Inspection of the tiling array suggests that glean may have missed the following exons: FLSFSLSLSLSLHAPPLSLPLYNLFXXXXXXXXXXXXXXXXXESSEVLKEQAASCCNSNISGERNFAQLDSHLHHAPNIGIGKIESKVMFKANATRHWLQNKPQGSRKELIRNNIKAGAKERKREMENKVTYREKVKRRVREKQQKLTEKAEKARNKVEELIESILQDGVIEHRDECETKVNGMSKTRATGMLKAQIQFRTKILGQDIGKHALSKCSVDELKNILLSIPEPTDKNFLTDVKDPGQIINREFAQKWEKDEKEQWYNATVINLKDGEFEVSYQGNNELFYMTIAEFFTDIHL\n
SPU_013407	SPU_013407	none	Inspection of the tiling array suggests that glean may have missed the following exons: YHLITFLILWSFLRSYPSHNTRSFNTRSSNMKPADAGVRRGAILVLILPHPPHQLKHQANPSRLLNPSLQSPSLQKQLQASPFW,TKFGWMGFPQAAPINEESRLLYLVIQLMKNHPAVSSLSPTKTATNIRLRYKTICDRIMDDPLLSTLNLPLPNINSKSITNFISKQETKYNLMSTAQPKVVSHRRVISFDNIPDPVELPEVIPKPQYQELQYKIVQHEAGRRRGEKRRHPGSDPSSSTTPTEASGEPITSPKPIAPKPIAPKAAAGIPILVV\n
SPU_014197	SPU_014197	none	Inspection of the tiling array suggests that glean may have missed the following exons: TPFLLHLEISHYLLSLPSFLLPSLVLTDFFCTLYLLSPFYLFSFLPGSHILFLACFLSLSLSLSLSLSLPLYVYFSLLTHHPSSLSILHELRIMRICSNPEFSLLSA,CSLQVALHVSSLSSRLSFSLPLSLSLSPSLHFSDSLPLRGYYVVCGFANELELPGRCAAMYKCHWPLRTHTHTRCIHLQHTLALSHTHNTHTHSELSAHTAGKGGLIHAYRTKGLCS,CSCSHRALHSKLLLVSCASSSMLKIKPLLMILPTHSPCYYLYCHNTCHLVKICNSWFLFCAHPNICFRSLSSSPARFVVNIQVMNLYFREKSHSLLMSIPHPSDI,TGVGCMLLADYAIHDRAVHQSDRGTVFLHLDHNSRNCTTPLTTNCWKCIISLKEKEREGKNTLPTQHSNKSRKKRGTVITDFDWVCMYDIGSAPPIHGAGSVTHSRHLNTVHAWSSNDNEER\n
SPU_014684	SPU_014684	none	Inspection of the tiling array suggests that glean may have missed the following exons: LNFSDAQKFQCSLCDGLFTSAKLILRHIRCEHNSGDEMIPVLTWKKKKKKKYVAIEVSSKQQLATQIKVSDQEESAFKCGTCKKVFPSFGRLMAHELFHEKEQASPVNVKDSLTVTKQMKPQGSKQYVCSECTKEYKSWRSLNRHEREAHGYRCDFCLERFPKKKDCLTHEQTHQAFKSSQPAGKSKASPTKSRAASTGIQPSEPAPDEPKDMLGRPTDYYKRPYKCRFCTKRYSSRGTAERHEKEVHKGEGDFKCSYCTKVFATVSRLKDHLVLHKYVNMYRCTECPRSFASESALNNHQGEHTGLKPFKCEVCGRGFRTRKLALKHKRRIHQERPKRFLCTFCDKGFADKSDWKVHERRHKGIRQYVCLECGKGFTSSTSLAAHKQAMHIKVKPFSCAVCSKSFALNHQYNHHMAKHRLEGEGNALASMQQS,SGPVHSSANHLHHAHLQRPVAFGGHGVKFSSLPASYTPPITAQEPAVERNDVPVSLTTCVPVVERTDLPTTTRESVIDIPPISMRESPTDSCIPPTTKQELAEDSYMSPSTAHESPIDSSISPTTTQESVEDSSIPATTTSEQVTDSINIEPVATSEVAMDDDIPPAKSWGTVEERSIRLVTTHKPQAEGNTLRAQGEDSL,LGQSSIKSAPSLQSTDHRKQCLPSSKPFHQANASPPAPDVSAAEPPLFAALNLTKTISVMDLPLSLSLRVASQDKVEGVVAKDTVEKGVEFGPYTGTLLDEEQGSSKETTWEV,VSHLLNQLLHYKAPTIESSVYLRQSRFIKLMQVLRPQTSLLPNLLFLQRSISQRQSPSWIFHCLSHYELHRRTRSKEWLPRIQLRRGWSLDPTQEHCWMRSRDRLRRQPGRY,ANIIHQTPPQPPVTLPVHIRGHGVKSSTLSANYAPPINTTHGAVEEERNDLQIATHGSVEASKMLPLAFHKSVVERNFQPTTSYESGLHVESNALPIITCKSA,TRCPGVRDRTLVVTQRSVNKLFVARNCHSFTLKKPRQFILWQFTAPYLNKPLLSLSLSLSPSLYATSLNLENELSSASTDSNLTLYH\n
SPU_014685	SPU_014685	none	Inspection of the tiling array suggests that glean may have missed the following exons: SPVISDAQKFQCSHCEGLFSSAKLILRHIRCEHSDGEPCEMMPALAWKRKGKKKGREKSVAIKFKFNHPINHPIVRKRKNSEEEEECDFRCGTCVKSFPSLGRLKEHELFHEMMHGDKPYECSECNQRYTAQSSLNRHEREVHGFLDDYKPRSRPKRLKAHVPKKPLHCRYCGQGYKSRGALANHERRIHGSRHPIREPDLPNDEPKDMLGRPSDYYQRPFKCRFCPKRYVSWTTVEQHEKEVHTREGTFKCSHCPKVCASESRLKEHLVVHKYMHMHRCTLCPRSFASESALNNHQGEHTGLKPFKCEICSRGFRTRKLTLKHKQRMHQERPKRYICSICNKGFAEKCNLKVHERRHKGIRQFVCLECGKGFTARFSLTAHMQAMHIKERPFACEICGKSFALNHHYNHHMAKHRLDGDDSIPQ,RRMYRKSHFTVVTVAKGTNHAVHSRTTRGESMALGTRFGNRTYRTMSPRICLVDPLITTSDPSSADFVQRDTFPGQRLNNTRRRSTREKALSSAVIVPRFAPVRAV,SVKEIQTIKQREQCSSSSHQASASSSSSDTSNPTPNTSKDESQLLAALNLKKTKSIQDLPQNLLFRATPEGKVDGVVAKERIEKGVEFGPYAGTLLDEEQGWTRDTTWEVRRAVFHKTVF,FPLDSAHGVSNAGIIHQARQQLPVHLRGHGVKSSTLSANYAPPITTHEPIRERNDLPITTHESVSSIIQPLTTPESGAKSNVPRPQGTVCNFCLVGFC,MKKHEPKFYRCKKCNQKCKTKTALNKHEREVHGHQCRFCSERFFKKSECMKHEQTHQAFKSLKPAVKKHESLSKTQASSPTLIHQPSEPSPSEPKDMLGKSTNYY,TRCPGVRDRTLVVTQRSVNKLFVARNCHSFTLKKPRQFILWQFTAPYLNKPLLSLSLSLSPSLYATSLNLENELSSASTDSNLTLYH\n
SPU_015358	SPU_015358	none	Inspection of the tiling array suggests that glean may have missed the following exons: STASWRTESVSALTLMTSSDLRAWQCGSTVLSMSDVRAVLALLHHSASSSRCSKGVLFKTLPYSPEVKLALSDIMKTRSFRR,PIIFFHFFVRLSPTAYHSNFCEGYCPFPLDSHFNGTNHAAVQAILHTRKMKRRDGRRIPSPCCVPNSFTGLSVLYLNEEKNVVIKDFEQMVATSCGCH,RVLPLPPRLSLQRDQSCCRTSYSTYQENETQRWKTDSKSLLCTQQFHWSLGALFERGEERGHQGLRTNGCYKLRMSLMSVMQPRDRHRLWFHQ,LHGLDNQADDVSDGLLIIVAIASLIVLWSLSLVRFLFLSLSFGSSEFGLVISMVPHLEVLGWYEGCDEESRHDDQAQIQPAQMENFKRKTRSM,AGTGGCIVGAGSTPSEAAGTGLVAEGAACNGVGACGAVARLVLGAVRVLGAPYGFSVTLVVVAGAGGVGLVLGGAVAVLGAWLMMVVVVAAPGVRNWVCC,PSPHPFSSSPLFHVSLSLSTSPLFPFPLLSLTLLLYISLHLSFTLAAPLSKYIHNPLLNRPPSSLPAFMPFISFFPLVSPSRSLHYPISPSSLSSSSSIIFPLYILLFPKEEPC,SKKPNPGFYENILLFVIDPKPVLTFSSPLFFLSSLPCLTVSVYLSFIPLPPVIPYPLALYFSPSFLHFSCTSLKIYTQSSA,KPAFSMIHTHTHTHAPESHTQTHAHILQYSKPKLSMYVNTPLPASIQHAPLRTYSSLHRTPRKQAVLYIEAFDLKINRSRTEM,CDGLVLLVAPPFSSGIVHLKEGINGHGSLLLGLIATDNLYVFTQSDRNDATLLGARILCRGNKVSGNESQRSHMRNENTILFFLFSILKT,VHSKVNQVGLYHYRFLPAYRTHQTGLPFNNRCWLSSNDISRHLTQFNPHTITMHESGKPTSAMDRKARFTQRTTPDSQLASKVCIGFIARVVN,INRSRSTVKLIKLVCIIIGFYQHIEHIKRAYHLITGAGYHQMISPVTLLSSIHTLSPCTKVVSLQARWIGKRDSHNAPHQTHN,EPRSPPRSSTKGLHPRSQQGFEQTQTLFILPCRLNILNITIIDFDVDDGAYDDEHDDSADHDHGGVQWTSDDVDVDEAMSRSSMLQSTLK,GNLTLTGYKDEKSFKRSSSSTTKTSTLGRQRKTSSAPVETIRLMTSLKVHKFEMKTKKKVENALIIVNGSHPKFFRPDNK,RLRRRWKMLSSSSMDRILNSSVQITSELIIKIYCVYSLCLPCPLPLCPSLSFSLSLSLSLSLSLSYNLSYLLLPQFIYMTQSLLYV,WLDGKKKRNTLFLVFSKQYTLSTTTYCRKKKFLAKGVHSVQHMTVFRVHNHPLNMKQKHFFYAINLTIRMREKQTVKQGSISRTNH,LNDFMFQQERDSGISASFTRSTCESERSSSSSMSSEELNDLSPQRFFPTTVSFDLEDDDHNTSPVPSPTLREPSGRQANSIVRRSTFYRNSEGNTQNIY,AYYTTAKTEVTNGIPYLYDDDVLVLGRLLYFVKCGVFSSRCSHEGSQNHDDQLHHSTCYMHCYSNCIPAKICPTARSQWST,SLFHLSSLLSLFSLFSLSPSSHFLLYFLLSVSLFLIYKVYIQTEIIISNNYVCKERRRGTECGLPLQLCGSCLVLVQLHVHVFHLYSN,FSSKGYIFPQNHIHFVPLCIGTKLDVEGSIDIYCGGYISISYFYDIDRIFCLNELHMNIVLAKQMHSLLSLFVFDRLTRY\n
SPU_015640	SPU_015640	none	Inspection of the tiling array suggests that glean may have missed the following exons: LIHQKEYGMKSKSWRANFHLQHQMSSSTVHVSVCPNFLYRLLINRRKCKRRAHHPAKGHDEMLSIGSNKYHRVFLESSSTRKGERF,SLVYRHTWFTLLRVVCSKYFNINLVHSQLFYHSSRTFVLLFLPINSSMIYIFFLKTASPFPLSAQALSAAPFSTTWPLLGPRGSLQSCPQSEMFH,SELVNSLERQPTSIALYLMCYLGGGMSSKGYRLRSVRRLHVSRAQERKKRPVRFLCNLSPYRVAVLMTIFGHLVAWQSWRGHHLFSYHCAPILFISFHLSLSLSLSLS,SHCLSASLSLSLFIFLFYPPQSLVTSPSCCPSLHFFKSLNLSSHLPLFSTFSPSLPISSLSFSLVISLISSLDLSCKIYPGHFSFLPPQLYLIPFIMYLYLYFIITFVF,KPTRPIFPHSRFALFPYGINQCDAATASTSLSFPLALFTPLSIYLYHHSFFLEGERKKNKSRHTEARPGFSPKLGSEILKQ,SARNRKQNLTVTTRHLDYFSYDNQLPLTTPAPPFSISLSLSLSLSLSLSFYFFLPLSLSPSLSSSSPSPLLLCNHPFLLFSP,STPSYHPRSPFLYLSLSLSFSLSLSLFLFLSPPLSLSLSLFFFSLPSTPLQPPFSPLFSIILFLSNFLSLSFHNASAISLPVLSSN\n
SPU_015688	SPU_015688	none	Inspection of the tiling array suggests that glean may have missed the following exons: LAYSNSMLPTCRHHPFSDLNANFSHLSFIDFKFLVQNAIHNQRHIGITCYLINTDNYFVFIPPNQAILAPVCKFHQSSSTG\n
SPU_015772	SPU_015772	none	Inspection of the tiling array suggests that glean may have missed the following exons: PLLILPKTDFEFPVSSPEKNSMSKRQRMHMVIESTCYDQFVVEPIRSGKRGFFTYLVGLIFVRIVDDIFSHLSLPLFYLVLLLQ,IYQVSISVEFVSPTYLHVFICFPLLSLALSIVYTNATSFSMFVCFHFLNLFSPLLFVFSSSLFKTKSFLIFSPLLLPPSILCSVFILISSPSLFNSLLDINLHLSISLSPLTLPPSLPHAHCVTHFPFFTRNEFKEGYFLA,TCLYLVIKVPPTLASMCSSKVPTLASMCSSKVESYFGNRKHIWGKERDVMNLPSIYFCRICLPNLSTCIYMFSSTLFSLVHCLYQCNLIFYVCLFSFFKFVFSFVVCVLLISL\n
SPU_016250	SPU_016250	none	Inspection of the tiling array suggests that glean may have missed the following exons: RSCLPNSSQSKSSTLELSARTFMYLNTDTMPFLFKGLIRPILEYGQAAWSPYRLGEQRILESVQRRATKIIPGLRNLSYQERLTQLQLPTLIHRRIRGDMIDVY\n
SPU_017427	SPU_017427	none	Inspection of the tiling array suggests that glean may have missed the following exons: PPSSLPSSILSSILSPSLSLSLFLFLPLSLSPTPFVSPSPLLSHSFLYSSSFTLPHLHMILSCPSESHCFHFLFFSFLSFSSITPSLPSSSILSFSLSLSHPLCVSLSLIQSLFSLLQHIYPPPPPSTLIC,LCEDRIWTCMILSCPSENHFVFIFYFLFFILLFYNPLPPFPPPFSPPFSLPLSPSLSFSFSLFLSPPPPLCLHLPYSVTLFSTRAHLPSPIYI,FFLAHLKTTSFSFSIFSFLSFSSITPFLPSLLHSLLHSLSLSLPLSLSLSPSFSLPHPLCVSISLTQSLFSLLELIYPPPSTYDSFLPI,NRKKISYMATMGAPNNSYRYTCSWACPRYLAMWCPNKILWIYEHNFTSNLSFYIVSCLTLDPVELLINVKAKRFGTEHLITNSKMILQISRSKK,DCTESSLLTVPLSRVDSFFVVLEALTFVLLLDNSDCTESSLLDVPLSRVDSFFAVLEALTFALLLDNLDCAVLSLLLEILIDFVFFVVLSFKGSQSSTFLTIGNLLDFVLLLL\n
SPU_017656	SPU_017656	none	Inspection of the tiling array suggests that glean may have missed the following exons: IFGSFHHHHCFPLHHYMSSSFLRHFWISESYRHHHHYHHHLFLLLLLHYYQWLISFSFSSVAGYFEPANLLHYSHAHYHYPYLPS,LSIFDPLLFPGVLLHFWITSSSSPSSSSSSSSSLSSSSLLLSIFDLLLLLGVFVDFWIIPSSPLLSSSSLHVFFFPEAFLDF\n
SPU_017750	SPU_017750	none	Inspection of the tiling array suggests that glean may have missed the following exons: YLERSKFTLSLLSLILLLFIARCFLGRCRLRGLVQTGLSSRQMTEAMNHVQDWYNDPRSIHANEVEPEVERVTILAMGKSITELGHEENDSTCNEHIL,DLFSYSCERSSYQKVASLKTASSIYIIYRNTNGPLSSNSYRADICTLRKDNCLKLSFTGITSKLQNADCTRPLTKNIESTIHKTCTWKIFTVR\n
SPU_017811	SPU_017811	none	Inspection of the tiling array suggests that glean may have missed the following exons: GFRNQVAKILSISQQCSCAPARGQQMMCLLHTILETPGIVHITSCTCAVSERSLFIYLFLKYIPLFIIAAKVPSRCIWLAKCRDMPGEHEALFRLRNPLL,IQMEFLRCYCVMYVHENGRIFLIIDQNSLCYFILFVHSNTHLGLPQSGCQNFKHFAAMFVRTCAWPADDVFTTHNTRNSRHCAYNQLHLRSFRTFSIHLFISKIHSAVHYCC,TVYVLRSRREHMSCFAYGCGWFVFTVGKKIFIAYVSLVILCVVYFLFSLMFTHVVVYCERGNDTKYEGHILGRYVQFVTRWRQDDRVVSQ\n
SPU_017846	SPU_017846	none	Inspection of the tiling array suggests that glean may have missed the following exons: ESPVNPIELWKKGPIVARSVVKTDEDVLPVRVFNPTREQRNIKAGTAIASLSAVEEVGTEMCNQTMPTETSASRNMPPDKQGNKDVDARRTTTHFFQCDSCTKSYEKKDSLQRHKREVHILKYRCDQCDYRTGRKTEMERHQGAHEVAVVPVRKHSLTPVSSPRCKPTSTPITGTSEKADASWRHVRMDWRSPPRDDPRLDRPRGRSPRRYRSSRRKSKSPQRDLTKSSCQKTPESPQHDGSEAS,RCVNCLSGCSIQQGNMGSQRQALPLPHCQRWKKWAQKCAIRRCPLKPVQAETCHPTSRETRTLMPEVRLPTSFNVTLVPKVMRRKTL,DFLVQINAVLDCQKMELRTEWGIIPCLDSEGESFCRRIVAGEEYSIPPGHEMVLANRVTGEKIALCEGLVESPVNPTELWKKGPIVARSVVKADEDVLIACQGVQFNKGTWDHKGRHCHCLIVSAGRSGHRNVQSDDAH,LPVRVFNSTREHGITKAGTAIASLSALEEVGTEMCNQTMPIETSASRNMPPDKQGDKDVDARGTTTHFFQCDSCTKSYEKKDSL,WCQKWDPVTMAQGRNRQNRPTPGEMMKHLERSMTEVDQMMEQLRASMSQVPVADPEPGRQEAFPPTLSGQTASGPTADSQMPAAQRVRFSSTPREQSRRLPSAGTGVHITPPLFSPP\n
SPU_018373	SPU_018373	none	Inspection of the tiling array suggests that glean may have missed the following exons: LKMKDKVSRMETLLVELHKVVVERSKQTTVRGQPYEGNESLILIGDSEHAVHVDKRRFQRAVHLATSVHLTKQTTERGQPYKAVEGNESLILIGNSEYAVHVDKRRFQRAVHLANSGRALLLKLMSMVIQPDELGNFSYRGDRNMEPPLDSLIDDDRFKAIQLQVRKSFPCFDAPKNLRRIRDAVNGKCRKLRRISSPS\n
SPU_018812	SPU_018812	none	Inspection of the tiling array suggests that glean may have missed the following exons: RVLSKKYHYNPLRIHRGETLLKVCTLYIVIRESLPKDNITCHLRIRYIQAREKQFGFGAGQEFQGAHCFLPPPPSTPGTSAYI\n
SPU_018850	SPU_018850	none	Inspection of the tiling array suggests that glean may have missed the following exons: VRDIYINTQTCTYQVASDHINNMSVSKNIYSSSTDVCTFAVQNFLYIKQKQLESGLRKFYIVSVCYLVLLLNLISCKLYIDSL\n
SPU_019435	SPU_019435	none	Inspection of the tiling array suggests that glean may have missed the following exons: YCEEVYMLERNEYALCRLCELCKPLQSFIVYGSSWRSCYFGNPHGSCPQNICQIFLDFRRCCILVSCKMIRYVYCRHEEYQSL\n
SPU_019824	SPU_019824	none	Inspection of the tiling array suggests that glean may have missed the following exons: NCQVSFSTEASPAGLARKRPDPGVGAEVISQIRACAELLVTYPALVRLLACMHAYMLDEVTLRCVSLGTYSAAVGSLPGMTTHVEGEASLCTAGFVAQLAGEWTFTCMNSHVNCQVSFTTEAPPAGLARKRLDPGVGAEVISQVGRLAERFLADGACEWLLTRMATHVAFKVASFDKALVADTAFEWSL,CAMVLWPLARCSFQRICLFVLFGYKIYLTVNSNSRFTNHSHTLSRGFYIQTLPAIYLNHVIGWRSWICIRYMLALKLSCTQTIFHQVISIRSL\n
SPU_019825	SPU_019825	none	Inspection of the tiling array suggests that glean may have missed the following exons: AEFFQTYITFVRFLVGVGSHVPGQIRACAELLVAYAALVRLLACMHADVLDEVALRFKRLWTCSAAIRSLPSMATHVKGQVALRTAGLVAQLAGEWTFTCMNSHVNCHVSFTTKAPPAGLARKGPDPGVGA\n
SPU_020311	SPU_020311	none	Inspection of the tiling array suggests that glean may have missed the following exons: STNGIHNIAYAQCRKQLRTPKCDEVARSMPKRCGRTEGSCSVAQVSLEPRERSNRPRFQTNKQHHVVRIILKTEHGPIISIFYRFSKVFRSEFVECI,LCKLFFVKCLFEVQMVFTISLMRNAESSFGRRSVTKLLEACRKDAVEPRVRVASLKCRLSLANVPTDQDFRPINSTMLSE\n
SPU_020481	SPU_020481	none	Inspection of the tiling array suggests that glean may have missed the following exons: MARVFAFACPSAHTGQACSWLFSWNNSCAFSLPQEGKIFPQVPHLKTNSFLSSLSFMSLSSFSFLTCCLMVNSVSAGFFFFFLYVRTGIISSGIRLVCSHLMCLRISLADVNRPSHWLHWNF,RLESLVCLLMLHTFTLLKKPFRTEPALMPMDFPLMFIEGSLCFALLVAFLTPVELGFMFFHVNGKGLCIRLSFSTHRTSMFLAVFMEQLVCLQPTAGRKDLPTGSTLENQLFLVVTVIHVTV,KHEREVHGHQCRFCSERFFKKSECMKHEQTHQAFKSLKPAAKKHESLSKTQASPPTIHQPSEPSPSEPRDMLGKSTNYYLQQRPFKCRFCPKRYVLRKKVNEHEKECHTGEAAFKCTHCPKIFTSKAAMMIHMKCHEQHRMYRCTLCPRSFASESALNNHQGEHTGLKPFKCEVCGKGFRVKKAVYAHRRRMHQERPKRFFCSVCDKGFADKANLVKHERRHKGIRPY,RKKSQCKEQSAPATNSSSSDTAQLLVALNLKKNAIISSSQELPKGLIFKVASEDKVEGVVAKDTMEKGVEFGPYTGTLLDEEQGWSKDTSWEVGRNSGNNNIDRMFL,VRNHNVKNKVLQPQTPHPRIRPSFSWLSISKRMQSYPRVKNFQRASSSRWLRRTRWREWLPRIQWRKGWSLDHTQEHYLMRSRDGPRTHHGR,ISGTQKFQCSQCEGLFTSAKLILRHIRCEHTSRIPDEMIPVLTYKKKKKKPAETELTIKQHVRKEKLDSDMNDSDDTKELVFKCGTCGKIFPSCGRLKAHELFHENSQEHACP\n
SPU_021205	SPU_021205	none	Inspection of the tiling array suggests that glean may have missed the following exons: PRPFYYLLHILSPPVCFFLPLFISSPPPPPLSVSLSAYISPPSPFPSASRSFSLSPLLPSLSAYQSLLSPLPSASSSLSEAPLLLLLLSLSLSPLSPPIYLHPIPSRLLLTPSLSLLLLLLLSLSLSPPISIRFFLSRPHSL,FDHALSTTCSTSSPLPSASSSLSLSPLLLLLLSLSLSPPIYLHPLPFRLLLAPSLYLLSFPLSPRINLYSLLSRLLLPPSLKLLSSSSSSLSLSPRSLRLYISTRSPPVCFSLPLFLSSSSSFSLSLSLRLSLSASSSLVPIPYKTQFT,LFLPRTIDLTTPFLLLAPHPLPSRLLLPPSLYLLSSSSSSLCLSLRLYISTLSLSVCFSLLLFISSPSLSLRVSISTLSSPVCFFLPL\n
SPU_021382	SPU_021382	none	Inspection of the tiling array suggests that glean may have missed the following exons: FGEEDPFTERLEHFMTMPQDQGQFRYTDMMDTNLPHTMDGEEYLQQTMDTSSNSFTCDLCQKPCGNRTNYFVHLRIKHAEELPYECGICEERFSTEIILKDHIHSSHTGDEGDFFKCDVCAARFSEKQYLHAHMLSHNEYATFPCHICRKTFMKRKDLQKHLSSHAKTIRESIPCPMCNKVFRLPRNLAIHLRTHSATFICQLCSEKPQSQEVPTSDDHKDGLGRTSTPEENCTEQEEIPGDEQQLFCKVCSITFTDQEDRIGHKCKVNRCSNCLKDFSCPSLLAADGESCSSEENPMCDTCNKAFSLMDKVKPDEHASKVEKGTVQCRMCLKRFPKPILNYSKPISNHPKRNQPCAAKEASNSPLRFPCRICGNIYFMKSTLRKHVKVHEREHIYLCEICDTIFHKKKTYKRHLKVHDEKRLKCPKCKVT,GPTVLPLYVSFVLRNLSRRKFLHLMTTKMALVGLLLLKKTVQSRKRYQGMSNNSFVKYVASLLQIKKTASDINARLTAAVTA\n
SPU_021383	SPU_021383	none	Inspection of the tiling array suggests that glean may have missed the following exons: IEKSPSVQEIKSDLDTKGGENQPQEALHQKIHLPGESQTSSENLVAVGSSPEQLPRHHCIYCKEEFQSMQTLLHVDHACVRINSTTKRCPICPKQVGSRKKFRQHIVSHNLTCKTRHKNQAANRMMDISQVKSPGQTCRCEWCHKEFDDRNKLIDHTRIHFAYGRNQCPVCERWYTNTTYFRQHVRVHGIILSDKDRQYNIQQAGPSKEQHRRCLWCDKKFQSSDKLAAHTYLHVARGRNCCPVCGKWVGRRKRSVFRAHLLTHGIKTFCDDKGQSCKRSVKGSHATQGLKQEVKSVVDGLTQSEVQGENSSRVPTHRCAWCHLGFHDLETLVNHTHDHIHSGMNRCPVCCLSFSGVWGFKLHLSVHGIPTPYRK,QDDVLSLGRSPLSSHTQTFADSLTSGTPPCVSNSDQGPPHPDENVQAILDTVSMAASQVQTSCRLPEHCCQWCGREFDGLEELVAHTRIHIKDGKNQCPVCKKWLSNKSNFKQHLKRHGIIPPFENKSVSIEHIRSNSMFISQMGKQRKSSTSIKKSYLSRKSSKYKTSLSIGSSDLKSLRSAHKRMLIDHPSSASSHSQRDEMSLDLKSLRSAHKRMLIDHPSSASSHSQRDEMSLDLKSLRSAHKRQLIDHPSSASSHSQRDEMSLDLKSPRSAHKRQLIDH,VIDGREKGLLEENVAGDLESPITRVVRVQITKENYQDLIEDETFDFQNEMDPCQELPHAKEPCILSDPQSGPFEYGTCRLCKKACGNRRMYLMHLRTKHSEELPYECQVCKARYLDEEDLESYNRKRVGQIENGSKEENTCELCGDKTSYACHTCNMEFDNRPKLIVHQKVHCKGKHCYCHLC,SGELPVSQIDSKDEDVHRAKGQNDDISCKRPHLAKDTWTMREQTKAETPHGDGRCDEVQQSSDERGRSNSSKSHVPHVLPVGRQRQNQESQQSLEKKQNVQITGNEMSGSQVGQKGMKKQTAGLGKNDDSVLTLY,SLDANSDASASQGQRKEVEQQTNQCHIDHISHSQRSLHLTSDTQALTESPEPANQPIHRCLWCVQEFSSLNELNVHLQIHIVKGKNRCPVCKKWLCNEYYFKTHLQLHGFKIEKQRQIKSLKKKSETNQITKSQIIFRRAPVRGQRH\n
SPU_022474	SPU_022474	none	Inspection of the tiling array suggests that glean may have missed the following exons: SEDGCLEVDNEGASEGEHLKKEDHEDSESGWLPVLEGISAVSQDSSQVQPRLENSSQSERELDESMNDPTNSPPTCDEDSTGKQGFDCTKCKKRFSVESDLGSHVMTCQSARRFSRQSRI\n
SPU_022892	SPU_022892	none	Inspection of the tiling array suggests that glean may have missed the following exons: KASFSIASLPSLFSILLRLPSPSLVLPLSFPWSLSFLYLVLSLSLTLSPSSSHCFPLPLPLSLCLCLSFSTLSPFYLHHLRLAHLYL,RHLSQLHLSPLSFLSCSVFLHLVLFCPSPSLGLSHFYISFSPSLLHFHPLPLTVSLSLFLSRSVSVSLSLLSPLFICIISDWLIYIYNNLIQGNSRQFLSFGHSLPCTLTIVSRFHIIGVGY\n
SPU_022893	SPU_022893	none	Inspection of the tiling array suggests that glean may have missed the following exons: NPSMETGKNFKCKKCSRLFTTSGGLRKHLLRCHEKHHVMKSRLRSRQTQTPEACMTRAGRDSSDQPIDTIKTSAGNLTPGEESEEFCFKTKCLNQRKVCNICLKSFAMFKF\n
SPU_022955	SPU_022955	none	Inspection of the tiling array suggests that glean may have missed the following exons: ISICPIFEHSMNLNSRIINSFAVSLPSLLYSHHPRLLTLHPFLLSHQRHLLLAASHPRPRLCCCSATCVTGWYSYVCTTRCGGSGGLQRDAVAYGWPVLGSIRLFDVAFQSIPACTDLAPLTTTLI,IRGLLIRLPFLSRLFCIRIILASSHFILSSSPISAICFLPHPILVLGFVVVAPPASPDGTVTSAPPDAVAAVGFSGTQLPMGGLYSEVSGSSM\n
SPU_023727	SPU_023727	none	Inspection of the tiling array suggests that glean may have missed the following exons: LALYALYQLHERPPMQVPPPPQQDRFLNICLGGAVRHKRSFSKFASRITFWGLKFFWSREWAINLLYRNCPHPPLIFIKTARS,QVAICGWYRDGDVNAAIQKIVQIVQNTICPLIALHIYQPPFYFAPPPPPFSLSFSLCPSVKLEPYPAPFSPSLFPLKTSAWSNKE\n
SPU_024745	SPU_024745	none	Inspection of the tiling array suggests that glean may have missed the following exons: HMCVQSEERPYQCSLCNKTFSTSRQCLKHIRAHADGKPYQCPHCPRRYAEESTLVGHVSRAHSAGKTYQCPLCNKSFSRMSNLKLHARSHTGEKPYKCSVCGKAFSRMSNLTRHTRFHTGAKPFECSYCNGRFTEKRNLIQHMRIHTGEKPFECSICNVTFSRNGSLTR,FFSSDGNCSLDSTKQKTRLLHQNLGERELKDEQRQLLDGKVKCVVIRKRGSDSLQTVEIHVEAPVDKSKREDLLNGSSCSCSACNRETRFVETAERNTCTSDPASFEDEGGWITIEKEKEFDDE,LNADHNNHRGSCSEEKLIQCLDHDKKPYKCSFCGKAFSIMSNLTQHSRFHTGAKPFECSYCDGRFTEKKNLIKHMKIHTGENKPYECSICNKTFSRYGSLTSHKRT\n
SPU_024748	SPU_024748	none	Inspection of the tiling array suggests that glean may have missed the following exons: RIDICGKQSTGSLSVRSSFDNPVSASSTARLLSRGSPVTLSSVGIASAVPPSASSFVESSSGESPAALSSSLGLVVEASVCNTGPTSFSPFCSSPKLLAVSGTFSEMDITCLSNPSHGIDCQGVHSSILRQSSSACI,RSWAASWLTGPPPDISLAASSTAKPVPGGSLVNASPARGSTSVGSNQQALCLSGHHLTTLYLRPRLLGCCPVGLQLLCLLLG,GSYTVISTMKNKPPICNSMAAFPRESKCEPSAWCMVFTTLCKTGCLPSPILLKTLSAMNDCVDPVSIRNVTGCPSTLATMKRPCRF,FSHQRKRRRSLSSLGSRTCENMPLISAIKATLSCRNLVKTPNSVFVRSGPCNSSRLRQKTPNSAEQSKTTLSFFFCGCTTPWCGRYQTQPSSLLVSCSGTFSI\n
SPU_024901	SPU_024901	none	Inspection of the tiling array suggests that glean may have missed the following exons: FFLSLFRFLHMKKSMCVDQSTQTMPDIYSCSHCGSSLLAKPMPPLARPQPSGPQQLPGLESSQARCINKKDNGDDYTYDDGIVTLDNEKELLGPDKWVSEAPDHPVAIFVKEEVMSDLEQSNSMHSDKEESFFDPRQDQEFRSDSKLEHSETGWYEEEEEEEEEEEEDEDEMEDDEEIDYEALERLDPTYDPFIGSRPHQLVSIVNF\n
SPU_024953	SPU_024953	none	Inspection of the tiling array suggests that glean may have missed the following exons: MDLQNHRVPVQCSDSSEDILPCSDYIKFVQNSFNKTYKDSLYSERGVQTRCQHNIKKYHYRNSYEEVAMPVLSLFIRRQWNSGVWVHLDLDEIYLNLLDAPCDLRVSRE,WIYRIIGCLFNVQTLQKIFCPVQTILNLFKTHLTKRIRIHCIVKGAFRLDANIISRNTITEIVMKRSLCQCFLCLLDVNGTPGCGFI,ISIIITVIIIFTFINLILTNIYFTTINFTIFTSIIITVIIIFTFISIIVTFIIIFITIMEIILLIIVITNINVISITRSPHFSSFFWSSSSSSSLSYII,SASSLLSSSSSPSSTSSSRTFTLLPSISPSSPASSLPSSSSSPSSASSLPSSSSSLPSWKSSSSSLSSPTSTSSALPVPLIFLPSSGLPLHHHHCHISYR,QHHHYCHHHLHLHQPHPHEHLLYYHQFHHLHQHHHYRHHHLHLHQHHRYLHHHLHYHHGNHPPHHCHHQHQRHQHYPFPSFFFLLLVFLFIIIIVIYHIE\n
SPU_025848	SPU_025848	none	Inspection of the tiling array suggests that glean may have missed the following exons: NEVFFILFFNLLCSIPLTLGFIVPSFTFENLYPPLIQTDTKSLGKLRGIAFCLLFHFLHISIISFLAPCDLLLCCFPSSFSEAWTFQGFCLRWKTFFIFISFCLPFCTFQFLKFLHYFRMSHVVIMNFA,LLSMCLFRSCFVSNCESQSRHTMAFSSFSIDVLMVVKSFLSLPTSNLLSSFVSFCAVSWIAFCSIISLGSHVPSCDSWTFKSAVCLFLSK,TLRKWATIKALLRNDFLHWSQLYTGTCRMKIFDAVQSCNLCRSKSHVLFKITWQSGHGCPSLGWRVELGQSLSTTFAVPDVWVSLSSPSSGSVIHAA,LHALAKDEHIIGVFMDLSKAFDTLDLTLDHDILLHKLYHYGVRGVSLNWSCSFLSCRSQYTVFDNAKSTMSSFTCGNVCHRALF\n
SPU_025850	SPU_025850	none	Inspection of the tiling array suggests that glean may have missed the following exons: IYSCLFFSFLFSVSIHFGDEQHIKTEHGVAQPVQTEDTVPIREKLVTPATVSQHQDSKEVQIDYDDDEDNSYAAWMTEPDDGDDNDTQTSGTAKVVESDCPSSTRQPREGHPCPDCHVILNSTWDLDLHRLHDCTASKIFIRHVPVYNCDQCRKSFRRRALMVAHLRKVHDNNMTHSEIVEKLEELKSTERQTKGNKDEKCLPSETKALKRPGLREGRGKTTKKKVAGSKKRDDGDMQEMKQQTKSNAPQLSKRLCIRLNKGWIKILESERWNDEPQGQGNTAEQVEEKDEEDLILTRQVELVIEDGERETSTMMPISESKISEAVLKEAAEGVEENQDASTISHLERKRQTADLKVHESQEGTCEPREMIEQKAIHETAQKETKEESKLDVGKDKNDFTTIKTSIENEENAIVCLLCDSQFETKHDRNKHMLNSHTEHRQLYKCSTCGKTFVQK,HIKETKYTCELCGKLFYTTGAIKLHVDSHNKERAFKCEECGKGFLRAYLLKVHNETVHSNASHCLCEVCGSAFKSQSNLKQHNLTAHTDVYKYSCDVCGKKFKRTTHRNAHMKVHSNDPANKPFKCKLCSKVFAAQARLKVHMDWHYNIRSHTCDVCGKSFLTKGNLDKHQYVHKDKKPHECQICFHGFVD,RSRPGAFLVLGGFCLGAGIGLLGAGTGLCCTVLSASLCNNPPSDMSCICSVSVSAWTPTPSSTRSSVCDIPLSSSDWSTC\n
SPU_026209	SPU_026209	none	Inspection of the tiling array suggests that glean may have missed the following exons: SRDGARHLPEDFPLQDYLGTQSDSSERESCTSENDSYLIPSQVAYFHEKRAADRETPDATTSSHEPGRMDIDRNADITMEYNEMNNQSSQLHATGSDQNDPSCSLTEEKRFLCHVCSKGFYFKCRLSRHMEIHGIEKAPSKKSHQCMVCDLRFSRV,EGSHFKSKIHISSHSLCSGGDSSLQSSYQSMRNSQEMSGDRSCDGARHLPEDFPLQDYLGTQSDSSERESCTSENDSYLIPSQVACFHEKQA,SVCGKSFREKSTHTKHMTTHSGEKPHVCLICNKAFSNTSGLSRHNKIHTGERPYECSFCKKTFSQTHHLSRHIKIHTGERPFECSVCSKTFSERGYLTEHQRVHTGEKPYFCSICEKRFTSNSCCKRHMRIHTGEKPFPCS\n
SPU_027147	SPU_027147	none	Inspection of the tiling array suggests that glean may have missed the following exons: KPEWRPIPFYVELPPFWLLADEISSPIPHIFPLSPFRATPPFLPQVFLVPNPYTPPCPLAYTTFYLLMYTTLAHSSCHTPIPSRLPPLYHNSHLLLLTTPFFTLSFIIFVLGGVLYLSNVFQEGKKKRGKDN,STYYYTCTCTIVIVAAFFCTLLVCCNDEKPYLTLKSAPHHASLWSRLNLYSLEASSAAMRLARMLVPLCPGLWCQVKRFCGRHPGHYNKVISKDRPKEKQKVVSIDRWSLY\n
SPU_027709	SPU_027709	none	Inspection of the tiling array suggests that glean may have missed the following exons: SSTVFHHQHLSSPLHAPPSCARSKPSANCRPAHIHYIQRVLSCDPVCDSLGTTCWGKRSRNSGKGSNPIGHRPPLLLHLLRGPDSGGTSHA,LVFVHELTCAKPTLADIALVRFLTCVFLLVVIKCAFRGKGSGAKGALERSVLGMLLDMNQQLAFHVKAFLTEAAGEFAAAEVDLVLVALENVAIDEPSRTKLALERAGTIVIVHGFPEHVDLFDLVLVLCWVGSILFNLAYLRCLLYLGTSSCQWLNTMMLPQMIMQEAVSDECLTADIASSRP,CHIGSCWKNSNLSFGSSLLQVTGMWFNAMYFLKMCIKGCWVCETLYTKITFKITRPLSFGTTELVLSFKHFPMFLLLVSRQALLIDVSLVTICTRPRGKRRVEGAWLVGGGSTMFHTFMSHYKLFLCVHQPTHITSIWSFIVILDMCFERVFVRAIKPTLGALVFVTFCLTGLSFSIGGTVPFLKMKQKSVLLFIKTSTLRAPKGIQLVSFHMALKLLRTCTGVATSLAKILTRSVPPYPQLNLQPFFTTSTFRVHCMLRLLVPGQSLPRIADLPTFITSKGCFLVTQYVIR,PLKGPKRTPAGSVRSATTQTVFCPSFLYASPSGRGPPRGASSGSLVVSAASTGASACSDGARLGAELVCGGPDCSPAGTMCG,YAIHNNIMCRAPKGENRGEKQISMRLFYKTNRIYLQICTHDPLTKLPTRDLHRLGDLLFETFITLILLVRFISNFCHSVHLVYLLSYK\n
SPU_027753	SPU_027753	none	Inspection of the tiling array suggests that glean may have missed the following exons: KNFTHIKKGYHQSPPTVCYQLPLFVCLVFPSLSPSVSVSLSHSVSTALYPPVSPSLSVSLSLSKVYLSHLLASCKSFSFRCLRIVSLSEDDLENRLE,SSQPATHMGLCCCCSLRRAASNAAALTLCLSLPAALLPEARKNRSEWLADGWSTEVESGACLVADVSWLAADGGAEYVDDVDGCEWDVPC\n
SPU_027912	SPU_027912	none	Inspection of the tiling array suggests that glean may have missed the following exons: RSFPLALELAVGQTSPEEIVALQHKWLESHLSPVGAASISARACTFLTVLTRNPPLGSGHCIMIFSCMSLLLRDQTPASSDLRNQPKLAELW,SGSFFRAFASRLSLTSSFSSGGSSSSSGGSSSSSHAHSTSSSAPSTSSNGPSTSSNGPSTSSDGPTSMSASGSPAVSASVAWLVGSSSYCVHESLV,GFRFPFEPDFIFFLRWLILFFRWLILFLPCPLYFLQCPFYFLQWPIYFLQWPIYFLRWPHFYVCFRVSSCVCFSSLAGWILLILCP\n
SPU_027919	SPU_027919	none	Inspection of the tiling array suggests that glean may have missed the following exons: NVSDAQKFQCGMCEDLFTSAKLILRHMRCEHSQDRTEDLYPVLMWKRKKKKKALETENSSQEDLTIENKVSEQGDLVFKCRTCEKVFSCHGRLKEHESFHKFSQGHACPVCDKKETNSRTLAKHMKTHEPLVLKCKECNRIYKTKSALRKHLNEFHGHQCRICSERFPYMTDCKKHEQTHQGSTKNGAHTGKSLLSASSPEEPKDMLGKTSNYYQRPFKCRYCPKRYSLRSSVKTHEKERHTGDLVFKCPHCPKVFGREYRLIDHLRSHEENRMYRCKLCPKTFGSESALTNHQGEHTGLKPFSCDICSKGFRIKKAVQDHKRRIHQKRQMRFFCSVCNKGFADKGNFTKHERRHKGVRPYVCLECGKGFTAKSCLTTHIKAMHTAEKPFSCELCGKTFSLNQNYTYHMFRHKEQGDISSIQQ,SYLKDSGPVHSSANHLHHAHLQRPVAFGGHGVKFSSLPASYTPPITAQEPAVERNDVPVSLTTCVPVVERTDLPTTTRESVIDIPPISMRESPTDSCIPPPTKQELAEDSYMSPSTAHESPIDSSISPTTTQESVEDSSIPATTTNEQVTDSINIEPVTTSEVAMDEDIPPAKSWGTVEERSIRLVTTHKPQAEGNT,LYFFCHFFLSPGQTPIAPPAPSKPLHSKEQRKQDTSSSSQSNTSPPASAVEPSLLAALNLKKSRMLSSTPELPQGLSFRWTLEGKVEGVVAKGTVEKGSEFGPYPGSLMNEEQGLSKD,ALTSNLVEFFPLYHMTTCWLDSNIVFEGIAYDRVVLEAVLCNLEIYYGRAIKFCVICHNQQTEVALLNNIIDPGFGEPTHLAIG,SSFYEQWKVITPLVTTSLKGINKKTDNSNNRHHSSNHNHNSNHSSSNMLHTIPLICQYHTTHPLSISLSIWGTINRPEFSQYHTQDLWHKCSVQFIMVILINLILIF,AMEGNYPPGYYLPERDQQENRQQQQPPPQQQPQPQQQPQQQQHVAHYPFDLPIPYHTSSQHQSEYMGDDQPTRVFPVPHPGPMAQMFSPVYHGNINKSDFDI\n
SPU_028222	SPU_028222	none	Inspection of the tiling array suggests that glean may have missed the following exons: ADNCSGLQGFLIFHSFGGGTGSGLNALLMERLSVDFGKKSKLEFAIYPAPQVSTAVVEPYNSILTTHTTLEHSDCAFMVDNEAIYDICRRNLDIERPSYQNLNRLIGQIVSSITASLRFDGALNVDLTEFQTNLVPYPRIHFPLVTYAPVISSEKAFHEQLSVSEITTSCFEPLNQMVKCDPRHGKY,TLSSWSVSPSTSARSPNWSSPSIRHLRFPPLLSSHTTPSLPLTPPSSTPTVPSWSTTKPSTISAVVISTSSVRHTRTSTV,LAKVQRAVCMLSNTTAIAEAWGRLNHKFDLMYAKRAFVHWYVGEGMEEGEFAEAREDLAALEKDYEEVGIDSCDAEAEDDEDY\n
SPU_028746	SPU_028746	none	Inspection of the tiling array suggests that glean may have missed the following exons: FRSFTEIDFWQNEEDGRCFDGETRESGDDDHGGNNKSCECKVTEPSTGSVCSYCQKKHERTDSGGKPSLQCFFCDCSFSIECHLTRHLQFHVGMKTYDTFHCSLCKKSFLSKSDLVKHKTKCTGEKPYECIHCTSTFAKQTDLKVHIRTHNQVKNILTVQTQDQTEHSYGQSQSQCPYCKRAFKTKSTLDSHIGTMTFENSYSCSHCSSTFRSKCSLTLHNRTHKYQCFLCNKRFASLDGRNTHVKWHTGVKPHHECSYCSKKFSKKCHLDEHVRIHTGEKPYRCSYCEKGFRTKGNFTKHLKIHNGGNNEEG,KQQGEREQRLTPVKEVGLCLACYMKEESSMEFYIKEEKLLFYEAETGKSDRDQESLQEDVKQSCVDEKGWIDMFAEPEIASSALSSDPQASETVLIAVGQESVLEDDERIGDTSQESSERESVPPTQ\n
SPU_028753	SPU_028753	none	Inspection of the tiling array suggests that glean may have missed the following exons: RISCYNTNIHDEALLLYAISHVYFSHPMMQTSCYIENIDIYVSCCEELDYTWIEMYCYNENMNKNFQRSLFLCVSQGWTLF\n
SPU_000017	SPU_000017	none	Inspection of the tiling array suggests that glean may have missed the following exons: KLKPLETRMMDRLEERHQKERPWETRMIDRPEERHQKERPWETRMDRPEERHQKEKPWKTKMMERPWETRMMERPWETRVMDRPEERHQKEKPWETRMIDRPEERHQKEKPWETRMMDRPEERHQKEKP,QTLAEVKVETIGDKNDGQAGGKTSKGETMGDENDRQTGGKTSKGETMGDEDGQAGGKTSKGETMEDENDGETMGDENDGETMGDESDGQTRGKTSKGETMGDENDRQTGGKTSKGETMGDENDGQTGGKTSKGETIGDENDSSIQFISHHF\n
SPU_000437	SPU_000437	none	Inspection of the tiling array suggests that glean may have missed the following exons: VVALLPADTDLLVVELEVVGAQDLAVAAEYSDSDFLDVVSLDVVVLEAVGLFVVAAGLTDVADVVLFEVAVGLHDVVVFALHLVDDVMSEDQQEVVPGVALLLVAVILDYWAVPGSHLVDILVIGLHFH,HIIIKYRSADVVCTRRLGYWFQTTELAHEVHCNITQAMGRRRAKSIFGTGVISTPLFSTKLFGTGNVDNLANMKSHFGTLYYRPM,FEAEQFCMFIYNNITIYMFLHSVSVDTIVQCRYYPCLCHFTVPTSQIIHLQIAMLCIKITRSPRTPKWYPNQPRYISHNTPSPCFALETKIHTSNIELPETCI\n
SPU_000440	SPU_000440	none	Inspection of the tiling array suggests that glean may have missed the following exons: PLQEFTLFFAPNCVLFLHESVMLSVQSREFGKKKFCCVTSSLFLSLRQTGSVHSTVGVHFNYDDAGTSRMHESLSSLSLPFQYQ,SRCLRIRPRRLLRRLTKRLPLRLSRRLLLRLSQRLLKRLPRRLTRRRPRRFLRRPLRSPPRRLSKSESSLRTPRVESMSS,IKMPQNKTKKAAEKANEKAPIKAVKKASVTAVAKTSEKAAKKAHEKAPKKVPQETSEKPAEKVVKKRVKSEDPKSGVHVFLGAKDFPRWCVMKEKSGCKTNVQLMRWLLGIAEKYFG\n
SPU_000540	SPU_000540	none	Inspection of the tiling array suggests that glean may have missed the following exons: IQININKGENHITRRERILFLHSDNYFDLPVSNSLAFKQLTMAHTLIVSYPTHLAPTGLFHHLTSLKNRHTHDLPNDYFC\n
SPU_000603	SPU_000603	none	Inspection of the tiling array suggests that glean may have missed the following exons: HKIFIYFWSGCVNGGGQIKAEEGITAKELLVEVEELYNSAPTRTPEENEVVQTLYEEWLGGVGSEKARTMLHTQYHALEKNTNALNIKW,TEEGRSRLKKASQRRNYSWKLKNCTTLLLRGHPRRTRWSRLFTRSGSEEWAVRKQGLCCIRSTMRWRRTQMLLTSSGRTGIGICRLGN,QPNINYLFLTQQRGXXXXXXXXXXXXXXXXXXAQSLRVKPDCIYHVTVMPCYDKKLEASRDDFYDDVYRTRDVDCVITSGMYLLFCLALF\n
SPU_001483	SPU_001483	none	Inspection of the tiling array suggests that glean may have missed the following exons: NPKQNIFVFFLNPEKPWQGEDKPSPSSQPKKSSSSRMKRKIKEENTPNDEVLDHVTRVTGLETVDGRIVDEEGEPEPESKKKMKKAKRKHI\n
SPU_002586	SPU_002586	none	Inspection of the tiling array suggests that glean may have missed the following exons: SLSCVYYPKKYIFIKFKHLPGLPYINYAVLAWGKSLITQLDKLFLAQKRVIRTICNADFCAHTNPLFYHHRILKVEDIYYMQLGSLMYDLNSGVLALAKIFKKNNQIHNYTTRSASAFHLPHARTKFTLNSLVCNRPRFWSTLVLTPVSICLQT,QGQTITKAFFILPSPYFIPGYFLCFSFLFPIPLLYLQMFYSLFLLHLSVPPPPPTNLSSSIILSIYLLFLIFSPLSHSLSISPLHLFPFCVKGSIAFFNFL,INNIFILPIYIHVHLPWIHLCTAYVHHLGYQGVFLPCNHSKGKLSQKLSSSSQVLISFLGIFFVFLFFSPFPFCIFKCSIPFSFFISLYPPPPPPISLPQSYFPYISFS,IISLFYQFIYMCIYLGFIYAQRMCTISATRECSYHVIIARANYHKSFLHPPKSLFHSWVFSLFFFSFPHSPFVSSNVLFPFPSSSLCTPPPPHQSLFLNHTFHISPFLNLFPPLSLAIHFPPPPIPLLCQRFNCFLQFS,LYSVTCTCTCMLGKLENPFCQNSPFPESLKKECTTNILFTLIILIILLDCTFPLNFCGALKIMMRSIIIKYLASSVPGRRKAN\n
SPU_002674	SPU_002674	none	Inspection of the tiling array suggests that glean may have missed the following exons: SWNSSDLMLRVSASWRVCSQLLVPFGCELHPTLMILFLSGSETCVWYSGILYLLLSISNKMYNGGIRWRISILDSMVNLNGASTCGC\n
SPU_002708	SPU_002708	none	Inspection of the tiling array suggests that glean may have missed the following exons: LNEWLTSKSTCTCALTTSFLINLLVLKLLPIGLACISSTNELLILPSGHSILKPWHGHINIPRTRVSCAWPKCFFNQGKGPLQWNNVV,LSSRFLHMKKRVCVDQATQTMPYVYSCNHCGSSPLANKPMPPLARIQPSGPQKIKLMDRVSNPLFSGPKTPNIKVIMIKKSTAGVLKRDSRMTPQALNNLMQLGSNLQAAQPLQGNLTGSS\n
SPU_003490	SPU_003490	none	Inspection of the tiling array suggests that glean may have missed the following exons: IGWFQRIRSVAAKMLILLPQPTMDLTACTNAKNTILKVRFSYSLLEFWYILLQPSPSALSVTFLSPKEKQLVLPAYTYLYVSC,RVALRDYVFSHGPTRGPIEYNTNTDDATAAITCVQHTQKPKCEENFGLFSGTRADHVEIKLLTNSNQVHDVNQHLPKLHPL\n
SPU_003649	SPU_003649	none	Inspection of the tiling array suggests that glean may have missed the following exons: IYSVSRKDGYRPYYIFPLMKYHISNEPPVIIYNFIKKKIHLLVRKVTYNSSRHHSVTSSPITELCKADRVKTSAVIAPGVNTSEEIARGKTGVLADRCVCVCR\n
SPU_003848	SPU_003848	none	Inspection of the tiling array suggests that glean may have missed the following exons: EGESCKKAKRKRKPTKPIKFMEYCSEAERIVVGNLPSPKATASYSTDQVMLMETDSSALNNEQFSQSSTQHMYCLLNFKIYECLQCGLGFASEKAMNVHIRTHSKEKPYWCTECDIGFTEHQLYMAHKQSHRPCKCDECGASFGNGSTLKNHKLLHLQSKNFKCSVCPKMFKQRAGLTCHMRSHTDERPYLCKECGAAFVDNKSLQNHMSVHSDEKAFKCSVCPKMFKQRAGLAHHMKHHNDDKQYLCKECGAAFAYNIHLQNHKAIHSDEKTFKCTICPKMFKQRAGLTGHMKAHTDEKPFMCELCGKSVKTKSTLKKHRMIHSEEKPYQCPLCPQAFKQRAGLSQHSHKHGEGNPY\n
SPU_004802	SPU_004802	none	Inspection of the tiling array suggests that glean may have missed the following exons: KLMNTFLLLFPPQSFYKNQTPLRARQTIKMKLKKQQKNPITTIVIMMMMTALQEKEVGRKAPWLEGVKAPNLRRNRQRNHLNHHRQRNTNATSVTLS,YLEINEHFSFTFSSPVILQKPDTSQGKTDNKDETQEAAEESNHNNCNNDDDDSPAGEGGREEGTLAGGGQGPEPEAEQAEEPFEPPPPEEHECNICNSVLTSLWELD\n
SPU_004806	SPU_004806	none	Inspection of the tiling array suggests that glean may have missed the following exons: EGYSIFHTFLVVHLALSFPTRDLHLRPLVLLLLLLLELRGRGTYCALFGLCDLFLGMYLHVSVKMETFADLGADGTGLELGRGRFTHVGVHVGIDLHLI,GIQHLPYLSRSPSRPELPYSRSSSEASCAAAAAAAGAAGKRNLLRPLWPLRSLPWNVSSCVGKDGDLCRPWCRWDRSGAGARALHPCWRSRRN\n
SPU_004807	SPU_004807	none	Inspection of the tiling array suggests that glean may have missed the following exons: NPVWMFVCLFMSFWELNSLKQCGHGCFRFGSVRSGCSCAANVPPLMSWWRRSFFWARDLQMSQMYENSVRSGEVSSWFKTSASFPQPPHLTLTVQGSRK\n
SPU_006058	SPU_006058	none	Inspection of the tiling array suggests that glean may have missed the following exons: FTKYQLSIHMKNHPEVKPFQCSACDKRFSLKSYLAQHMKYHSDKKTHQCPMCPKGFIRNSVLQEHIKTHASEKPFECAMCGKRFSSKISLAVHMKKVCKKRPDREQQDNPLPSV\n
SPU_009398	SPU_009398	none	Sp-MAP2K5 spans two glean prediction: \nSPU_009399 and SPU_009398\n
SPU_000751	SPU_000751	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix D.1 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees.\n
SPU_001794	SPU_001794	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix C.1 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees. \n \nIt is noteworthy that SPU_001797 (Sp-MACPF-C.2), a model adjacent to this gene, also contains the MACPF domain. A comparison of their protein sequences reveals high similarity but a fair number of differences as well. It is to be determined whether this fact reflects the erroneous assembly of different haplotypes (both genes are indeed located in an area of numerous contigs) or if reflects a true gene duplication event.\n
SPU_005223	SPU_005223	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix A.1 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees. \n \nNB: Given the position of this model in a small scaffold and its similarity to typically larger models, it is likely that this is an incomplete model. Also, other gene prediction protocols generated slightly different models for this gene. For lack of better evidence, we have decided to accept this glean model in its present form. In addition, SPU_022091 shows high sequence similarity to this model, but some differences as well (including sequence gaps); it is yet to be determined if these two models might represent haplotypes wrongly assembled as two different genes.\n
SPU_006818	SPU_006818	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix G.1 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees. \n \nNB: This model is located at the end of a scaffold. We cannot rule out that this model has been forcedly truncated during the gene prediction process. In fact, its only functional domain (MACPF) lies next to the end C-terminus of the predicted protein, which is uncharacteristic of the other members of this family of genes.\n
SPU_014677	SPU_014677	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix C.3 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees. \n \nNB: Other gene prediction protocols incorporate additional C-terminus sequence in their models for this gene, but without adding new identifiable domains to the predicted protein. For lack of better evidence, we have therefore decided to accept this model in its current form.\n
SPU_016546	SPU_016546	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix B.3 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees. \n \nThe structure of this gene is highly supported by the embryonic genome-wide tiling array hybridization data, and by identical models generated by almost all gene prediction protocols.\n
SPU_017952	SPU_017952	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix A.4 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees.\n
SPU_022091	SPU_022091	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix A.2 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees. \n \nNB:  SPU_005223 shows high sequence similarity to this model, but some differences as well (including sequence gaps); it is yet to be determined if these two models might represent haplotypes wrongly assembled as two different genes.\n
SPU_007159	SPU_007159	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix E.1 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees. \n \nNB: This model is located at the end of a scaffold. We cannot rule out that this model has been forcedly truncated during the gene prediction process. In fact, its only functional domain (MACPF) lies next to the end C-terminus of the predicted protein, which is uncharacteristic of the other members of this family of genes.\n
Sp-Gg1d	SPU_030084	none	Gene model derived from est matches. Predicted protein sequence matches exactly to SPU_014498 (Sp-Gg1), but an additional intron seems to be present in 3'utr.\n
Sp-Gg4	SPU_030086	none	Gene predicted based on homology to human sequences, but this locus might be a pseudogene (because it doesn't have introns).  It seems that it is expressed though based on tiling array data.\n
SPU_000986	SPU_000986	none	This gene model is located at the end of a  very short contig. The nucleotide sequence of the first exon has 94% similarity to that of another Sp-Tlr gene. It could be a member of Toll-like receptor. The second exon that was 100000bp separated from the first was eliminated. \n
SPU_013279	SPU_013279	none	#\nGene fragment- likely haplotype of SPU_018156- see annotation of that gene.\n
SPU_018156	SPU_018156	none	Complex gene with varying exon predictions among different models. Tiling data inconclusive.\n
SPU_007980	SPU_007980	none	Haplotye of SPU_025345- see that gene for annotation.\n
SPU_005834	SPU_005834	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nThis model Blasts best to Cathepsin C, and it strongly co-clusters with it in a NJ phylogenetic tree. We have therefore decided to name it "Cathepsin1" but note this high similarity by making "CathepsinC" one of the synonyms for this model.\n
SPU_009042	SPU_009042	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nThis model Blasts best to Cathepsin L, and it strongly co-clusters with Cathepsin L and several other glean models in a NJ multiple protein alignment tree. We have therefore decided to name this model "Cathepsin 2" for consistency, while noting its high similarity to Cathepsin L by making "CathepsinL-like1" one of its synonyms. \n \nNB: The structure of this model is highly supported by the fact that other gene prediction protocols generated almost identical models and by the genome-wide tiling array hybridization data.\n
SPU_009368	SPU_009368	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nThis model Blasts best to Cathepsin L2; however, a NJ multiple protein alignment tree shows that its sequence is equally related to that of Cathepsin L, L2, K and S. For this reason we decided to name this model only "Cathepsin 3" and not group it with other Sp-CathepsinL-like models. \n \nNB: The structure of this model is highly supported by the fact that other gene prediction protocols generated almost identical models.\n
SPU_009601	SPU_009601	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nThis model Blasts similarly to Cathepsins X, Y and Z, and it strongly co-clusters with them and SPU_013893 in a NJ multiple protein alignment tree. We have therefore decided to name this model "Cathepsin 4" for consistency, while noting its high similarity to Cathepsin Z by making "CathepsinZ-like1" one of its synonyms. \n \nNB: Other gene prediction protocols generated noticeably different models for this gene. However, the genome-wide tiling array hybridization data indicate that all exons of this glean model would be expressed at similar levels during embryonic development, which supports the idea that they all belong to the same gene. For lack of better evidence we have decided to accept this glean model in its current form.\n
SPU_013893	SPU_013893	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nThis model Blasts similarly to Cathepsins X, Y and Z, and it strongly co-clusters with them and SPU_009601 in a NJ multiple protein alignment tree. We have therefore decided to name this model "Cathepsin 5" for consistency, while noting its high similarity to Cathepsin Z by making "CathepsinZ-like2" one of its synonyms. \n \nNB: The structure of this model is highly supported by the fact that other gene prediction protocols generated almost identical models and by the genome-wide tiling array hybridization data. It must be noted, however, that the N-terminus for this model is located very close to one end of the scaffold. For this reason, we cannot rule out that more N-ter sequence not included in this model exists.\n
SPU_014767	SPU_014767	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nThis model Blasts best to Cathepsin L, and it strongly co-clusters with Cathepsin L and several other glean models in a NJ multiple protein alignment tree. We have therefore decided to name this model "Cathepsin 7" for consistency, while noting its high similarity to Cathepsin L by making "CathepsinL-like3" one of its synonyms. \n \nThe structure of this model is highly supported by the fact that other gene prediction protocols generated almost identical models. It must also be noted that both adjacent models (SPU_014766/SPU_014768) show signficant similarity to this model, which raises the question of whether these may reflect gene duplications. A careful inspection of their sequences and domain structure suggests that SPU_014766 is a true cathepsin gene (annotated as such by Esther Miranda - Duke University), whereas SPU_014768 may have been generated as a result of an assembly error (see SPU_014768 for more details). Both models show significant differences at the aminoacidic level with SPU_014767, which would argue that they are due to true gene duplication events, something that is also observed among vertebrate cathepsin L and L-like genes.\n
SPU_014768	SPU_014768	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. Based on such analysis, we propose that this is very likely an incomplete/wrong model. A careful analysis of its sequence and structure domain indicates that its true last coding exon (containing a STOP codon) may fall in a sequence gap \nadjacent to its second last coding exon (i.e. the last coding exon may have been forcedly incorporated to this model). For lack of supporting evidence for this claim, we have accepted this glean model in its present form. \n \nThis model Blasts best to Cathepsin L, and it strongly co-clusters with Cathepsin L and several other glean models in a NJ multiple protein alignment tree. We have therefore decided to name this model "Cathepsin 8" for consistency, while noting its high similarity to Cathepsin L by making "CathepsinL-like4" one of its synonyms. \n \nThe (wrong?) structure of this model is highly supported by the fact that other gene prediction protocols generated almost identical models. It must also be noted that its adjacent model (SPU_014767) shows signficant similarity, which raises the question of whether this may reflect a gene duplication. A careful inspection of their sequences shows some noticeable differences at the aminoacidic level which would argue that they are due to a true gene duplication event, something that is also observed among vertebrate cathepsin L and L-like genes.\n
SPU_014765	SPU_014765	none	This model was annotated and modified based on a manual inspection of multiple protein sequence alignments. We have modified this model based on the models generated by NCBI and FgeneshAB, both of which show a good alignment to cathepsin L-like genes. The original glean3 prediction presented exon and domain structures that clearly resembled artificially fused genes. In fact, the remaining of the original glean3 prediction, which is represented as well by respective NCBI and FgeneshAB predictions, closely resembles genes present in othey phyla, which supports the claim that the original version of SPU_014765 wrongfully fused to separate genes. \n \nThis modified model Blasts best to Cathepsin L, and it strongly co-clusters with Cathepsin L and several other glean models in a NJ multiple protein alignment tree. We have therefore decided to name this model "Cathepsin 9" for consistency, while noting its high similarity to Cathepsin L by making "CathepsinL-like5" one of its synonyms. \n \nNB: The adjacent model (SPU_014766) shows signficant similarity to this modified SPU_014765, which raises the question of whether this may reflect a gene duplication. A careful inspection of their sequences shows some noticeable differences at the aminoacidic level which would argue that they are due to a true gene duplication event, something that is also observed among vertebrate cathepsin L and L-like genes. Although we cannot rule out, based on this analysis, that they may be different alleles of the same gene and that the apparent duplication be due to an assembly error.\n
SPU_028748	SPU_028748	none	Amino terminal domain truncated- no obvious 5' exons. Tiling data not consistent with gene models.\n
SPU_014914	SPU_014914	none	This model was annotated based on a manual inspection of multiple protein sequence alignments. \n \nThis model Blasts best to Cathepsin F, and it strongly co-clusters with Cathepsin F in a NJ multiple protein alignment tree. We have therefore decided to name this model "Cathepsin 10" for consistency, while noting its high similarity to Cathepsin F by including "CathepsinF-like1" as one of its synonyms. \n \nNB: The structure of this model is highly supported by the fact that other gene prediction protocols generated very similar models and by the genome-wide tiling array hybridization data.\n
SPU_020838	SPU_020838	none	This model was annotated and modified based on a manual inspection of multiple protein sequence alignments. We have modified this model based on the corresponding model generated by NCBI, which shows a better alignment to cathepsin L-like genes (although it generates an incomplete cds - i.e. no stop codon). The remaining (C-ter) of the original glean3 prediction does not Blast back to any known sequence in nr nor does it contain any identifiable functional domain, which suggests it may represent an artificial fragment fused to the model for lack of an earlier stop codon in the scaffold. \n \nThe modified model Blasts best to Cathepsin L, and it strongly co-clusters with Cathepsin L and several other glean models in a NJ multiple protein alignment tree. We have therefore decided to name this model "Cathepsin 13" for consistency, while noting its high similarity to Cathepsin L by making "CathepsinL-like8" one of its synonyms. \n \nNB: The adjacent model (SPU_020837) shows signficant similarity to this modified SPU_020838, which raises the question of whether this may reflect a gene duplication. A careful inspection of their sequences shows some noticeable differences at the aminoacidic level which would argue that they are due to a true gene duplication event, something that is also observed among vertebrate cathepsin L and L-like genes. Although we cannot rule out, based on this analysis, that they may be different alleles of the same gene and that the apparent duplication be due to an assembly error.\n
SPU_020837	SPU_020837	none	This model was annotated and modified based on a manual inspection of multiple protein sequence alignments. Its structure is supported by almost identical models generated by all other gene prediction protocols. \n \nThis model Blasts best to Cathepsin L, and it strongly co-clusters with Cathepsin L and several other glean models in a NJ multiple protein alignment tree. We have therefore decided to name this model "Cathepsin 12" for consistency, while noting its high similarity to Cathepsin L by making "CathepsinL-like8" one of its synonyms. It should be noted, however, that this model lies close to one end of the scaffold, and we cannot therefore rule out that additional N-ter sequence has been left out in the assembly process. \n \nNB: The adjacent model (SPU_020838) shows signficant similarity to this model, which raises the question of whether this may reflect a gene duplication. A careful inspection of their sequences shows some noticeable differences at the aminoacidic level which would argue that they are due to a true gene duplication event, something that is also observed among vertebrate cathepsin L and L-like genes. Although we cannot rule out, based on this analysis, that they may be different alleles of the same gene and that the apparent duplication be due to an assembly error.\n
SPU_012930	SPU_012930	none	This gene is present on three GLEAN predictions. SPU_017141 contains the first ~890 AA and SPU_012930 and SPU_021952 have the rest. SPU_012930 is the largest piece with ~1400 aa.\n
SPU_021952	SPU_021952	none	This gene is present on three GLEAN predictions. SPU_017141 contains the first ~890 AA and SPU_012930 and SPU_021952 have the rest. SPU_012930 is the largest piece with ~1400 aa.\n
SPU_002324	SPU_002324	none	embryonic lethal, abnormal vision, drosophila, homolog-like 1; Hu antigen R\n
SPU_010853	SPU_010853	none	Putative pre-mRNA splicing factor ATP-dependent RNA helicase DHX15 (DEAH box protein 15)\n
SPU_008911	SPU_008911	none	This model was annotated based on a manual analysis of multiple protein sequence alignments and domain composition. Its structure is supported by almost identical models generated by other gene prediction protocols. \n \nThis gene aligns well with and shows a domain structure that resembles genes of the heme-dependent peroxidase superfamily, which includes a large number of genes from various phyla. These genes are most typically identified based on their biological function (e.g. ovoperoxidase, lactoperoxidase, myeloperoxidase, eosinophil peroxidase, etc), but they all present a secretory signal peptide and a single haem-peroxidase domain. \n \nNB: The N-ter of this model is likely missing, based on the structure of the genes with which this gene aligns best and the fact that this model is located at the end of a scaffold.\n
SPU_019097	SPU_019097	none	This model was annotated and modified based on a manual analysis of multiple protein sequence alignments and domain composition. \n \nThis modified model aligns well with and shows a domain structure that resembles genes of the heme-dependent peroxidase superfamily, which includes a large number of genes from various phyla. These genes are most typically identified based on their biological function (e.g. ovoperoxidase, lactoperoxidase, myeloperoxidase, eosinophil peroxidase, etc), but they all present a secretory signal peptide and a single haem-peroxidase domain. \n \nNB: Different gene prediction protocols generated noticeably different models for this gene (mostly towards the N-ter of the model). The NCBI model shows the best alignment with other heme-dependent peroxidases to which this model Blasts back, and thus we chose it to modify this glean prediction.\n
SPU_002004	SPU_002004	none	This model was annotated based on a manual analysis of multiple protein sequence alignments and domain composition. Its structure is supported by almost identical models generated by other gene prediction protocols and by the embryonic genome-wide tiling array hibridization data. \n \nThis gene aligns well with and shows a domain structure that resembles genes of the heme-dependent peroxidase superfamily, which includes a large number of genes from various phyla. These genes are most typically identified based on their biological function (e.g. ovoperoxidase, lactoperoxidase, myeloperoxidase, eosinophil peroxidase, etc), but they all present a secretory signal peptide and a single haem-peroxidase domain. \n \nNB: The N-ter of this model is missing, based on the structure of the genes with which this gene aligns best and the fact that this model is located at the end of a scaffold.\n
SPU_010343	SPU_010343	none	DEAD Box containing protein. Similar to either DEAD (Asp-Glu-Ala-Asp) box polypeptide 5 OR DEAD (Asp-Glu-Ala-Asp) box polypeptide 17.\n
SPU_024969	SPU_024969	none	the original glean is a C-terminus of snx13 gene; the N-terminus is on the same scaffold - the previous glean SPU_024968; I've combined predictions here.\n
SPU_006014	SPU_006014	none	Added 3' exon from Angerer model due to alignment to mmp13.\n
SPU_024968	SPU_024968	none	n-terminal piece of Sp-RGS-PX1; full annotation is in Glean SPU_024969\n
SPU_018333	SPU_018333	none	First exon position is uncertain: some discrepancy with the est. Neither est nor homology searches give a nice fisrt exon prediction, the exact region must not be in the assembly yet.\n
SPU_007675	SPU_007675	none	pre-mRNA splicing factor SF3a (60kD). SPU_016711 is a duplicate prediction.\n
SPU_011993	SPU_011993	none	Human gene is significantly longer. But the prediction may be complete as Elegans and Drosophila genes are about the same size. \n \n \nThis gene encodes subunit 2 of the splicing factor 3a protein complex. The splicing factor 3a heterotrimer includes subunits 1, 2 and 3 and is necessary for the in vitro conversion of 15S U2 snRNP into an active 17S particle that performs pre-mRNA splicing. Subunit 2 interacts with subunit 1 through its amino-terminus while the single zinc finger domain of subunit 2 plays a role in its binding to the 15S U2 snRNP. Subunit 2 may alsofunction independently of its RNA splicing function as a microtubule-binding protein.\n
SPU_002029	SPU_002029	none	Haplotype of SPU_004897 (Sp-mmp24d).\n
SPU_013513	SPU_013513	none	Duplicate prediction for SPU_026806\n
SPU_026806	SPU_026806	none	Distal part of the SF3a120. First part is predicted by SPU_027895.\n
SPU_000866	SPU_000866	none	Likely missing 5' exon based on blast alignments.\n
SPU_027966	SPU_027966	none	PHD finger-like domain protein 5A (Splicing factor 3B associated 14 kDa protein) (SF3b14b)\n
SPU_002161	SPU_002161	none	This gene encodes one of four subunits of the splicing factor 3B. The protein encoded by this gene cross-links to a region in the pre-mRNA immediately upstream of the branchpoint sequence in pre-mRNA in the prespliceosomal complex A. It also may be involved in the assembly of the B, C and E spliceosomal complexes. In addition to RNA-binding activity, this protein interacts directly and highly specifically with subunit 2 of the splicing factor 3B. \n            This protein contains two N-terminal RNA-recognition motifs (RRMs), \n            consistent with the observation that it binds directly to pre-mRNA.\n
SPU_000101	SPU_000101	none	Likely has an extra-exon predicted.\n
SPU_017104	SPU_017104	none	The nucleotides of the second exon has 93% identity to another typical Sp-Tlr gene, while that of the first exon doesn't. This gene model may be a member of Toll-like receptor. \n \n
SPU_003478	SPU_003478	none	Missing one or more exons at the beginning of the gene. SPU_003477 may represent one of the missing exons or it may be a duplication of the SPU_003478.\n
SPU_003477	SPU_003477	none	Missing one or more exons at the beginning of the gene. SPU_003477 may represent one of the missing exons or it may be a duplication of the SPU_003478.\n
SPU_013007	SPU_013007	none	SPU_017802 is a duplicate prediction.\n
SPU_017802	SPU_017802	none	This is a duplicate prediction for SPU_013007\n
SPU_021629	SPU_021629	none	similar to mitosis-specific chromosome segregation protein SMC1. SPU_026628 appears to be a duplicate prediction over the last exon(s).\n
SPU_026628	SPU_026628	none	SPU_026628 appears to be a duplicate prediction over the last exon(s).\n
SPU_012607	SPU_012607	none	small nuclear ribonucleoprotein D1 polypeptide (16kD); snRNP core protein D1; Sm-D autoantigen"\n
SPU_008908	SPU_008908	none	Modified gene model to reflect cloned cDNA. Included sequences present on small scaffolds 85810, (167442 and 56237 both match same region), 160020. \n
SPU_010849	SPU_010849	none	SPU_010849 is likely a duplicate prediction for SPU_024030\n
SPU_019147	SPU_019147	none	SPU_021883 is a duplicate prediction of SPU_019147\n
SPU_021883	SPU_021883	none	SPU_021883 is a duplicate prediction of SPU_019147\n
SPU_012978	SPU_012978	none	Lacking C-ternimus.  See SPU_028726, _27443, _08472. \n
SPU_005089	SPU_005089	none	This gene model has no TIR domain. But the nucleotides encoding signal peptide, LRRNT, LRR(15-23), CT has 88% similarity to another typical Sp-Tlr. The unknown sequence (NNN) in the 3'end could makes the gene model incomplete. \n
SPU_001231	SPU_001231	none	#\nThis gene model has no TIR domain. But the nucleotides encoding LRR(12-21) and CT has 85% similarity to another typical Sp-Tlr. Unknown sequence (NNN) at the 3' end of this gene model could makes the gene model incomplete. \n
SPU_006611	SPU_006611	none	This gene model has no TIR domain. But the nucleotides  encoding SP, NT, LRR(12-23), CT has 86% similarity to another typical Sp-Tlr. The unknown sequence (NNN) at the 3'UTR could makes the gene model incomplete. \n
SPU_008150	SPU_008150	none	#\nThis gene model has no TIR domain. But the nucleotides encoding SP, NT, LRR(12-23), CT has 86% similarity to another typical Sp-Tlr. The unknown sequence (NNN) at the 3'UTR could makes the gene model incomplete. \n
SPU_009172	SPU_009172	none	#\nThis gene model has no TIR domain. But the nucleotides encoding SP, NT, LRR(9-21), CT has 85% similarity to another typical Sp-Tlr. Unknown sequence (NNN) at the 3' end of this gene model could makes the gene model incomplete. \n
SPU_028478	SPU_028478	none	Sp predicted similar to CG5680-PB\n
SPU_000151	SPU_000151	none	dulicate accession: NP_999689 \nSPU_010284 is a nearly identical internal duplicate\n
SPU_017385	SPU_017385	none	One of 2, SPU_016748 overlaps with this sequence. They match exactly (DNA & protein). This gene is longer at N term. and appears to contain the start codon, although internal sequences appear to be bad, and the long runs of Es aren't 'real'.\n
SPU_008111	SPU_008111	none	matches middle of human/mouse gene sequence \nappears to be missing exons on beginning/end\n
SPU_024925	SPU_024925	none	Notes:  \n-1 bp missing after 34146 \n-1 bp insertion at 36263 \n-2 bp insertion at 39462-3 \n-2 bp missing after 39814 \n-18 bp missing after 41009 \n-15 bp missing after 41117 \n-mismatch btwn 41178-94 \n-12 bp missing after 41291 \n-6 bp insertion after 41348 \n-mismatch btwn 41465-73 \n-6 bp missing after 41496\n
SPU_013575	SPU_013575	none	Prediction of last exon is likely to be incorrect.\n
SPU_018780	SPU_018780	none	1 of 2, other is SPU_014846. These are overlapping genes that are nearly identical where they overlap, although each has gaps with respect to the other. This gene BLASTs with an e of zero to both CG7337-PA (XP_781545.1) and to MAPKBP1-like.\n
SPU_009620	SPU_009620	none	1 of 2, the other is SPU_009206, which seems to be an internal fragment of this gene. They are identical in the overlapping region. Both genes hit XP-794873.1 as well.\n
SPU_009206	SPU_009206	none	1 of 2, other is SPU_009620. This one is a shorter version of 09620, and is identical. Both genes BLAST to both XP_794873.1 and XP_783732.1\n
SPU_024798	SPU_024798	none	SPU_024798 represent the first half of the gene. SPU_009840 is the latter half.\n
SPU_009840	SPU_009840	none	SPU_024798 represent the first half of the gene. SPU_009840 is the latter half.\n
SPU_019790	SPU_019790	none	glean prediction looks like a duplicated part of larger rgs12-containing region. See full annotaion with SPU_004238.\n
SPU_002103	SPU_002103	none	Obtained SPU_002103 from S. purpuratus genome by using N-terminal peptide sequence of NM_003972 (human gene BTAF1) with score of 72 bits and E-value 3e-13. Other Glean3 candidates had poor scores.  \n \nThe best genbank hits (XP_795066, E=3e-59; and XP_788365, E=2e-55) are predicted partial peptides similar to TBP-associated factor 172 (TAF-172) (TAF(II)170) of Strongylocentrotus purpuratus and are incomplete at the carboxy terminal. Predicted TAFs from other organisms also appear with high scores. \n \nBlasting Genbank yields some sequence support from empirical data, yet raises some warning flags. Human BTAF1 RNA polymerase II, B-TFIID transcription factor-associated,170kDa, has an E-score of 2e-24. However, sequence data for an endonuclease/reverse transcriptase (presenilin gene) from Branchiostoma floridae (amphioxus) is also returned, with an E-value of 1e-24, and other reverse transcriptases from schistosomes, several mosquito species and chickens. \n \nUpon examining the Genboree presentation of SPU_002103, one observes: 1) The Glean, NCBI, and FgeneshAB predictions appear to span two different contigs while the Genscan model does not, by virtue of omitting the segment at the 5? end (this gene is on the ?strand); 2) The NCBI model lacks the largest exon predicted by all other gene models; and 3) No Exonerate or Splign data appear.  \n \nMicroarray tiling data from Systemix seem to indicate weak signals supporting two of the seven exons (5175-5338; 3816-4478), no support for two exons (1323-1481; 317-496), and questionable support for the remainder. \n \nUpon observing the questionable support for the gene in its entirety, I selected the sequence corresponding to the exon missing in the NCBI data and blasted the translated peptide sequence against Genbank. This time, my query obtained a hit corresponding to XP788888, which is the predicted CDS for an endonuclease/exonuclease/phosphatase family and RNA-directed DNA polymerase (5R694) of Strongylocentrotus purpuratus. Moreover, when I blasted the N-terminal 240 aa against Glean3, the best hit was SPU_003468 (484 bits, E=e-137) instead of the initial GLEAN_02103, and is a stronger match than the original.   \n \n
SPU_009223	SPU_009223	none	#\nThis gene model doesn't have a TIR domain. But the nucleotides encoding SP, NT, LRR(8-22), CT has 90% similarity to another typical Sp-Tlr. The unknown sequence (NNN) in 3'end of the coding could makes the gene model incomplete.\n
SPU_001621	SPU_001621	none	internal duplication: looks like the ends of two contigs in the scaffold (aagj01193203 and aagj01193210) are actually overlapping... hence one of the exons was duplicated in prediction & deleted in annotation (actually, 2 exons are duplicated because of this, but only one extra made it into prediction)\n
SPU_009450	SPU_009450	none	This gene model may represent a pseudogene or contain sequence error. A part of intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified model contains some frame shifts, but reflects best gene structure. \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_011156	SPU_011156	none	This gene model doesn't have a TIR domain But the nucleotides encoding SP, NT, LRR(15-22) has 90% similarity to another typical Sp-Tlr. The unknown sequence (NNN) in 3'end of the coding could makes the gene model incomplete. \n
SPU_016689	SPU_016689	none	Sp-seawi is made up of SPU_016689 on scaffold 19510 and SPU_024002 on scaffold 55433\n
SPU_025996	SPU_025996	none	This is the only N-terminus and should be combined with SPU_025997.  \n
SPU_011775	SPU_011775	none	Partial Toll-like receptor. The nucleotides encoding SP, NT, LRR(5-17) has 89% similarity to another typical Sp-Tlr gene. This gene model is located at the end of the scaffold. \n
SPU_012584	SPU_012584	none	#\nThis gene model doen't have a TIR domain. The nucleotides of the fist exon has 90% similarity to another typical Sp-Tlr gene, while the second exon could be wrong prediction. The fist exon is located at the end of the contig. That could make the gene model incomplete. \n
SPU_002257	SPU_002257	none	SPU_018588 is a duplicate prediction for SPU_02257.\n
SPU_018588	SPU_018588	none	SPU_018588 is a duplicate prediction for SPU_002257.\n
SPU_002983	SPU_002983	none	Partial sequence. \nNaked cuticle-2 is an EF hand calcium-binding domain protein similar to the recoverin family of myristoyl switch proteins.\n
SPU_025144	SPU_025144	none	SPU_025144 contains the first part of the gene. SPU_023888 contains the second half. Both the predictions overlap significantly.\n
SPU_023888	SPU_023888	none	SPU_025144 contains the first part of the gene. SPU_023888 contains the second half. Both the predictions overlap significantly.\n
SPU_016974	SPU_016974	none	This gene model doesn't have a TIR domain. But the nucleotides encoding SP, NT, LRR(7) has 98% similar to another Sp-Tlr gene(SPU_013751).  This gene model is located in a short scaffold, which could make it incomplete. \n
SPU_014928	SPU_014928	none	#\nSPU_014928 was divided into two Sp-Tlr genes (Sp-TlrP10 and Sp-TlrP11). This gene model doesn't have  a TIR domain, but the nucleotides encoding LRR have high similarity to other typical Sp-Tlr genes. The model is located at the end of a contig.\n
Sp-TlrP11	SPU_030087	none	SPU_014928 was divided into two Sp-Tlr genes (Sp-TlrP10 and Sp-TlrP11). This gene model has unknown sequence in the 3' region, which could make the model incomplete.  \n
SPU_011546	SPU_011546	none	Prediction possibly too long.\n
SPU_014932	SPU_014932	none	SPU_014932 was divided into two Sp-Tlr genes (Sp-TlrP12 and Sp-TlrP13). This gene model doesn't have  a TIR domain, but the nucleotides encoding SP, NT, LRR (13-23) have high similarity to other typical Sp-Tlr genes. The model is located at the end of a contig. \n
SPU_028188	SPU_028188	none	#\nScaffold_80160 is incomplete.  Appears to be complement factor B, but missing vWF domain and most of serine protease domain, probably because of incomplete sequence data.\n
Sp-TlrP13	SPU_030088	none	SPU_014932 was divided into two Sp-Tlr genes (Sp-TlrP12 and Sp-TlrP13). This gene model doesn't have  a TIR domain, but the nucleotides encoding SP, NT, LRR (16-23), CT and TM have high similarity to other typical Sp-Tlr genes. The model is located at the end of a contig. \n
SPU_015299	SPU_015299	none	This gene model doesn't have a TIR domain, but the nucloetides encoding SP, LRR(7-13) has 89% similarity to another Sp-Tlr gene. This gene model is located at the end of a scaffold, which could make it incomplete. \n
SPU_025703	SPU_025703	none	Incomplete gene model: expected 5' part of the gene is missing\n
SPU_015534	SPU_015534	none	This gene model doesn't have a TIR domain. But the nucloetides of the first exon has 91% similarity to another Sp-Tlr gene (SPU_015533).  The first exon is located at the end of a contig, which could make it incomplete. \n
SPU_018099	SPU_018099	none	The gene model doesn't have a TIR domain. But the nucleotides encoding SP, NT, LRR(6013) has 89% identity to another Sp-Tlr gene(SPU_015066). This gene model is located in a short contig, which may make it incomplete. \n
SPU_019041	SPU_019041	none	#\nThis gene model doesn't have a TIR domain. But the nucleotides encoding SP, NT, LRR(4-14) has 90% idenitity to another Sp-Tlr gene(13751). This exon is located at the end of a contig, which could make this model incomplete. \n
SPU_007261	SPU_007261	none	Partial sequence. \nSee putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137537553-3176-9890171404.BLASTQ4\n
SPU_019835	SPU_019835	none	This gene model doesn't have a TIR domain. But the nucleotides encoding SP, NT, LRR(3-7)have 98% identity to another Sp-Tlr gene (26274). It is located at the end of a contig that is far from the next one. That may make this model incomplete. \n
SPU_001665	SPU_001665	none	Similar to Dual specificity phosphatase 11. \nSee putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137536261-8980-151643868263.BLASTQ4\n
SPU_020666	SPU_020666	none	This gene model doesn't have a TIR domain. But the nucleotides encoding SP, NT, LRR(9-21), CT have 88% identity to another Sp-Tlr gene (05950). There seems to be an assembly error in the contig of this model, which makes it incomplete.   \n
SPU_024003	SPU_024003	none	Based on position on contig and alignment with best blast hit (human) it is likely this gene model is missing a 3' exon.\n
SPU_028884	SPU_028884	none	SPU_028884 represents 5' end (exons 1-3) of the Sp-Stx16 gene. SPU_028885 represents the 3' end (exons 4-11) of this gene.\n
SPU_020266	SPU_020266	none	Similar to Protein phosphatase 1D magnesium-dependent delta isoform .\n
SPU_028885	SPU_028885	none	SPU_028885 represents the 3' end (exons 4-11) of the Sp-Stx16 gene. SPU_028884 represents the 5' end (exons 1-3)of this gene.\n
SPU_021194	SPU_021194	none	This gene model doesn't have a TIR domain. But the nucleotides encoding SP, NT, LRR(8-18) has 91% identity to another Sp-Tlr gene (00615). This gene model is located at the end of a scaffold. \n
SPU_007282	SPU_007282	none	Similar.  Portions of this sequence are nearly identical to Tyrosine-protein phosphatase, non-receptor type 23.  \nSee putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137537628-5867-112430593701.BLASTQ4\n
SPU_021299	SPU_021299	none	This gene model doesn't have a TIR domain. But the nucleotides encoding SP, NT, LRR(18-23) has 89% identity to another Sp-Tlr gene (14548). This gene model is located at the end of a contig, which could make it incomplete. \n
SPU_018583	SPU_018583	none	It's missing 5' and 3' end of the gene.  It overlaps with SPU_007337.\n
SPU_021421	SPU_021421	none	This gene model doesn't have a TIR domain. But the nucleotides encoding SP, NT, LRR(13-23) have 87% identity to another Sp-Tlr gene (16468). This gene model is located at the end of a contig that is far from next contig. \n
SPU_021425	SPU_021425	none	The nucleotids of the first exon + the following 200bp have 87% identity to another Sp-Tlr gene. The first exon is located at the end of a contig and far from the second one(76000bp).  The second exon and below could be a wrong prediction. \n
SPU_023505	SPU_023505	none	#\nSplign sequences support gene model.\n
SPU_000663	SPU_000663	none	This is clearly a partial sequence of a DUSP.  It has been tentatively identified as DUSP24.  Duplicate of SPU_018623.   \nSee putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137535380-30970-10768348136.BLASTQ4\n
SPU_021914	SPU_021914	none	Partial Toll-like receptor. The nucleotids SP, NT, LRR(14-23), CT, TM have 86% identity to another Sp-Tlr gene(00615). Unknown sequence (NNN) in the 3'end of the coding could make this model incomplete. \n
SPU_009937	SPU_009937	none	Similar to Intestinal acid phosphatase PHO-1.  Partial sequence.\n
SPU_023031	SPU_023031	none	Patial Toll-like receptor. The nucleotides of the fist exon + the following 683bp in intron have 90% identity to another Sp-Tlr gene (21162). This first exon is located at the end of a contig. and far from the second one (13500bp).  \n
SPU_018623	SPU_018623	none	#\nDuplicated gene.  See SPU_000663.\n
SPU_017995	SPU_017995	none	Similar to amPTPN3 and to Gallus gallus protein tyrosine phosphatase, non-receptor type 1.\n
SPU_016144	SPU_016144	none	The sequences for SPU_016143 were added in front of 16144.  They appear to be part of the same gene.  Also, there is an RVT domain in this gene that probably should not be there.  May be a sequencing error. \nMultiple duplications.  See also SPU_008253, SPU_016053, SPU_019852, SPU_020604, SPU_022839, SPU_024537, and SPU_027101.  Blasts to PTPRA but part of a novel clade in phylogenetic analysis.\n
SPU_016053	SPU_016053	none	Blasts to PTPRA, but in phylogenetic analysis it forms part of a novel clade with PTPRLec1, PTPRLec2, PTPRLec3, PTPRLec5, PTPRLec6, PTPRFn1, and PTPRFn2.  \n
SPU_022839	SPU_022839	none	Blasts to PTPRA, but forms part of a novel clade in phlyogenetic analysis.  See also PTPRLec1, PTPRLec3-6, PTPRFn1, and PTPRFn2.\n
SPU_023934	SPU_023934	none	Partial Toll-like receptor. The nucleotides of the first exon have 90% identity to another Sp-Tlr gene (09435). The first exon is located at the end of a contig and far from the 2nd - 4th exons.  The 2nd - 4th exons were eliminated. \n
SPU_023936	SPU_023936	none	#\nThis gene model doesn't have a TIR domain. But the nucleotides encoding SP, LRR(9-19) have 87% identity to another Sp-Tlr gene (14352). This gene model is located at the end of a contig, which could make it incomplete. \n
SPU_027101	SPU_027101	none	Multiple duplications.  See also SPU_008253, SPU_016053, SPU_016144, SPU_019852, SPU_020604, SPU_022839, and SPU_024537.\n
SPU_022506	SPU_022506	none	Missing N-terminus.  \n
SPU_001698	SPU_001698	none	Missing C- and N-terminus, but highly hit to GABA transporter.  For N-terminus, this prediction should be combined with SPU_001697.  See SPU_006561. \n
SPU_000076	SPU_000076	none	Missing N-terminus.  See SPU_006561, _01698.  \n
SPU_014977	SPU_014977	none	Missing N-terminus.  See SPU_003832. \n
Sp-TlrP34	SPU_030089	none	The nucleotides of the first and second exons in SPU_026438 and the intron between them have 88% identity to another Sp-Tlr gene (SPU_015066). The second exon is located at the end of a contig that is far from the third one. The third to 10th exons could belong to Sp-ABCH1. \n
SPU_028380	SPU_028380	none	This gene model doesn't have a TIR domain. But the nucleotides SP, NT, LRR(8-15) have 88% identity to another Sp-Tlr gene (14548). This gene model is located at the end of a scaffold, which could make it incomplete. \n
SPU_007205	SPU_007205	none	#\nMissing N-terminus.  See SPU_003832, _14977. \n
SPU_009356	SPU_009356	none	Missing N-terminus.  See SPU_003832, _14977, _07205.  \n
SPU_008246	SPU_008246	none	Missing N-terminus.  See SPU_003832, _14977, _07205, _09356.  \n
SPU_008617	SPU_008617	none	Full length.  See SPU_003832, _14977, _07205, _09356, _08246.  \n
SPU_016011	SPU_016011	none	Missing N-terminus.  See SPU_003832, _14977, _07205, _09356, _08246, _08617. \n
SPU_027712	SPU_027712	none	highly homologous to SPU_027713, located on the same scaffold\n
SPU_027713	SPU_027713	none	higly homologous to SPU_027712\n
Sp-PLN	SPU_030090	none	Absence of a complete cDNA prevents further identification of the gene, but it probably extends to another scaffold based on the protein's size and the position of the exisiting cDNA on the end of the scaffold.   \nNOTE:  \n- exon 8 is missing 9 bp after 2555.   \n- exon 4 probably falls in the poly-N region \n
SPU_022727	SPU_022727	none	Haplotype of SPU_005376.\n
SPU_018472	SPU_018472	none	Has similarity to hatching enzyme. Tiling data supports glean3 gene model.\n
SPU_005663	SPU_005663	none	Similar to Protein phosphatase 1 regulatory subunit 12B. Partial sequence. \nSee putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137537271-5077-208204720659.BLASTQ4\n
SPU_005182	SPU_005182	none	Sp-064 has many exons that are found on several Glean3 models; 05182, 12439, 18503, 17239.   \nThere are additional Glean3 models that overlap and may represent the other allele.  These are:  \n08381 overlaps with 05182;  \n10678 also overlaps with 05182 but in a different region than 08381;  \n18503 overlaps with 12439;  \n10474 overlaps with 18503; \n07883 overlaps with 17239. \n \nGLEAN3-05182 has exons 1 through 14 of 23.\n
SPU_013607	SPU_013607	none	Similar to Sp-R-PTP-delta.  Partial sequence.  May be a portion of a duplicate gene.  Another Sp-R-PTP-delta, SPU_000831, is not on the same scaffold.  SPU_000831 is probably a duplicate.\n
SPU_012439	SPU_012439	none	This scaffold has exons 16 through 18.  Exon 15 is missing from the assembly. \nOther Glean models overlap this region and may be the other allele.  SPU_012439 overlaps with 18503.\n
SPU_025759	SPU_025759	none	Similar to R-PTP-delta. Partial sequence. See also SPU_000831 and SPU_013607. See structure of this gene at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137533949-23252-71761336427.BLASTQ4\n
SPU_018503	SPU_018503	none	GLEAN3-18503 includes exons 19-22 and part of exon 23 of this gene, which is also found on GLEAN3-05182, GLEAN3-12439 and SPU_017239.  However, it does not include the 3' end of the gene which is located on GLEAN3-17249. \nSPU_018503 also overlaps with SPU_010474.\n
SPU_017239	SPU_017239	none	Glean3-17239 includes the 3' end of this large gene.  It overlaps with Glean3-07883\n
SPU_010823	SPU_010823	none	Similar to R-PTP-mu. See also SPU_022405.\n
SPU_006528	SPU_006528	none	Similar to R-PTP-mu. Partial sequence. See also SPU_016411, SPU_022686, SPU_026582, and SPU_018743. \nSee putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137537418-19107-58243529506.BLASTQ1\n
SPU_023162	SPU_023162	none	Blasts to PTPRT, but in phylogenetic analysis it forms part of a novel clade that also includes PTPRLec1, PTPRLec2, PTPRLec3, PTPRLec4, PTPRLec6, PTPRFn1, and PTPRFn2.  \n
SPU_001207	SPU_001207	none	Incomplete prediction/assembly problem. First 305 AA from the 544 AA protein are present. \n
SPU_001597	SPU_001597	none	A description of an homologue of this gene appears in: \nEmery,P., So,W.V., Kaneko,M., Hall,J.C. and Rosbash,M. Cell 95 (5), 669-679 (1998) \nCRY, a Drosophila clock and light-regulated cryptochrome, is a \nmajor contributor to circadian rhythm resetting and \nphotosensitivity\n
SPU_011174	SPU_011174	none	Matches_SPU_008752. The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: FRELDSLYEGISSITSSRRRVEVTTFLLDQLESNDRSNGQVMVAQQPTNPTNNHVNNNMNSGNLEQPMHHDSESDEGFEEMDTGVEGAVGQSHSVSPTPSDE\n
SPU_008752	SPU_008752	none	Matches_SPU_011174.\n
SPU_017642	SPU_017642	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: STYIGPWCAGFGDLRARGSSGGSRPYTCTCTYTRILAIWGTGMIRDPVMKNKKFTVWFPKIGVHLIHRDLRYCAFVINLLICSSLATCFLSDGCQCLFKLL,DRFLSLFAHLECVSETLIGILSVCLLSSSQTWKWKSFIPVNIFLHLSCLQERCAANCGICFCYLQYLRMVVENIIKTFRD\n
SPU_023867	SPU_023867	none	Exon 2 is alternatively spliced in Paracentrotus lividus, Arbacia punctulata, and Sphaerechinus granulavis.\n
SPU_009262	SPU_009262	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: ASLSPTPHVHQRYPQGLGHLQSSPASLAYQTASRSQETCWRGVYQMVKLHQGQYSSVTVSCDLNSFGVCRRVLFQRLPSAGSASLVSSGCSCPVLGVFRIPLMRDIYQVTLASSLQTLYYL,QFQGTLCLLFWKVFLSFASLAFLILHQYLVYLLFLVYLFFVSPLLLLDFVCGSHWCLSLLNEFLLVWCVLHDYSQGFVLSEG,NGFYQQILRWLNSLYPSYTKDTSSRFLDREENGGLFLWLRFNDRFSLYPDLLHDLHDFLHGRLADDIRHKQVFLQHLTFTKGIHKV,TVFIPHTRRIHPQDFLIEKKMEVFFCGCGSTIDSLCTPTSFTIFMTSFMAVWLTISVISKSFSNTSRSPKVSTRSRASSKLTRFFGLPNSKSLSGNMLARGLSNGETSSRTVLICDSFLRSKLFWRLSASSFPTFTFSGLRFFGFFRMLLPCAWCVPDSFDERYLSSDTCFLTSDTVLSVTNSLVGIGLVTEVSSGALPNS,SRLLVESMPASLNVLSADGSSTSSILDSVCVCLILASFSGDWLLCALLSITISGDPVSTVLEGLPLFRFFGFSDPASVSGLPAFLGLPLFRFSSPSAGLCVWLTLVSFPTE,PMLISFSAGWCVTTASSKPTRDSVLNSLGGLPLFCFSSSCFISKMHALDLSFKAPLLSSVNLSNLISTVSILDLSIVDVLATTLGDKQDPGTIPLLGLPCFCLSPFPDPELDIFGWYGLNSLKGLPTFPVLFFSDSTSDSNCSVRASLPVD,FSFRGLPLFLFSLLSVPESIPDRLTSSSLSVDWLSSTLACKLSTDESDIDTFRGLPLFLFSELESTSESVPDFPNPAILSNDWPRLTLDPLLGLPLFLFCGLSAEVSTPS,SSIHTLPLHCANKHGFIPSPPTLSSSGLPPPPPSPPPPRPLLGSHKHAASPTTLAATTAATAATTAAESGPATGAVATVACRHAPPGSFHSSSH,TISSPCYVSPDVHKYRYTSACCLGLFCCASAIQIEIKILCWVIQNLYFLLLQYQGFYLSQQTKLRFSICVCVYLFMNYLCTCTMQPLIPCHCSLAPGFTFIWRV\n
SPU_023730	SPU_023730	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 5,8.The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: AEPKLCRGTPTHSITVQHSIIWSMLTKPCASMHTFSRSNQGEKSKHFFIDIWTGSVKHVIFESCANSIDLSPSAYAIGDCLCRSRYAQTPMVDFIRTLKLVQ,QSSFDSGIILVTSLHSICHTNPLFPHHQSSPCRNLTQTKILHSPLHTTNLPHAETSRKPRYSIPLSHHPSSPCRNLTQTKILHPPLHRHTDYKSTDDLQHLSYSEQCNFLLGLLLD,HLYTPSATPTPSSHTTNLPHAETSRKPRYSTPPFTPPIFPMQKPHENQDTPSPFHTTHLPHAETSRKPRYSTPLYTATLTTSRPMIYNILAIRSSVTFCSDFCWIDIDLTGVTGVLWMRMGNIIFQQIIIKRGLL,FRFGHHPRDIFTLHLPHQPPLPTPPIFPMQKPHANQDTPLPPSHHQSSPCRNLTKTKILHPPFTPPIFPMQKPHANQDTPPPFTPPH,VERSSNGIKPRATSLRPPLEYIITRLLQEQKPDRGKLDRETRTLRKRLVTLRASLVIPSAPPAQKAQKHGCSPDLIMLSLTVISSSLHPSRAAVHHSNPPPPLFHPIPSLHFILPLFLVPLRFRV,PGNKDPEETFSDITRFTCNSLCSTRPEGAKTRMQSRSHHAIPYGDIIVVASFQSSRASFQPPPPSLPSHSIPPLHPSSVSCPSSLPCR,LNHVLNRIKTMKEQDTLRHHSLHLAFREPYPTIPVSQRRQLHDNGTEISLTLSNMQRPKHLLHSPPLLLSHLNYHPKCSIHKKTQNQIESSTLIRSCPLFSITLVLYP\n
SPU_008177	SPU_008177	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 16.\n
SPU_020637	SPU_020637	none	Matches_SPU_026099.\n
SPU_018126	SPU_018126	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: TSSGGMFHLNTPLKLGKALHQWGGIVFIFTSILWCIKIIRPIFFQAIQPTGDKAIYFSGSSDRDKRAHLMIVLPVNSRGRPLM,VQAGQSQSRSNPIEISKVSTMKLPFNLFLNSHTPFLKPCFQSACCLLIVETIRHSFLTPYSILVHVSCTNVERTLNPIIASTCSHTRLR,NRHTPTITHTSQTGCHPHVPQSSTGFNLNDKSASSTRCRFFAHIHRLIKTYGSENVDIEIRHVYSLSWQHMGASMRFVLVCSGSSGGVHKQCNVYRNRLPHRV,KPIHPRWFCSFGTGVALGMIGRFEKGQNVRLIRREKVWHNHDNGNSEPRITLRPQLSKFIWRTPSRTLLTSVSAFPLIRGIKPGGFIRQNEYLPVYANTDIERACA\n
SPU_014539	SPU_014539	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 1.The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: PYPLRPIAHTPLSITIRRSLPLPCSFPFPFPSLLPLYLACPLSSLLPSLFLLFSLSFPCLLSLLPLSTSPLSILCSLLPLYFTSFST,PTPRSPSRSVDHSLSLALSPFPSPLSSLSILHVPFPLFFPLFSFSSPSLSHVYCLSSPSPPRPFLSSAPSSLFTLLHFLLDIIAVYAFPNTNSLPLLN,LPPSSHSPHPALHHDPSITPSPLLFPLSLPLSPPSLSCMSPFLSSSLSFPSLLPLFPMSIVSPPPLHLAPFYPLLPPPSLLYFIFYLILLQCMLSQTPTLCLFLIEF\n
SPU_016343	SPU_016343	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 1.The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: SHTNPSFFQSLYLPKEWYKSLLTYSVCFPRNSSEDTKIAIKSRDKNPFPFLLYDLLFMITVHFPRGLLLHSDKSPSLSQDV\n
SPU_002677	SPU_002677	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: LHASPPCSPSPYLHLFFPPSLSIPSTMIAPRITSLLSVSLSPSFLPSIALHPFRCDHLAPRIPSLLSVSLSPSFLPSIALHPFRHDHL,SLYFSFSLSFRLPISPSILPSVVLHPFRHDHVAPRIPSLLSVSLSPSFLSSIALHPFHHDSSTHHLLAFRLPISIFSSLHRSPSLPL\n
SPU_002603	SPU_002603	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: RARSEKRSTSLSKLLIDLPINCQKYARYCSEIRAAVSLFKQNSSNCLQCANLLRTILVQYTSHCYCYFIHLLYLVTSSRLDTSFVCLTSNFEETVNCHFYILYNEQSGNNRIRIR\n
SPU_005435	SPU_005435	none	Matches_SPU_023739. The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: CMSFSLLVLYLFFCISLSISPSLPLHYASLSSFNLCLSVSVSLSLSLSLSLFPFLSLSLPFSIMVMSLSIHLPSLIFPLPLSFYVHLGLFLLLCLSFPIF\n
SPU_023739	SPU_023739	none	Matches_SPU_005435. The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: VIITTTIIIIIISNLTTPPLSLTMTLEIPFKRGWDLSRRGCSSLCLVNTDKLNNVYTVCVVCHPFSTHRINGTFPCMCGHLQGILRVRTTISFSPCHL\n
SPU_004598	SPU_004598	none	Matches_SPU_006159.\n
SPU_006159	SPU_006159	none	Matches SPU_004598. Transcriptome data indicates that Glean may have falsely predicted the following exons: 2,4.\n
SPU_014461	SPU_014461	none	Matches SPU_024163. Transcriptome data indicates that Glean may have falsely predicted the following exons: 1.\n
SPU_024163	SPU_024163	none	Matches_SPU_014461.\n
SPU_025486	SPU_025486	none	Matches_SPU_025486. Transcriptome data indicates that Glean may have falsely predicted the following exons: 2,3.\n
SPU_023868	SPU_023868	none	Matches_SPU_023868.\n
SPU_020346	SPU_020346	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: RASLLYPTFPDSLCLSLSLSLSLSVCVCVCVCVCVCVCLPPSLSLSLSLPALLAFSIFPSPHIFFTLSFLDSILHLGQPVFLSSLSLSLSLYMYLPTLSFSNPPSFCYVFLNYLFAYSLSKFSFFFLNL,LISSPLFQYCHVFICHNTISHPLNAPLYSTLLSLILSVSLCLSLSLCPSVCVCVCVCVCVCVSPPLSLSLFLSQLSWPSPSSPLPIFSLP\n
SPU_014177	SPU_014177	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 1.\n
SPU_011315	SPU_011315	none	Matches_SPU_011315. The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: LHVCRLLQAICLSSSLFPLTLSSVSSSLKCSHFPIYSLSSSSPFSLSVSLPPSLSLSPSPFHSPPLIILCLSGDASSLALSLPSMPAHFTAVC,KCIEGDYMYVVCFRQSVCLLPSFPSPSPVSLHLSNAPISPSTPSLLRLRSLYLSLSLLPSLSLHLPFILPRSLFSVSLAMLLRLLFPSRPCRPTSQLFA,ILRDENVLRVITCMSFASGNLSVFFPLSPHPLQCLFISQMLPFPHLLPLFFVSVLFICLSPSFPLSLSISLSFSPAHYSLSLWRCFFACSFPPVHAGPLHSCLP,HQHSLPVAPCKTPCRSRETIPIIFFHFKTLMQPPQMTNIVQYFIIKTRARGLLCTFEAALPNPPPLLLSLSLLHHPQTIQSTHPRPKQMPGKFTNDHTSGGSRIL\n
SPU_017983	SPU_017983	none	Matches_SPU_017983.\n
SPU_018954	SPU_018954	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: LDDFKRINKPNGVDDNLQKKYQVLERHVTSLRQYGQKMNQRLEMLETSNNGVCVWKIANYNEKKKDAMKTNVKSICSPPFYTSQYGYKLCGRVFLMGDGVGKGTYISLFLTIMKGSFDAVLPWPFKERITFQLVNQDDSINKSIVEAFRPDPASSSFKKPTTEKNIGAGCPLFAKIQIIEDPKSGFIRDNTMYLKIICQTSDVPEIK\n
SPU_006753	SPU_006753	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: XXXXXRLHLSGVWCSLSDNAAGSETLDFMCLVFSVKPLDLEVFYLHYLKFGVRQGYWIWSTDLTAIIWCEVPANVTSFGALHLT\n
SPU_013305	SPU_013305	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: ALDKGNTWQFPELHPNWNLNATSNLMKIFSHIFIRSTFLSLLILSVPVPMITSITTIPKIFNLVVANYHVFTKNTSSYQLKI,ISLSGGRGHLCDCLGGQSSSSIEAAYCYSHGRVGDGVTSPSTSSSLTKTTSTKLSPHSSLSSSSGCYIIPNLMGHVIYLIIFLWDSPCNYFI\n
SPU_027487	SPU_027487	none	Matches_SPU_008936.\n
SPU_016650	SPU_016650	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: GMSKVIWYSLLQSDLSAEEIQAWLDRKVHGTVLESIHTCRDTDPAMQDNNYARYERSDKALKPLIGQVMCTRPSISCATCRSSIVNFPSPGASEDQLEQLGLREQVSMVYMIEWMCCTIF,SYYVDKHDQRERTEPSPFSFRLPSVALFSHLTFHPLFCFQFDIKGKCPHSRATPLMLLICPCTKASVNKTSDIGRRRPLTGSPLFSV\n
SPU_006150	SPU_006150	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 4,7.\n
SPU_015456	SPU_015456	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: RSWTEAMTERIRDVHILREGERERGGDREKERERRRERERGDRGLEEMRKKAYAWWEKQQCSIHRRVQSSMRMYPGGMTGCYARPQDHRKQKRKTTTTTT\n
SPU_000485	SPU_000485	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: SLSVSLYIYIFPLFFPFFDTTLSPSLPPFFYFSTTQDLYMSPSIFYFYFSSLFHLYLSLSLSLSLFISLSPSLSLSHSFLSSYSVLSSLSVSSFSFFLSPTENA,LSLSLSIYISFPYSFPFLTPLFLHPFPPFSIFPPHKTSTCLPLFSISISLRSSISISLSLSLSLSLSLSLPLSLFLTPSSLHIPSFLLSPSPLSLSFFLQLRTR,HHSFSIPSPLFLFFHHTRPLHVSLYFLFLFLFALPSLSLSLSLSLSLYLSLSLSLSFSLLPLFIFRPFFSLRLLFLFLSFSN\n
SPU_002592	SPU_002592	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: IFATKRPYIVVAVQPVIKCELLYILFLPMSPLHAPSLPFFLDNIIWSSSFLCLSISLKYPPDTNATFSNIITIITQNRLLLSPSISPILPFVYIK,VAGSKKKTLVSFISCPKPESGRACTPLVPLWPVFTRCKPPKRGQTLLDIEWMRTYLIRLPPSRLLCATLEIFSQTYESNS,SLLSLSLTPLTSLSYLPPFPFLPSLSFSYSSSLSSSSSFPLLLLLFVFLFVFFPFPFSRLLSLFLSFSVSPSSWVLAGGSRGHCPQISCPCRCPSFGHIVMWLLMCLSSTMTQQ,HFSLFLLPLSLHYPIFLPSLFSPLSLSLILLLYLLLLLFLFFFFFLSSSSCSSLSPSLVSSPCFFHSQSHPLPGSLQEDLEGIAPRFRAHADVLRLVTLSCGC,FVNRRTPLTQNLSLRLLSQPYHLPPPSLSLTPFLFQFNHTSTRIHCLIGLFSDKRGNKSTKLMSRPDDDTISKTNMSERINYQNIMERGCKSNSSCDGNGCIGIDG\n
SPU_016685	SPU_016685	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: PLVTYAYLPDCLSGSSRGQVRKARRSGACFVEFQFHFCLPSVSLPLSPSLSLSLSLPLSPSLHLFLHDSFHPSLTLFSPLQ,LPTLTFLTVYRGAPGVRYEKLVARGPVLSSSNFIFAFPPSLSHSLPLSLSPSLSLSPLLSISFFMTLFIPRLLFFPPCNENGLYVSHLSVV\n
SPU_013843	SPU_013843	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 2,4.\n
SPU_013178	SPU_013178	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: MEQVDPPKDDATSSESLGEQREKTVPDVNSAGDQCSKESDGNEGKEETAKIPNSKEEDPALPSTSTGEEGMTADSSSHDDPEGNADEKMEESKDTDDKIEERQGTDDKGAKQVDGDDQLEEGEDRNNEHPGREPRDAEFTSEIFKIMLRNLPTRFGFQVGV\n
SPU_022971	SPU_022971	none	Matches_SPU_022971.\n
SPU_015425	SPU_015425	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: FKSFQKTTNGGMSVSFSPNLSSNLEKNTLQNFDLNCHDKTVFYDIGHRSRLMQDMCTYKTFMQKKISQNNIQEHYYLFEEKQNGRRNASNPCSTVSISILNIRILIP,HDLNLSKRQQTGECLFHFLLILAAIWKRIRSKTLILTAMTKLYSMILDTEAGSCKTCVLIKPSCKRRYHRIIYRSITIYLKKSRMEGEMHPILAALCQFPF,LGVTLRDRRRNKEIRKELKVGNILELARDMRLRWFGQSEWADEGKPAKDRMTRAVEGSRGRGRPETCWKEGYLKKELNLTAAQTGNRREWRLRIRPTNPC\n
SPU_011080	SPU_011080	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 1.\n
SPU_005022	SPU_005022	none	This is the bHLH domain of Sp-ahr.  The C-terminal sequence is  either in SPU_013788 (more complete) or SPU_012296 (one PAS domain only).  \n
SPU_025737	SPU_025737	none	Matches_SPU_025737 The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: AIFFNSLESLTREALPSSPTLESVLPYPESSVNLPNAPPACSILVRGVLSEEFSSTLGPVLLGVLSSVGRGRINEVMSEPGGALSLFAGTSWPVLG,RILLFTVAFCLSHIFQQLGVPHKRSPAIIPHTGIGASIPRVQCKSSKCTTGLFHSRKGGAVRGVQLYFRTSALGCIVFSRAGENQ\n
SPU_025589	SPU_025589	none	Matches_SPU_025589 The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: ITSFMRQGPDLLLSSIRDLLTDNGHNILETIPLYLIQFSLVKRRRKELSLRQGTIRRTHFFEFVSEKAFSVIRKQIFFIFPQQ,IYSIYFTGGGELSNQLMFPTSMLSGSTSKFTSPNIFSKLEFSIESFPESLPSTYLLGVIIRDLRELRIPPDVLGMFLFTGDNMWCLGVLECPTRLL\n
SPU_011348	SPU_011348	none	Matches_SPU_011348. Transcriptome data indicates that Glean may have falsely predicted the following exons: 5.The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: TGGEPRQNIWTLIRSRRYGTTSSRSYDRLYMENNGLNWWRTPVESPDINPIEKVWNDLKRFLRQIVHGKQWTQLVANPSRISRH,TGGEPQQNLQISIRSRRYGTTLRGSYERSYMENNGLNWWRTPAESPDINPIEKVWNDLKIDIYIYIIIFAFFFILYTREWKTATKPEHLIGIEVF,CWKRCSLFASGSICNFEGHCRRSLTKNVKYRSIETETAVDHGVLNLEQEKSSKFLIIKYRLEGTRMKSLALLSRIVESSIARFIVESFQGPGSSP,VSNTFLFCQQVPHIFDTSGDKVKVRTEVYSHDALKMWWAALIACGIYIHAHLTMLCQRLINKYREKFMYGAPTIISIARPG\n
SPU_027598	SPU_027598	none	Matches_SPU_027598. The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: RISRRSELSNVKGSFLLRETTSELATSRVQFRIGFSTIQYSTDGTISRSPAKVMLEALQYICLRKHLQLRRPLQNGHRQRMINTVV\n
SPU_024486	SPU_024486	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 2,3.\n
SPU_014405	SPU_014405	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 8.\n
SPU_019444	SPU_019444	none	Matches_SPU_019444.\n
SPU_028093	SPU_028093	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: KPRWTYKQVACQNITAAKARQKWHKQSSKIGCSGGVSRFSVCTLPILMSVILFINESNLKLGSKLRIRGVVPYFVPLNNSLAIIEAKNRITWAKFLNRHKHVL\n
SPU_028148	SPU_028148	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: SYLSLHPYLFVCLPVFLSFCLSGFCRPLHILTPPVYICLLVCLSVCLSVWLSLCLRLSVCSCLYISLSLSVCPCMPVCLSVCVCMRVWVGGGVRVSLSFSILATPQPSSNYPALSPSLPLSQSLSHSFSLSSSLSF,LILSFSPSIPLCLSACLSVFLSLWLLPTTPHLNSPCLYLFVSMSVSLSVCMVVSLSPPLCLFLSVYISFSVCLSVYACLSVRVRMYACVGGWGRTCVSLFFNLSDPSTILKLSSSFPFSPPLPISLPFFLTLFISLILTLMSLFSLWAQLHSDPPS,ERQIKREKYRDKVCVSVLPVCVCVRVRERERERERGRERGGGAGKREKGVNERDRERERNTEIKCALVCCLCVRVCVCVCVCM\n
SPU_003704	SPU_003704	none	see SPU_009520 for information about correct glean model assembly. \n \nnote missing sequence: \nITPKCGVPNVFPSPLRLGE \n \nalso there is either alternate splicing, or a extraneous exon.\n
SPU_011576	SPU_011576	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 5.\n
SPU_001519	SPU_001519	none	Matches_SPU_001519. Transcriptome data indicates that Glean may have falsely predicted the following exons: 5.\n
SPU_003920	SPU_003920	none	Matches_SPU_003920. Transcriptome data indicates that Glean may have falsely predicted the following exons: 4.\n
SPU_027215	SPU_027215	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: VKWVCACYERYYFVLSCRFMVLNIITTDGGVVQITIYGGGGGVKSRLETHLCLHVKHSARYKRPSHSSAITRNGNVLSHTPVTLTILAPDRHQTDTRPTPDRHYPEQ\n
SPU_012491	SPU_012491	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: NSLLSCREQVSRKLAKLYFLHLGIRGGPISLIMITLASKIGLLKFTWKKVFDTGIANPLALMSPRHRSSQIQYFSTSTAGNRFKLSLSVKWDRPRILRNQSSSSIILRTFFQFIKNMMVSLIRYHFLMFLI,REREWWRMRYMQGGEDERGRDRWLDREKGRKRDRERERERRGYKWVQKIDHGLFEIDLYHHLSIPLFLPLHTIISCQKSKHSNACLTKYSLSLFKP,TTEPQTVFVCHVCSITLHLYLPLSLSLSITLYFSFSCSFYYYFSPFFHSHYFSSNALLPPHFRSLALSLSTNHLVSFSRSLLSLLFIFSYSPLRSLYGYDTCMWQIYRGS\n
SPU_013047	SPU_013047	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 4.\n
SPU_011297	SPU_011297	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 3.\n
SPU_027334	SPU_027334	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 3,4,6.The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: YCCPKPFRLVTATLRTMVKLTNEAMDIFSKTLSFLNIEPKMINATRPVGRPFSLGKEDVSVFHAVRHDHHHHIITMSDHHLMLIPDMNDEVTDAH\n
SPU_018351	SPU_018351	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: TQPCFKLNVFENRDARVKKCRLNAPLAKHPDKTKPSCPFLHLRPGSKLLCLQICAKRLGVGNSEKLRNFEACFIFFIKLLPIWFPQLILFCTQGLSYQDCYFPPIFIYFRFGIFNQGNRRSVIF,VSKSMVFTNTHLFLPFLISVGEALKVRFLLIKFAYSTSTLSLYIRSAPSGTYRIPVISMCDIILNIQLYANGNDHTFSNIKD,LSLVLLTIFYDTGRRSREFRLRAEPHFRSVFVYSSILFPYFQKLLNSKYSKINCFHGNLERNFPMPLNDCMRRIAATQCDL\n
SPU_007981	SPU_007981	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 5.The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: PSISKSPFLYQSSIIPPPFLHHSSTIPPPFLHHSSTGRLSFLHHSSTIPPPFLHHSSTSLLPFLHQSSSIPPPFLHHSSTSCLQFLHHSSTSPLLFLHQSSTNPPPLFHHPSTTPTLNLYQSFTNHNPKCLPH,FLHHSSTIPPPFLHHSSTIPPPVVYHSSTIRLPFLHHSSTIPPPVFFHSSTSRLPFLRHSSTIPPPVVYNSSTIPPPVLYYSSTNPPPILHHSSTIPPPLPHSISTSHLPITIPSVFPI,KSIPLPILHNSSTIPPPFLHHSSTIPPPFLHRSSIIPPPFVYHSSTIPPPFLHQSSSIPPPVVFHSSAIPPPFLHQLSTIPPPFLHQSSTIPPPILHQSSTTLPPSLHHSHTQSLPVIYQSQSQVSSPY\n
SPU_004414	SPU_004414	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: ASARVLCLYSLCIVRFLILPSDLCQGFPNLQWVHFCRRCHLHVHSMCMMDLSSSCKSFNICSNLQFKPFRSLPQFRLITFGFPVCRSHFKEILNYKFMA\n
SPU_004844	SPU_004844	none	Matches_SPU_004844.\n
SPU_017725	SPU_017725	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: KRGIRSASTQLIVHFVRLAFGYLHVKFFYLTCLHVCKMLIEIQQYYGLFNSMLTKEHCTLHVVAYSKGDREQVIFLNGFPCSRLKC,CPSHNMHSSLFQILVANMMSWLYSSQHSTHFNISICFAKFVIFHPGHVFLIRVSGRICSCSIASCKHTNCVFLPIHVYCLGCSHYSYNFFSIIRVGIKLYSPYLIRLYSEFTFF\n
SPU_001998	SPU_001998	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 2.The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: NTTCAARKLVYANETQNHHLFPIIRIMLFCHIFGIDLHLCGPYTGMHFHCKKYIKNEMIMSSPKINERIFPDVLASLLRSPFEIEEAPVAACVELYSSLINTTRCRRGNSEHNFIFA,LIRHDVEEATVSIISFLLSQPASQGVPHHTNYKYFHFIRASHCPRAFSFWLLAGFARIFGSWGRGFSCCSCDNRGKENMVGKEEEKNRFNIVSKKRVVAESVYHRLPRST,FSNNHHQSFLFLPSLSLSLSLCLSLSFCRLIIVCELFLYQSMASPLQSPPPIVRLPPLGTKLSTGWSCMHAEKTPQFWSP,LFSPFLSFLIITISHSYFYPLSLCHSLSVFLSLSVALSLFVSFSCTSQWLAPFNPLHPLFACLLWAPSCPQVGHACMQRRLHSSGPPKWPCLPCVVLRLPFFGGIDSSPIVSASVFLSPHSSGFGS,MKSIYKCNCSSCFNLGIVSKLESTAPCILPSLQLSGSNANQNKDILGKPFVETQWGCWSNWRRDDWPCDLKCHAGNIPTTSNSTDTDCLNTYVMVLVMNVFGYVCTCAVVFTLR \n
SPU_021210	SPU_021210	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: GEISCFSVHRLVTFLKVLFFRKCYLGSIKGSPHRHDDDRSSYNHQVNSRKYSVNMLMTSSLMHCGSICRKRTATFLCGLSLNLETSVEPLNVLTYMY,DYVHSNRRGCTPVSRVVCCRDIWQCGTKGSLCCAGDRDVYYRTELSRLVRRNALLCTCMICTLTEMAFLKGFDRFQTVSRLHIVCCSKRFCHFYRSTLYS\n
SPU_026962	SPU_026962	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: CVFHSFVYCKYDKLRKGDDRSQDLGSLQCVSYKDVIWTFTTGHKILSSSIDMILLFSCVTSLVYVCHYFMYVFLTPSPVPFVYPFILIETCSVLLVNSFTSEDT\n
SPU_010404	SPU_010404	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 1.\n
SPU_019268	SPU_019268	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 2.\n
SPU_018951	SPU_018951	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 2,3.\n
SPU_014157	SPU_014157	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 8.The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: LHLPILIYSSLIPSSTYPLVHLTFSFSYSFLLSIFPILLFSPRPASHDCVPRQPRFYPKPHLISCIHPVLHVCQESVLINLPTKLDPSFHSRLSTCMTV\n
SPU_014170	SPU_014170	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: QREEESEKEEKKGEGRKEKRKECRIESNLGFPISRVPILRLELKRGRGKERERERERGRGKERDIITCQKPLEESFRKEYNKTTAPGISDKQKLVYFKTSLSISQGALIHFKHVSQTHAQDNHYF,FLSCRGKLMMAEGNACMIYIRNTRVESSMCMVTNKVVWLDTKQKNVCQNIMITRPPGTVMRPTFFFLMKRRKDMNSCRARSKIWRKWVDGDIKQKVCAYVCVNEENVNSATVVWGQDCIK,LSTMLSTQEYQYRIMFRVTKLPRAVLHFFPVVFDVSISWQLPPPYFLLLSFFLTPPSNIFLPLFISPSLYLSICTMTFNGRRLKLHILKSNHTLPRSYQTPELAEGESRSTRD\n
SPU_019129	SPU_019129	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 2,3,5,7,8.\n
SPU_018366	SPU_018366	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 3,4.The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: KFTRIWSKLLGRTNICGFITNGSAVVGREGILSRSCFSDQCLKIGFVSPSPHPKGGHGVASGGGGGSKHANCWNLRCDMCNCNWCMLMNK,KFTRIWSKLLGRTNICGFITNGSAVVGREGILSRSCFSDQWSEDRVCFALTSSERGSRSGLGGGGGNLNTRIVGICVVTCVTGIFASSHIYMCMCIY\n
SPU_028827	SPU_028827	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: TDNLQVALHKKGLKNVKPFFAVIIDIFSTFERNNISVPSQKGVNSVLEDMSLKCLSGSTYINACNANEKPTCLRESDGEFPNSNQVGSCSNPLEKILVIVNTTLTLEYS\n
SPU_014802	SPU_014802	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 2.The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: LDNHYVILKFGKEMLKVSRIHHQPRAGPPPPPNILFLLFLLLLLLLLLLLLLLLLLLLLPLPWTWELQQQIIFLSFFLFFFREL\n
SPU_011202	SPU_011202	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 6,7.The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: RENNLQHQSTGNSGLFSASFLHSPFLLSSLPSFIHPTNPKVELIDTKLAQMLLNVKRSAEGIRIRLHIVERWGRNETESLWMQTDQSDMYLDRDR\n
SPU_026877	SPU_026877	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: QNCGCLAYLFTVVVMFYVNVRLIWKYTLKKESITLFESKTSFFSEALCNCENTVCALLNEMSVCLRVRVCVCEAEGGRERGTDRDEEKGDETRGSENVRLAEILEDEDAGWKEEGEPKKRSYDAHVLYWNLVLSSSLSWVPGATRALTK,NVGFRFLSKSNLFSLVAHDSFAIECTKYNYSACKYLNKKKTLFSQDILLFIKGHVSFVIALPPKMLKIFLHRIYFYIHAPTAESYSNDFRIGGKSNCILSPSLYNSSTILP\n
SPU_016168	SPU_016168	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: QEIVRWPALCHTPSLPPFHSATTPPTRVDLSVTPPTFIVYSFPFYTYQTQTNKKQTKNICVKERRTVRPSTLYFSILTFLYFI\n
SPU_026905	SPU_026905	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 9.The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: MKSSAVFCNSSMASSFISSIISSTSFFTFFFTFFLTFFLTFFLALSLAFFFAFFFTFFFALSLTFFFTFFFAFFLAFFFTCFFTRFFACFFACFLTCFLACFLICFFATFFPVFLAALLPPFLAADPATFIILTVC,FIYGIILHILHHIVYFFFHLLFYFLLDLFLDFLLGLILGLFLRLFLYFLLRFVLDLLFHLLFRLLFSLLLHLLFYSLLCLLLCLLLDLFLGLLFDLFLRHFLSGFLGRPLATFLGR,NLPDDEENNLQNNLDEPWPFISCQRACSGYVLFKDLYGWPSTLILLLAALVFSFLLFFFLPWRHFPVHFFLIIVIRPFACT\n
SPU_017404	SPU_017404	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 1,7.\n
SPU_007242	SPU_007242	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: QKVEDEHSSRPESLPDHHTDFQFLGELGSSQSLEYFLGKLGLSQSLEYFPEGQDCSSYEWNSYESIENADELTYNCTWCSIPKPCREGRGKQNNISSKI,LSMKLQCWVKNLRRLGGTRRRHNSRRLKTSTPAVRNRCQITTQTFSFLVNLGRLRALNIFLVNLGCLRALSIFPKVKIVVPTNGIPMRA\n
SPU_010438	SPU_010438	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: LLPTPFQVFLHNRKQMCSIIHFLYVIALNEMQNVENVELIQVRVNFEASVHTALNECRLELHVCLCLLFFMSYMNIIHSPSELH\n
SPU_005831	SPU_005831	none	This gene model doesn't have a TIR domain. But the nucleotides encoding SP, NT, LRR(16-21), CT, TM has 97% identity to another Sp-Tlr gene(SPU_018838). So it could be a member of Toll-like receptor family.\n
SPU_024820	SPU_024820	none	Blasted to PTPRA and PTPRT, but phylogenetic analysis showed that it is part of a novel clade also containing PTPRLec1, PTPRLec2, PTPRLec3, PTPRLec4, PTPRLec5, PTPRFn1, and PTPRFn2.\n
SPU_011286	SPU_011286	none	This sequence apparently encodes the first exon of Sp p38, the rest of which is contained by SPU_010118. Amino acids 1-47 (approximately) corresond to p38. Correct sequences below: \n \nDNA: \nATGTCTGCTTTTCATTCACTGCCAGAGGACTTCCATCACATTGAACTCAATAAAACGATATGGGAAGTCCCCAATCGGTATGTGCGACTGGAACCTGTGGGCTCAGGAGCGTATGGGCAAGTATGTTCAACAGAA \n \nProtein: \nMSAFHSLPEDFHHIELNKTIWEVPNRYVRLEPVGSGAYGQVCSTE\n
SPU_028213	SPU_028213	none	SPU_009602 may be a duplicate prediction.\n
SPU_019852	SPU_019852	none	Blasts to PTPRA, but forms a novel clade in phylogenetic analysis with PTPRFn1, PTPRFn2, and PTPRLec2-6.  \n
SPU_000237	SPU_000237	none	partial sequence only, internal. Also seems to be missing an exon.\n
SPU_019770	SPU_019770	none	SPU_019770 codes the first exon(s) for this gene. Rest of the gene is present in SPU_019769.\n
SPU_019769	SPU_019769	none	SPU_019770 codes the first exon(s) for this gene. Rest of the gene is present in SPU_019769.\n
SPU_017187	SPU_017187	none	One of 3. This gene is a partial sequence, and is identical to 19022, which is longer and encompasses this gene. SPU_001396 also is a JIP3, but is distinct from 19022 and 17187.  This gene and 19022 also BLAST well to XP_782498.1, sperm-associated antigen 9 isoform 1. \n
SPU_009388	SPU_009388	none	Similar to phosphohistidine phosphatase 1.\n
SPU_026582	SPU_026582	none	Partial sequence.  See also SPU_006528, SPU_016411, SPU_022686, and SPU_018743. \n
SPU_028046	SPU_028046	none	Similar to Protein phosphatase PP2A regulatory subunit A. Partial sequence.\n
SPU_001694	SPU_001694	none	Similar to Receptor-type tyrosine-protein phosphatase R. Partial sequence.\n
SPU_015535	SPU_015535	none	Similar to Receptor-type tyrosine-protein phosphatase R. See also SPU_020488.\n
SPU_023889	SPU_023889	none	homolog: arrestin beta-1 from human, isoform B\n
SPU_020488	SPU_020488	none	Similar to Receptor-type tyrosine-protein phosphatase R. See also SPU_015535.\n
SPU_011457	SPU_011457	none	Missing N-ternimus.  See SPU_014876.  \n
SPU_005942	SPU_005942	none	See SPU_014876, _11457.  \n
SPU_020542	SPU_020542	none	Blasts to PTPRK, but didn't clade with these genes in phylogenetic analysis.  Formed a unique clade with SPU_015923. Partial sequence.\n
SPU_008253	SPU_008253	none	Similar to R-PTP-alpha.  See also SPU_016053, SPU_016144, SPU_019852, SPU_020604, SPU_024537, SPU_027101, and SPU_022839.  \nSee putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137537808-25802-57187428179.BLASTQ4\n
SPU_017329	SPU_017329	none	The ATG (exon 1) probably lies in an unsequenced area.  Other notes for exon 15:  \n- 23 bp missing after 23565 \n- 16 bp mismatch btwn 23566-81 \n- 24 bp missing after 24183 & 24237\n
SPU_006155	SPU_006155	none	glean modified to correspond to est. \nSeems like it is missing the C-terminal part of the expected protein. It could be contained in the prediction SPU_017460, but the est doesn't line up with the exon sequences there\n
SPU_024537	SPU_024537	none	Similar to R-PTP-alpha.  See also SPU_008253, SPU_016053, SPU_016144, SPU_019852, SPU_020604, SPU_027101, and SPU_022839.\n
SPU_028020	SPU_028020	none	it seems that this prediction has two matches to Nef3 in C-terminus and N-terminus.\n
SPU_015941	SPU_015941	none	It hits to the same query Mouse Rufy3 as SPU_028460. They maybe the same gene,maybe not. It is named as Sp-Rufy4.\n
SPU_028184	SPU_028184	none	has one kazal and two TY domains - like a splice isoform of SMOC (Q9H4F8)- a SPARC homologue\n
SPU_002025	SPU_002025	none	Contains single NtA domain like N-terminus of agrin. \nOther GLEAN  predictions contain FOLN and KAZAL repeats and may comprise the next segment (especially SPU_002467 and possibly SPU_024994).  A fourth gene looks like the next piece (SPU_022633)and the adjacent gene (SPU_022634) contains LamG repeats that look like the C-terminus. These five gene predictions may be adjacent and comprise a full agrin gene.\n
SPU_017460	SPU_017460	none	glean describes c-teminal part of the gene \npotentially could be the c-terminal part of the gene descibed as SPU_006155, but no linking est data is available\n
SPU_015605	SPU_015605	none	Hh signaling pathway member\n
SPU_019022	SPU_019022	none	One of 3. This Glean is identical to and encompasses SPU_017187, both of which BLAST to JIP3 as well as sperm-associated antigen 9. In addition, SPU_001396 is also a JIP3, but does not match these others and so probably represents a true duplication.\n
SPU_024688	SPU_024688	none	See also SPU_005592 and SPU_006723.\n
SPU_005592	SPU_005592	none	See also SPU_006723 and SPU_024688. \nSee putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137537110-23138-78233751102.BLASTQ4 \n \nAnd this sequence is a duplicate of SPU_006723\n
SPU_001396	SPU_001396	none	One of 3. SPU_017187 and 19022 also encode a JIP3, distinct from this gene.\n
SPU_025413	SPU_025413	none	Similar to c-myc binding protein.\n
SPU_018908	SPU_018908	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: FTFTCYLSGQKKDPTILVLNRRSLLINIHEVPHPRAIWFYCYPPLHLNSSQPPDEYYKYSRSLCCVFVSCLCLYQAARSGL\n
SPU_007911	SPU_007911	none	Using SpADAM cDNA, two predictions align SPU_00911 and SPU_020545.  Both are nearly identical from 1010 to 3072.  Missing exons are mostly on Scaffold_317, but some are only on Scaffold_663 (2 exons encoding 681-865).  I have corrected gene features of SPU_020545.\n
SPU_020545	SPU_020545	none	Using SpADAM cDNA, two predictions align SPU_00911 and SPU_020545.  Both are nearly identical from 1010 to 3072.  Missing exons are mostly on Scaffold_317, but some are only on Scaffold_663 (2 exons encoding 681-865).  I have corrected gene features of SPU_020545.\n
SPU_010565	SPU_010565	none	looks like part of slit - the C-terminal half - perhaps the last few domains are artefacts/duplication - missing N-terminal half with more LRR repeats \n \nADJACENT GENE (SPU_010564) - LOOKS LIKE THE N-TERMINAL HALF - could be one or two exons encoding LRR repeats missing at junction\n
SPU_018348	SPU_018348	none	FA58C-LamG-LamG structure defines this as relative of CASPR - probably missing C-terminal half that should contain other domains (maybe FBG,more LamGs and EGFs, TM and/or 4.1m) \n \nThe prediction from SPU_007341 looks like a likely candidate for C-terminus\n
SPU_023409	SPU_023409	none	Has EGF and BNR repeats but no N-terminal reeler domain - probably a fragment. \n \nSPU_008268 is very similar in structure\n
SPU_023757	SPU_023757	none	An cDNA containing everything but the 5' end of the gene was used in a BLAST search of the contig database, which resulted in the identification of an overlapping sequence that starts with a signal peptide.  The first exon appears to be on the - strand whereas the other exons are + strand, suggesting an assembly error. \n
SPU_027145	SPU_027145	none	The full-length cDNA was assembled from several overlapping cDNA fragments and ESTs and confirmed by PCR of a full length ORF.  Gene features have been altered to comply with cDNA.\n
SPU_028236	SPU_028236	none	Likely the unique ortholog of human CDC2L5 and CrkRS \n"Additional evidences of the existence of the gene" have been obtained in Sphaerechinus granularis\n
SPU_023951	SPU_023951	none	SPU_023951 lacks C-terminal SH3 domain present in homologs.\n
SPU_021161	SPU_021161	none	partial sequence,  identical to SPU_028463 from aa 35-249 (with one exception). Mismatched short ends do not appear to result from frame shifts. Also BLASTs to XP_781571.1 (MAPKAPK5) with high e value.\n
SPU_013910	SPU_013910	none	Duplicate gene (non-identical) to other MAPKAPK5s: SPU_028463 and _21161 (these later 2 appear to be the same gene). The termini of this gene appear to be incorrect.\n
SPU_023676	SPU_023676	none	Also BLASTs strongly to XP_789413. Identical and internal to SPU_006513\n
SPU_011714	SPU_011714	none	sequence is only partial. SPU_020782 is an internal identical duplicate of this gene. \n
SPU_020782	SPU_020782	none	internal identical duplicate of SPU_011714. Also BLASTs strongly to XP_797035.1\n
SPU_026498	SPU_026498	none	This is the C terminal part of the protein; the N terminal portion is encoded by SPU_027848. These 2 gleans overlap (nucleotide level): bases 1-353 (this  glean).  \n
SPU_017694	SPU_017694	none	duplicate, non-identical to SPU_010805. Partial sequence.\n
SPU_007222	SPU_007222	none	appears to be missing the start codon, but 3rd aa is present\n
SPU_027370	SPU_027370	none	The N-terminal sequence (exons from nt 8030 to 19367) is not part of Sp-CDK7. This sequence is similar to the sequence NP_000327.1 encoding sodium channel, nonvoltage-gated 1, beta [Homo sapiens].  \nLikely due to a problem of contig assembly. \nThe N terminus of Sp-CDK7 is missing and the C-terminus (two ultimate exons)is conflictiv.\n
SPU_004845	SPU_004845	none	adhesion protein or cell surface receptor - novel architecture - FBG, an N-terminal  MNNL Notch ligand domain and multiple EGF-Ca repeats. Good Blast match with Notch homolog but that may be spurious (EGFs). \nCould be a Notch or Notch ligand but does not have ankyrin repeats characteristic of Notch or DSL characteristic of Notch ligands\n
SPU_011551	SPU_011551	none	Based on best blast hit data, this protein is closely related to tolloid but lacks the C-terminal EGF, CUB and CUB domains.  One C-terminal predicted exon and two N-terminal exons do not encode conserved sequence and may not be part of this gene. \n
SPU_007341	SPU_007341	none	LamG-EGF-LamG-4.1m - looks like C-terminus of CASPR or neurexin. \nNeurexin gene (SPU_024416) has its C-terminus - CASPR gene (SPU_018348) does not. Suggests this is part or CASPR gene.\n
SPU_014828	SPU_014828	none	All predicted exons supported by EST data.\n
SPU_012138	SPU_012138	none	EGF-LAMG-LAMG-EGF \n \nthese two domains occur in intermingled fashion in quite a few known proteins - neurexins, perlecans, CASPR \n \nURCHINS APPEAR ALSO TO HAVE NOVEL EGF/LAMG PROTEINS\n
SPU_015404	SPU_015404	none	membrane-proximal portion of an adhesion receptor - a bit like Crumbs - has several LamG/EGF pairs and a TM domain \n \nthese two domains (or other EGF variants) occur in intermingled fashion in quite a few known proteins - neurexins, perlecans, agrin, crumbs, CASPR, some cadherins \n \nurchins appear also to have novel EGF/LAMG proteins \n
SPU_024257	SPU_024257	none	#\nEGF-EGF-LAMG-LAMG-EGF-EGF-LAMG \n \nthese two domains (or other EGF variants) occur in intermingled fashion in quite a few known proteins - neurexins, perlecans, agrin, crumbs, CASPR, some cadherins \n \nurchins appear also to have novel EGF/LAMG proteins \n
SPU_004895	SPU_004895	none	Multiple EGFCa repeats and C-terminal Lamg/EGFCa modules \nNo obvious TM domain \nNovel architecture \nEssentially same structure as >SPU_016555 \nLook a bit like Crumbs but much larger and not the same domain organization\n
SPU_016555	SPU_016555	none	Multiple EGFCa repeats and C-terminal Lamg/EGFCa modules \nNo obvious TM domain \nNovel architecture \nEssentially same structure as >SPU_004895 \nLook a bit like Crumbs but much larger and not the same domain organization\n
SPU_009927	SPU_009927	none	This model is on a short scaffold and is probably lacking Both N and C-terminal exons.  One predicted exon, given below, cannot be validated by sequence similarity to members of the M12A class of proteases. \n>SPU_009927|Scaffold82736|3265|3440| DNA_SRC: Scaffold82736 START: 3265 STOP: 3440 STRAND: +  \nGAGAAGAAGAAGAAGAAAAAGATGATGAAGAAGAAGATGAGGAGGAGGAGGATGATGAAGAAGAAGAAGA \nAGGAGAAGATGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAAAAGAA \nGAAGACAAACAAGAAGAGGAGGAGGAAGACGAAGAA \n
SPU_020365	SPU_020365	none	#\nMultiple EGFCa repeats and three LamG domains interspersed before a TM segment \nLooks rather like Crumbs in overall organization but larger. \n \nSPU_016807 is similar\n
SPU_021812	SPU_021812	none	Likely unique ortholog of human cyclin T1 and T2\n
SPU_028742	SPU_028742	none	This model contains exons encoding a protein most similar to SpAN, a sea urchin astacin protease. but lacks a C-terminal domain found in SpAN.\n
SPU_009203	SPU_009203	none	Model probably contains partial CDS because it is at the end of a scaffold.  Of the 5 predicted exons, only 3 and 4 contain conserved sequences.\n
SPU_014989	SPU_014989	none	The N-terminal sequence of this cyclin is probably the one encountered in SPU_011295.    \nThree GLEAN: SPU_000328,14989 and 0011295 encode the cyclin L protein. They differ in the N-terminal end. \n
SPU_000218	SPU_000218	none	#\nThe model contains exons encoding CUB domains similar to those found in tolloid-like proteins within he M12A metalloprotease subfamily.  This sequences are very similar those in SPU_017070, but are two divergent to be allelic.  This is very likely to be partial CDS.\n
SPU_027114	SPU_027114	none	Comparison to best blast hit sequence suggests that the gene model contains exons encoding a tolloid/BMP-1 like protein.  The model is likely to be partial because there are only 3 cub domains instead of the 5 normally associated with this subclass of astacin proteases.  Note that this model is adjacent to a very similar gene, SPU_027115.\n
SPU_027115	SPU_027115	none	Comparison to best blast sequences suggests that this model contains exons encoding a protein related to tolloid and bmp-1.  It is like to be partial because there is only 1 cub domain rather than the 5 characteristic of proteins in this subclass of M12A.  Note that it is adjacent to a very similar gene, SPU_027114\n
SPU_019518	SPU_019518	none	identical to SPU_000129 over >200 aa; possible mis-assembly\n
SPU_026758	SPU_026758	none	strong identity to HIF2a through first 239 aa\n
SPU_008353	SPU_008353	none	needs to be split \naa 1-~900 = similar to Biotin protein ligase \naa ~900-1676 = similar to SIM but missing N-term. \nN-terminal likely found in SPU_013962\n
SPU_021277	SPU_021277	none	gi|72046985|ref|XP_786603.1|  PREDICTED: similar to ataxin 2 [Strongylocentrotus purpuratus]Length=898 \n
SPU_024739	SPU_024739	none	Comparison to best blast sequence suggests that all but the first and last exons in this model are conserved with tolloid-like proteins.  While the first predicted exon may be part of this gene, the last one ( >SPU_024739|Scaffold97632|8420|9724| ) encodes peptide sequence similar to other proteins.\n
SPU_003612	SPU_003612	none	This model contains exons encoding an astacin protease of the tolloid family most closely related to the sea urchin protein SpAN, but lacks other exons characteristic of this group of proteins, such as CUB domains probably because they are on other scaffolds and this model is located at the end of scaffold 72693.\n
SPU_005781	SPU_005781	none	NOVEL ARCHITECTURE - TY-TY-WAPx5-VWCx4-WAP domains\n
SPU_012001	SPU_012001	none	NOVEL ARCHITECTURE - EGFCa interspersed with 3 TY repeats\n
SPU_019601	SPU_019601	none	NOVEL ARCHITECTURE - WAPx4-TY-x3-EGFx3  domains\n
SPU_000957	SPU_000957	none	probably an ECM protein given its domain composition - TSPN - many VWC -VWD at C-terminus - novel architecture \nSPU_004940 has very similar strucure minus the TSPN \nthe FN1 predictions are likely alternative predictions for the VWC repeats\n
SPU_004940	SPU_004940	none	probably an ECM protein given its domain composition -   \nmany VWC and VWD at C-terminus - novel architecture \nSPU_000957 has very similar strucure plus a TSPN domain at N-terminus \nthe FN1 predictions are likely alternative predictions for the VWC repeats\n
SPU_027525	SPU_027525	none	C-terninus of this gene is SPU_027526 and should be combined.  \n
SPU_026798	SPU_026798	none	This is part of a sea urchin specific group of ADAM-TS genes.  There are two worm ADAMTS genes with some sequence similarity (wormbase# F08C6.1a.1 and C02B4.1)\n
SPU_012296	SPU_012296	none	aa 1-235 strong id to ss and AHRs \nSee also SPU_013788 and SPU_005022 for AHR-like sequences. \nSPU_005022 may be the bHLH domain of this model or of SPU_013788.\n
SPU_016604	SPU_016604	none	SPU_015419 model on scaffold 76434 is also part of predicted Sp-Pask.   FgeneshAB prediction S.P_Scaffold70175 may have additional exons, based on alignment with mammalian PAS-K.\n
SPU_013296	SPU_013296	none	only domain is the reprolysin domain\n
SPU_013297	SPU_013297	none	only domain it contains is a part of the reprolysin domain\n
SPU_025061	SPU_025061	none	contains TSP1 repeats found in ADAM-TS sequences\n
SPU_001545	SPU_001545	none	contains a TSP1 and calcium-binding domain \n
SPU_005275	SPU_005275	none	#\nPREDICTED: Strongylocentrotus purpuratus similar to Machado-Joseph disease protein 1 (Ataxin-3) (LOC581652), mRNA \n
SPU_005234	SPU_005234	none	Comparison to best blast sequence suggests that the model contains exons encoding an astacin protease.  It lacks domains characteristic of the closely related astacins, tolloid and BMP1.  One of the predicted exons (>SPU_005234|Scaffold6580|26133|26420) probably belongs to another gene.  The inferred amino acid sequence from the last two predicted exons (>SPU_005234|Scaffold6580|26133|26420; >SPU_005234|Scaffold6580|30923|31151| ) is not conserved.\n
SPU_010948	SPU_010948	none	This model encodes a protein with the same domain architecture as the sea urchin protein SpAN; the primary sequence indicates that it is a different gene.\n
SPU_001560	SPU_001560	none	Comparison to best blast sequence suggests that the model contains some but not all of the domains characteristic of tolloid/bmp1 proteins.  The last 4 predicted exons given below contain sequences similar to other kinds of proteins and therefore may not be part of this model. \n>SPU_001560|Scaffold70398|20995|21158| DNA_SRC: Scaffold70398 START: 20995 STOP: 21158 STRAND: +  \nCCTTGGGCCTTGAGAGTTATGTCATCCCAGATTCAAGTCTGACAGCTTCCAGTGAATTTAATGCTGACCA \nTGGTGCAAAGAGAGGTCGTCTTAACCTGGCCAGAGTCGGGGATCTGCGTGGAGGCTGGAATCCAATGGAC \nAACGATGCAAACCCGTGGATCCAG (no blasts to something else; maybe EGF) \n \n>SPU_001560|Scaffold70398|23209|23361| DNA_SRC: Scaffold70398 START: 23209 STOP: 23361 STRAND: +  \nGTGGATCTTCTGGACCTTTACCGTATCATTTCAGTTGCGACTCAAGGGCGACAAGATCTTGACCAGTGGG \nTTAATAGCTACAAGCTTGCTTGGAGTACTGATGGCACGACCTTTCGCACAGTGCAGGACATTCCCGGGCC \nAGGAGCTGACAAG (blasts as previous exon) \n \n>SPU_001560|Scaffold70398|23720|23890| DNA_SRC: Scaffold70398 START: 23720 STOP: 23890 STRAND: +  \nATCTTCATCGGTAATGTTGACCGCAACACCATCATGACCAACACTCTGCCTGTGTCCCAGGTTTGCCGCT \nATTTCCGCTTGATGCCTGTCAGCTGGTATAAACACATTAGTGTTCGTATGGAGATATATGGATATGGTGA \nAGGCCCTGTCACAGGTCAGTATGAAAACTAG (blasts as previous two exons. \n \n>01560 \nMSRTLLLSGLVAMLMAYSLAKPLRKQKGYTKTKVPQIKKVEFNGEILEIAVEEDDPFHRPIPADEGYSPNAYETDMMLNPEQEAALSDPKNSRNKRKASKDTTKYWPKKIIDQATSQHVINVPYEFGLGVDRTAIKAAMAHWQDQTCVRFEIHDRSVSSLWQHRLKFIKSDGCYSYLGLQSKIGFQDVSIGKGCTRLGTVSHEIGHALGFWHEQSRPDRDEFVTVNFANIIQDKMNAFRKHTTDDVMTNVPYDYNSVMHYGAYGFGIDAKVPTLIPKDPLSMGEIGQRLGLSYLDVKLANFMYECDSHCPGASSCHSGFRDMNCKCRCPESHKGDYCEVVALNFPGNLGNPDEQIRLKFDALDMEPFDTSSKKCLDYINIRAGGNLYYEGTDFCGNTLPPEIIADEIILSFHSDETNTNKGFHGTYTREKISALGLESYVIPDSSLTASSEFNADHGAKRGRLNLARVGDLRGGWNPMDNDANPWIQVDLLDLYRIISVATQGRQDLDQWVNSYKLAWSTDGTTFRTVQDIPGPGADKIFIGNVDRNTIMTNTLPVSQVCRYFRLMPVSWYKHISVRMEIYGYGEGPVTGQYEN \n
SPU_026547	SPU_026547	none	#\nThere may be an assembly problem with this model since part of the protease domain is repeated.  IN this model the order of domains is partial astacin protease, cub, cub, then what is probably the beginning of the protein - N-terminal signal peptide, activation domain, astacin protease, cub, cub.  \n
SPU_018198	SPU_018198	none	Matches the Lysosomal trafficking regulator from rat along the entire coding sequence.  Conservation very high at 3' end.  Tiling experiment indicates high expression in embryos.\n
SPU_016045	SPU_016045	none	One of 2. SPU_004024 is an exact duplicate of this protein, although 04024 is shorter and missing the N terminus.\n
SPU_015349	SPU_015349	none	the N terminal region in particular matches AMPK-like, while the C terminus does not BLAST strongly\n
SPU_017949	SPU_017949	none	One of 2. An almost perfect duplicate of SPU_009559. This protein is longer, appears to contain the true N terminus and an exon missing from 09559.\n
SPU_009878	SPU_009878	none	one of 2. Non-identical duplicate of SPU_003844\n
SPU_023875	SPU_023875	none	One of 4. Non-identical duplicate of SPU_000442, 23876, 08085.\n
SPU_005676	SPU_005676	none	SPU_004836 is a partial duplcate prediction.\n
SPU_004836	SPU_004836	none	SPU_004836 is a partial duplcate prediction for SPU_005676.\n
SPU_009559	SPU_009559	none	one of 2. THis is an almost-perfect duplicate of SPU_017949. This protein is missing the N terminus and an internal exon, but otherwise is an exact match.\n
SPU_026779	SPU_026779	none	Partial sequence identical and included in SPU_024526\n
SPU_000442	SPU_000442	none	One of 5. Non-identical duplicate of SPU_023875, 23876, 08085, 19751\n
SPU_017487	SPU_017487	none	One of 2. This protein appears to be a shortened version of SPU_005613, which has a much longer N terminus. This protein (17487) has a slightly longer C terminus.\n
SPU_017903	SPU_017903	none	First half completely predicted. Last half of the gene missing. SPU_028480 is a partial duplicate prediction.\n
SPU_005613	SPU_005613	none	One of 2. SPU_017487 overlaps C-terminus and is nearly identical. 17487 is slightly longer at C-terminus, but this protein (05613) is considerably longer in the N terminus.\n
SPU_009980	SPU_009980	none	- shows comparable homology to vertebrate terminal deoxyribonucleotidyltransferase (TdT) and vertebrate polymerase mu \n- one intron in was skipped by GLEAN3 prediction\n
SPU_026447	SPU_026447	none	PREDICTED: similar to fragile X mental retardation gene 1, \nautosomal homolog [Strongylocentrotus purpuratus].\n
SPU_023876	SPU_023876	none	One of 3. This one is the longest, and is non-identical to either SPU_000442 or SPU_023875\n
SPU_008085	SPU_008085	none	One of 5. Non-identical duplicates of 00442, 19751 and 23875. Nearly identical (and internal) to 23876, although the C terminus of this protein diverges. \n
SPU_019009	SPU_019009	none	Strongylocentrotus purpuratus mRNA for SuDp98 protein Length=3650 \n
SpRag2L	SPU_030091	none	This gene has been verified by Race and RT-PCR.  It is expressed at low levels in early gastrula embryos, adult coelomocytes, and other adult tissues.  Though it has only low sequence identity with vertebrate Rag2 it is predicted to have the same structure and is encoded in reverse  orientation downstream of SpRag1L (a Rag1-like gene). \n
SPU_001138	SPU_001138	none	Pfam00194 match.  \n \nTranscriptome data indicate that it is expressed in the embryo. \n \nA family of carbonic anhydrase-like proteins exists in the sea urchin. Orthologous relationships with vertebrate carbonic anhydrases and related proteins have not been carefully examined.\n
SPU_010623	SPU_010623	none	SPU_010623 is a partial duplicate prediction for SPU_027004.\n
SPU_027004	SPU_027004	none	SPU_010623 is a partial duplicate prediction for SPU_027004.\n
SPU_012518	SPU_012518	none	pfam00194 match.  \n \nTranscriptome data indicate that it is expressed in the embryo. \n \nA family of carbonic anhydrase-like proteins exists in the sea urchin. Orthologous relationships with vertebrate carbonic anhydrases and related proteins have not been carefully examined.\n
SPU_024809	SPU_024809	none	Pfam00194 match.   \n \nTranscriptome data indicates that it is expressed in embryo. \n \nA family of carbonic anhydrase-like proteins exists in sea urchin. Orthologous relationships with vertebrate carbonic anhydrases and related proteins have not been carefully examined.\n
SPU_014262	SPU_014262	none	these other Glean3 sequences had high similarity to endonuclease-reverse transcriptase: SPU_024197, SPU_026145, SPU_002879 \n
SPU_024197	SPU_024197	none	these other Glean3 sequences also had high similarity to endonuclease-reverse transcriptase: SPU_014262, SPU_026145, SPU_002879\n
SPU_002879	SPU_002879	none	these other Glean3 sequences also have high similarity to endonuclease-reverse transcriptase: SPU_014262, SPU_024197, SPU_026145\n
SPU_008844	SPU_008844	none	This blasts to PPEF1, but phylogenetic analysis showed that it was a homologue of human PPEF2.  SPU_011860 is likely the identical protein.\n
SPU_022254	SPU_022254	none	This sequence was a partial sequence of SPU_019367 and has been modified to include the sequence from SPU_019367\n
SPU_011655	SPU_011655	none	domains LDLa - CCP x4 - EGFCa x3 - Ig - SEA 7TM_2 \n \nNo GPS but otherwise looks like a member of the LNB-7TM family of adhesion domain GPCRs\n
SPU_015198	SPU_015198	none	Lyx4- EGFCa- Lyx2-LDLa- EGFCax2-Ig 7TM_2 \n \nNo GPS but otherwise looks like a bit like a member of the LNB-7TM family of adhesion domain GPCRs \nNo known LDLa or LY members of LNB7TM GPCR family \nNovel architecture\n
SPU_023185	SPU_023185	none	EGFCa x13-Ig 7TM_2 \n \nNo GPS but otherwise looks like a member of the LNB-7TM family of adhesion domain GPCRs\n
SPU_026721	SPU_026721	none	EGFCax3-Ig 7TM_2 \n \nNo GPS but otherwise looks like a member of the LNB-7TM family of adhesion domain GPCRs\n
SPU_007507	SPU_007507	none	NIDO and VWD domains characteristic of mucins\n
SPU_007661	SPU_007661	none	NIDO and VWD domains characteristic of mucins\n
SPU_018197	SPU_018197	none	NIDO, IPT, AMOP, VWD,CCP - LOOKS LIKE A MUCIN but also has hyalin repeats\n
SPU_020968	SPU_020968	none	NIDO and VWD domains characteristic of mucins\n
SPU_025062	SPU_025062	none	NIDO, VWD AND EGF_CA TM - very similar structure to mucin4d of chickens\n
SPU_005955	SPU_005955	none	DSL and EGFCa domains characteristic of Notch ligands \nGene structure looks like partial duplication - assembly problems?\n
SPU_011976	SPU_011976	none	DSL and EGFCa domains characteristic of Notch ligands \nGene structure looks like partial duplication - assembly problems?\n
SPU_013510	SPU_013510	none	DSL and EGFCa domains characteristic of Notch ligands \nGene structure looks like partial duplication - assembly problems? \nAlso very short - fragment?\n
SPU_013646	SPU_013646	none	DSL and EGFCa domains characteristic of Notch ligands \nRather short - fragment?\n
SPU_016194	SPU_016194	none	contains no domains\n
SPU_010547	SPU_010547	none	DSL and EGFCa domains characteristic of Notch ligands\n
SPU_021193	SPU_021193	none	 This sequence roots the veritbrate clade containing both ADAM-TS16 and ADAM_TS18 genes\n
SPU_016016	SPU_016016	none	DSL and EGFCa domains characteristic of Notch ligands\n
SPU_021044	SPU_021044	none	DSL and EGFCa domains characteristic of Notch ligands \n \nGene structure looks like a duplication - assembly problems? \n
SPU_025985	SPU_025985	none	#\nDSL and EGFCa domains characteristic of Notch ligands\n
SPU_000680	SPU_000680	none	the sequence roots the vertibrate clade containing both ADAM-TS6 and ADAM-TS10\n
SPU_018098	SPU_018098	none	contains a reprolysin domain and adams spacer\n
SPU_016423	SPU_016423	none	similar to gamma-interferon inducible lysosomal thiol reductase (GILT) - vertebrate GILT cleaves disulfide bonds in proteins and is involved in MHC class II-restricted antigen processing.   \n
SPU_027456	SPU_027456	none	contains only part of the reprolysin domain\n
SPU_008756	SPU_008756	none	This gene roost the vertibrate clade containing both ADAM-TS7 and ADAM-TS12.  \n
SPU_014521	SPU_014521	none	appears to be a hapoltype, but is lacking a portion of the sequence.\n
SPU_003170	SPU_003170	none	This is part of a sea urchin specific group of ADAM-TS genes.  There are two worm ADAMTS genes with some sequence similarity (wormbase# F08C6.1a.1 and C02B4.1). may be a haplotype.\n
SPU_004710	SPU_004710	none	This gene roots the clade of AdamTS2 and ADAMTS3\n
SPU_010597	SPU_010597	none	contains metalloprotease and reprolysis domains which are the first parts of an ADAM-TS gene.  \n
SPU_004088	SPU_004088	none	only domain it contains is the reprolysin domain\n
SPU_018171	SPU_018171	none	haplotype found\n
SPU_018228	SPU_018228	none	Similar to Tyrosine-protein phosphatase 10D precursor.  Receptor-linked protein-tyrosine phosphatase 10D.  Partial Sequence\n
SPU_026428	SPU_026428	none	Blasts to Ppm1h, but is not homologous to human Ppm1h in PP2C subfamily tree.\n
SPU_003844	SPU_003844	none	One of 2. Non-identical duplicate of SPU_009878\n
SPU_014936	SPU_014936	none	Also BLASTs to XP_783422.1 (it's a tie). One of 2, identical duplicate of SPU_022395\n
SPU_022395	SPU_022395	none	Also BLASTs to XP_783422.1 (it's a tie). One of 2, identical duplicate of SPU_014936\n
SPU_004024	SPU_004024	none	One of 2. This is a shorter, identical duplicate of SPU_016045. This lacks the N terminus\n
SPU_025169	SPU_025169	none	One of 2. SPU_024676 is an identical duplicate in most of the N terminal portion (although the extreme N termini diverge); however the C terminal portions are divergent. The divergences are not due to frame shifts. This GLEAN appears to contain a start codon, unlike 23676.\n
SPU_024676	SPU_024676	none	One of 2. SPU_025169 is an identical duplicate in most of the N terminal portion (although the extreme N termini diverge); however the C terminal portions are divergent. The divergences are not due to frame shifts. This GLEAN does not appear to contain a start codon, unlike 25169.\n
SPU_001928	SPU_001928	none	This is the closest genbank match to the published hyalin clone. That sequence was incomplete, and the hyalin repeats appear in many genes so it is unclear whether this is the "authentic" hyalin, or whether there is a family of matrix proteins expressed in embryos.  The identity between the cloned gene and this glean model covers much of the model with missing Hyalin repeats at the N terminus.\n
SPU_016839	SPU_016839	none	One of 3. Partially overlaps with 08255, which is identical in the N terminal part of the overlapping sequence, but divergent in the C terminal part. Also overlaps (in a distinct region) with 08254, which is identical and entirely contained in 16839. Note that 08254 and 08255 do NOT overlap.\n
SPU_008254	SPU_008254	none	This protein is identical and completely internal to SPU_016839.\n
SPU_008255	SPU_008255	none	Partially duplicated by SPU_016839. The overlapping region is identical in the N terminal part, but divergent in the C terminal part, protein and nucleotide.\n
SPU_004671	SPU_004671	none	Non-identical to other IKKs: GLEANs 16839, 08254, 08255. SPU_011356 is a shorter, internal identical duplicate, although 11356 seems to contain a spurious stretch of amino acids (see that sequence).\n
SPU_007638	SPU_007638	none	One of 2. SPU_027909 is an almost identical duplicate \n
SPU_027909	SPU_027909	none	One of 2. SPU_007638 is an almost identical duplicate\n
SPU_000053	SPU_000053	none	52 hyalin repeats, 10 EGF repeats plus 6 other exons.  incomplete gene at end of scaffold 1258.  Scores at high level in tiling experiment against embryos.  Placed 5th in hyalin family because its match against hyalin1 is in HYR domains and is significant but intermittant.\n
SPU_008700	SPU_008700	none	SPU_011110 is a partial sequence with an  exact match to this sequence\n
SPU_011110	SPU_011110	none	This is a partial sequence that is an exact match with SPU_008700 \n
SPU_002435	SPU_002435	none	Similar to Sidekick 2. Partial sequence. \nSee putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137536477-1679-116181527840.BLASTQ4\n
SPU_019632	SPU_019632	none	This is the closest match to Sp-hyalin1. It matches at HYR's.  It has 18 hyalin repeats and 9 EGF repeats in 27 exons.  It spans the complete scaffold and is incomplete\n
SPU_028412	SPU_028412	none	One of 2. SPU_016505 is a nearly identical duplicate\n
SPU_000806	SPU_000806	none	possibly an FGF-R, probably a RTK\n
SPU_003870	SPU_003870	none	This gene model doesn't have a TIR domain. But the nucleotides encoding SP, NT, LRR(12-22), CT, TM has 90% identity to another Sp-Tlr gene(SPU_020741) and it is located at the end of a contig.  So it could be a member of Toll-like receptor family. \n
SPU_002779	SPU_002779	none	#\nsee also Sp-HSP70(3)A \nFirst described by, \nAUTHORS   Foltz,K.R., Partin,J.S. and Lennarz,W.J. \n  TITLE     Sea urchin egg receptor for sperm: sequence similarity of binding \n            domain and hsp70 \n  JOURNAL   Science 259 (5100), 1421-1425 (1993)\n
SPU_022798	SPU_022798	none	The BLAST hit is quite weak and only picks up the STK domain. \n
SPU_002418	SPU_002418	none	One of 2. SPU_006947 is an identical duplicate, but is missing the N term. THis protein appears to contain the start codon.\n
SPU_025210	SPU_025210	none	Duplicate prediction for SPU_007944.\n
SPU_025819	SPU_025819	none	Duplicate prediction for SPU_026605\n
SPU_009909	SPU_009909	none	Similar to SpRag1L (SPU_027600), a sea urchin Rag1-like gene.  One of the more complete matches.  Probably a pseudogene.  Matches Rag1 core region, but c-terminal matching (SpRag1L 753-862) is attached to N-terminal (As for SPU_014069). Region of match is SpRag1L: 399-862, ~41% AA identity.   \n
SPU_023673	SPU_023673	none	This Glean3 sequence appears to be a duplication; SPU_023674 and SPU_023672 are on the same scaffold and also match to endonuclease reverse transcriptase\n
SPU_023674	SPU_023674	none	This sequence appears to be a duplication; SPU_023672 and SPU_023673 are on the same scaffold and also match to endonuclease reverse transcriptase\n
SPU_023672	SPU_023672	none	This sequence appears to be a duplication; SPU_023674 and SPU_023673 are on the same scaffold and also match to endonuclease reverse transcriptase\n
SPU_015136	SPU_015136	none	Similar to SpRag1L (SPU_027600), a sea urchin Rag1-like gene.  One of the more complete matches.  Probably a pseudogene.  Matches Rag1 core region. Region of match is SpRag1L: 607-977, 38% AA identity.   \n
SPU_009908	SPU_009908	none	Similar to SpRag1L (SPU_027600), a sea urchin Rag1-like gene. Probably a pseudogene.  Matches Rag1 core region. Region of match is SpRag1L: 557-647, 37% AA identity.   \n
SPU_026698	SPU_026698	none	Similar to portion of Sp-Rag1L (SPU_027600). Probably a pseudogene in cobination with transposase.  Matches Sp-Rag1L N-terminal putative Zn-binding region: AA 7-105, 32%.  \n
SPU_024394	SPU_024394	none	One of 2. SPU_024395 is almost identical, but slightly shorter.\n
SPU_020853	SPU_020853	none	Similar to UBXD1.\n
SPU_012427	SPU_012427	none	Similar to UBXD1.\n
SPU_024395	SPU_024395	none	One of 2, GLEAN_24394 is almost identical but is slightly longer\n
SPU_007508	SPU_007508	none	possibly an SNF-1 like serine threonine kinase\n
SPU_014978	SPU_014978	none	Similar to VEGF.\n
SPU_027566	SPU_027566	none	similar to protein tyrosine phosphatase, receptor type, D isoform 2 precursor\n
SPU_020281	SPU_020281	none	Similar to Tyrosine-protein phosphatase, non-receptor type 1/2 (Protein-tyrosine phosphatase 1B) (PTP-1B), partial \n
SPU_018356	SPU_018356	none	has 1 cub domain and 25 HYR domains each as a distinct exon.  Low to no expression in embryos.  Likely to be a complete gene.  Similar to Sp-hyalin1 due to homology of hyalin repeats.  \n
SPU_019751	SPU_019751	none	One of 5. Non-identical duplicate of SPU_000442, 08085, 23876. Identical to and completely internal to SPU_023875\n
SPU_025766	SPU_025766	none	Similar to Receptor-type tyrosine-protein phosphatase mu precursor.  Duplicates\n
SPU_021590	SPU_021590	none	Bucentaur contains a LINE repeat sequence in some species\n
Sp-Tlr001	SPU_030092	none	Partial Toll-like receptor. The nucleotids encoding CT, TM, TIR have 96% identity to another Sp-Tlr gene(08963). This gene model occupies entire sequence of a short scaffold. \n
Sp-Tlr002	SPU_030093	none	Partial Toll-like receptor. The nucleotids encoding CT, TM and TIR have 98% identity to another Sp-Tlr gene (24205). This gene model is located at the end of a short scaffold. \n
SPU_023993	SPU_023993	none	#\nDuplicate prediction for SPU_006932\n
Sp-Tlr210	SPU_030094	none	Partial Toll-like receptor. The nucleotids encoding CT, TM and partial TIR have 98% identity to another Sp-Tlr gene(21936). This gene model occupies entire sequence of a short scaffold. \n
SPU_023091	SPU_023091	none	Partial sequence of a prickle protein. The sequence presents homology with the SPU_023090 but is not identical.\n
SPU_005447	SPU_005447	none	there's an internal duplication in the predicted protein, which is most likely the assembly problem\n
SPU_001586	SPU_001586	none	probably missing part of carboxy end\n
Sp-Tlr183	SPU_030095	none	Partial Toll-like receptor. The nucleotids encoding CT, TM and partial TIR have 95% identity to another Sp-Tlr gene (19834). This gene model is located at the end of a short scaffold. \n
SPU_018199	SPU_018199	none	Except the first 50 aa, this sequence is also contained in SPU_005021 however with further interspersed sequences. \n
SPU_004983	SPU_004983	none	5 prime and 3 prime wrongly predicted \n5 prime is on Scaffold80207, no Glean model, my own genscan analysis identifies a 169_aa gene (exon) (pred. 8 on Scaffold80207): \n 8.00 Prom +  96623  96662   40                              -5.75 \n 8.01 Init +  96914  97145  232  1  1   71  111   260 0.725  25.07 \n 8.02 Intr +  98822  98985  164  0  2   28   25   110 0.308  -2.53 \n 8.03 Term + 101288 101401  114  1  0   79   42   218 0.969  13.89 \n 8.04 PlyA + 101438 101443    6                               1.05 \n \n>Scaffold80207|GENSCAN_predicted_peptide_8|169_aa \nMYRAVIYTIFVGLVCLDSVVEYGVEARRNGRKRNRNPGAGDVLSASGGDVVKVRPTPRRP \nQIPLKAEVQPPHSRGVPGVQNWAQCQRLVVQLQVDAEAMRNSSNLSRQKYHFVEINLIRK \nTYGTEQGDNHLVIIYFIVLSRLIIESIRFDDRMRSNNAERCDEQCRAGR \n \n>Scaffold80207|GENSCAN_predicted_CDS_8|510_bp \natgtaccgtgcagtaatttacaccatcttcgtgggcctggtgtgcctggacagcgtggtt \ngagtacggagtcgaagctcgcaggaatggaagaaagaggaacaggaatcctggagcaggg \ngatgttttatctgcatccggtggtgatgttgtcaaggtgagaccgacaccaagaaggcct \ncagattccactcaaagccgaggtacagcccccacattcaagaggtgttccaggggtgcaa \naattgggctcaatgtcaacgactggtagtgcaattacaagttgacgccgaggctatgcgt \naattcgagcaatttgtcgcgtcaaaaatatcactttgtcgaaataaatttgataaggaaa \nacttatggtaccgagcaaggggataaccacttagtgataatctactttattgtcctttca \ncgactaatcatcgagtctatacgatttgatgaccgaatgcggtcgaacaatgccgagcgg \ntgtgacgaacaatgccgagcgggccgctaa \n \n \n
SPU_005021	SPU_005021	none	#\nThis sequence contains most of the SPU_018199 \n
Sp-TlrP41	SPU_030096	none	Partial Toll-like receptor. The nucleotides encoding LRR(9-19), CT, TM and partial TIR have 87% identity to another Sp-Tlr gene(SPU_007850). This gene model occupies entire sequence of a short scaffold. \n
Sp-TlrP42	SPU_030097	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding SP, LRR(3-9) have 97% identity to another Sp-Tlr gene(11537). This gene model at the end of a short scaffold. \n
Sp-TlrP43	SPU_030098	none	#\nPartial Toll-like receptor does not have a TIR domain. The nucleotides encoding SP, NT, LRR(11-22), CT, TM have 87% identity to another Sp-Tlr gene(07850). This gene model at the end of a scaffold. \n
SPU_022562	SPU_022562	none	Missing one (or more) exons at the beginning.\n
Sp-TlrP44	SPU_030099	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding SP, NT, LRR(0-2) have 87% identity to another Sp-Tlr gene(15066). This gene model at the end of a scaffold. \n
Sp-TlrP45	SPU_030100	none	#\nPartial Toll-like receptor does not have a TIR domain. The nucleotides encoding LRR(5-9) have 84% identity to another Sp-Tlr gene(25312). This gene model at the end of a short scaffold. \n
Sp-TlrP46	SPU_030101	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding LRR (0-10) have 87% identity to another Sp-Tlr gene(15066). This gene model occupies entire sequence of a short scaffold. \n
Sp-TlrP47	SPU_030102	none	#\nPartial Toll-like receptor does not have a TIR domain. The nucleotides encoding LRR (1-6) have 89% identity to another Sp-Tlr gene(18519). This gene model occupies entire sequence of a short scaffold. \n
SPU_020306	SPU_020306	none	May have one exon too many at the 3'-end.\n
SPU_013377	SPU_013377	none	Using HsSERPINE2 in BLASTP search of the GLEAN3 models yielded 15 genes. Each was analyzed by SMART and the top 12 hits were found to have a SERPIN domain and were annonated Sp-serpin-like 3 through 12 (2 were already annotated by A.Cameron). BLASTP searches of the GLEAN3 models with other vertebrate Serpins yielded the same top 12 hits.\n
Sp-TlrP48	SPU_030103	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding LRR (6-10) have 87% identity to another Sp-Tlr gene(11541). This gene model occupies entire sequence of a short scaffold. \n
SPU_013378	SPU_013378	none	#\nUsing HsSERPINE2 in BLASTP search of the GLEAN3 models yielded 15 genes. Each was analyzed by SMART and the top 12 hits were found to have a SERPIN domain and were annonated Sp-serpin-like 3 through 12 (2 were already annotated by A.Cameron). BLASTP searches of the GLEAN3 models with other vertebrate Serpins yielded the same top 12 hits.\n
SPU_018632	SPU_018632	none	Using HsSERPINE2 in BLASTP search of the GLEAN3 models yielded 15 genes. Each was analyzed by SMART and the top 12 hits were found to have a SERPIN domain and were annonated Sp-serpin-like 3 through 12 (2 were already annotated by A.Cameron). BLASTP searches of the GLEAN3 models with other vertebrate Serpins yielded the same top 12 hits.\n
Sp-Tlr114	SPU_030104	none	This gene model could be caused by assembly error. The nucleotids encoding LRR (9-17) have more than 99.5% identity to another Sp-Tlr gene(21936). This gene model occupies entire sequence of a short scaffold. \n
Sp-TlrP49	SPU_030105	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding LRR (5-9), CT, TM have 91% identity to another Sp-Tlr gene(15066). This gene model occupies entire sequence of a short scaffold. \n
SPU_024263	SPU_024263	none	#\nUsing HsSERPINE2 in BLASTP search of the GLEAN3 models yielded 15 genes. Each was analyzed by SMART and the top 12 hits were found to have a SERPIN domain and were annonated Sp-serpin-like 3 through 12 (2 were already annotated by A.Cameron). BLASTP searches of the GLEAN3 models with other vertebrate Serpins yielded the same top 12 hits.\n
Sp-TlrP50	SPU_030106	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding SP, NT, LRR(5-15) have 98% identity to another Sp-Tlr gene(05950). This gene model is located at the end of a contig.\n
Sp-TlrP51	SPU_030107	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding LRR (12-20) have 86% identity to another Sp-Tlr gene(11537). This gene model is located at the end of a scaffold.\n
SPU_014670	SPU_014670	none	The SPU_014670 prediction apparently missed exons 5-18, present in other gene models, coding for the highly conserved catalytic domain of synaptojanin.\n
Sp-TlrP52	SPU_030108	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding LRR (5-12) have 93% identity to another Sp-Tlr gene(00199). This gene model is located at the end of a scaffold.. \n
Sp-TlrP53	SPU_030109	none	Partial Toll-like receptor. The nucleotides encoding SP, NT, LRR(16-24), CT, TM and partial TIR have 90% identity to another Sp-Tlr gene(06164). This gene model is located at the end of a scaffold. \n
Sp-TlrP54	SPU_030110	none	#\nPartial Toll-like receptor. The nucleotides encoding CT, TM and partial TIR have 93% identity to another Sp-Tlr gene (13536). This gene model occupies entire sequence of a short scaffold. \n
Sp-TlrP55	SPU_030111	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding SP, NT, LRR (3) have 87% identity to another Sp-Tlr gene(20741). This gene model is located at the end of a scaffold. And it may represent a pseudogene or contain stop codons. \n
Sp-TlrP56	SPU_030112	none	#\nPartial Toll-like receptor does not have a TIR domain. The nucleotides encoding SP, NT, LRR (3) have 87% identity to another Sp-Tlr gene(24960). This gene model is located at the end of a scaffold. \n
Sp-TlrP57	SPU_030113	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding  LRR (8-16), CT, TM have 88% identity to another Sp-Tlr gene(16536). This gene model is located at the end of a scaffold. \n
Sp-TlrP58	SPU_030114	none	#\nPartial Toll-like receptor does not have a TIR domain. The nucleotides encoding  SP, NT, LRR (13-22), CT, TM have 92% identity to another Sp-Tlr gene(20741). This gene model is located at the end of a scaffold. \n
SPU_008658	SPU_008658	none	Pfam00194 match.  Partial gene.  \n \nTranscriptome data indicate that it is expressed in the embryo. \n \nA family of carbonic anhydrase-like proteins exists in the sea urchin. Orthologous relationships with vertebrate carbonic anhydrases and related proteins have not been carefully examined.\n
SPU_013459	SPU_013459	none	Pfam00194 match. \n \nA family of carbonic anhydrase-like proteins exists in the sea urchin. Orthologous relationships with vertebrate carbonic anhydrases and related proteins have not been carefully examined.\n
SPU_025722	SPU_025722	none	Pfam00194 match.   \n \nA family of carbonic anhydrase-like proteins exists in the sea urchin. Orthologous relationships with vertebrate carbonic anhydrases and related proteins have not been carefully examined.\n
SPU_013458	SPU_013458	none	Partial gene. Pfam00194 match.  \n \nA family of carbonic anhydrase-like proteins exists in the sea urchin. Orthologous relationships with vertebrate carbonic anhydrases and related proteins have not been carefully examined.\n
SPU_010471	SPU_010471	none	Pfam00194 match. \n \nPretty strong similarity to PMC EST (accession no.DN577792). \n \nA family of carbonic anhydrase-like proteins exists in the sea urchin. Orthologous relationships with vertebrate carbonic anhydrases and related proteins have not been carefully examined.\n
Sp-TlrP59	SPU_030115	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding  SP, NT, LRR (13-21), CT, TM have 88% identity to another Sp-Tlr gene(20741). This gene model is located at the end of a scaffold. \n
Sp-TlrP60	SPU_030116	none	#\nPartial Toll-like receptor does not have a TIR domain. The nucleotides encoding  LRR (2), CT, TM have 96% identity to another Sp-Tlr gene(08278). This gene model occupies entire sequence of a short scaffold. \n
Sp-TlrP61	SPU_030117	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding SP, LRR (5) have 98% identity to another Sp-Tlr gene(24208). This gene model occupies entire sequence of a short scaffold and may represent pseudogene or cotain a sequence error. \n
Sp-TlrP62	SPU_030118	none	#\nPartial Toll-like receptor does not have a TIR domain. The nucleotides encoding SP, NT, LRR (0-7) have 89% identity to another Sp-Tlr gene(14352). This gene model is located at the end of a contig. \n
Sp-TlrP63	SPU_030119	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding LRR (5-6) have 89% identity to another Sp-Tlr gene(03419). This gene model occupies entire sequence of a short scaffold. \n
Sp-TlrP64	SPU_030120	none	#\nPartial Toll-like receptor does not have a TIR domain. The nucleotides encoding NT, LRR (4-7) have 89% identity to another Sp-Tlr gene(09435). This gene model is at the end of a scaffold. \n
Sp-TlrP65	SPU_030121	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding SP, NT, LRR (5) have 87% identity to another Sp-Tlr gene(14352). This gene model is at the end of a short scaffold. \n
Sp-TlrP66	SPU_030122	none	#\nPartial Toll-like receptor does not have a TIR domain. The nucleotides encoding LRR (5), CT, TM have 89% identity to another Sp-Tlr gene(03419). This gene model occupies entire sequence of a short scaffold. \n
Sp-TlrP67	SPU_030123	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding LRR (5-6), CT, TM have 94% identity to another Sp-Tlr gene(09435). This gene model occupies entire sequence of a short scaffold. \n
Sp-TlrP68	SPU_030124	none	#\nPartial Toll-like receptor does not have a TIR domain. The nucleotides encoding SP, NT, LRR(8-14) have 90% identity to another Sp-Tlr gene(14352). This gene model occupies entire sequence of a short scaffold. \n
Sp-TlrP69	SPU_030125	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding NT, LRR(4-9) have 89% identity to another Sp-Tlr gene(14352). This gene model occupies entire sequence of a short scaffold. \n
Sp-TlrP70	SPU_030126	none	#\nPartial Toll-like receptor does not have a TIR domain. The nucleotides encoding SP, NT, LRR(12-21) have 90% identity to another Sp-Tlr gene(14352). This gene model is located at the end of a contig and may represent a pseudogene or contain sequence error. \n
Sp-TlrP71	SPU_030127	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding LRR(6-7), CT, TM, TIR(partial) have 91% identity to another Sp-Tlr gene(09435). This gene model occupies entire sequence of a short scaffold and may represent a pseudogene or contain sequence error. \n
Sp-TlrP72	SPU_030128	none	#\nPartial Toll-like receptor does not have a TIR domain. The nucleotides encoding SP, NT, LRR(14-23) have 87% identity to another Sp-Tlr gene(09435). This gene model is located at the end of a short scaffold. \n
Sp-TlrP73	SPU_030129	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding LRR(5), CT, TM have 91% identity to another Sp-Tlr gene(03419). This gene model is located at the end of a short scaffold.\n
Sp-TlrP74	SPU_030130	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding LRR(4-6), CT, TM have 92% identity to another Sp-Tlr gene(21225). This gene model occupies entire sequence of a short scaffold. \n
SPU_003171	SPU_003171	none	LY4/CUB/SR/ZP\n
SPU_004061	SPU_004061	none	NIDO/EGF/ZP\n
SPU_004611	SPU_004611	none	LY2/MAM/ZP\n
SPU_005270	SPU_005270	none	ZP/CCP3\n
SPU_013342	SPU_013342	none	SR/ZP\n
SPU_014213	SPU_014213	none	EGFCa3/ZP\n
SPU_016300	SPU_016300	none	CCP14/ZP\n
SPU_016840	SPU_016840	none	EGF2/ZP\n
SPU_018648	SPU_018648	none	CUB4/ZP\n
SPU_022873	SPU_022873	none	EGFCa/ZP\n
SPU_022889	SPU_022889	none	EGFCa/ZP\n
SPU_024217	SPU_024217	none	CCP/CLECT/CCP2/ZP\n
SPU_026587	SPU_026587	none	EGF/ZP\n
SPU_027535	SPU_027535	none	LY/ZP\n
SPU_028276	SPU_028276	none	CUB6/ZP\n
SPU_028843	SPU_028843	none	CCP11/ZP\n
SPU_028844	SPU_028844	none	CCP3/ZP\n
Sp-TlrP75	SPU_030131	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding LRR(5-7), CT, TM have 91% identity to another Sp-Tlr gene(21225). This gene model occupies entire sequence of a short scaffold. \n
SPU_020943	SPU_020943	none	This sequence is identical to the one deposited in GenBank as NP_999657 (derived from RefSeq NM_214492), except for an 18-amino acid (54 nucleotide) gap between the predicted initiator methionine (i.e. residue 1) and the second predicted amino acid residue in the GLEAN sequence.  This sequence (CCTCGAGAAATTATTACCTTACAGCTAGGACAATGTGGGAACCAGATTGGGATG in the RefSeq entry) is not detected in the Baylor DNA sequence dataset by BLASTP or TBLASTN. \n \nAnnotation entered by Bob Obar (robar@scientist.com).\n
Sp-TlrP76	SPU_030132	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding SP, NT, LRR(15-23) have 90% identity to another Sp-Tlr gene(14548). This gene model is located at the end of a short scaffold. \n
Sp-TlrP77	SPU_030133	none	#\nPartial Toll-like receptor does not have a TIR domain. The nucleotides encoding SP, NT, LRR(15-23) have 91% identity to another Sp-Tlr gene(14548). This gene model is located at the end of a short scaffold. \n
Sp-TlrP78	SPU_030134	none	Partial Toll-like receptor does not have a TIR domain. The nucleotides encoding SP, NT, LRR(4-6) have 90% identity to another Sp-Tlr gene(27162). This gene model is located at the end of a short scaffold. \n
Sp-TlrP79	SPU_030135	none	#\nPartial Toll-like receptor does not have a TIR domain. The nucleotides encoding SP, LRR(2-4) have 93% identity to another Sp-Tlr gene(26274). This gene model is located at the end of a short scaffold. \n
SPU_000444	SPU_000444	none	 fragment\n
SPU_000472	SPU_000472	none	 fragment\n
SPU_000591	SPU_000591	none	 fragment\n
SPU_000592	SPU_000592	none	 fragment\n
SPU_000769	SPU_000769	none	 fragment\n
SPU_000777	SPU_000777	none	 fragment\n
SPU_000791	SPU_000791	none	 fragment\n
SPU_000820	SPU_000820	none	 fragment\n
SPU_000859	SPU_000859	none	 fragment\n
SPU_000901	SPU_000901	none	 fragment\n
SPU_001023	SPU_001023	none	 fragment\n
SPU_001036	SPU_001036	none	 fragment\n
SPU_001064	SPU_001064	none	 fragment\n
SPU_001286	SPU_001286	none	 fragment\n
SPU_001350	SPU_001350	none	 fragment\n
SPU_001364	SPU_001364	none	 fragment\n
SPU_001417	SPU_001417	none	 fragment\n
SPU_001453	SPU_001453	none	 fragment\n
SPU_001766	SPU_001766	none	 fragment, should join with SPU_001767, still incomplete gene\n
SPU_001767	SPU_001767	none	 fragment, should join with SPU_001766, still incomplete gene\n
SPU_001770	SPU_001770	none	 fragment\n
SPU_001828	SPU_001828	none	 partial\n
SPU_002544	SPU_002544	none	 fragment\n
SPU_002741	SPU_002741	none	 partial\n
SPU_002804	SPU_002804	none	 fragment\n
SPU_002907	SPU_002907	none	 fragment\n
SPU_002910	SPU_002910	none	 fragment\n
SPU_002996	SPU_002996	none	 fragment\n
SPU_003311	SPU_003311	none	 fragment\n
SPU_003331	SPU_003331	none	 fragment\n
SPU_003374	SPU_003374	none	 fragment, should join SPU_003375, still incomplete gene\n
SPU_003375	SPU_003375	none	 fragment, should join SPU_003374, still incomplete genes\n
SPU_003431	SPU_003431	none	 fragment\n
SPU_003443	SPU_003443	none	 fragment\n
SPU_003465	SPU_003465	none	 fragment\n
SPU_003510	SPU_003510	none	 fragment\n
SPU_003623	SPU_003623	none	 fragment\n
SPU_003625	SPU_003625	none	 partial; missing C-terminus\n
SPU_003631	SPU_003631	none	 fragment\n
SPU_003663	SPU_003663	none	 fragment\n
SPU_003729	SPU_003729	none	 fragment\n
SPU_003784	SPU_003784	none	 fragment\n
SPU_003886	SPU_003886	none	 partial, missing N- and C-terminus\n
SPU_003977	SPU_003977	none	 fragment\n
SPU_003983	SPU_003983	none	 fragment\n
SPU_004226	SPU_004226	none	 fragment\n
SPU_004450	SPU_004450	none	 fragment\n
SPU_004499	SPU_004499	none	 fragment\n
SPU_004567	SPU_004567	none	 fragment\n
SPU_004621	SPU_004621	none	 fragment\n
SPU_004633	SPU_004633	none	 fragment\n
SPU_004708	SPU_004708	none	 fragment\n
SPU_004725	SPU_004725	none	 extra C-terminus\n
SPU_004743	SPU_004743	none	 fragment\n
SPU_004815	SPU_004815	none	 insertions\n
SPU_004816	SPU_004816	none	 insertions\n
SPU_004911	SPU_004911	none	 fragment\n
SPU_004930	SPU_004930	none	 fragment\n
SPU_004935	SPU_004935	none	 fragment\n
SPU_005047	SPU_005047	none	 fragment\n
SPU_005183	SPU_005183	none	 partial, missing C-terminus region\n
SPU_005249	SPU_005249	none	 fragment\n
SPU_005344	SPU_005344	none	 fragment\n
SPU_005357	SPU_005357	none	 fragment\n
SPU_005370	SPU_005370	none	 fragment, has extra residues on C-terminus\n
SPU_005625	SPU_005625	none	 fragment\n
SPU_005717	SPU_005717	none	 fragment\n
SPU_005754	SPU_005754	none	 fragment, missing stretch in middle\n
SPU_005803	SPU_005803	none	 small fragment\n
SPU_005995	SPU_005995	none	 fragment\n
SPU_006257	SPU_006257	none	 fragment\n
SPU_006323	SPU_006323	none	 small fragment\n
SPU_006521	SPU_006521	none	 fragment\n
SPU_006556	SPU_006556	none	 fragment\n
SPU_006698	SPU_006698	none	 small fragment\n
SPU_006779	SPU_006779	none	 fragment\n
SPU_006870	SPU_006870	none	 fragment\n
SPU_006887	SPU_006887	none	 small fragment\n
SPU_007045	SPU_007045	none	 fragment\n
SPU_007321	SPU_007321	none	 fragment, extra N-terminus residues\n
SPU_007657	SPU_007657	none	 fragment\n
SPU_007739	SPU_007739	none	 fragment\n
SPU_007748	SPU_007748	none	 fragment\n
SPU_007853	SPU_007853	none	 fragment\n
SPU_008032	SPU_008032	none	 fragment\n
SPU_008069	SPU_008069	none	 fragment\n
SPU_008130	SPU_008130	none	 fragment\n
SPU_008288	SPU_008288	none	 fragment\n
SPU_008308	SPU_008308	none	 fragment\n
SPU_008320	SPU_008320	none	 fragment\n
SPU_008372	SPU_008372	none	 fragment\n
SPU_008459	SPU_008459	none	 fragment\n
SPU_008685	SPU_008685	none	 fragment\n
SPU_008734	SPU_008734	none	 fragment\n
SPU_008748	SPU_008748	none	 fragment\n
SPU_008759	SPU_008759	none	 fragment\n
SPU_009056	SPU_009056	none	 fragment, extra mismiatch stretch on C-terminus\n
SPU_009158	SPU_009158	none	 fragment\n
SPU_009281	SPU_009281	none	 small fragment\n
SPU_009733	SPU_009733	none	 fragment\n
SPU_009844	SPU_009844	none	 fragment\n
SPU_010234	SPU_010234	none	 fragment\n
SPU_010289	SPU_010289	none	 fragment\n
SPU_011052	SPU_011052	none	 fragment\n
SPU_011215	SPU_011215	none	 fragment\n
SPU_011390	SPU_011390	none	 partial, missing C-terminus half\n
SPU_011435	SPU_011435	none	 fragment\n
SPU_011587	SPU_011587	none	 fragment\n
SPU_011791	SPU_011791	none	 fragment\n
SPU_012316	SPU_012316	none	 fragment\n
SPU_012600	SPU_012600	none	 partial, missing C-terminus half\n
SPU_012832	SPU_012832	none	 fragment\n
SPU_012941	SPU_012941	none	 fragment\n
SPU_012945	SPU_012945	none	 fragment\n
SPU_012947	SPU_012947	none	 fragment\n
SPU_013036	SPU_013036	none	 fragment\n
SPU_013051	SPU_013051	none	 fragment\n
SPU_013215	SPU_013215	none	 fragment\n
SPU_013473	SPU_013473	none	 fragment\n
SPU_013529	SPU_013529	none	 fragment\n
SPU_013546	SPU_013546	none	 small fragment\n
SPU_013637	SPU_013637	none	 fragment\n
SPU_013678	SPU_013678	none	 fragment\n
SPU_014043	SPU_014043	none	 fragment\n
SPU_014856	SPU_014856	none	 fragment\n
SPU_015031	SPU_015031	none	 fragment\n
SPU_015060	SPU_015060	none	 fragment\n
SPU_015061	SPU_015061	none	 fragment\n
SPU_015095	SPU_015095	none	 fragment\n
SPU_015100	SPU_015100	none	 fragment\n
SPU_015101	SPU_015101	none	 fragment\n
SPU_015194	SPU_015194	none	 missing N-terminus, extra C-terminus\n
SPU_015476	SPU_015476	none	 partial, missing most of the C-terminus\n
SPU_015492	SPU_015492	none	 fragment\n
SPU_015571	SPU_015571	none	 fragment\n
SPU_016020	SPU_016020	none	 fragment\n
SPU_016176	SPU_016176	none	 partial, missing C-terminus\n
SPU_016633	SPU_016633	none	 fragment\n
SPU_016636	SPU_016636	none	 fragment\n
SPU_016901	SPU_016901	none	 fragment\n
SPU_016917	SPU_016917	none	 small fragment\n
SPU_017139	SPU_017139	none	 fragment, unmatched residues on C-terminus\n
SPU_017143	SPU_017143	none	 fragment\n
SPU_017170	SPU_017170	none	 fragment\n
SPU_017174	SPU_017174	none	 missing N- and C-terminus residues\n
SPU_017608	SPU_017608	none	 fragment\n
SPU_017696	SPU_017696	none	 partial, missing C-terminus half\n
SPU_017703	SPU_017703	none	 fragment\n
SPU_018027	SPU_018027	none	 fragment, unmatched stretch of aminoacids on N-terminus\n
SPU_018092	SPU_018092	none	 fragment\n
SPU_018766	SPU_018766	none	 missing some aminoacid stretches in middle\n
SPU_018767	SPU_018767	none	 extra stretch of aminoacids in middle\n
SPU_018929	SPU_018929	none	 fragment\n
SPU_019020	SPU_019020	none	 fragment, should join SPU_019027, still missing the N-terminus region\n
SPU_019027	SPU_019027	none	 fragment; should join SPU_019020, still missing the N-terminus region\n
SPU_019298	SPU_019298	none	 fragment\n
SPU_019402	SPU_019402	none	 fragment\n
SPU_019780	SPU_019780	none	 fragment\n
SPU_020174	SPU_020174	none	 fragment\n
SPU_022715	SPU_022715	none	 fragment\n
SPU_022752	SPU_022752	none	 fragment\n
SPU_022980	SPU_022980	none	 fragment\n
SPU_023129	SPU_023129	none	 small fragment\n
SPU_023227	SPU_023227	none	 small fragment\n
SPU_023266	SPU_023266	none	 small fragment\n
SPU_023323	SPU_023323	none	 fragment\n
SPU_023515	SPU_023515	none	 fragment\n
SPU_023680	SPU_023680	none	 fragment\n
SPU_023772	SPU_023772	none	 small fragment\n
SPU_023841	SPU_023841	none	 small fargment\n
SPU_024227	SPU_024227	none	 fragment\n
SPU_024229	SPU_024229	none	 fragment\n
SPU_024313	SPU_024313	none	 fragment\n
SPU_024396	SPU_024396	none	 fragment\n
SPU_024430	SPU_024430	none	 fragment\n
SPU_024541	SPU_024541	none	 fragment\n
SPU_024557	SPU_024557	none	 fragment\n
SPU_025087	SPU_025087	none	 fragment\n
SPU_025224	SPU_025224	none	 fragment\n
SPU_025596	SPU_025596	none	 fragment\n
SPU_026372	SPU_026372	none	 fragment\n
SPU_026682	SPU_026682	none	 fragment\n
SPU_026700	SPU_026700	none	 fragment\n
SPU_026839	SPU_026839	none	 fragment\n
SPU_026988	SPU_026988	none	 fragment\n
SPU_026994	SPU_026994	none	 fragment\n
SPU_027071	SPU_027071	none	 fragment\n
SPU_027495	SPU_027495	none	 fragment\n
SPU_027591	SPU_027591	none	 partial, missing C-terminus half\n
SPU_027901	SPU_027901	none	 fragment\n
SPU_028013	SPU_028013	none	 fragment\n
SPU_028083	SPU_028083	none	 fragment\n
SPU_028129	SPU_028129	none	 fragment\n
SPU_028237	SPU_028237	none	 fragment\n
SPU_028307	SPU_028307	none	 fragment, missing stretch in middle\n
SPU_028454	SPU_028454	none	 fragment\n
SPU_028651	SPU_028651	none	 fragment\n
SPU_028709	SPU_028709	none	 fragment\n
SPU_001454	SPU_001454	none	21 CADH repeats but no TM or Cyto domain-probably a partial-could it be hitched up to SPU_001452-CLEARLY A CADHERIN BUT CLASS UNCLEAR\n
SPU_002742	SPU_002742	none	11 cadh REPEATS BUT NO TM OR CYTO DOMAIN-CLEARLY A CADHERIN BUT CLASS UNCLEAR\n
SPU_003730	SPU_003730	none	4 CADH + TM AND CYTO BUT NO CAT-BD-NON-CLASSICAL CADHERIN of the vertebrate type\n
SPU_004074	SPU_004074	none	Single CADH domain-nothing else-obviously a cadherin fragment.\n
SPU_004556	SPU_004556	none	2 CADH domains-nothing else-obviously a cadherin fragment.\n
SPU_005228	SPU_005228	none	Single CADH domain followed by a LAMG domain-looks like a fly cadherin\n
SPU_008299	SPU_008299	none	5 CADH domains and a TM domain-probable cadherin fragment of vertebrate type\n
SPU_008380	SPU_008380	none	Single CADH domain and a TM domain-probably a cadherin fragment\n
SPU_009073	SPU_009073	none	12 CADH domains and a set of EGF and LAMG repeats before a TM domain.  Cytoplasmic domain present-but no catenin-binding domain-probable non-classical cadherin of the fly type.\n
SPU_010840	SPU_010840	none	13 CADH domains and EGF/LAMG/EGF_ probable cadherin fragment of the fly type.\n
SPU_011375	SPU_011375	none	8 CADH domains-no TM-probable cadherin fragment.\n
SPU_013323	SPU_013323	none	8 CADH domains-no TM-probable cadherin fragment.\n
SPU_014606	SPU_014606	none	6 CADH repeats and TM-cyto domain present but w/o catenin-binding domain-probable non-classical cadherin of the vertebrate type\n
SPU_015210	SPU_015210	none	6 CADH domains-no TM-probable cadherin fragment\n
SPU_016980	SPU_016980	none	"20 CADH repeats, single EGF and TM-non-classical cadherin of the fly type"\n
SPU_017039	SPU_017039	none	"3 CADH domains, 2 EGF and LAMG-no TM or cytoplasmic domain-probable cadherin fragment of the fly type"\n
SPU_019394	SPU_019394	none	11 CADH repeats and TM-cyto domain present but w/o catenin-binding domain-probable non-classical cadherin\n
SPU_019783	SPU_019783	none	"partial classical cadherin of the fly type, no CADH repeats but has classical cadherin cyto domain"\n
SPU_023005	SPU_023005	none	Single CADH domain-nothing else-obviously a cadherin fragment.\n
SPU_025622	SPU_025622	none	7 CADHs plus TM and cytoplasmic domain but no catenin-binding site-probable non-classical cadherin of the vertebrate type-previously reported as homolog of protocadherin 9\n
SPU_026595	SPU_026595	none	Single CADH domain-nothing else-obviously a cadherin fragment\n
SPU_027133	SPU_027133	none	"LOOKS A BIT LIKE FLAMINGO (SEVERAL TM DOMAINS-BUT NO GPS DOMAIN), FLAMINGO/CELSR subfamily"\n
SPU_027356	SPU_027356	none	25 CADH PLUS EGF/LAMG/EGF AND TM-cyto domain present but w/o catenin-binding domain-probable non-classical cadherin of the fly type\n
SPU_028328	SPU_028328	none	3 CADH domains-nothing else-obviously a cadherin fragment.\n
SPU_000958	SPU_000958	none	collagen fragment\n
SPU_004531	SPU_004531	none	collagen fragment\n
SPU_005187	SPU_005187	none	collagen fragment\n
SPU_006067	SPU_006067	none	collagen fragment\n
SPU_007582	SPU_007582	none	collagen fragment\n
SPU_011736	SPU_011736	none	collagen fragment\n
SPU_012707	SPU_012707	none	collagen fragment\n
SPU_013354	SPU_013354	none	collagen fragment\n
SPU_014619	SPU_014619	none	adjacent fragment of a fibrillar collagen\n
SPU_017571	SPU_017571	none	large fragment with a few collagen repeats-could be an N-terminal pro piece\n
SPU_021235	SPU_021235	none	NOVEL COLLAGEN ARCHITECTURE-FUSION??\n
SPU_022116	SPU_022116	none	possible relative of human col24a1\n
SPU_022882	SPU_022882	none	collagen fragment\n
SPU_022896	SPU_022896	none	collagen fragment\n
SPU_022936	SPU_022936	none	possible relative of col3a1\n
SPU_023283	SPU_023283	none	possible relative of col9a3\n
SPU_025369	SPU_025369	none	collagen fragment\n
SPU_026786	SPU_026786	none	collagen fragment\n
SPU_027250	SPU_027250	none	collagen fragment\n
SPU_013557	SPU_013557	none	"C-terminal fragment of a fibrillar collagen, possible relative of col27a1"\n
SPU_014618	SPU_014618	none	C-terminal fragment of a fibrillar collagen-adjacent gene 14619 contains another fragment\n
SPU_017791	SPU_017791	none	N-terminal fragment of a fibrillar collagen\n
SPU_028613	SPU_028613	none	C terminus of fibrillar collagen\n
SPU_005167	SPU_005167	none	fibrillar collagen of the I/II/III subclass-partial_lacks C-terminus\n
SPU_026008	SPU_026008	none	fibrillar collagen of the I/II/III subclass -appears complete\n
SPU_026009	SPU_026009	none	fibrillar collagen of the I/II/III subclass -appears complete\n
SPU_009076	SPU_009076	none	Fibrillar collagen of the V/XI type-PROBABLY COMPLETE\n
SPU_003768	SPU_003768	none	type IV collagen-could be complete\n
SPU_015708	SPU_015708	none	C-terminal fragment of a type IV collagen\n
SPU_000142	SPU_000142	none	collagen XV/XVIII-partial-lacks N-terminal TSPN/LamG domain\n
SPU_000691	SPU_000691	none	NOVEL ARCHITECTURE - CLECT and TSP1 domains alternating plus FA58C and FTP-"LINK" at C-terminus \n \n- a bit similar to SPU_019437 and SPU_005426\n
SPU_005426	SPU_005426	none	NOVEL ARCHITECTURE - CLECT and TSP1 domains alternating plus EGFCA repeats at N-terminus and FA58C and "LINK"/PANAP at C-terminus \n \n- a bit similar to SPU_019437 and SPU_000691\n
SPU_001768	SPU_001768	none	"PUTATIVE LAM G CHAIN; has LamNT, LamB"\n
SPU_006118	SPU_006118	none	"partial LAM A OR G CHAIN, has LamB"\n
SPU_006558	SPU_006558	none	"Looks complete-has lamB domain-so most like lam g chain has LamNT, LamB"\n
SPU_007555	SPU_007555	none	"partial LAM A OR G CHAIN, has LamB"\n
SPU_009846	SPU_009846	none	"partial LAM A OR G CHAIN, has LamB"\n
SPU_014257	SPU_014257	none	"putative laminin fragment, LamB only, best blast hits are proteoglycan"\n
SPU_015482	SPU_015482	none	Looks complete-has NO lamB domain-so most like lam b chain has LamNT\n
SPU_020192	SPU_020192	none	"PUTATIVE LAM A CHAIN; has LamNT, LamB, missing C-terminus"\n
SPU_022929	SPU_022929	none	"LAM A1/2 CHAIN; has LamNT, LamB, LamG, missing C-terminus"\n
SPU_026039	SPU_026039	none	"LAM A3/5 CHAIN; has LamNT, LamB, LamG, looks complete  "\n
SPU_027389	SPU_027389	none	"putative laminin fragment-lam a or g, has LamB"\n
SPU_025057	SPU_025057	none	has LamNT-looks incomplete-could be a laminin or a netrin\n
SPU_026322	SPU_026322	none	has LamNT-looks incomplete-could be a laminin or a netrin\n
SPU_028368	SPU_028368	none	has LamNT-Looks incomplete-could be a laminin or a netrin\n
SPU_006557	SPU_006557	none	has LamNT-looks incomplete-could be a laminin or a netrin\n
SPU_000176	SPU_000176	none	novel architecture-no known proteins with this composition/organization\n
SPU_001328	SPU_001328	none	"has a reeler domain and an EGF, may belong with reelin genes"\n
SPU_002367	SPU_002367	none	"just a reeler domain, may belong with reelin genes"\n
SPU_007070	SPU_007070	none	novel membrane protein with reeler domain and multiple EGF repeats-no known proteins with this composition/organization\n
SPU_011038	SPU_011038	none	"just a reeler domain, may belong with reelin genes"\n
SPU_012092	SPU_012092	none	"just a reeler domain, may belong with reelin genes"\n
SPU_013071	SPU_013071	none	"just a reeler domain, may belong with reelin genes"\n
SPU_014572	SPU_014572	none	Enormous protein with multiple EGF and EGFCA domains-N-terminal Reeler and a CUB domain near C-terminus. No known proteins with this composition/organization\n
SPU_015603	SPU_015603	none	"has a reeler domain and two EGFs, may belong with reelin genes"\n
SPU_015604	SPU_015604	none	"has a reeler domain and an EGF, may belong with reelin genes"\n
SPU_016091	SPU_016091	none	reeler domain plus DoH catecholamine-binding domain\n
SPU_016188	SPU_016188	none	"has a reeler domain and an EGF, may belong with reelin genes"\n
SPU_016612	SPU_016612	none	"just a reeler domain, may belong with reelin genes"\n
SPU_024087	SPU_024087	none	has a reeler domain but also a RING domain-probably not really related to reeler-novel domain combination\n
SPU_026165	SPU_026165	none	has a reeler domain but also a large block of repetitive simple sequencs and two CCP domains-probably not really related to reeler\n
SPU_026222	SPU_026222	none	"just a reeler domain, may belong with reelin genes-note that adjacent gene very similar"\n
SPU_026223	SPU_026223	none	"just a reeler domain, may belong with reelin genes-not that adjacent gene very similar"\n
SPU_026550	SPU_026550	none	"enormous protein with reeler domain, CUB domain and multiple internal EGF repeats-looks like assembly problem"\n
SPU_013829	SPU_013829	none	large protein with few defined domains-2CCP at one end-SEA/EGF at the other\n
SPU_016171	SPU_016171	none	"SEA, LDLa and CUB domains-novel architecture"\n
SPU_021919	SPU_021919	none	CCP and SEA domains- novel architecture\n
SPU_022753	SPU_022753	none	CCP and SEA domains- novel architecture\n
SPU_019795	SPU_019795	none	two sea domains-there is a human protein-interphotoreceptor proteoglycan-with a similar structure\n
SPU_001785	SPU_001785	none	hyalin protein with SEA domain\n
SPU_002355	SPU_002355	none	very large protein with many hyalin repeats and SEA domain and EGF repeats near the C-terminus \n \nexpressed during embryonic development\n
SPU_003635	SPU_003635	none	"hyalin repeat protein with some other domains interspersed-SEA, VWD, CUB, LDLa"\n
SPU_004945	SPU_004945	none	SEA/HYR/CLECT - short hyalin protein\n
SPU_009594	SPU_009594	none	has Spond_N and TSP1 domains-realted to vertebrate spondins\n
SPU_020379	SPU_020379	none	has Spond_N and TSP1 domains-realted to vertebrate spondins\n
SPU_013393	SPU_013393	none	Annotation entered by Bob Obar (robar@scientist.com). \nThe epsilon-tubulin protein family is not yet a coherent one, and it is impossible at this time to determine whether the lack of similarity observed between the amino-terminal ~100 amino acids of this sequence and the corresponding segments of other epsilon-tubulin database entries is due to divergence or error.\n
SPU_003119	SPU_003119	none	See putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137536744-28277-70510656271.BLASTQ4\n
SPU_002663	SPU_002663	none	Annotation entered by Bob Obar (robar@scientist.com). \nThis GLEAN originally represented an amino-terminal segment of an epsilon-tubulin encoded by Scaffold498.  The gene model was later manually extended using sequence from Scaffoldi3903, which appears to encode the entire epsilon-tubulin polypeptide.\n
SPU_012141	SPU_012141	none	Annotation entered by Bob Obar (robar@scientist.com). \nThis GLEAN represents an amino-terminal segment of an epsilon-tubulin.  Except for 6 codons near the amino terminus of each, it is identical to SPU_002663.\n
SPU_000178	SPU_000178	none	See putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137535187-7613-25495531689.BLASTQ4\n
SPU_022346	SPU_022346	none	Pfam00194 match. \n \nA family of carbonic anhydrase-like proteins exists in the sea urchin. Orthologous relationships with vertebrate carbonic anhydrases and related proteins have not been carefully examined.\n
SPU_026483	SPU_026483	none	Pfam00194 match. \n \nA family of carbonic anhydrase-like proteins exists in the sea urchin. Orthologous relationships with vertebrate carbonic anhydrases and related proteins have not been carefully examined.\n
SPU_016740	SPU_016740	none	Pfam00194 match.  Transcriptome data indicate that it is expressed in the embryo. \n \nA family of carbonic anhydrase-like proteins exists in the sea urchin. Orthologous relationships with vertebrate carbonic anhydrases and related proteins have not been carefully examined. \n \n
SPU_009509	SPU_009509	none	#\nPfam00194 match. \n \nA family of carbonic anhydrase-like proteins exists in the sea urchin. Orthologous relationships with vertebrate carbonic anhydrases and related proteins have not been carefully examined.\n
SPU_026747	SPU_026747	none	Pfam00194 match. \n \nA family of carbonic anhydrase-like proteins exists in the sea urchin. Orthologous relationships with vertebrate carbonic anhydrases and related proteins have not been carefully examined.\n
SPU_012995	SPU_012995	none	Pfam00194 match. Transcriptome data indicate it is expressed in the embryo. \n \nA family of carbonic anhydrase-like proteins exists in the sea urchin. Orthologous relationships with vertebrate carbonic anhydrases and related proteins have not been carefully examined.\n
SPU_000702	SPU_000702	none	Pfam00194 match. \n \nA family of carbonic anhydrase-like proteins exists in the sea urchin. Orthologous relationships with vertebrate carbonic anhydrases and related proteins have not been carefully examined.\n
SPU_008894	SPU_008894	none	Pfam00194 match.  Transcriptome data indicate that gene is expressed in the embryo. \n \nA family of carbonic anhydrase-like proteins exists in the sea urchin. Orthologous relationships with vertebrate carbonic anhydrases and related proteins have not been carefully examined.\n
SPU_006053	SPU_006053	none	gene model spans at least 3 glean predictions  \nSPU_006053 (N-ter) \nSPU_006054 (middle) located on the same scaffold but on opposite strand (assembly problem?) \nSPU_024289 (end) \n \n2 glean predictions are duplicates of some exons SPU_006612 (middle) and SPU_001692 (end)\n
SPU_016272	SPU_016272	none	There seems to be some extra predicted exons in the GLEAN prediction. NCBI GNOMON prediction seems to be more accurate (XP_797469) \n \nBest homology with vertebrate Alk6, but as the closely related vertebrate Alk3, doesn't seems to have its counterpart in the sea urchin, I called him Alk3-6.\n
SPU_014830	SPU_014830	none	This model contains exons identical to SPU_028742 and is either a duplication or an allele.\n
SPU_004951	SPU_004951	none	Partial Toll-like receptor. This gene model is located at the end of a short scaffold. The nucleotide sequence has 95% identity to another Sp-Tlr gene, so it could be a member of Toll-like receptor.\n
SPU_012005	SPU_012005	none	Possible Gene Prediction issue - it could be several concatenated proteins\n
SPU_025902	SPU_025902	none	Possible Gene Prediction issue - it could be several concatenated proteins\n
SPU_018488	SPU_018488	none	highly conserved 69% identical to the human protein\n
SPU_010750	SPU_010750	none	trypsin-like protease with SR and several FN3 repeats - novel architecture\n
SPU_000406	SPU_000406	none	Ig domains (9) and a few FN3 - may be a fragment\n
SPU_010007	SPU_010007	none	Ig3/FN3/Ig2 - may be a fragment\n
SPU_010124	SPU_010124	none	Ig3/FN3-2/Ig7 - may be a fragment\n
SPU_010125	SPU_010125	none	Ig13/FN3 - may be a fragment\n
SPU_012992	SPU_012992	none	Igc2-9/FN3-2 - may be a fragment \ndomain structure  (9-2) is consistent with Ds-CAM (9-4-1-2) missing C-terminus. \nBlast match to DCC is fifth after other better matches and domain structure not right for DCC (4-6)- probably not DCC/neogenin but similar molecule\n
SPU_025374	SPU_025374	none	novel architecture -SEMA and FN3\n
SPU_020387	SPU_020387	none	ROS-RELATED -single LY and several FN3 BUT lacks TM and kinase domains\n
SPU_027290	SPU_027290	none	FN3-TM-PPase - TM NOT PREDICTED - overlap with hits is limited\n
SPU_015923	SPU_015923	none	FN3-TM-Ppase - overlap with hits is limited\n
SPU_023943	SPU_023943	none	has Ig - EGF - SEVERAL FN3-GPS-7TM\n
SPU_025788	SPU_025788	none	has  9 FN3s - 4 EGFCas - 3 more FN3-GPS-TM \n \nApart from lack of 7tm_2 domain this matches to overall pattern of LNB-7TM-GPCRs \nMay be missing C-terminus\n
SPU_016363	SPU_016363	none	has mixture of FN3/EGFCa-GPS-7TM_2 \n \nmatches to overall pattern of LNB-7TM-GPCRs\n
SPU_009018	SPU_009018	none	novel architecture - LIPOXYGENASE (LH2) HOMOLOGIES FLANKING MULTIPLE FN3 REPEATS\n
SPU_021340	SPU_021340	none	LEUCINE-RICH/FN3 PROTEIN -NOVEL ARCHITECTURE - HAS TBC DOMAIN\n
SPU_000435	SPU_000435	none	myosin light chain kinase - structure predicted may suggest duplication or assembly problems\n
SPU_019190	SPU_019190	none	Best matches are with the Rho-GEF kinases, Duet,  but that gene has an additional N-terminal segment - missing RhoGEF, PH and SH3 domains\n
SPU_013917	SPU_013917	none	lots of Ig domains and a few FN3 plus a mixed-function kinase - good matches to projectin and twitchin - clearly a muscle-specifc structural kinase\n
SPU_028470	SPU_028470	none	Kallmann syndrome 1 homolog - WAP/FN3n/LDLa\n
SPU_004282	SPU_004282	none	Ig-FN3-2-BTB\n
SPU_026161	SPU_026161	none	Ig/FN3/Ig/FN3 alternating\n
SPU_000159	SPU_000159	none	hyalin protein - prediction is longer than cDNA sequences\n
SPU_017178	SPU_017178	none	FN3-SPRY - there are a couple of classes of proteins with that combination sometimes plus other domains no good Blast hits domain structure is very similar to GL3_28216 - a good homolog of tripartite motif proteins - suspect this is one of those\n
SPU_028216	SPU_028216	none	FN3-SPRY - best matches are to tripartite motif-contaning proteins from several species\n
SPU_023539	SPU_023539	none	FN3-SPRY - ALSO RING and BBOX - there are several chordate proteins with that combination - probably transcription factors - matches with five best hits do not include the RING domain but the domain structure of the matching segemnt is very similar to GL3_28216 - a good homolog of tripartite motif proteins - suspect this is one of those\n
SPU_027121	SPU_027121	none	"ECM or membrane adhesion protein - probably a fragment - many VWD/EGF/FN3 domains - somewhat homologous with human Fc gamma Ig-binding protein [AAD39266.1] and with zonaadhesins of pig and rabbit [NP_999548.1, AAF63342.2] but there are gaps"\n
SPU_022706	SPU_022706	none	"ECM or membrane adhesion protein - probably a fragment - many VWD and FN3 domains - somewhat homologous with human Fc gamma Ig-binding protein [AAD39266.1] and with human zonaadhesins [AAL04410.1, AAL04412.1, AAL04413.1] - but there are gaps"\n
SPU_003992	SPU_003992	none	CLECT/FN3/TM - MEMBER OF A LINKED CLUSTER OF THREE\n
SPU_003993	SPU_003993	none	CLECT/FN3-2/TM - MEMBER OF A LINKED CLUSTER OF THREE\n
SPU_003994	SPU_003994	none	CLECT-FN3-TM - MEMBER OF A LINKED CLUSTER OF THREE\n
SPU_002394	SPU_002394	none	CLECT-FN3-TM\n
SPU_005115	SPU_005115	none	CLECT-FN3-TM\n
SPU_012215	SPU_012215	none	CLECT-FN3-TM\n
SPU_023488	SPU_023488	none	CLECT-FN3-PANAP-TM -novel architecture\n
SPU_021126	SPU_021126	none	CLECT-FN3- noTM\n
SPU_008066	SPU_008066	none	CLECT-FN3- no TM\n
SPU_023487	SPU_023487	none	CLECT2-FBG-FN3-PANAP-TM -novel architecture\n
SPU_013668	SPU_013668	none	CLECT2-EGF3-FN3-TM\n
SPU_014375	SPU_014375	none	CLECT2-EGF3-FN3-TM\n
SPU_024360	SPU_024360	none	CLECT-EGF-FN3-TM\n
SPU_028703	SPU_028703	none	CLECT-EGF3-FN3-TM\n
SPU_019135	SPU_019135	none	CLECT-EGF2-FN3 - noTM\n
SPU_012463	SPU_012463	none	CLECT-EGF2-FN3 - no TM\n
SPU_000346	SPU_000346	none	CLECT/EGF2/FN3/TM - ALSO POSSIBLE PHOSPHATASE although PPase domain looks to be outside??\n
SPU_012855	SPU_012855	none	CLECT/FN3 alternating WITH BLOCK OF CUB DOMAINS IN MIDDLE -novel architecture\n
SPU_010216	SPU_010216	none	CLECT/FN3 alternating\n
SPU_027967	SPU_027967	none	ANK2/FN3/RA RA domains - looks like some sort of intracellular adaptor\n
SPU_018475	SPU_018475	none	Ig/FN3/TM - may be fragment - see adjacent gene\n
SPU_027951	SPU_027951	none	Ig3/FN3-2/TM - may be fragment - see adjacent gene\n
SPU_005899	SPU_005899	none	Ig2/FN3/TM - fragment\n
SPU_006614	SPU_006614	none	Ig2/FN3-3/TM -N-term half matches with Robo - Cterm half does not\n
SPU_008663	SPU_008663	none	Ig-4/FN3-4/TM -N-term half matches with Robo - Cterm half does not\n
SPU_010851	SPU_010851	none	Ig5/FN3-2/TM -looks like quite a good domain match for CDO\n
SPU_012530	SPU_012530	none	Ig/FN3/TM-fragment\n
SPU_015844	SPU_015844	none	Ig4/FN3/Ig/TM\n
SPU_015846	SPU_015846	none	Ig2/FN3/TM - fragment\n
SPU_024482	SPU_024482	none	Ig2/FN3/TM - fragment\n
SPU_024708	SPU_024708	none	Ig4/FN3-2/TM\n
SPU_003460	SPU_003460	none	Ig/FN3-3/Ig/FN3-2 /TM -may be fragment\n
SPU_009818	SPU_009818	none	Igc2-4/FN3-4/Igc2/FN3-2/TM \n \nthis does look like a Ds-CAM homolog - Ds-CAM has 9-4-1-2 arrangement of Ig/FN3 domains - this has 4-4-1-2 - suggests its missing the N-terminal 5 Ig domains \n \nNB - there are quite a few genes with 5 Ig repeats and nothing else - that might comprise the N-terminus \n(SPU_012351, SPU_015273, SPU_008772, SPU_007532, SPU_010002, SPU_013487, SPU_016889, SPU_000469, SPU_025431, SPU_025430, SPU_015584, SPU_009608, SPU_009024) \nNone of these is obviously adjacent - nor are any other Ig only genes from the numbering - needs browser work.\n
SPU_009318	SPU_009318	none	Ig6/FN3/TM -partial overlap only\n
SPU_015705	SPU_015705	none	Ig2/FN3/TM - MAY BE fragment\n
SPU_017022	SPU_017022	none	IG5/FN3/TM\n
SPU_018030	SPU_018030	none	Ig2/FN3/TM - may be a fragment\n
SPU_021725	SPU_021725	none	Ig5/FN3/TM - may be a fragment\n
SPU_024019	SPU_024019	none	Ig8/FN3/TM - may be a fragment\n
SPU_025975	SPU_025975	none	Ig3/FN6/TM - partial overlap only - does not have predicted TM and C-terminus (putative cyto domain) does not have Neogenin_C \nand is not homologous with DCC in Blast \n \nDomain sequence (3-6) is consistent with 4-6 arrangement in DCC/neogenin - would suggest it's missing N-terminal Ig domain \n
SPU_026725	SPU_026725	none	Ig4/FN3-2/TM\n
SPU_027415	SPU_027415	none	Ig2/FN3-3/TM\n
SPU_025966	SPU_025966	none	LRR repeats-Ig-EGF-FN3-TM\n
SPU_001538	SPU_001538	none	FN3/TM - probably a fragment\n
SPU_001957	SPU_001957	none	FN3/TM - probably a fragment\n
SPU_004072	SPU_004072	none	FN3-2/TM - probably a fragment\n
SPU_004128	SPU_004128	none	FN3-2/TM - probably a fragment\n
SPU_013954	SPU_013954	none	FN3/TM - probably a fragment\n
SPU_019460	SPU_019460	none	FN3/TM - probably a fragment\n
SPU_020590	SPU_020590	none	FN3-2/TM - probably a fragment\n
SPU_022063	SPU_022063	none	FN3-2/TM - probably a fragment\n
SPU_002657	SPU_002657	none	FN3-3/TM - probably a fragment\n
SPU_002763	SPU_002763	none	EGF3FN3-3/TM - no Ig (?) - probably missing C-terminus\n
SPU_004758	SPU_004758	none	FN3-2/TM - probably a fragment\n
SPU_007789	SPU_007789	none	FN3-3/TM - probably a fragment\n
SPU_011473	SPU_011473	none	FN3-8/TM\n
SPU_012998	SPU_012998	none	FN3-6/TM\n
SPU_015746	SPU_015746	none	FN3-5/TM\n
SPU_016936	SPU_016936	none	FN3-31/TM\n
SPU_017208	SPU_017208	none	FN3-14/TM\n
SPU_018070	SPU_018070	none	FN3-4/TM- probably a fragment\n
SPU_020335	SPU_020335	none	FN3-3/TM - probably a fragment\n
SPU_021244	SPU_021244	none	FN3-3/TM - probably a fragment\n
SPU_022585	SPU_022585	none	FN3-4/TM- probably a fragment\n
SPU_026349	SPU_026349	none	FN3-4/TM- probably a fragment\n
SPU_026991	SPU_026991	none	FN3-3/TM - probably a fragment\n
SPU_022586	SPU_022586	none	EGF/Ig/FN3-2/TM - probably missing both ends\n
SPU_019051	SPU_019051	none	EGF/FN3-2/TM - see adjacent gene\n
SPU_027686	SPU_027686	none	EGF/FN3/TM - see adjacent gene - missing kinase\n
SPU_000673	SPU_000673	none	EGF5/FN3-3/TM - probably fragment\n
SPU_002870	SPU_002870	none	EGF-3/FN3-2/TM - missing kinase\n
SPU_007983	SPU_007983	none	EGF2/FN3/TM - missing kinase\n
SPU_010587	SPU_010587	none	EGF/FN3/TM\n
SPU_011180	SPU_011180	none	EGF/FN3/TM\n
SPU_012810	SPU_012810	none	EGF/FN3-2/TM\n
SPU_013172	SPU_013172	none	EGF2/FN3/TM -partial overlap only - missing kinase\n
SPU_018153	SPU_018153	none	EGF2/FN3/TM- partial overlap only\n
SPU_023923	SPU_023923	none	EGF-2/FN3-3/TM - partial overlap only\n
SPU_004347	SPU_004347	none	EGF-2/FN3-3/TM - very partial overlap\n
SPU_009654	SPU_009654	none	EGF-3/FN3-2/TM - very partial overlap\n
SPU_011804	SPU_011804	none	EGF/FN3-3/TM - very partial overlap\n
SPU_014858	SPU_014858	none	EGF-2/FN3-3/TM -partial overlap - missing kinase\n
SPU_015381	SPU_015381	none	EGF/FN3-4/TM - partial overlap\n
SPU_016530	SPU_016530	none	EGF-2/FN3-3/TM - very partial overlap\n
SPU_021253	SPU_021253	none	adhesion receptor - FN3/EGF_Ca intermingled - single CCP - a bit similar in composition to FLJ00133 protein but domain organisation different - see adjacent gene - pfam gives some hyalin repeats\n
SPU_023398	SPU_023398	none	two FN3 - could be ECM or receptor - see adjacent genes\n
SPU_023400	SPU_023400	none	two FN3 - could be ECM or receptor - see adjacent genes\n
SPU_022518	SPU_022518	none	two FN3 - could be ECM or receptor - see adjacent gene\n
SPU_002470	SPU_002470	none	two FN3 - could be ECM or receptor\n
SPU_002531	SPU_002531	none	two FN3 - could be ECM or receptor\n
SPU_002801	SPU_002801	none	two FN3 - could be ECM or receptor\n
SPU_003777	SPU_003777	none	two FN3 - could be ECM or receptor\n
SPU_004756	SPU_004756	none	two FN3 - could be ECM or receptor\n
SPU_006052	SPU_006052	none	two FN3 - could be ECM or receptor\n
SPU_011337	SPU_011337	none	two FN3 - could be ECM or receptor\n
SPU_016143	SPU_016143	none	two FN3 - could be ECM or receptor\n
SPU_017760	SPU_017760	none	two FN3 - could be ECM or receptor\n
SPU_017979	SPU_017979	none	two FN3 - could be ECM or receptor\n
SPU_021268	SPU_021268	none	two FN3 - could be ECM or receptor\n
SPU_022616	SPU_022616	none	two FN3 - could be ECM or receptor\n
SPU_022764	SPU_022764	none	two FN3 - could be ECM or receptor\n
SPU_023304	SPU_023304	none	two FN3 - could be ECM or receptor\n
SPU_023502	SPU_023502	none	two FN3 - could be ECM or receptor\n
SPU_022519	SPU_022519	none	three FN3 - could be ECM or receptor - see adjacent gene\n
SPU_003995	SPU_003995	none	three FN3 - could be ECM or receptor\n
SPU_004950	SPU_004950	none	three FN3 - could be ECM or receptor\n
SPU_011397	SPU_011397	none	three FN3 - could be ECM or receptor\n
SPU_011648	SPU_011648	none	three FN3 - could be ECM or receptor\n
SPU_013667	SPU_013667	none	three FN3 - could be ECM or receptor\n
SPU_017929	SPU_017929	none	three FN3 - could be ECM or receptor\n
SPU_019818	SPU_019818	none	three FN3 - could be ECM or receptor\n
SPU_022314	SPU_022314	none	three FN3 - could be ECM or receptor\n
SPU_024061	SPU_024061	none	three FN3 - could be ECM or receptor\n
SPU_025664	SPU_025664	none	three FN3 - could be ECM or receptor\n
SPU_028131	SPU_028131	none	three FN3 - could be ECM or receptor\n
SPU_028665	SPU_028665	none	three FN3 - could be ECM or receptor\n
SPU_023737	SPU_023737	none	Ig2/FN3-3 - could be ECM or receptor - see adjacent gene\n
SPU_022705	SPU_022705	none	four FN3 - could be ECM or receptor - see adjacent gene\n
SPU_023738	SPU_023738	none	four FN3 - could be ECM or receptor - see adjacent gene\n
SPU_022580	SPU_022580	none	four FN3 - could be ECM or receptor\n
SPU_026676	SPU_026676	none	four FN3 - could be ECM or receptor\n
SPU_022535	SPU_022535	none	five FN3 - could be ECM or receptor\n
SPU_004309	SPU_004309	none	five FN3 - could be ECM or receptor\n
SPU_014850	SPU_014850	none	four FN3 - could be ECM or receptor\n
SPU_017765	SPU_017765	none	five FN3 - could be ECM or receptor\n
SPU_021252	SPU_021252	none	12 FN3 - could be ECM or receptor\n
SPU_018476	SPU_018476	none	three FN3 - could be ECM or receptor - see adjacent gene\n
SPU_021082	SPU_021082	none	three FN3 - could be ECM or receptor - see adjacent gene\n
SPU_017766	SPU_017766	none	three FN3 - could be ECM or receptor - last of several similar adjacent fragments\n
SPU_015008	SPU_015008	none	lots of EGF - a few FN3 repeats - one CUB domain - Blast hits with Notch but does not really look like Notch\n
SPU_000554	SPU_000554	none	Ig4/FN3-2 - weak match with NCAM\n
SPU_001159	SPU_001159	none	EGF/FN3-2\n
SPU_005508	SPU_005508	none	Ig6/DISIN/FN3-2/IG3/FN3-4 - patchy match with contactin\n
SPU_009039	SPU_009039	none	Ig/FN3-5 - weak match with neogenein\n
SPU_009757	SPU_009757	none	Ig5/FN3-4 - no TM - incomplete?\n
SPU_013497	SPU_013497	none	Ig/FN3-3\n
SPU_013927	SPU_013927	none	Ig/FN3-3\n
SPU_026307	SPU_026307	none	Ig4/FN3-2\n
SPU_028098	SPU_028098	none	Ig/FN3-4\n
SPU_028100	SPU_028100	none	Ig/FN3-5\n
SPU_024882	SPU_024882	none	Ig3/FN3  - see adjacent gene\n
SPU_027952	SPU_027952	none	Ig4/FN3  - see adjacent gene\n
SPU_004501	SPU_004501	none	Ig5/FN3- weak match with nephrin\n
SPU_004746	SPU_004746	none	Ig3/FN3 - weak match with NCAM\n
SPU_005900	SPU_005900	none	Ig3/FN3 - weak match with nephrin\n
SPU_006087	SPU_006087	none	Ig4/FN3\n
SPU_007323	SPU_007323	none	Ig4/FN3\n
SPU_008387	SPU_008387	none	Ig/FN3\n
SPU_008771	SPU_008771	none	Ig2/FN3\n
SPU_009571	SPU_009571	none	Ig/FN3\n
SPU_010291	SPU_010291	none	Ig2/FN3-2\n
SPU_014759	SPU_014759	none	Ig2/FN3\n
SPU_017488	SPU_017488	none	Ig5/FN3\n
SPU_017889	SPU_017889	none	Ig/FN3-2\n
SPU_021745	SPU_021745	none	Ig3/FN3\n
SPU_023843	SPU_023843	none	Ig2/FN3 - weak match with sidekick\n
SPU_024986	SPU_024986	none	Ig/FN3-2\n
SPU_026399	SPU_026399	none	EGF/Ig/EGF3/FN3-4\n
SPU_001092	SPU_001092	none	Ig4/FN3\n
SPU_021083	SPU_021083	none	four FN3 - could be ECM or receptor - see adjacent gene\n
SPU_022641	SPU_022641	none	four FN3 - could be ECM or receptor\n
SPU_015007	SPU_015007	none	FN3-3/EGF/FN3-7/EGF2 - possible TENASCIN - BUT NO FBG\n
SPU_001682	SPU_001682	none	FN3 domain\n
SPU_002469	SPU_002469	none	FN3 domain\n
SPU_009658	SPU_009658	none	FN3 domain\n
SPU_014031	SPU_014031	none	FN3 domain\n
SPU_014854	SPU_014854	none	FN3 domain\n
SPU_019478	SPU_019478	none	FN3 domain\n
SPU_020676	SPU_020676	none	FN3 domain\n
SPU_021011	SPU_021011	none	FN3 domain\n
SPU_025558	SPU_025558	none	FN3 domain\n
SPU_027203	SPU_027203	none	FN3 domain\n
SPU_002112	SPU_002112	none	FN3-18/EGF12 - possible TENASCIN - BUT no FBG\n
SPU_003659	SPU_003659	none	FN3-7/EGF2 - conceivably TENASCIN - BUT NO FBG\n
SPU_001994	SPU_001994	none	EGF/Ig/FN3\n
SPU_003814	SPU_003814	none	EGF/Ig/FN3\n
SPU_004830	SPU_004830	none	Ig/EGF2/FN3-4\n
SPU_008166	SPU_008166	none	EGF5/Ig5/FN3-2\n
SPU_016493	SPU_016493	none	Ig/EGF-2/FN3-2\n
SPU_023397	SPU_023397	none	FN3/EGF/FN3-2 - see adjacent genes\n
SPU_019052	SPU_019052	none	EGF/FN3 - see adjacent gene\n
SPU_024881	SPU_024881	none	EGF/FN3  - see adjacent gene\n
SPU_001907	SPU_001907	none	EGF3/FN3-3\n
SPU_003293	SPU_003293	none	EGF/FN3\n
SPU_003931	SPU_003931	none	EGF/FN3\n
SPU_013220	SPU_013220	none	EGF/FN3-3\n
SPU_019659	SPU_019659	none	EGF/FN3\n
SPU_021589	SPU_021589	none	EGF/FN3\n
SPU_025304	SPU_025304	none	EGF7/FN3\n
SPU_026416	SPU_026416	none	EGF2/FN3\n
SPU_003418	SPU_003418	none	EGF/FN3-2\n
SPU_004018	SPU_004018	none	EGF/FN3-3/EGF\n
SPU_005820	SPU_005820	none	EGF2/FN3\n
SPU_006514	SPU_006514	none	EGF/FN3\n
SPU_007791	SPU_007791	none	EGF2/FN3\n
SPU_009637	SPU_009637	none	EGF/FN3-3\n
SPU_011344	SPU_011344	none	EGF/FN3\n
SPU_023165	SPU_023165	none	EGF2/FN3-3\n
SPU_001463	SPU_001463	none	all FN3 - could be ECM or receptor - see adjacent gene\n
SPU_024778	SPU_024778	none	all FN3 - could be ECM or receptor - see adjacent gene\n
SPU_024777	SPU_024777	none	all FN3 - could be ECM or receptor  - see adjacent gene\n
SPU_000748	SPU_000748	none	all FN3 - could be ECM or receptor\n
SPU_000797	SPU_000797	none	all FN3 - could be ECM or receptor\n
SPU_001610	SPU_001610	none	all FN3 - could be ECM or receptor\n
SPU_001658	SPU_001658	none	all FN3 - could be ECM or receptor\n
SPU_002802	SPU_002802	none	all FN3 - could be ECM or receptor\n
SPU_002820	SPU_002820	none	all FN3 - could be ECM or receptor\n
SPU_003086	SPU_003086	none	all FN3 - could be ECM or receptor\n
SPU_003583	SPU_003583	none	all FN3 - could be ECM or receptor\n
SPU_005060	SPU_005060	none	all FN3 - could be ECM or receptor\n
SPU_005960	SPU_005960	none	all FN3 - could be ECM or receptor\n
SPU_007364	SPU_007364	none	all FN3 - could be ECM or receptor\n
SPU_009767	SPU_009767	none	all FN3 - could be ECM or receptor\n
SPU_009943	SPU_009943	none	all FN3 - could be ECM or receptor\n
SPU_010144	SPU_010144	none	all FN3 - could be ECM or receptor\n
SPU_010788	SPU_010788	none	all FN3 - could be ECM or receptor\n
SPU_011113	SPU_011113	none	all FN3 - could be ECM or receptor\n
SPU_014469	SPU_014469	none	all FN3 - could be ECM or receptor\n
SPU_015918	SPU_015918	none	all FN3 - could be ECM or receptor\n
SPU_015919	SPU_015919	none	all FN3 - could be ECM or receptor\n
SPU_017207	SPU_017207	none	all FN3 - could be ECM or receptor\n
SPU_017762	SPU_017762	none	all FN3 - could be ECM or receptor\n
SPU_017937	SPU_017937	none	all FN3 - could be ECM or receptor\n
SPU_028716	SPU_028716	none	all FN3 - could be ECM or receptor\n
SPU_010429	SPU_010429	none	VWC AND FN3 DOMAINS INTERMINGLED - NO TM PREDICTED BUT THERE ARE TM RECPTORS KNOWN WITH THESE TWO DOMAINS\n
SPU_009576	SPU_009576	none	all Ig - could be ECM or receptor\n
SPU_001222	SPU_001222	none	all FN3 - could be ECM or receptor\n
SPU_005015	SPU_005015	none	all FN3 - could be ECM or receptor\n
SPU_026321	SPU_026321	none	all FN3 - could be ECM or receptor\n
SPU_000997	SPU_000997	none	This is a second C3 gene in the sea urchin.  The encoded protein has a conserved thioester site and a single cleavage site to generate alpha and beta chains.  There is a histidine in the C-terminal direction that functions in substrate binding choice. \n \nThe gene model is missing the 5' end - about the first 130 amino acids.  However, Genboree shows an overlap with NCBI:prediction XM-775838.1 (scaffold 1499) that may contain the missing part of the gene.\n
SPU_005193	SPU_005193	none	The encoded protein has a thioester site and two cleavage sites.  The first to cleave the alpha and beta chains and the second to cleave the alpha and gamma chains.  This structure is typical of C4 proteins in mammals but also the C3 proteins in the cyclostomes.  \n
SPU_028445	SPU_028445	none	likely missing two exons on the C-terminus.\n
SPU_021668	SPU_021668	none	Annotation entered by Bob Obar (robar@scientist.com). \nThis is one of 4 tandem alpha-tubulin Gene Models (SPU_021667 - 21770).\n
SPU_021669	SPU_021669	none	Annotation entered by Bob Obar (robar@scientist.com). \nThis is one of 4 tandem alpha-tubulin Gene Models (SPU_021667 - 21770).\n
SPU_016746	SPU_016746	none	Annotation entered by Bob Obar (robar@scientist.com).\n
SPU_028221	SPU_028221	none	Annotation entered by Bob Obar (robar@scientist.com).\n
SPU_024615	SPU_024615	none	Annotation entered by Bob Obar (robar@scientist.com).  This Gene Model contains a full-length alpha-tubulin with a duplication of 125 amino acids near the amino terminus of the predicted protein.\n
SPU_012679	SPU_012679	none	Annotation entered by Bob Obar (robar@scientist.com).  This Gene Model contains a nearly full-length alpha-tubulin that is missing 15 amino acids at the amino terminus of the predicted protein.\n
SPU_027848	SPU_027848	none	This is the N terminal part of the protein; the C terminal portion is encoded by SPU_026498. These 2 gleans overlap (nucleotide level): bases 724-1066 (this  glean).  Probably the sequence after 1066 {TAGGATTATTGAGAAATCTTTAA} probably does not really belong to this gene\n
SPU_027380	SPU_027380	none	3' of CDS missing (TM domain)\n
SPU_005205	SPU_005205	none	similar to Phosphatidyl Serine Receptor (PSR), involved in phosphatidylserine-specific apoptotic cell clearance (e.g. macrophages engulfing apoptotic T cells) \n \n-model assembled/modified from SPU_005204 and SPU_005205\n
SPU_005885	SPU_005885	none	Partial sequence. \nSee putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137537340-17614-146655645239.BLASTQ1\n
SPU_016937	SPU_016937	none	The 5' end of the sequence annotated here is SPU_016936.  SPU_016936 sequences have been pasted in front of SPU_016937 sequences.  A portion of SPU_016936 at the boundaries of contig AAGJ01178647 and contig AAJ01178648 contains an identical repeat.  This is probably a genome assembly error and the duplicate sequence has been removed from the peptide sequence reported here (but not from the DNA sequence).   The deleted sequence in SPU_016936 is: \nITTGLYNDEMVTSSTTRNCSTTDCESFTVDFDTLNSGTLYTLYAGVVQSSGREVVPLLAKAATIPESAVDLQFTSIGRNYVVLTWDNPAGMIDSYNISYYPVNDITKLMFEVVQAAAESNVLRVDDLNEGMNYSFTVVSLLEVEADLQEMGAPVEVFAVVGVLGSLNITAFDETTMNIEWEQVDVED.\n
SPU_005602	SPU_005602	none	Partial sequence. \nSee putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137537204-5277-205811584566.BLASTQ4\n
SPU_006699	SPU_006699	none	Similarity to Chlamydomonas reinhardtii Inner_Dynein_Arm_1_Intermediate Chain IC140 (C_530081|166736|IA1-IC140|) and Homo sapiens axonemal dynein intermediate polypeptide 2.  Contains a WD-40 motifs.  The protein is also essentially identical to the Anthocidaris crassispina (gi|2494216|sp|Q16960|DYI3_ANTCR Dynein intermediate chain 3, ciliary).\n
SPU_012809	SPU_012809	none	Similarity to Chlamydomonas reinhardtii Inner_Dynein_Arm_1_Intermediate Chain IC140 (C_530081|166736|IA1-IC140|) and Homo sapiens testis development protein NYD-SP29 (NP_660155).\n
SPU_008502	SPU_008502	none	Similar to protein serine/threonine phosphatase 4 regulatory subunit 1.\n
SPU_026101	SPU_026101	none	Similar to PP4R1\n
SPU_015320	SPU_015320	none	This Gene Model is a close homolog of Chlamydomonas reinhardtii Inner Dynein Arm Light Chain p28 (IA-IC28, C_740003) and Homo sapiens axonemal dynein light chain (NP_003453).  It has a GenBank ID gi|1354084|gb|AAC47111.1| (axonemal dynein light chain p33).\n
SPU_018432	SPU_018432	none	Similar to Dentin sialophosphoprotein precursor (DMP-3).  Partial sequence.\n
SPU_017261	SPU_017261	none	Similar to Neurabin 1\n
SPU_000937	SPU_000937	none	This looks like the N-terminus of perlecan - with other parts of the gene in different gene predictions. \n \nSPU_012324/SPU_028620 BOTH look like the middle of the gene and they are duplications - SPU_012324 is longer   \n \nSPU_026338 looks like the C-terminus\n
SPU_000974	SPU_000974	none	This gene matches predicted gene products for coronin 1A   \n(gi|72123047|ref|XP_791382.1| PREDICTED: similar to coronin, actin binding protein, 1A [Strongylocentrotus purpuratus])\n
SPU_002812	SPU_002812	none	EGF,EGF_Lam and FAS1 domains and a TM domain \nClosest match in human is stabilin but the urchin gene lacks the LINK domain and has a somewhat different pattern of EGF/FAS1 domains\n
SPU_015419	SPU_015419	none	SPU_015419 model is part of predicted Sp-Pask. Other, non-overlapping part is in SPU_016604 on scaffold 70175.  FgeneshAB prediction S.P_Scaffold70175 may have additional exons, based on alignment with mammalian PAS-K.\n
SPU_003678	SPU_003678	none	two FAS1 domains - member of a family of genes with similar structures and sequences \n \nVirtually identical to SPU_020485 - latter has an additional  short segment in the middle(Exon??) \n \nAlso virtually identical to SPU_003676 - but there are some minor sequence differences\n
SPU_020485	SPU_020485	none	two FAS1 domains - probably a fragment \n \nVirtually identical to SPU_003678 - latter is missing a short segment in the middle \n
SPU_005198	SPU_005198	none	two FAS1 domains \n \nmember of a family of genes with similar structures and sequences\n
SPU_006345	SPU_006345	none	two Fas1 domains - member of a family of genes with similar structures and sequences \n \nNote that SPU_006346 has same structure and virtually same sequence\n
SPU_006346	SPU_006346	none	two Fas1 domains - member of a family of genes with similar structures and sequences \n \nNote that SPU_006345 has same structure and virtually same sequence\n
SPU_015670	SPU_015670	none	two Fas1 domains - member of a family of genes with similar structures and sequences\n
SPU_000070	SPU_000070	none	NIDO, AMOP, VWD,CCP - LOOKS LIKE A MUCIN\n
SPU_001942	SPU_001942	none	VWCs, two VWDs and a couple of EGFs - novel architecture\n
SPU_003277	SPU_003277	none	FOUR VWD -also TILand VWC -looks like a mucin\n
SPU_003376	SPU_003376	none	AMOP, VWD,CCP - LOOKS LIKE A MUCIN - MISSING N-TERMINAL NIDO\n
SPU_005406	SPU_005406	none	NIDO, AMOP, VWD AND EGF_CA TM - rather similar structure to mucin4d of chickens\n
SPU_009378	SPU_009378	none	LPD_N, VWD only - common structure for vitellogenin\n
SPU_009395	SPU_009395	none	NIDO/VWD - RATHER MUCIN-LIKE\n
SPU_013189	SPU_013189	none	three VWD domains, also TIL and VWC - looks like a mucin\n
SPU_013334	SPU_013334	none	NIDO, AMOP, VWD,CCP - LOOKS LIKE A MUCIN\n
SPU_015633	SPU_015633	none	NIDO, AMOP, VWD,CCP - LOOKS LIKE A MUCIN\n
SPU_016052	SPU_016052	none	LPD_N, VWD only - common structure for vitellogenin\n
SPU_020744	SPU_020744	none	enormous protein - his-rich domain at N-terminus, 3-4 VWD or TIL/VWC domains,  a run of LDLa, a long segment of low complexity and LDLa/FA58C at C-terminus - some compositional similarity with SCO-spondin but not same\n
SPU_021744	SPU_021744	none	FA58C and VWD with a couple of EGFs - novel architecture\n
SPU_027118	SPU_027118	none	VWC/VWD - two repeats -  note that adjacent gene (27119) has multiple repeats of the same kind\n
SPU_027119	SPU_027119	none	VWC/VWD - in multiple repeats - novel architecture - note that adjacent gene (27118) has similar composition\n
SPU_028683	SPU_028683	none	LPD_N, VWD, VWA only - vitellogenin in Anopheles contains extra VWA also\n
SPU_000538	SPU_000538	none	EGF x 5 - CCP - TM  \nNOVEL ARCHITECTURE\n
SPU_002986	SPU_002986	none	CCP-CLECT-CCP-EGF-EGF-TM \nNOVEL ARCHITECTURE \n
SPU_009610	SPU_009610	none	EGF-EGF-VWD - probably a fragment\n
SPU_000782	SPU_000782	none	VWD only - probably a fragment\n
SPU_016089	SPU_016089	none	EGF-VWD-EGF - probably a fragment\n
SPU_018155	SPU_018155	none	VWD-EGF - probably a fragment\n
SPU_020181	SPU_020181	none	VWF only - almost certainly a fragment\n
SPU_016222	SPU_016222	none	VWF only - almost certainly a fragment\n
SPU_028685	SPU_028685	none	large protein with a single VWD towards the C-terminus\n
SPU_017171	SPU_017171	none	FBG - EGF x2 - CLECT  - novel architecture \nno particularly informative Blast hits\n
SPU_023671	SPU_023671	none	novel architecture - several domains characteristic of adhesion proteins - FA58C, MAM, FBG, SR \nNo homologues but this is a known St.purp cDNA \nPancer,Z. Dynamic expression of multiple scavenger receptor cysteine-rich genes in coelomocytes of the purple sea urchin \nProc. Natl. Acad. Sci. U.S.A. 97 (24), 13156-13161 (2000)\n
SPU_021993	SPU_021993	none	EGF-FBG - similar to C-terminus of SPU_024020 which also has  a pfam:Nacht domain \n \nNovel architecture - FBG may imply role in innate immunity.\n
SPU_006084	SPU_006084	none	large protein with mixture of multiple CCP, EGFCa and HYR domains and one TSP1 domain at the C-terminus \nNovel architecture - shared with SPU_010445 and SPU_019944\n
SPU_010445	SPU_010445	none	large protein with mixture of multiple CCP, EGFCa and HYR domains and one TSP1 domain at the C-terminus \nNovel architecture - shared with SPU_006084 and SPU_019944\n
SPU_019944	SPU_019944	none	large protein with mixture of multiple CCP, EGFCa and HYR domains and one TSP1 domain at the C-terminus \nNovel architecture - shared with SPU_010445 and SPU_006084\n
SPU_004017	SPU_004017	none	NOVEL ARCHITECTURE - WAP-IG-KU-KU-C345C\n
SPU_006068	SPU_006068	none	novel architecture - intermingled SR and FU repeats followed by a series of Ig domains and a TM \nlooks like a fragment of the predicted gene UPI0000583F83 - similar to deleted in malignant brain tumors 1 isoform c precursor \n \ncompare SPU_024528 which is a more complete version of this gene\n
SPU_017202	SPU_017202	none	novel architecture - intermingled EGF/EGFCa/Igv/Igc2 domains followed by a set of VWC domains\n
SPU_022250	SPU_022250	none	looks like a good model of an LRR/Ig membrane receptor - has complete set of LRR with NT and CT domains followed by Igc2 and TM - there are quite a few receptors of this type in humans \n \nBEST MATCH IS LRIG receptors - some homology with Gp-V of platelets\n
SPU_000425	SPU_000425	none	looks like a pretty good model of an LRR/Ig membrane receptor - has complete set of LRR with NT and CT domains followed by Igc2 - no TM - there are quite a few receptors of this type in humans \n \n
SPU_005538	SPU_005538	none	#\nlooks like a pretty good model of an LRR/Ig membrane receptor - has a set of 7 LRR repeats (no NT) and CT domain followed by Igc2 and TM - there are quite a few receptors of this type in humans\n
SPU_000186	SPU_000186	none	looks like a good model of an LRR/Ig membrane receptor - has complete set of LRR with NT and CT domains followed by Igc2 FN3 and TM - there are quite a few receptors of this type in humans\n
SPU_012819	SPU_012819	none	looks like a pretty good model of an LRR/Ig membrane receptor - has a set of 5 LRR (no NT) and a CT domain followed by Igc2 and TM - there are quite a few receptors of this type in humans\n
SPU_020790	SPU_020790	none	looks like a partial model of an LRR/Ig membrane receptor - has 4 LRR repeats followed by a CT domain and an Igc2 -  \nlacks LR_NT at N-terminus and TM at C-terminus - there are quite a few receptors of this type in humans\n
SPU_002757	SPU_002757	none	looks like a good model of an LRR/Ig membrane receptor - has complete set of LRR with NT and CT domains followed by 2 Igc2 and TM - there are quite a few receptors of this type in humans\n
SPU_015612	SPU_015612	none	looks like a good model of an LRR/Ig membrane receptor - has complete set of LRR with NT and CT domains followed by Ig and TM - there are quite a few receptors of this type in humans\n
SPU_017564	SPU_017564	none	looks like a good model of an LRR/Ig membrane receptor - has complete set of LRR with NT and CT domains followed by Ig but no TM - there are quite a few receptors of this type in humans\n
SPU_018080	SPU_018080	none	looks like a good model of an LRR/Ig membrane receptor - has complete set of LRR with NT and CT domains followed by Ig but no TM - there are quite a few receptors of this type in humans\n
SPU_001129	SPU_001129	none	looks like a pretty good model of an LRR/Ig membrane receptor - has a set of 12 LRR repeats (no NT) and a CT domain followed by Igc2 and TM - there are quite a few receptors of this type in humans\n
SPU_018608	SPU_018608	none	looks like a partial model of an LRR/Ig membrane receptor - has aset of 4 LRR repeats (no NT) and a CT domain followed by Ig but no TM - there are quite a few receptors of this type in humans\n
SPU_011759	SPU_011759	none	looks like a pretty good model of an LRR/Ig membrane receptor - has a set of LRR repeats (no NT) and a CT domain followed by Igc2 but no TM - there are quite a few receptors of this type in humans\n
SPU_011637	SPU_011637	none	looks like a pretty good model of an LRR/Ig membrane receptor - has a set of  6 LRR repeats (no NT) and a CT domain followed by Ig and TM - there are quite a few receptors of this type in humans\n
SPU_004660	SPU_004660	none	looks like a good model of an LRR/Ig membrane receptor - has complete set of LRR with NT and CT domains followed by six Ig domains and TM - there are quite a few receptors of this type in humans\n
SPU_014240	SPU_014240	none	looks like a good model of an LRR/Ig membrane receptor - has complete set of LRR with NT and CT domains followed by Igc2 and TM - there are quite a few receptors of this type in humans\n
SPU_021853	SPU_021853	none	multiple TSP1 repeats with one Igc2 domain in the middle  \n \npossibly an ADAM-TS\n
SPU_023512	SPU_023512	none	four TSP1 repetas and an Ig domain - probably a fragment\n
SPU_010954	SPU_010954	none	e val = 0.0 to NP_999823 from S. purpuratus, but SPU_010954 is missing aa's at its start and has an insertion.  Gene model still to be modified to agree with cloning data. \ne val = 0.0 to NP_055785 from Homo sapiens. \ne val = e-125 to C_620048 from Chlamydomonas. \nAnnotated by RL Morris, A Musante, K Judkins, B Rossetti, A Rawson.\n
SPU_019990	SPU_019990	none	The automated GLEAN prediction for this Gene Model contained a duplication of 76 amino acids near the amino terminus of the predicted protein. On the assumption that this apparent duplication is an assembly error rather than a true sequence duplication, it waqs edited out of the GLEAN sequence.\n
SPU_017705	SPU_017705	none	A sequence containing the conserved motiv PSSALRE, characteristic of this kinase in other species, is missing in SPU_017705.\n
SPU_004406	SPU_004406	none	Similiar to Homo sapiens neuroglobin (NGB) mRNA, complete cds\n
SPU_010004	SPU_010004	none	This is a gene with 19 perfect hyalin repeats in tandem, each present as a separate exon,each with a high pfam score as a hyalin repeat.  Expressed in the embryo at a modest level.\n
SPU_017620	SPU_017620	none	This hyalin-like gene may be incomplete, is expressed in embryos at very low levels, if at all based on the tiling experiment.  It has an EGF - 11 hyalin repeats - EGF.  \n
SPU_012490	SPU_012490	none	The GLEAN3 prediction doesn't include the C-terminus of the protein\n
SPU_028266	SPU_028266	none	Gene accepted as is.  Not expressed in embryo but has 6 EGF repeats, 49 hyalin repeats \n
SPU_028066	SPU_028066	none	Most likely artifactual (haplotype ?) duplication of SPU_016079 (based on proximity of all other identified Type I Activin Like Receptor  to SPU_016079) \n
SPU_016079	SPU_016079	none	SPU_028066 seems a artifactual duplication of this gene \nIt seems also that the first exon of this prediction is an extra one that is not predicyed in the Angerer genescan.\n
SPU_014949	SPU_014949	none	This hyalin-like gene is not expressed in the embryo, has 23 hyalin repeats, each with a low pfam score.\n
SPU_019338	SPU_019338	none	Sequence discrepancies with cDNA data. \nGLEAN3 predicts a longer protein.\n
SPU_001500	SPU_001500	none	Expressed in the embryo based on tiling experiment.  Highly conserved at N terminus with rat.  Exons predicted to be correct in glean model\n
SPU_021497	SPU_021497	none	unclear duplication SPU_000669\n
SPU_008552	SPU_008552	none	Gene model corrected (2 exons modified) after careful comparison with vertebrate ortthologs and domain analysis. The 3' part of the gene is supported by EST evidence (CD323084 StrPu537.001446 from 20hr blastula stage library)\n
SPU_022049	SPU_022049	none	GLEAN3 prediction corresponds to the NP_999778.1 cDNA sequence only for the first 2/3 of the sequence\n
SPU_022153	SPU_022153	none	Highly expressed in embryos,  has high homology to rat casein kinase, gene looks to be complete as modeled\n
SPU_013435	SPU_013435	none	The BRCA2 repeats are highly conserved and the gene is similar to the length of\n
SPU_023599	SPU_023599	none	It is very similar to the P.l Dnmt1, with almost the same number of introns\n
SPU_006612	SPU_006612	none	duplicated exons of Sp-FRAP, see SPU_006053 for gene model\n
SPU_001692	SPU_001692	none	duplication, see SPU_006053 for  gene model\n
SPU_009520	SPU_009520	none	Annotated using P.lividus(AM179826 and CAJ47350) and S.purpuratus cDNA and protein sequences. \nGlean 09520 contains the first exons.The C terminal exons have to be taken from Glean 03704 (Meredith Ashby). Protein sequences from the last exon of 09520 and the first exon of 03704 are not encoded in the cDNAs, and conversely a sequence present in both cDNAs is not present in the models. Alternative splicing or erroneous models. \nWarning :possible assembly problem. The fragment of the protein sequence missing is coded for by a short sequence from scaffold 2003. This sequence is incorporated in other Glean and NCBI models, within an intron in one case, as an exon read in a different frame, in the other. \nReconstructed protein sequence is given. \nIndicated highest blast hit is for non sea urchin sequences.     \n
SPU_006406	SPU_006406	none	The protein encoded by this gene has a thioester site but no cleavage site to separate alpha and beta chains as in Sp-C3.  The protein sequence is too short to be either a complement protien or alpha 2 macroglobulin. \n \nGLEAN3-06406 overlaps with GLEAN3-26313.  An alignment showing this overlap is attached to this annotation.\n
SPU_019422	SPU_019422	none	The protein encoded by this gene has a thioester site, but no beta chain and no histidine to regulate thioester attack on the target sequence.  The sequence is too short to be complement or alpha 2 macroglobulin.  The sequence does not overlap with any other GLEAN sequences.\n
SPU_019612	SPU_019612	none	Similiar to gi|72041679|ref|XM_793147.1|PREDICTED: Strongylocentrotus purpuratus similar to ataxin 7-like  \n2 (LOC593678), partial mRNA \n
Sp-Tlr170	SPU_030136	none	Parital Toll-like receptor predicted by FgeneshAB and ++. The nuleotides have 89% identity to a typical Sp-Tlr (SPU_023035). This gene model is located at the end of a contig, making the gene model incomplete. \n
SPU_025786	SPU_025786	none	#\nhigh expression after UV-B irradiation of embryos\n
Sp-Tlr215	SPU_030137	none	Parital Toll-like receptor predicted by FgeneshAB and Genscan. The nuleotides have 98% identity to a typical Sp-Tlr (SPU_013751). There are 5 LRRs and no more LRRs were found in the 5' upstream region. This gene model is located at the end of a short scaffold. \n
SPU_005856	SPU_005856	none	Exon 2 was missing in the original Glean3 model.  mRNA sequence was obtained from Bill Marzluff.\n
SPU_027840	SPU_027840	none	The protein encoded by this gene matches to an EST with similarities to FKBP-12.\n
SPU_016548	SPU_016548	none	Significant partial overlap with SPU_016547\n
SPU_018378	SPU_018378	none	   e val for NP_999777 = 0.0; KRP85 [Strongylocentrotus purpuratus].   \nUpdated sequence by replacing peptide seq with NP_999777 and nucleotide sequence with "NM_214612, 2213 bp, mRNA, linear, INV 12-AUG-2005" \n   e val = e-135 for C_1880008 (http://genome.jgi-psf.org/Chlre3/Chlre3.home.html), Cr-Fla10 which is same as NCBI P46869 \n   e val = e-140 for NP_004789, KIF3B [Homo sapiens]. \n   Annotation by RL Morris, B Rossetti, and A Rawson. \n
SPU_026281	SPU_026281	none	#\nSegment of KRP95, annotated fully in SPU_026280. \nAnnotation by RL Morris, R.A.Obar, AP Rawson, and B Rossetti.\n
SPU_027012	SPU_027012	none	This glean model encodes the N-terminal part of the protein.  The C-terminus of the protein is contained in SPU_014185.\n
SPU_014185	SPU_014185	none	Encodes the C-terminus of Wee1; see annotation to SPU_027012.\n
SPU_011239	SPU_011239	none	e val to Q9P2H3 [Homo sapiens] = 0.0 \ne val to C_120075 (FAP167, IFT80, Intraflagellar Transport protein 80, http://genome.jgi-psf.org/Chlre3/Chlre3.home.html) = 3e-57 \nWD repeats. \nAnnotated by RL Morris.\n
SPU_010564	SPU_010564	none	three and a bit complete LRRNT-LRRn-LRRCT repeats \n \nTogether with adjacent gene (SPU_010564) comprises a complete Slit gene - could be one or two exons encoding LRR repeats missing at junction\n
SPU_021581	SPU_021581	none	The start methionine of the ORF described here aligns with the query sequence (Mouse dystrophin) after the first 300 N-terminal amino acids.  GLEAN3 prediction 21580 lies immediately upstream on the same scaffold and encodes  a partial spectrin motif, it is possible that this should be included, however this entry constitutes a well aligned ORF against known homologs. \n
SPU_021228	SPU_021228	none	This model was annotated based on a manual inspection of multiple protein sequence and domain structure comparisons. \n \nThis and a very similar adjacent model (SPU_021229) predict proteins with a domain structure very similar to that of coagulation factors 5 and 8 (long N-terminus of little complexity and a C-terminal Pfam F5_F8_type_C domain). Their C-terminal F5_F8_type_C blasts best to coagulation factor 8. It should be noted, however, that there is a predicted EGF domain at the N-terminus of this model, which is absent from coagulation factors 5 and 8. \n \nIts adjacent model (SPU_021229) is similar in sequence but far from identical, which suggests that these models might represent a true gene duplication event.\n
SPU_021229	SPU_021229	none	This model was annotated based on a manual inspection of multiple protein sequence and domain structure comparisons. \n \nThis and a very similar adjacent model (SPU_021228) predict proteins with a domain structure very similar to that of coagulation factors 5 and 8 (long N-terminus of little complexity and a C-terminal Pfam F5_F8_type_C domain). Their C-terminal F5_F8_type_C blasts best to coagulation factor 8. It should be noted, however, that there is a predicted Cadherin domain at the N-terminus of this model, which is absent from coagulation factors 5 and 8. \n \nIts adjacent model (SPU_021228) is similar in sequence but far from identical, which suggests that these models might represent a true gene duplication event.\n
SPU_012598	SPU_012598	none	e val to XP_787973 = e-122. \ne val to Q9P2H3 [Homo sapiens] = e-73 \nSome sililarity to FAP167, IFT80, Intraflagellar Transport protein 80, http://genome.jgi-psf.org/Chlre3/Chlre3.home.html). \nAnnotated by RL Morris.\n
SPU_026298	SPU_026298	none	one complete LRR unit LRR-NT/17LRR/LRR-CT \n \nadjacent gene (SPU_026299) has very similar structure - probably part of the same LRR protein\n
SPU_026299	SPU_026299	none	one complete LRR unit LRR-NT/11LRR/LRR-CT \n \nadjacent gene (SPU_026298) has very similar structure - probably part of the same LRR protein\n
SPU_000831	SPU_000831	none	Similar to R-PTP-delta.  Partial sequence. May be a portion of a duplicate gene.  Another Sp-R-PTP-delta, SPU_013607, is not on the same scaffold.  SPU_013607 is probably a duplicate. \nSee putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137535563-11783-53029041881.BLASTQ1\n
SPU_000164	SPU_000164	none	One of 59 models with only one clectin motif and no others\n
SPU_000294	SPU_000294	none	One of 59 models with only one clectin motif and no others\n
SPU_001182	SPU_001182	none	One of 59 models with only one clectin motif and no others\n
SPU_001274	SPU_001274	none	One of 59 models with only one clectin motif and no others\n
SPU_002144	SPU_002144	none	One of 59 models with only one clectin motif and no others\n
SPU_002697	SPU_002697	none	One of 59 models with only one clectin motif and no others\n
SPU_003227	SPU_003227	none	One of 59 models with only one clectin motif and no others\n
SPU_003618	SPU_003618	none	One of 59 models with only one clectin motif and no others\n
SPU_003774	SPU_003774	none	One of 59 models with only one clectin motif and no others\n
SPU_004015	SPU_004015	none	One of 59 models with only one clectin motif and no others\n
SPU_005111	SPU_005111	none	One of 59 models with only one clectin motif and no others\n
SPU_005594	SPU_005594	none	One of 59 models with only one clectin motif and no others\n
SPU_005989	SPU_005989	none	One of 59 models with only one clectin motif and no others\n
SPU_005991	SPU_005991	none	One of 59 models with only one clectin motif and no others\n
SPU_006508	SPU_006508	none	One of 59 models with only one clectin motif and no others\n
SPU_007040	SPU_007040	none	One of 59 models with only one clectin motif and no others\n
SPU_007576	SPU_007576	none	One of 59 models with only one clectin motif and no others\n
SPU_007766	SPU_007766	none	One of 59 models with only one clectin motif and no others\n
SPU_008393	SPU_008393	none	One of 59 models with only one clectin motif and no others\n
SPU_009504	SPU_009504	none	One of 59 models with only one clectin motif and no others\n
SPU_010100	SPU_010100	none	One of 59 models with only one clectin motif and no others\n
SPU_010101	SPU_010101	none	One of 59 models with only one clectin motif and no others\n
SPU_010212	SPU_010212	none	One of 59 models with only one clectin motif and no others\n
SPU_010297	SPU_010297	none	One of 59 models with only one clectin motif and no others\n
SPU_011163	SPU_011163	none	One of 59 models with only one clectin motif and no others\n
SPU_012314	SPU_012314	none	One of 59 models with only one clectin motif and no others\n
SPU_012942	SPU_012942	none	One of 59 models with only one clectin motif and no others\n
SPU_013649	SPU_013649	none	One of 59 models with only one clectin motif and no others\n
SPU_014081	SPU_014081	none	One of 59 models with only one clectin motif and no others\n
SPU_014082	SPU_014082	none	One of 59 models with only one clectin motif and no others\n
SPU_014184	SPU_014184	none	One of 59 models with only one clectin motif and no others\n
SPU_014222	SPU_014222	none	One of 59 models with only one clectin motif and no others\n
SPU_017860	SPU_017860	none	One of 59 models with only one clectin motif and no others\n
SPU_019213	SPU_019213	none	One of 59 models with only one clectin motif and no others\n
SPU_020517	SPU_020517	none	One of 59 models with only one clectin motif and no others\n
SPU_022152	SPU_022152	none	One of 59 models with only one clectin motif and no others\n
SPU_022396	SPU_022396	none	One of 59 models with only one clectin motif and no others\n
SPU_022861	SPU_022861	none	One of 59 models with only one clectin motif and no others\n
SPU_023797	SPU_023797	none	One of 59 models with only one clectin motif and no others\n
SPU_024127	SPU_024127	none	One of 59 models with only one clectin motif and no others\n
SPU_024382	SPU_024382	none	One of 59 models with only one clectin motif and no others\n
SPU_025181	SPU_025181	none	One of 59 models with only one clectin motif and no others\n
SPU_025184	SPU_025184	none	One of 59 models with only one clectin motif and no others\n
SPU_025248	SPU_025248	none	One of 59 models with only one clectin motif and no others\n
SPU_025874	SPU_025874	none	One of 59 models with only one clectin motif and no others\n
SPU_025875	SPU_025875	none	One of 59 models with only one clectin motif and no others\n
SPU_025892	SPU_025892	none	One of 59 models with only one clectin motif and no others\n
SPU_026524	SPU_026524	none	One of 59 models with only one clectin motif and no others\n
SPU_027079	SPU_027079	none	One of 59 models with only one clectin motif and no others\n
SPU_028229	SPU_028229	none	One of 59 models with only one clectin motif and no others\n
SPU_028432	SPU_028432	none	One of 59 models with only one clectin motif and no others\n
SPU_028538	SPU_028538	none	One of 59 models with only one clectin motif and no others\n
SPU_028564	SPU_028564	none	One of 59 models with only one clectin motif and no others\n
Sp-Tfpi-like	SPU_030138	none	This model was created based on a Fgenesh++ prediction on scaffold98422 and a manual inspection and comparison of the predicted protein to similar genes in other groups. \n \nBased on our analysis thus far, this model likely corresponds to a partial prediction, as indicated by the similarity of the available sequence from this model to Tissue factor pathway inhibitor genes in vertebrates.  \n
SPU_015878	SPU_015878	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure analyses. \n \nThe structure of this model is supported by the fact that other gene prediction protocols generated identical models. \n \nThe domain structure of this model differs slightly from that of vertebrate kallikrein B1 genes, in that no Pfam PAN domain is predicted in its N-terminus. Otherwise, the size and structure of this prediction is similar to that of kallikrein B1.\n
SPU_006723	SPU_006723	none	See also SPU_005592 and SPU_024688. \nSee putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137537488-29699-97084085825.BLASTQ4\n
SPU_008466	SPU_008466	none	Blasts to PTPRT, but doesn't clade with the PTPR K/M/T/U group in phylogenetic analysis using PTPc domains 1 or 2.  Renamed PTPRorph2. \nSee putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137537867-27665-146362170620.BLASTQ4\n
SPU_008878	SPU_008878	none	Partial sequence.  Similar to Survivin 2. \nSee putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137538166-23506-35039912798.BLASTQ4\n
SPU_000276	SPU_000276	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 3.The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: RERERERERWRDGRGEREIATERERERERERERERERKRNMERERERERERERERIGNKSEYGIVRYVXXXXXXXXXXXXXXXXXGGGRDSIPFIENP\n
SPU_012586	SPU_012586	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 2,5,6,7.\n
SPU_013962	SPU_013962	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 2. \n \nC-terminal has likely been wrongly attached to SPU_008353.\n
SPU_004028	SPU_004028	none	Matches_SPU_004028. Transcriptome data indicates that Glean may have falsely predicted the following exons: 1.\n
SPU_002815	SPU_002815	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: KMEYILRERQLRLGRRLQCHHIHPQMSKATKSNQLQWHHIPMGTHNRLVLTAKLSSRRNGNFSVTIRGILFVTIILLLVIVVVTFSVGFVMVFLTDKWPRRYQSYPVIGDDWIKSGFRKRTESRINLPTYTYYGY,FVPSEHLLNARLSAVEKSVGKALISGVSSIHGQNQPLTDQAKPQDLNQEDNQTTAQPTQPTQQDEGDDSGITHQPLNVTTDSIEDGVHTEGTTTQVGQETAMPPHTSANVKGDQKQPTTMAPHTNGDSQPPSADGEVVIKAKREFLGDHPRYSFRDNNPFVGDRRGDVLGRLRDGLPHRQVASQVPVLPRHRRRLDQKWFQKKDGIEN\n
SPU_022816	SPU_022816	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: HPSSSSPLYTPNFSSPSSSSLPSVYLFPKSFIQPLLYLQHRPLLLNNIVLLLLSFLTPSTTSSSSSSPSLISFFFSSSIFSKIVFPSSSSPPQPPPKLPPPPPPPLLPNPFHNHILQYFLPPPLPCLLCIPFPQSFIQPLLYLHLKLRLFHSSSSPLFSYLL,FCSSSSSSSSCSSSFSSSTTSPPPPLLPNPLHNFLLSLFLLDILLLLLLYILQIFLPPPPPPCLQYISFLRVSSNRSCTCNIAPSS,EFHPTAPVPATSPPPPKQHRPPPPLLPNPLHNFVLLFIPLLDILLLLLLHILQNCLPLLLLSSPTPSKTSSSSSSSSPSQPLPQPYSPIFSSPSSPLPSVYPLSSEFHPTAPVPALKAPPFPLLLLSPFLLSP\n
SPU_023386	SPU_023386	none	The tiling data indicates that in the vicinity of the gene are significant ORFs which might represent additional exons. ORFs: AKMLAFITSHRTKARGKKEYDIVFPALFSWLYRVCFLAKRFIYAKQPQAMKPGGIGCLNEEGRKVLYKFLSFSFSPSTPILHAQINSRH,MFFVSPTNSTPNFSLATIRPRVVSAIALAAGLHPSNSIQHKSVKPSLGECRFEAEASSLTSFDLCLFALRFPLSLHSDIA,IRAYITVYNINIYVRHLEYDSSTNTLCVYVLSPSLYYSPSPSYLPLPISLSPSPFPKYFYVAYNALVISYRTLVDIFLGFCLNVFCFAN,LIHHPDMPAEIFTTPTSNTDTLLVQPRHFVYTIISIRPHAYKLMNPPPPSTSLFLSPSPSSTPFYYFFEVFPLAFRALVR,ERPNICPLASPDLSVLTKEAPVRINRTLLFLSLSPDPGPDNIQGISKSHSLFSLYFLSFFFQSPIPPPPPSHHPPQISSLP,SFTSIKYICLKDCSTISAHYPKHAHYQADTLLWFSLHSRMRRRWAIYCTRKIITRIPNDRAATLPDIHLFSLPASLPPTRYSPVIVH,IYLFKRLFYNICSLSQARTLSGRYPTVVLTPFTHEEKMGNLLHEENHNENSKRQGSNLARHTPVLLTCLIASYQVFTCNRPLRRDNSLPLSTVTDALFDAYAEVY\n
SPU_007882	SPU_007882	none	Matches c-type lectin domain (cd00037).\n
SPU_022405	SPU_022405	none	Ig-EGF-FN3 - a domain sequence unique to Tie1/2 in chordates \nLacks TM and kinase domains - suspect this is missing 3' end of gene.\n
SPU_010943	SPU_010943	none	The GLEAN model directly corresponds to the previously cloned sea urchin (purpuratus) fascin 1 gene.\n
SPU_023200	SPU_023200	none	variant b, 2 DSRM domains, the other gene has just one. \n
SPU_017592	SPU_017592	none	partial CDS containing an exon encoding part of the protease domain.  Nucleic acid sequence is very close to that of SPU_011551.\n
SPU_018708	SPU_018708	none	partial cds on a short scaffold.  Nucleic acid sequence is identical to that of SPU_028742.\n
SPU_019655	SPU_019655	none	partial CDS; predicted exons supported by weak signals on the transcriptome.\n
SPU_006214	SPU_006214	none	PREDICTED: Strongylocentrotus purpuratus similar to CG11793-PA (LOC579361), mRNA Length=893 \n
SPU_007151	SPU_007151	none	PREDICTED: Strongylocentrotus purpuratus similar to CG1548-PA (LOC575021), mRNA \n
SPU_015385	SPU_015385	none	#\nPREDICTED: Strongylocentrotus purpuratus similar to lipocalin 7 (LOC576823), mRNA \n
SPU_021839	SPU_021839	none	SPU_021839 encodes a partial 3'-terminal sequence of the Cdk2 mRNA \nThe sequence is entirely contained in SPU_007655, Refer to it for further annotation\n
SPU_001005	SPU_001005	none	4 SRCR domains. Hits DMBT1 (probably superficial).  Probably incomplete model.  Possibly part of SPU_001004.\n
SPU_001004	SPU_001004	none	TM-SRCR(8).  Probably partial model.  Possiby part of gene that includes SPU_001005.  \n
SPU_000984	SPU_000984	none	SRCR(5).  Probably incomplete.\n
SPU_000672	SPU_000672	none	Two gene models seem to be fused together. The first 7 exons appear to belong to the NLR gene, although the first exon is also questionable. \nDomains: DEATH-NACHT-LRRs \n
SPU_013206	SPU_013206	none	#\nThe Genscan model contains additional sequences at the 5' end and seems more accurate. The gene features and sequences were annotated according to this model. \nDomains: DEATH-NACHT-PYD-LRRs \n
SPU_024528	SPU_024528	none	novel architecture - intermingled SR and FU repeats followed by a series of Ig domains and a TM \nlooks like a more complete version of the predicted gene UPI0000583F83 - similar to deleted in malignant brain tumors 1 isoform c precursor \n \ncompare SPU_006068 - a duplication of the C-terminal half of this gene\n
SPU_006952	SPU_006952	none	has all the extracellular features of a toll receptor - no TIR in cyto domain - could be incomplete gene or simply an LRR receptor \n \nNote - adjacent gene (SPU_006951) is very similar\n
SPU_006951	SPU_006951	none	has all the extracellular features of a toll receptor - no TIR in cyto domain - could be incomplete gene or simply an LRR receptor \n \nNote - adjacent gene (SPU_006952) is very similar\n
SPU_021178	SPU_021178	none	has all the extracellular features of a toll receptor - no TIR in cyto domain - could be incomplete gene or simply an LRR receptor\n
SPU_001106	SPU_001106	none	has all the extracellular features of a toll receptor - no TIR in cyto domain - could be incomplete gene or simply an LRR receptor\n
SPU_021751	SPU_021751	none	has all the extracellular features of a toll receptor - no TIR in cyto domain - could be incomplete gene or simply an LRR receptor\n
SPU_014039	SPU_014039	none	has the extracellular features of a toll receptor ( a bit spaced out) - no TIR in cyto domain - could be incomplete gene or simply an LRR receptor\n
SPU_014845	SPU_014845	none	has the extracellular features of a toll receptor ( a bit spaced out) - no TIR in cyto domain - could be incomplete gene or simply an LRR receptor\n
SPU_007065	SPU_007065	none	has the extracellular features of a toll receptor BUT no predicted TM and no TIR in putative cyto domain - could be incomplete gene or simply an LRR receptor\n
SPU_022463	SPU_022463	none	has the extracellular features of a toll receptor ( a bit spaced out) BUT no predicted TIR in putative cyto domain - could be incomplete gene or simply an LRR receptor\n
SPU_018237	SPU_018237	none	has the extracellular features of a toll receptor ( maybe fewere LRRs - could be missing N-terminus) BUT no predicted TIR in putative cyto domain - could be incomplete gene or simply an LRR receptor\n
SPU_001436	SPU_001436	none	has the extracellular features of a toll receptor ( maybe fewer LRR repeats)) BUT no predicted TIR in putative cyto domain - could be incomplete gene or simply an LRR receptor\n
SPU_018238	SPU_018238	none	has the extracellular features of a toll receptor ( maybe fewer LRR repeats)) BUT no predicted TIR in putative cyto domain - could be incomplete gene or simply an LRR receptor\n
SPU_020527	SPU_020527	none	has the extracellular features of a toll receptor BUT no predicted TM or TIR cyto domain - could be incomplete gene or simply an LRR receptor\n
SPU_019916	SPU_019916	none	has the extracellular features of a toll receptor ( maybe fewer LRR repeats)) BUT no predicted TM or TIR cyto domain - could be incomplete gene or simply an LRR receptor\n
SPU_026114	SPU_026114	none	has the extracellular features of a toll receptor ( maybe a bit spaced out) BUT no predicted TM or TIR cyto domain - could be incomplete gene or simply an LRR receptor\n
SPU_008477	SPU_008477	none	has the extracellular features of a toll receptor ( maybe a few more than usual LRRs) BUT no predicted TM or TIR cyto domain - could be incomplete gene or simply an LRR receptor\n
SPU_007476	SPU_007476	none	looks rather like  an LRR/Ig membrane receptor - a set of LRR with NT a CT domain followed by Ig-like and TM - lacks an LRR-NT domain - could be incomplete \n \nBEST MATCH IS LRIG receptors \n
SPU_024251	SPU_024251	none	looks like a good model of an LRR/Ig membrane receptor - has complete set of LRR with NT and CT domains followed by Ig-like and TM - there are quite a few receptors of this type in humans \n \nBEST MATCH IS LRIG receptors - some homology with Gp-V of platelets\n
SPU_001749	SPU_001749	none	has the extracellular features of a toll receptor (plus an LRR-NT domain at N-terminus) BUT no predicted TM or TIR cyto domain - could be incomplete gene or simply an LRR receptor\n
SPU_026296	SPU_026296	none	has the extracellular features of a toll receptor ( maybe a few more than usual LRRs and has an N-terminal LRR-NT domain) BUT no predicted TIR cyto domain - could be incomplete gene or simply an LRR receptor\n
SPU_000740	SPU_000740	none	Signal Peptide-SRCR(4)-TM.\n
SPU_000654	SPU_000654	none	Signal Peptide-SRCR(2).  Possibly partial\n
SPU_002925	SPU_002925	none	has partial LRR unit - LRR4/LRRCT and TM - could be fragment of a toll receptor or of another type of LRR receptor\n
SPU_000646	SPU_000646	none	Sig Pep - SRCR(5).  Possibly partial.\n
SPU_007463	SPU_007463	none	one complete LRR unit LRR-NT/21LRR/LRR-CT\n
SPU_001172	SPU_001172	none	SRCR(3)-TM.  Probably partial. (DMBT1)\n
SPU_001177	SPU_001177	none	SRCR(2).  Probably Partial.(DMBT1)\n
SPU_001229	SPU_001229	none	SRCR(5). Probaly partial. (DMBT1)\n
SPU_001266	SPU_001266	none	SRCR(5)-TM. Probably partial.  (DMBT1)\n
SPU_001601	SPU_001601	none	F5_F8_type_C(1)-SRCR(3). Probably partial.\n
SPU_008669	SPU_008669	none	Putative conserved TAFII28 domain detected in BlastP search. \n \nBest Genbank hit was a predicted protein similar to TFIID subunit 11 in S. purpuratus (XP_789830; 477 bits, 3e-133. \n \nBest Genbank empirical support for TFIID subunit 11 is TAF11 RNA polymerase II, TATA box binding protein (TBP)-associated factor, 28kDa [Homo sapiens], accession AAV38212, score 170 bits, 1e-40. \n \nAll exons are supported by tiling path array data. \nExonerate and Splign both support expression of this protein. \nHowever, query of Poustka's database does not retrieve significant hits. \n \nModified glean model by accepting Davidson's 3' UTR. No other changes were made.\n
SPU_008435	SPU_008435	none	fragment of an Igc2/FN3 protein - 3 Ig and 1 FN3 - no TM\n
SPU_001727	SPU_001727	none	SRCR(9). Probably partial. (DMBT1)\n
SPU_001763	SPU_001763	none	SigPep-SRCR(2). Probably partial.(Hensin/DMBT1)\n
SPU_007715	SPU_007715	none	EGF-Ca x3 - Igc2 x2 - FN3 - probably a fragment - this domain organisation is not particularly informative as to identity \n \nlooks a little like a fragment of Tie 1/2 but the exact domains are not quite right\n
SPU_007629	SPU_007629	none	Ig-EGF-FN3 - probably a fragment - this domain organisation is not particularly informative as to identity\n
SPU_001863	SPU_001863	none	SRCR(2). Probably partial. (DMBT1)\n
SPU_014732	SPU_014732	none	Blasted protein sequence of human gene NM_005645 against Baylor to obtain SPU_014732. \n \nBlasted glean gene against NCBI. Putative conserved domains detected: TFIID-18kDa (Pfam). Best Genbank hit: XP_796890. Best empirical data support (provisional acceptance at NCBI): TAF13 RNA polymerase II, TATA box binding protein (TBP)-associated factor, 18kDa [Accession: NP_001016066, cDNA, Xenopus tropicalis; bits 145, 4e-34]. and an unknown protein sequence [Accession: AAH74456, cDNA, Xenopus laevis; score, bits 145, 5e-34]. \n \nExonerate and Splign data exist that support all four exons of the gene. Poustka's database lacks support for any exon. Tiling path array data support exons 2-4, but are inconclusive for the first exon. \n \nI omitted Davidson's 3' UTR because it was HUGE: longer than all the exons of the CDS combined. If the 3'UTR modification were accepted, the resultant gene would contain both SPU_014732 (this gene) and SPU_014733 (the next gene on the scaffold in the 3' direction). \n \n \n \n
SPU_002028	SPU_002028	none	SRCR(2). Probably partial. (DMBT1)\n
SPU_024857	SPU_024857	none	six Igc2 repeats - probably a fragment of some adhesion protein or receptor\n
SPU_008622	SPU_008622	none	eight Ig and Igv repeats - probably a fragment of some adhesion protein or receptor \n \nC-terminal DEATH domain could be an artefact - no known proteins with this structure\n
SPU_003127	SPU_003127	none	SRCR(4)-TM. Probably partial. (DMBT1)\n
SPU_013889	SPU_013889	none	seven Ig-like repeats and a heme peroxidase domain - homolog of peroxidasin, although missing LRR repeats at N-terminus\n
SPU_003384	SPU_003384	none	SRCR(4). Probably incomplete. \n
SPU_012354	SPU_012354	none	nine Igc2 and Ig repeats - probably a fragment of some adhesion protein or receptor\n
SPU_003526	SPU_003526	none	SRCR(6). Probably partial. (hensin/dmbt1)\n
SPU_020291	SPU_020291	none	seven Ig and Igc2 repeats - probably a fragment of some adhesion protein or receptor\n
SPU_006024	SPU_006024	none	fifteen Ig Igc2 and Igv repeats - probably a fragment of some adhesion protein or receptor\n
SPU_003963	SPU_003963	none	SRCR(6)-TM. Probably partial. (DMBT1). Maybe continuous with SPU_003964, SPU_003965, SPU_003966\n
SPU_022647	SPU_022647	none	eight Ig and Igc2 repeats - probably a fragment of some adhesion protein or receptor\n
SPU_025432	SPU_025432	none	fourteen Ig and Igc2 repeats - probably a fragment of some adhesion protein or receptor\n
SPU_003964	SPU_003964	none	SRCR(3)-TM. Probably partial. (DMBT1). Maybe continuous with SPU_003963, SPU_003965, SPU_003966\n
SPU_010123	SPU_010123	none	eight Ig and Igc2 repeats - probably a fragment of some adhesion protein or receptor\n
SPU_003965	SPU_003965	none	SRCR(2). Probably partial. (DMBT1). Maybe continuous with SPU_003963, SPU_003964, SPU_003966\n
SPU_003966	SPU_003966	none	SRCR(3)-TM. Probably partial. (DMBT1). Maybe continuous with SPU_003963, SPU_003964, SPU_003965\n
SPU_022021	SPU_022021	none	six Igc2 repeats - probably a fragment of some adhesion protein or receptor \n \nlong low-complexity sequence preceding Ig domains is suspicious \n
SPU_000532	SPU_000532	none	eight Ig and Igc2 repeats - probably a fragment of some adhesion protein or receptor\n
SPU_012350	SPU_012350	none	ten Ig and Igc2 repeats - probably a fragment of some adhesion protein or receptor\n
SPU_006022	SPU_006022	none	eleven Ig and Igc2 repeats - probably a fragment of some adhesion protein or receptor\n
SPU_012355	SPU_012355	none	six Ig and Igc2 repeats - probably a fragment of some adhesion protein or receptor\n
SPU_000453	SPU_000453	none	ten Igc2 repeats - probably a fragment of some adhesion protein or receptor\n
SPU_013978	SPU_013978	none	six Ig and Igc2 repeats - probably a fragment of some adhesion protein or receptor\n
SPU_021326	SPU_021326	none	six Ig and Igc2 repeats - probably a fragment of some adhesion protein or receptor\n
SPU_021747	SPU_021747	none	fourteen Ig and Igc2 repeats - probably a fragment of some adhesion protein or receptor\n
SPU_000533	SPU_000533	none	eight Ig and Igc2 repeats - probably a fragment of some adhesion protein or receptor\n
SPU_027927	SPU_027927	none	Exon 2 of the Glean model does not Blast to NLR proteins. May be erroneous. There are unresolved (NNN) sequences within this model as well. \nDomains: NACHT, DEATH, LRR \n
SPU_003942	SPU_003942	none	#\nHomo sapiens mRNA for polyglutamine binding protein variant 14(PQBP1 gene).\n
SPU_027335	SPU_027335	none	PREDICTED: Strongylocentrotus purpuratus similar to Galanin receptor type 2 (GAL2-R) (GALR2) (LOC580125), mRNA \nLength=1029 \n
SPU_011732	SPU_011732	none	#\nStrongylocentrotus purpuratus similar to gamma-aminobutyric acid  A receptor, epsilon (LOC585340), mRNA\n
SPU_008212	SPU_008212	none	PREDICTED: Strongylocentrotus purpuratus similar to Somatostatin receptor type 2 (SS2R) (SRIF-1) (LOC581556), mRNA\n
SPU_016241	SPU_016241	none	PREDICTED: Strongylocentrotus purpuratus similar to Glycogenin-1 (LOC589298), partial mRNA \n
SPU_027885	SPU_027885	none	PREDICTED: Strongylocentrotus purpuratus similar to ceruloplasmin (LOC586705), mRNA \n
SPU_020534	SPU_020534	none	#\nPREDICTED: Strongylocentrotus purpuratus similar to Surfeit locus protein 1 (LOC592318), mRNA \n
SPU_019373	SPU_019373	none	PREDICTED: Strongylocentrotus purpuratus similar to surfeit 5 isoform b (LOC583077), mRNA. \n
SPU_018029	SPU_018029	none	#\nTSPN and TSP1 domain combination. This domain combination is usually found in subgroup A thrombospondins. One fly protein has just these 2 domains. \nThe TSPN domain is most closely related to the one found in human collagen11a1. \nThe gene prediction probably includes some repetative sequence elements\n
SPU_025559	SPU_025559	none	Model contains only a TSPN domain but blast suggests that this domain is related to the Col15/18 family. The gene model is probably partial.\n
SPU_004595	SPU_004595	none	Partial glean model for Nek8, duplicate of a longer glean model SPU_005411\n
SPU_028861	SPU_028861	none	Two overlapping glean predictions match DAP5 on scaffold 22799 (SPU_023932) and scaffold 105341 (SPU_028861).  \nNew gene model proposed: \nExon 1 Scaffold105341|5215|5219|+ \nExon 2 Scaffold105341|6813|6935|+ \nExon 3 Scaffold105341|8718|8820|+  \nExon 4 Scaffold22799|7457|7567|+ \nExon 5 Scaffold22799|7779|7805|+ \nExon 6 Scaffold22799|8433|8499|+ \nExon 7 Scaffold22799|8977|9069|+ \nExon 8 Scaffold22799|9731|9789|+ \nExon 9 Scaffold22799|11308|11490|+ \nExon 10 Scaffold22799|11835|11961|+ \nExon 11 Scaffold22799|13490|13579|+ \nExon 12 Scaffold22799|15302|15734|+ \nExon 13 Scaffold22799|16419|16569|+ \nExon 14 Scaffold22799|18222|18533|+ \nExon 15 Scaffold22799|19645|19860|+ \nExon 16 Scaffold22799|20736|21041|+ \nExon 17 Scaffold22799|21914|22071|+ \nExon 18 Scaffold22799|23074|23282|+ \nExon 19 Scaffold22799|23856|23977|+ \nExon 20 Scaffold22799|24686|24757|+ \nExon 4 to 8 are present in both saffolds.  Homologous sequences to exon 5 are found twice nearby (eg. on scaffold22799, 7657-7683 and 8050-8076), multiple isoforms? \nExon 15 is also duplicated in SPU_004171.\n
SPU_011191	SPU_011191	none	The encoded aa sequence is entirey contained in SPU_003528.  \nIn SPU_011191 the sequence encoded by exons 3 and 4 of SPU_003528 are missing. \n
SPU_015935	SPU_015935	none	There is only one gene representing SNRPA and U2B" in Urchin.\n
SPU_005411	SPU_005411	none	Partial gene model. The Cter part of the protein is on glean 04595 (duplicated exons identical). The Nterminal part of the protein is missing (not found on this genome assembly), the kinase domain is therefore incomplete. beginning of SPU_005411 is located at the border of the scaffold and do not reflect the real Nter.\n
SPU_028068	SPU_028068	none	PREDICTED: Strongylocentrotus purpuratus similar to large conductance calcium-activated potassium channel subfamily M alpha member 1 isoform b (LOC578468), mRNA \n
SPU_010354	SPU_010354	none	#\nSPU_010354 lies on minus strand of Scaffold90906. Protein is probably missing N-ter (comparative analysis of transmembrane helices in the prot family).  \nAnalysis of genomic region 5' of glean prediction (used Genescan and GeneMark to look for additional exons): not conclusive. \nScaffold90906 seq is incomplete 5' to glean_10354, additional exons may lie there. Alternatively, N-ter may possibly lie on SPU_006608 (Scaffold109591). \n \n
SPU_022934	SPU_022934	none	This model was annotated based on multiple protein sequence alignments and a manual analysis of predicted domain structures. \n \nThe annotation of this model is supported by reciprocal blasting and the observation that the domain structure of this model is similar to that of plasminogen genes. It should be noted, however, that given the position of this model in the current assembly (in a scaffold containing various sequence gaps), this model may be only partial and could be significantly improved as updated versions of the assembly become available.\n
SPU_000633	SPU_000633	none	This model was annotated based on multiple protein sequence alignments and a manual analysis of predicted domain structures. \n \nThe annotation of this model is supported by reciprocal blasting and the observation that the domain structure of this model is similar to that of plasminogen genes. It should be noted, however, that the protein predicted by this model resembles a partial plasminogen. The position of this model in the current assembly is inconclusive with regards to the possibility that this model may be incomplete or that it may represent a novel protein related in structure to plasminogen.\n
SPU_004292	SPU_004292	none	This gene is split between two scaffolds.  This model encodes the N-terminal portion of the protein.  The C-terminal portion, with some overlap, is found in SPU_025615.\n
SPU_025615	SPU_025615	none	This model encodes the C-terminal portion of the Rbl-1 protein.  The N-terminal portion (with some overlap) is found in SPU_004292.\n
SPU_004011	SPU_004011	none	SRCR(3). Probably partial. (DMBT1)\n
SPU_014731	SPU_014731	none	This model may be duplicated in SPU_014730. Please refer to SPU_014730 for details and comments.\n
SPU_004086	SPU_004086	none	SRCR(4). Probably partial. (DMBT1)\n
SPU_004100	SPU_004100	none	SigPep-SRCR(2)-TM. Possibly part of gene that includes SPU_0101(and 102?). (Brain Ser prot/hensin/DMBT1)\n
SPU_006242	SPU_006242	none	Gene correctly predicted\n
SPU_004101	SPU_004101	none	SigPep-SRCR(7)-TM. Possibly part of gene that includes SPU_0100(and 102?). (DMBT1)\n
SPU_014730	SPU_014730	none	This model was annotated based on multiple protein sequence alignments and a manual analysis of predicted domain structures. \n \nThe annotation of this model is supported by reciprocal blasting and the observation that the domain structure of this model is similar to that of plasminogen genes. It should be noted, however, that given the position of this model in the current assembly (close to the end of a scaffold), this model may be only partial and could be significantly improved as updated versions of the assembly become available. \n \nIt should also be noted that this model may be duplicated in an adjacent model (SPU_014731). They're 98% identical at the protein level and they are located on separate contigs, which might indicate a very recent true gen duplication event or an assembly problem (haplotypes?).\n
SPU_004160	SPU_004160	none	SRCR(8). possible partial. (DMBT1)\n
SPU_002088	SPU_002088	none	SPU_013821 appears to be identical.\n
SPU_004642	SPU_004642	none	SRCR(2)-Sushi(1?).  Possibly partial. (DMBT1)\n
SPU_005000	SPU_005000	none	SigPep-SRCR(5).  Possibly partial.  (DMBT1)\n
SPU_005154	SPU_005154	none	SRCR(7). Possibly partial.\n
SPU_005420	SPU_005420	none	SRCR(5)-Sushi(1)-TM.  Possibly partial.\n
SPU_005464	SPU_005464	none	SRCR(6)-TM. Possibly partial.\n
SPU_005414	SPU_005414	none	This model was annotated based on multiple protein sequence alignments and a manual analysis of predicted domain structures. \n \nThe annotation of this model is supported by reciprocal blasting and the observation that the domain structure of this model is similar to that of plasminogen genes. It should be noted, however, that given the position of this model in the current assembly (close to an end of a small scaffold), this model may be only partial and could be significantly improved as updated versions of the assembly become available.\n
SPU_006608	SPU_006608	none	Partial sequence \nSPU_006608 (Scaffold109591) probably contains the N-ter of Rh50, C-ter to be found on SPU_010354 (minus strand of Scaffold90906): Or else these are two different genes. \n \nProbably only the first two exons code for Rh50 protein \nThird exon (pos.18813-18900) probably not real:deleted \n \n
SPU_005556	SPU_005556	none	#\nSRCR(9). Possibly partial.\n
SPU_027802	SPU_027802	none	PREDICTED: Strongylocentrotus purpuratus similar to Ubiquitin ligase protein RNF8 (RING finger protein 8) (LOC583203), mRNA \n
SPU_016506	SPU_016506	none	Model inaccurately predicts N-terminus (>100 amino acids have been omitted!) \n \n \nSPU_021385 appears to be identical.\n
SPU_005860	SPU_005860	none	SRCR(3)-TM-PTPc. Unique domain structure.  Possibly partial.\n
SPU_009188	SPU_009188	none	Pfam PF00909\n
SPU_018141	SPU_018141	none	Pfam PF00909\n
SPU_000856	SPU_000856	none	PREDICTED: Strongylocentrotus purpuratus similar to potassium voltage gated channel, Shab-related subfamily, member 2 (LOC593326), mRNA Length=3318 \n
SPU_013823	SPU_013823	none	#\nOne EST (CD295368) appears to include the N-terminus of the protein and suggests the true start methionine is slightly upstream of the predicted start methionine in the GLEAN model. Based on this EST the model has been modified to include these additional amino acids at the N-terminus (MFCFRAILVLSACVVYGQKKEKTNVFTIKPVSSIYLPHYVAGKKSWGINKDAAVKTAYD...) Note that this added sequence contains a predicted signal sequence (a feature of other MSP130 proteins). \n \nSPU_006387 appears to be identical over much of its sequence but contains additional proline/glutamine-rich repeats.\n
SPU_006386	SPU_006386	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 30.71% over 368 BLAST alignment positions. 759 of 1189 Muscle alignment positions masked (63.800 %; 430 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_011265	SPU_011265	none	Partial sequence: N-ter only \n \nCDS at the end of scaffold \n \nPfam PF00909\n
SPU_014492	SPU_014492	none	Very similar at both the amino acid and nucleotide levels to 3 other GLEAN models: 21242, 15326, and 12567. It seems likely that at least some of these are haplotypes or improperly assembled genes. \n
SPU_005008	SPU_005008	none	Partial CDS based on alignment with best blast hit sequence.  Transcriptome data strongly suggests that the following exons belong to this gene.  The first of the following predicted exons blasts to the same metalloprotease. \n>Supertig106616_1|Scaffold106616|20708|20789| DNA_SRC: Scaffold106616 START: 20708 STOP: 20789 STRAND: +  \nGGCTATTGAATTTACGTGGAGAAGATATTCCATACAACCCTCTCTTTATATCGTTTGCTATCGTCGGGAC \nCCAGTACATCAA \n>Supertig106616_1|Scaffold106616|21416|21513| DNA_SRC: Scaffold106616 START: 21416 STOP: 21513 STRAND: +  \nCCTATACCTCTATGATGCTAGCCGCCGGTTGGACACGGAGAGATATTCGGACTTGAGAGAACATTTGGAG \nCTGGACTCGTCTAGGTGCTCAAGCTACA \n>Supertig106616_1|Scaffold106616|23148|23305| DNA_SRC: Scaffold106616 START: 23148 STOP: 23305 STRAND: +  \nCTTCGTTCCCGACTACTTGCTTGAGAGCAAGGGATTTATCAGCCTTCACCGATGATCTCAGTACCGTTTC \nGTTTAGTAAGAAGGTCTGGTTTAGTAACTCATCTAACTACTTCGTCTACAAAACTATCACTGAGAGGCCT \nGATGTCAACAATGATAGG \n>Supertig106616_1|Scaffold106616|24152|24238| DNA_SRC: Scaffold106616 START: 24152 STOP: 24238 STRAND: +  \nAGTAAATATATCATGGAGCCCTCTCCCGTCCTGCTCATGAAAGCTGTCAAAAATGACGTCGAAGTGGAGG \nCGATGAACCAGGCATTC \n>Supertig106616_1|Scaffold106616|25692|25762| DNA_SRC: Scaffold106616 START: 25692 STOP: 25762 STRAND: +  \nGATCCAAAAGAAGGAGACGATCGATCGTTGACGGAATGGCTGGTTGCTCAGAAGACTGAAACATTCAGAG \nA \n>Supertig106616_1|Scaffold106616|26458|26535| DNA_SRC: Scaffold106616 START: 26458 STOP: 26535 STRAND: +  \nATCACATAGCAGTTATCAATACCCGAGTTACGAGACGATAGCCGCCGTAGGATACCACAGTGCCGACTAT \nTATTACCA \n>Supertig106616_1|Scaffold106616|27073|27144| DNA_SRC: Scaffold106616 START: 27073 STOP: 27144 STRAND: +  \nCCCTATAGAGGATGACCGGTTTGCCATACCTACTGGTAAGATGTTCCTCTATGACATGGGAGGACAGTAT \nAG \n>Supertig106616_1|Scaffold106616|27693|27813| DNA_SRC: Scaffold106616 START: 27693 STOP: 27813 STRAND: +  \nAGAAGGGACGACTACCCTCGCCCGAACCTTCTTCTTTGCCAAGGAATGGTATGAGGAGAATGAGAACCGT \nTATGAGTTTGATCGCACTTATGATCCTGCAAGGCCAACTGAATTTCAGCAG \n>Supertig106616_1|Scaffold106616|27868|27960| DNA_SRC: Scaffold106616 START: 27868 STOP: 27960 STRAND: +  \nGCTCTGACGCTCATGGCGCTTTACATATTTTTACCGTCACTGGGCACTGGTCATAGCCGAGCTGCCAAAG \nTAACTAGCGCTCAAGCATATCAG \n>Supertig106616_1|Scaffold106616|28493|28607| DNA_SRC: Scaffold106616 START: 28493 STOP: 28607 STRAND: +  \nTTTTCAGTGGACGAGGAGGGGGTGGCTAGTTGGGTTGGTCAACTTGACCGCTTTGGACGTCGTCGGTCGT \nTTATCGAAGTGATATATCGCGAGGATAAGGGAACCAATGTAGGGG \n\t\t \n \n \n \n
SPU_007203	SPU_007203	none	Analysis indicates typical cysteines positioned for post-translational processing, protein folding and disulphide bonding in the mature peptide.An additional cysteine residue in a B domain is similar to the sequence of Ciona INS-L3. The B domain is very long in Sp-IGF1. Two dibasic sites in the sequence corresponding to a short C-peptide makes it vertebrate insulin- and relaxin-like. All true IGFs in vertebrates have lost dibasic sites in C-peptide.They are cleaved at the far end of the long C terminus, in an E domain. Two aromatic residues (YY) in the end of the putative B domain are crucial for biological activity of IGFs. \nUsually the D domain is short in length. Dibasic residues (RASR) at the beginning of the long E domain mark the C-terminal of D domain in IGFs. This makes the D domain of Sp-IGF 1 very long in contrast with other IGF sequences. \nAnnotated with the help of Robert Olinski (Robert.Olinski@neuro.uu.se) and Mohammed Idris (Idris@szn.it)  \n
SPU_004690	SPU_004690	none	#\nNek11 is split on two overlapping glean models with 100%identity (exon 3 and 4). SPU_004690 and SPU_014659 \n
SPU_014659	SPU_014659	none	Nek11 is split on two overlapping glean models with 100%identity. SPU_004690 and SPU_014659 \nsee SPU_004690 for gene model\n
SPU_015736	SPU_015736	none	Probably has an extra exon predicted towards beginning.\n
SPU_000811	SPU_000811	none	FIRST 170 AA corresponds to RAB3.  The rest-contains lots of repeated sequences - probably artifact\n
SPU_015145	SPU_015145	none	Missing an exon at the beginning?\n
SPU_000253	SPU_000253	none	contains rab domain \nNo signal on tiling array and no EST.  May be pseudogene or expressed in adult only.\n
SPU_023334	SPU_023334	none	SPU_016582 and SPU_017442 have near exact match to c-terminal portions of the corrected gene model.  Exon 2 is present in the scaffold at 2 different positions.  \n \nQuery: 34   GISMQGMDPSHVALTVLMLQNDLFDQFRCDRNVTLGLNHAT 74 \n            GISMQGMDPSHVALTVLMLQNDLFDQFRCDRNVTLGLNHAT \nSbjct: 5190 GISMQGMDPSHVALTVLMLQNDLFDQFRCDRNVTLGLNHAT 5068 \n \nand \n \nQuery: 34   GISMQGMDPSHVALTVLMLQNDLFDQFRCDRNVTLGLNHAT 74 \n            GISMQGMDPSHVALTVLMLQNDLFDQFRCDRNVTLGLNHAT \nSbjct: 8917 GISMQGMDPSHVALTVLMLQNDLFDQFRCDRNVTLGLNHAT 8795 \n \nExon 6 is out of the scaffold on separate small scaffold as listed in gene features \n \n
SPU_001288	SPU_001288	none	missing n-term\n
SPU_013132	SPU_013132	none	SPU_003272 is a partially similar prediction.\n
SPU_003272	SPU_003272	none	SPU_013132 is partially overlapping similar prediction.\n
SPU_001818	SPU_001818	none	appears to be partial\n
SPU_022309	SPU_022309	none	Part of sequence of this gene model is also found on another scaffold in SPU_001872 (sequence almost identical for part; 3' end missing).  Is this a duplication or part of the genome that has been included in the assembly twice?\n
SPU_006254	SPU_006254	none	SRCR(4)-TM. Possibly incomplete.\n
SPU_006264	SPU_006264	none	SRCR(4). Probably incomplete.\n
SPU_006531	SPU_006531	none	SRCR(4). Probably incomplete.\n
SPU_002507	SPU_002507	none	rab and DNAj domain\n
SPU_006538	SPU_006538	none	SRCR(2). Probably incomplete. Check in reference to SPU_006539.\n
SPU_006539	SPU_006539	none	SigPep-SRCR(12). Probably incomplete. Check SPU_006538.\n
SPU_021493	SPU_021493	none	AA 193-340 in prediction does not match human homologue; in fact matches nothing (except for another predicted S. purp sequence [XP_791406])when blasted by itself.  AA 497-682 of prediction matches a P. liv EST in the MaxPlank database.\n
SPU_006659	SPU_006659	none	SRCR(6)-LRR(3). Possiby incomplete. Unique domain orgaization.\n
SPU_006731	SPU_006731	none	SigPep-SRCR(6)-TM.\n
SPU_007110	SPU_007110	none	SRCR(7). Probably incomplete.\n
SPU_021121	SPU_021121	none	Overlap with SPU_019174. This model appears to encode the C-terminal part of SpCul-3.\n
SPU_007349	SPU_007349	none	WSC(2)-SRCR-WSC-SRCR-WSC-TM. Possibly incomplete. Unique domain composition.\n
SPU_001926	SPU_001926	none	Encodes sequences in middle of protein, corresponding to amino acids 447-659 of human cullin 4A.\n
SPU_028294	SPU_028294	none	This protein contains the following domains: \nDEATH,NACHT and LRRs\n
SPU_009731	SPU_009731	none	Appears to encode C-terminal portion of the protein, corresponding to amino acids downstream of residue number 704 of human cullin 4.\n
SPU_018555	SPU_018555	none	Appears to encode the N-terminal portion of the protein, corresponding to amino acids 27-346 of human cullin 4.  SPU_018556 also appears to have portions of this gene, assembled with parts of various other genes including PEX1.\n
SPU_007370	SPU_007370	none	SigPep-SRCR(2)-HYR. Possibly incomplete.\n
SPU_000832	SPU_000832	none	Previously cloned.  GLEAN prediction contains GAPS that should be rectified.  \n
SPU_007372	SPU_007372	none	SRCR(2)-HYR(2)-IgC2-GPS-SRCR(2). Probably partial. See SPU_007370.\n
SPU_007618	SPU_007618	none	SigPep-SRCR(2)-TM.\n
SPU_007660	SPU_007660	none	SRCR(3)-TM. Possibly partial.\n
SPU_007718	SPU_007718	none	SRCR(6)-TM. Possibly partial.\n
SPU_001439	SPU_001439	none	SPU_017618 was another high scoring blast hit for POLR2C in the glean database.\n
SPU_007781	SPU_007781	none	SRCR(10)-TM. Possibly partial. See SPU_007782.\n
SPU_011776	SPU_011776	none	This gene contains 2 NACHT domains which is very unusual. Also, it is located at the end of a Scaffold and could be incomplete. \nDomains: DEATH,NACHT,LRR,NACHT,LRRs\n
SPU_013170	SPU_013170	none	The protein encoded by this model appears to be a duplication of the 5' and 3' halves.  The sequence shows two thioester sites, which is unheard of, and no cleavage sites for alpha and beta chains.  \n
SPU_013169	SPU_013169	none	SPU_013169 appears to be a duplication of the 5' end of SPU_013170.\n
SPU_007782	SPU_007782	none	SRCR(5). Probably partial.  See SPU_007781.\n
SPU_007893	SPU_007893	none	SRCR(6). Probably partial. See SPU_007894, 07895, 07896, 07897, 07899, 07900.\n
SPU_007840	SPU_007840	none	cub(4)-SRCR(2). Possibly partial.\n
SPU_007894	SPU_007894	none	SRCR(4). Probably partial. See SPU_007893, 07895, 07896, 07897, 07899, 07900.\n
SPU_007895	SPU_007895	none	SRCR(2). Probably partial. See SPU_007893, 07894, 07896, 07897, 07899, 07900.\n
SPU_007896	SPU_007896	none	SRCR(7)-TM. Probably partial. See SPU_007893, 07894, 07895, 07897, 07899, 07900.\n
SPU_007897	SPU_007897	none	SRCR(2)-TM. Probably partial. See SPU_007893, 07894, 07895, 07896, 07899, 07900.\n
SPU_007899	SPU_007899	none	SigPep-SRCR(5)-TM. Probably partial. See SPU_007893, 07894, 07895, 07896, 07897, 07900.\n
SPU_007900	SPU_007900	none	SRCR(4)-TM. Probably partial. See SPU_007893, 07894, 07895, 07896, 07897, 07899.\n
SPU_012678	SPU_012678	none	This GLEAN is missing the N-terminal amino acid sequence of an alpha-tubulin, and is adjacent to another alpha-tubulin Gene Model, SPU_012679.\n
SPU_010277	SPU_010277	none	Only exons 3-22 are present on this GLEAN model on scaffold464. There is a gap of ~100bp where exons 15 (assuming that this part represnts only one exon) could be located. Exons 1-2 are present on scaffold35149, which has no GLEAN prediction.\n
SPU_011107	SPU_011107	none	matches to the 3' end of SPU_011106\n
SPU_006756	SPU_006756	none	This GLEAN contains an insertion at its amino terminus that is inconsistent with assignment as a conventional alpha-tubulin.\n
SPU_007984	SPU_007984	none	This GLEAN contains an insertion at its amino terminus that is inconsistent with assignment as a conventional alpha-tubulin.  There is a "T" missing, probably from sequencing error.\n
SPU_027579	SPU_027579	none	This GLEAN has a good full-length match with Chlamydomonas reinhardtii RIB43A (E = 2.00E-42).\n
SPU_025241	SPU_025241	none	One of 2. SPU_022153 is a non-identical duplicate\n
SPU_008432	SPU_008432	none	SRCR(4). Probably incomplete. \n
SPU_028690	SPU_028690	none	This gene model has one thrombospondin domain and does not show high scoring matches to other sequences on GenGank.  \n
SPU_008504	SPU_008504	none	SRCR(16). Probably incomplete.\n
SPU_008514	SPU_008514	none	SRCR(2)-TM. Possibly incomplete.\n
SPU_011197	SPU_011197	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure analyses. \n \nThis annotation is supported by reciprocal blasting to Inhibitor of NFkappaB genes from various taxa and an identical domain structure. Its structure also strongly correlates with genome-wide tiling array hybridization data.\n
SPU_008836	SPU_008836	none	SigPep-SRCR(2). Possibly incomplete.\n
SPU_008885	SPU_008885	none	SRCR(3). probably incomplete.\n
SPU_005782	SPU_005782	none	contains rab, ras and trunk domains\n
SPU_006362	SPU_006362	none	PARTIAL--MISSING N-TERMINUS \nNo signal in tiling array or EST.  May be pseudogene or adult only expression.\n
SPU_024635	SPU_024635	none	 partial, missing C-terminus\n
SPU_027868	SPU_027868	none	 missing central stretch\n
SPU_028499	SPU_028499	none	 extra N- and C-terminus\n
SPU_004006	SPU_004006	none	One of 2. GLEAN_21174 is a shorter internal version of this GLEAN. 21174 is mostly identical (protein level), but does not match on either predicted end. \n
SPU_008777	SPU_008777	none	This GLEAN shares ~50% sequence identity over nearly its entire length with NP_653306 (Homo sapiens tektin-1).  Other than a discrepancy of a 39-amino acid insertion at the amino terminus of SPU_019591, the coding regions of SPU_008777 and SPU_019591 are identical.\n
SPU_019591	SPU_019591	none	This GLEAN shares ~50% sequence identity over nearly its entire length with NP_653306 (Homo sapiens tektin-1).  But the first 39 predicted amino acids after the initiator M (DAGATLLSRSYAPTIPVYPTQTTVGTKTDQALSQDLAKM) look like they don't belong.  Other than this discrepancy, the coding regions of SPU_008777 and SPU_019591 are identical.\n
SPU_023618	SPU_023618	none	This GLEAN shares >50% sequence identity over nearly its entire length with NP_114104 (Homo sapiens tektin-3).\n
SPU_006388	SPU_006388	none	IDENTICAL TO SPU_006389 except at 3'end\n
SPU_006453	SPU_006453	none	This GLEAN shares ~50% sequence identity over nearly its entire length with NP_444515.1 (Homo sapiens tektin-1).  The coding regions of SPU_013841 and SPU_006453 are identical.\n
SPU_006392	SPU_006392	none	contains ADP-ribosyl-GH domain and rab domain\n
SPU_020728	SPU_020728	none	This GLEAN shares ~52% sequence identity over ~400 amino acids with NP_055281 (Homo sapiens tektin-2).\n
SPU_000049	SPU_000049	none	 missing N-terminus\n
SPU_007237	SPU_007237	none	 fragment\n
SPU_021063	SPU_021063	none	 half-molecule\n
SPU_005461	SPU_005461	none	 fragment\n
SPU_004038	SPU_004038	none	This is a short fragment that appears to encode a protein identical to the Sp-betaL Integrin. It is an incomplete sequence at the end of a short scaffold.  \n
SPU_000348	SPU_000348	none	 fragment, should join with SPU_000348, still incomplete gene\n
SPU_000349	SPU_000349	none	 fragment, should join with SPU_000348, still incomplete gene\n
SPU_024384	SPU_024384	none	 extra C-terminus half\n
SPU_001041	SPU_001041	none	 contains 2 repeats matching a similar stretch\n
SPU_003580	SPU_003580	none	 extra N- and C-terminus stretches\n
SPU_004561	SPU_004561	none	 extra N-terminus region\n
SPU_004242	SPU_004242	none	 unrelated stretch on N-terminus, partial match to the gene\n
SPU_007951	SPU_007951	none	 fragment\n
SPU_014894	SPU_014894	none	 partial, missing C-terminus half, some extra stretches in middle\n
SPU_016354	SPU_016354	none	 unrelated N-terminus half, only the C-terminus half matches the gene\n
SPU_014094	SPU_014094	none	 fragment\n
SPU_011806	SPU_011806	none	 fragment\n
SPU_011846	SPU_011846	none	 fragment\n
SPU_011894	SPU_011894	none	 fragment\n
SPU_011897	SPU_011897	none	 fragment\n
SPU_012522	SPU_012522	none	 fragment\n
SPU_013365	SPU_013365	none	 fragment\n
SPU_013505	SPU_013505	none	 fragment\n
SPU_013566	SPU_013566	none	 partial, missing N-terminus\n
SPU_014209	SPU_014209	none	 fragment\n
SPU_014873	SPU_014873	none	 fragment\n
SPU_014903	SPU_014903	none	 fragment\n
SPU_015530	SPU_015530	none	 fragment\n
SPU_016239	SPU_016239	none	 fragment\n
SPU_016280	SPU_016280	none	 fragment\n
SPU_016701	SPU_016701	none	 fragment\n
SPU_017893	SPU_017893	none	 extra N-terminus\n
SPU_018043	SPU_018043	none	 fragment\n
SPU_018288	SPU_018288	none	 fragment\n
SPU_019705	SPU_019705	none	 fragment\n
SPU_019912	SPU_019912	none	 fragment\n
SPU_008410	SPU_008410	none	IDENTICAL TO 15580\n
SPU_014036	SPU_014036	none	This gene is on three scaffolds (544, 81593 and 56300). On scaffold544 (exon 1) there is one GLEAN model (SPU_025514) which covers exon1-3 for this gene. On scaffold81593 GLEAN_14036 is predicted (exon 2-15). For scaffold 56300, gene prediction is one SPU_017882(exon 8-24) for this gene. Exon 2 and 3 are overlapped between two scaffolds (544 and 81593) and exon 8-15 are overlapped between scaffolds 81593 and 56300. Please refer to SPU_025514 and GLEAN_17882 for gene features for far N-terminal and C-terminl portion respectively.\n
SPU_017882	SPU_017882	none	This gene is on three scaffolds (544, 81593 and 56300). On scaffold544 there is GLEAN model (SPU_025514) which covers exon1-3 for this gene.. On scaffold81593 GLEAN_14036 is predicted (exon 2-15). For scaffold 56300, gene prediction is SPU_017882(exon 8-24) for this gene. Exon 2 and 3 are overlapped between two scaffolds (544 and 81593) nad exon 8-15 are overlapped between scaffolds 81593 and 56300. Please refer to GLEAN_14036 and SPU_025514 for gene features for N-terminl portion.\n
SPU_009123	SPU_009123	none	poor conservation \n
SPU_025502	SPU_025502	none	Ig7/FN5/TM \n \nbest hit is Ds-CAM but the domain organization is not quite right for either Ds-CAM (9-4-1-2) or DCC (4-6) \nDoes not have Neogenin_C cytoplasmic match \n \nprobably neither but some related gene\n
SPU_004328	SPU_004328	none	Igc2-4/FN3-5/TM \n \nHigh Blast hit (not #1) with DCC \nC-terminus (putative cyto domain) does not have Neogenin_C \nand is not homologous with DCC in Blast \n \nDomain organisation not consisitent with Ds-CAM or DCC (4-6)\n
SPU_009354	SPU_009354	none	SRCR(4). Possibly partial.\n
SPU_009496	SPU_009496	none	SigPep-SRCR(6). Possibly incomplete.\n
SPU_009562	SPU_009562	none	SRCR(2). Probably partial.\n
SPU_009676	SPU_009676	none	SRCR(3). Probably incomplete. See SPU_009677.\n
SPU_005039	SPU_005039	none	has GPS and a single TM - probably missing C-terminus. \nfive LDLa and one EGF in exodomain - there are no reported LDLa-LNB7TM GPCRs but St purp has several possible members \n \nNovel domain structure \n
SPU_009677	SPU_009677	none	SRCR(3). Probably incomplete. See SPU_009676.\n
SPU_009753	SPU_009753	none	SRCR(3). Probably partial.\n
SPU_005758	SPU_005758	none	has GPS and four TM - probably missing C-terminus. \nthree LDLa in exodomain - there are no reported LDLa-LNB7TM GPCRs but St purp has several \n \nNovel domain structure \n
SPU_009869	SPU_009869	none	contains Rab, Arf-GAP, and Ank domains\n
SPU_009988	SPU_009988	none	#\nIg(2)-SRCR(6). Unique structure. Probably incomplete. See SPU_009989\n
SPU_026060	SPU_026060	none	has GPS but no TM - probably missing C-terminus. \ntwo LDLa and two EGF in exodomain - there are no reported LDLa-LNB7TM GPCRs but St purp has several \n \nNovel domain structure \n
SPU_009910	SPU_009910	none	PARTIAL, MISSING N-TERMINUS\n
SPU_009989	SPU_009989	none	SRCR(6). Probably incomplete. See SPU_009988.\n
SPU_010001	SPU_010001	none	SRCR(2). Probably incomplete.\n
SPU_010062	SPU_010062	none	SRCR(6). Possibly incomplete.\n
SPU_010226	SPU_010226	none	SRCR(4). Possibly incomplete. See GLAEN3_10227.\n
SPU_000002	SPU_000002	none	CUB-LDLa x2 7TM \n \nNo GPS but otherwise looks like a member of the LNB-7TM family of adhesion domain GPCRs \nNo known LDLa members of LNB7TM GPCR family \nNovel architecture\n
SPU_010227	SPU_010227	none	F5_F8_type_C-SRCR(9). Probably incomplete. See SPU_010226.\n
SPU_023577	SPU_023577	none	FA58C-CUB-CLECT-LDLa x4-LRR x3 - 7TM \n \nNo GPS but otherwise looks like a member of the LNB-7TM family of adhesion domain GPCRs \nNo known FA58C or LDLa members of LNB7TM GPCR family \nNovel architecture\n
SPU_010232	SPU_010232	none	SigPep-SRCR(4). Possibly incomplete. \n
SPU_027136	SPU_027136	none	LDLa-LRRNT-LRRtypx3 - 7TM_1 \n \nNo GPS - looks most like glycoprotein hormone receptors \n
SPU_010240	SPU_010240	none	SRCR(2). Possibly incomplete. See SPU_010241.\n
SPU_010241	SPU_010241	none	SRCR(2). Possibly incomplete. See SPU_010240\n
SPU_014777	SPU_014777	none	LDLa x5-EGFx2-Igc2-GPS - 7tm_2 \nLooks like a member of the LNB-7TM family of adhesion domain GPCRs \nNo known LDLa members of LNB7TM GPCR family \nNovel architecture\n
SPU_011062	SPU_011062	none	CUB - LDLa - 7tm_1 \nNo GPS and 7tm_1 - could be a meber of the LNB-7TM subfamily - more likely a member of glycoprotein hormane receptor family. \nNo known LDLa members of LNB7TM GPCR family \n
SPU_010330	SPU_010330	none	#\nSigPep-SRCR(2)-HYR\n
SPU_020284	SPU_020284	none	identical to parts of SPU_027935\n
SPU_010409	SPU_010409	none	SRCR(5)\n
SPU_010501	SPU_010501	none	SigPep-SRCR(4). Possibly incomplete.\n
SPU_022714	SPU_022714	none	LDLa x9-EGFCa x4-Ig 7TM_2 \n \nNo GPS but otherwise looks like a member of the LNB-7TM family of adhesion domain GPCRs \n \nNo known LDLa members of LNB7TM GPCR family \nNovel architecture\n
SPU_005239	SPU_005239	none	LDLa x10-LRRtyp x5 7TM \n \n \nNo GPS - looks most like glycoprotein hormone receptors \n \n
SPU_010832	SPU_010832	none	SRCR(3)-TM. Possibly incomplete.\n
SPU_010909	SPU_010909	none	SRCR(3)-TM. Possibly incomplete.\n
SPU_004837	SPU_004837	none	LDLa x2-LRRtyp x5 7TM \n \nNo GPS - looks most like glycoprotein hormone receptors \n \nNo known LDLa members of LNB7TM GPCR family \n
SPU_010953	SPU_010953	none	#\nSRCR(5). Possibly incomplete.\n
SPU_009242	SPU_009242	none	CUBx5-FA58C-CUB-LDLa x3 5TM \n \nNo GPS but otherwise looks like a member of the LNB-7TM family of adhesion domain GPCRs \n \nNo known FA58C or LDLa members of LNB7TM GPCR family \nNovel architecture\n
SPU_010991	SPU_010991	none	SRCR(6)-TM. Possibly incomplete. See SPU_010992. 10993, 10994.\n
SPU_005132	SPU_005132	none	CUB-CLECT-LDLa-EGF-LDLa x3 7TM \n \nNo GPS but otherwise looks like a member of the LNB-7TM family of adhesion domain GPCRs \n \nNo known LDLa members of LNB7TM GPCR family \nNovel architecture\n
SPU_010992	SPU_010992	none	SigPep-SRCR(2)-TM. Possibly incomplete. See SPU_010991. 10993, 10994.\n
SPU_010993	SPU_010993	none	SigPep-SRCR(5). Possibly incomplete. See SPU_010991. 10992, 10994.\n
SPU_012382	SPU_012382	none	CLECT-LDLa x6-LRRNT-LRRtyp x4 7TM-1 \n \nNo GPS and 7tm-1 rather than 7tm_2 but otherwise looks like a member of the LNB-7TM family of adhesion domain GPCRs \n \nNo known LDLa members of LNB7TM GPCR family \nNovel architecture\n
SPU_010994	SPU_010994	none	SRCR(2)-TM. Possibly incomplete. See SPU_010991, 10992, 10993.\n
SPU_007191	SPU_007191	none	fragment; see SPU_007191 for larger Sp-Trh\n
SPU_015872	SPU_015872	none	CUB-LDLa x2-LRRtyp x2 7TM_1 \n \nNo GPS but otherwise looks a bit like a member of the LNB-7TM family of adhesion domain GPCRs or, perhaps more likely, a glycoproteinhormone receptor \n \nNo known LDLa members of LNB7TM GPCR family \nNovel architecture\n
SPU_011101	SPU_011101	none	SigPep-SRCR(2). Possibly incomplete.\n
SPU_011146	SPU_011146	none	SRCR(3). Possibly incomplete.\n
SPU_015161	SPU_015161	none	LDLa-LRRtyp x5 7TM \n \nNo GPS - looks most like glycoprotein hormone receptors \n
SPU_028049	SPU_028049	none	CUB-CLECT-LDLa x5/6-LRRtyp x 7TM_1 \n \n \nNo GPS - looks most like glycoprotein hormone receptors\n
SPU_008926	SPU_008926	none	LDLa-Ig-GPS-7TM \nLooks like a member of the LNB-7TM family of adhesion domain GPCRs \n \nNo known LDLa members of LNB7TM GPCR family \nNovel architecture\n
SPU_011752	SPU_011752	none	SRCR(2). Probably incomplete.\n
SPU_013248	SPU_013248	none	LRRtyp x3 - 7TM \n \nNo GPS - looks most like glycoprotein hormone receptors\n
SPU_011977	SPU_011977	none	SigPep-SRCR(4). Possibly incomplete\n
SPU_012039	SPU_012039	none	SigPep-SRCR(7). possibly incomplete.\n
SPU_027096	SPU_027096	none	SR x4 7TM \n \nNo GPS but otherwise looks a bit like a member of the LNB-7TM family of adhesion domain GPCRs\n
SPU_012159	SPU_012159	none	SRCR(3)-TM. Probably incomplete.\n
SPU_019239	SPU_019239	none	SR x4 - LRRtyp x7 - 7TM_1 \n \n \nNo GPS - looks most like glycoprotein hormone receptors\n
SPU_022189	SPU_022189	none	CUB-LRRtyp 4 - 7TM_1 \n \nNo GPS - looks most like glycoprotein hormone receptors\n
SPU_012410	SPU_012410	none	SigPep-SRCR(6). Possibly incomplete.\n
SPU_012888	SPU_012888	none	#\nSRCR(3)-TM. Possibly incomplete.\n
SPU_007937	SPU_007937	none	LDLa x2 - 7TM \nNo GPS but otherwise looks like a member of the LNB-7TM family of adhesion domain GPCRs \n
SPU_013650	SPU_013650	none	SigPep-SRCR(3). possibly incomplete.\n
SPU_013831	SPU_013831	none	SRCR(3). probably incomplete.\n
SPU_016033	SPU_016033	none	LRRtyp x2 - 7TM-1 \n \nNo GPS - looks most like glycoprotein hormone receptors\n
SPU_019238	SPU_019238	none	SR x3 - LRRtyp x7 - 7TM_1 \n \n \nNo GPS - looks most like glycoprotein hormone receptors\n
SPU_014079	SPU_014079	none	SRCR(4)-TM. probably incomplete. See SPU_014080.\n
SPU_014080	SPU_014080	none	SigPep-SRCR(2). Probably incomplete. See SPU_014079.\n
SPU_014095	SPU_014095	none	SigPep-SRCR(4). Possibly incomplete.\n
SPU_014769	SPU_014769	none	LRRtyp x10 - 7TM_1 \n \nNo GPS - looks most like glycoprotein hormone receptors \n \nNOTE SPU_014765 contains a very similar gene fused with a Cathepsin gene \n
SPU_014602	SPU_014602	none	SigPep-SRCR(6). Possibly incomplete.\n
SPU_014829	SPU_014829	none	SRCR(7). Probably incomplete.\n
SPU_012610	SPU_012610	none	LRRtyp x3 7TM \n \nNo GPS - looks most like glycoprotein hormone receptors\n
SPU_018295	SPU_018295	none	LRRtyp x2 7TM_1 \n \n \nNo GPS - looks most like glycoprotein hormone receptors\n
SPU_008259	SPU_008259	none	LRRtyp x3 7TM  ShKT domain at N-terminus may be an artefact \n \nNo GPS but otherwise looks like a member of the LNB-7TM family of adhesion domain GPCRs or like glycoprotein hormane receptors\n
SPU_018294	SPU_018294	none	LRRtyp x2 - 7TM_1 \n \n \nNo GPS - looks most like glycoprotein hormone receptors\n
Sp-HIF-1a	SPU_030140	none	This is the last 18 exons of SPU_001262. The rest of SPU_001262 has been annotated as a separate gene, Sp-Birc6\n
SPU_018887	SPU_018887	none	This is part of a sea urchin specific group of ADAM-TS metalloproteinase genes.  There are two worm ADAMTS genes with some sequence similarity (wormbase# F08C6.1a.1 and C02B4.1).   \n \nThe homeobox at the N-terminal is clearly a prediction/ assembly error.\n
SPU_010561	SPU_010561	none	LRRtyp x3 - 7TM-1 \n \nNo GPS - looks most like glycoprotein/thyrotropin hormone receptors\n
SPU_021131	SPU_021131	none	 This is part of a sea urchin specific group of ADAM-TS genes.  There are two worm ADAMTS genes with some sequence similarity (wormbase# F08C6.1a.1 and C02B4.1).  This sequence also appears to be a haplotype but is missing a portion.\n
SPU_024052	SPU_024052	none	LRRtyp x2 - 7TM-1 \n \nNo GPS - looks most like glycoprotein/thyrotropin hormone receptors\n
SPU_024290	SPU_024290	none	LRRtyp x2 - 7TM-1 \n \nNo GPS - looks most like glycoprotein/thyrotropin hormone receptors\n
SPU_015464	SPU_015464	none	LRRtyp x6 - 7TM-1 \n \nNo GPS - looks most like glycoprotein/thyrotropin hormone receptors\n
SPU_012494	SPU_012494	none	LRRNT/LRRtyp/LRRCT-GPS- 7TM-2 \n \nlooks like a member of the LNB-7TM subfamily of GPCRs\n
SPU_003202	SPU_003202	none	only domain it contains is a part of the reprolysin domain\n
SPU_007355	SPU_007355	none	only domain it contains is the reprolysin domain\n
SPU_004726	SPU_004726	none	SR x4 - LRRtyp x3 - 7TM_1 \n \n \nNo GPS - looks most like glycoprotein hormone receptors\n
SPU_006020	SPU_006020	none	may be part of the sea urchin only family of genes, but it contains only the metalloprotease and reprollysin domains which are the first parts of an ADAM-TS gene.   There are two worm ADAMTS genes with some sequence similarity (wormbase# F08C6.1a.1 and C02B4.1)\n
SPU_018297	SPU_018297	none	SR x3 - LRRtyp x4 - 7TM_1 \n \n \nNo GPS - looks most like glycoprotein hormone receptors\n
SPU_023004	SPU_023004	none	contains only the reprolysin domain\n
SPU_015133	SPU_015133	none	LRRtyp x3 - 7TM-1 \n \nNo GPS - looks most like glycoprotein/thyrotropin hormone receptors\n
SPU_009503	SPU_009503	none	the sequence is similar to both ADAM-TS6 and ADAM-TS10, but it looks like it is a particial sequence containing only a TSP1 and ADAMs spacer.  \n
SPU_011913	SPU_011913	none	roots the clade that contains vertibrate ADAM-TS2 and ADAM-TS3\n
SPU_020686	SPU_020686	none	LRRtyp x3 - 7TM-1 \n \nNo GPS - looks most like glycoprotein/thyrotropin hormone receptors\n
SPU_002876	SPU_002876	none	LRRtyp x5 - 7TM-1 \n \nNo GPS - looks most like glycoprotein/thyrotropin hormone receptors\n
SPU_006540	SPU_006540	none	LRRtyp x2 - 7TM-1 \n \nNo GPS - looks most like glycoprotein/thyrotropin hormone receptors\n
SPU_023903	SPU_023903	none	This gene is more common in arthropods than in vertebrates.  \n \nit has some domains that are characteristic of ADAM-TS proteins including TSP-1 and N-term ADAM spacer domain.  However, it has TY and KU domains which are novel for an ADAM-TS gene found in vertebrates, but normal for Papilin found in arthropods.  \n
SPU_002031	SPU_002031	none	This gene is more common in arthropods than in vertebrates.  \n \nit has some domains that are characteristic of ADAM-TS proteins including TSP-1 and N-term ADAM spacer domain.  However, it has TY and KU domains which are novel for an ADAM-TS gene found in vertebrates, but normal for Papilin found in arthropods.   \n
SPU_020547	SPU_020547	none	#\nsimilar to ADAM15, and to the ADAM15-like alleles, but distince enough to likely be a seperate gene. \n \nDisintegrin/ACR/TM - domain structure characteristic of an ADAM. \nNote that the gene is near to another gene with a very similar  \nstructure (SPU_020545) \n
SPU_022131	SPU_022131	none	PREDICTED: Strongylocentrotus purpuratus similar to 106 kDa O-GlcNAc transferase-interacting protein (LOC589805),\n
SPU_023613	SPU_023613	none	PREDICTED: Strongylocentrotus purpuratus similar to Ubiquitin-conjugating enzyme E2-17 kDa (Ubiquitin-protein ligase)(Ubiquitin carrier protein) (Effete protein) (LOC586593) \n
SPU_012475	SPU_012475	none	PREDICTED: Strongylocentrotus purpuratus similar to Separin \n(Separase) (Caspase-like protein ESPL1) (Extra spindle poles-like 1 protein) (LOC576865), mRNA. \n
SPU_013841	SPU_013841	none	#\nThis GLEAN shares ~50% sequence identity over nearly its entire length with NP_444515.1 (Homo sapiens tektin-1).  The coding regions of SPU_013841 and SPU_006453 are identical.\n
SPU_003152	SPU_003152	none	This F-box protein is most similar to Fbw7 (drosophila archipeligo), the F-box protein that targets cyclin E for destruction.  However, it is significantly less similar to human Fbw7 than is SPU_019951, which is likely the Fbw7 ortholog, and I have thus named it "Fbw7-like".\n
SPU_000883	SPU_000883	none	Likely to be incomplete.   Has 1 EGF, 1HYR, and 1 complete C lectin domain\n
SPU_001854	SPU_001854	none	multiple EGFCa - GPS -7tm_2 \n \ngood match to overall pattern of LNB-7TM-GPCRs\n
SPU_007778	SPU_007778	none	See also SPU_013579, which encodes a nearly identical protein.\n
SPU_007916	SPU_007916	none	CUB plus multiple EGFCa - GPS NO TMs predicted \n \nApart from lack of -7tm_2 this is a good match to overall pattern of LNB-7TM-GPCRs \nProbably missing C-terminus\n
SPU_021402	SPU_021402	none	CUB x2 -multiple EGFCa - HormR-GPS x2 - NOTMs predicted \n \nApart from lack of 7tm_2 a good match to overall pattern of LNB-7TM-GPCRs \n \nC-terminus may be off\n
SPU_007933	SPU_007933	none	Note that the protein sequence of this model is identical to that of SPU_006781.\n
SPU_007915	SPU_007915	none	multiple EGFCa - GPS -7tm_2 \n \ngood match to overall pattern of LNB-7TM-GPCRs \n \nNote adjacent gene is similar\n
SPU_018381	SPU_018381	none	PREDICTED: Strongylocentrotus purpuratus similar to Huntingtin (Huntingtons disease protein) (HD protein) (LOC590871), partial mRNA. \n
SPU_022093	SPU_022093	none	LDLa x2 - EGF x2- Ig - GPS - NO TMs predicted \n \nApart from lack of 7tm_2 domain this matches to overall pattern of LNB-7TM-GPCRs \nMay be missing C-terminus\n
SPU_026092	SPU_026092	none	multiple EGF_Ca - Ig - GPS - 7tm_2 \nGood match to overall pattern of LNB-7TM-GPCRs \n
SPU_019231	SPU_019231	none	#\nCUB - multiple EGF_Ca - HormR - GPS - 7tm_2 \nGood match to overall pattern of LNB-7TM-GPCRs \n
SPU_008005	SPU_008005	none	2 EGF/EGFCas -GPS-7TM_2 \nmatches to overall pattern of LNB-7TM-GPCRs \n
SPU_003304	SPU_003304	none	#\nEGFCa -GPS-7TM_2 \nmatches to overall pattern of LNB-7TM-GPCRs \n
SPU_007742	SPU_007742	none	#\nmultiple EGFCa - Ig -GPS-  three TM segments \nApart from absence of complete 7tm_2 domain matches to overall pattern of LNB-7TM-GPCRs \nMay be missing C-terminus\n
SPU_023715	SPU_023715	none	The sequence coded by this GLEAN is entirely contained in SPU_005762; refer to this one for further annotation\n
SPU_012362	SPU_012362	none	EGF - Ig -GPS- 7tm_2 \nmatches to overall pattern of LNB-7TM-GPCRs \n
SPU_022320	SPU_022320	none	LDLa - EGF - Ig -GPS- single TM \nApart from lack of full 7tm_1 or 7tm_2 domain \nmatches to overall pattern of LNB-7TM-GPCRs or to hormone receptors\n
SPU_014667	SPU_014667	none	CUB - SO - NO GPS- 7tm_2 \nBecause of complete 7tm_2 domain matches to overall pattern of LNB-7TM-GPCRs but no GPS - could also be a member of the glycoprotein hormone receptor subfamily \n
SPU_024566	SPU_024566	none	#\nCUB - NO GPS - 7tm_2 \nBecause of complete 7tm_2 domain matches to overall pattern of LNB-7TM-GPCRs but no GPS - could also be a member of the glycoprotein hormone receptor subfamily \n
SPU_022019	SPU_022019	none	CUB - SO - one pfam:collagen repeat - NO GPS- 7tm_2 \nBecause of complete 7tm_2 domain matches to overall pattern of LNB-7TM-GPCRs but no GPS - could also be a member of the glycoprotein hormone receptor subfamily \n
SPU_009826	SPU_009826	none	#\nNOT Embryonically expressed.  May be pseudo-gene or adult-only gene.\n
SPU_023643	SPU_023643	none	NOT Embryonically expressed.  May be pseudo-gene or adult-only gene.\n
SPU_013269	SPU_013269	none	NOT Embryonically expressed.  May be pseudo-gene or adult-only gene.\n
SPU_028112	SPU_028112	none	NOT Embryonically expressed.  May be pseudo-gene or adult-only gene.\n
SPU_004496	SPU_004496	none	IDENTICAL sequence to SPU_004096   \nNOT Embryonically expressed.  May be pseudo-gene or adult-only gene.\n
SPU_004085	SPU_004085	none	Gene fragment.\n
SPU_014859	SPU_014859	none	SRCR(4). Possibly partial. See SPU_014860.\n
SPU_015163	SPU_015163	none	CCP- 4XHYR - GPS-7TM_2 \n \nmatches general pattern for LNB-7TM-GPCR receptors\n
SPU_025755	SPU_025755	none	Scaffoldi4711 has both ends of what appears to be a single beta integrin subunit - BetaD (SPU_012985). Amino acids 1-420 appear to be a novel integrin beta subunit.  The scaffold has jointed two unrelated genes and I have corrected the model to remove the 3 exons that are not part of this subunit.  The model is incomplete.  \n
SPU_014860	SPU_014860	none	SRCR(2). Possibly partial. See SPU_014859.\n
SPU_016723	SPU_016723	none	3 CCP - GPS-7TM_2 \n \nmatches general pattern for LNB-7TM-GPCR receptors\n
SPU_014992	SPU_014992	none	SRCR(8)-TM. Possibly incomplete. See SPU_014993, 14994.\n
SPU_020525	SPU_020525	none	FA58C- HYR - GPS-7TM_2 \n \nmatches general pattern for LNB-7TM-GPCR receptors \n \nSIMILAR DOMAIN COMPOSITION TO SPU_013084 \nNOVEL ARCHITECTURE\n
SPU_017226	SPU_017226	none	2x HYR -EGF - GPS-7TM_2 \n \nmatches general pattern for LNB-7TM-GPCR receptors\n
SPU_004759	SPU_004759	none	NOT Embryonically expressed.  May be pseudo-gene or adult-only gene.\n
SPU_014994	SPU_014994	none	SRCR(9). Possibly incomplete. See SPU_014992, 14993.\n
SPU_015123	SPU_015123	none	SRCR(3). Possibly incomplete. \n
SPU_015325	SPU_015325	none	SRCR(2). Probably incomplete.\n
SPU_020144	SPU_020144	none	NOTE only last 170-ish aa of this GLEAN match to Rheb\n
SPU_015387	SPU_015387	none	SigPep-SRCR(4). Possibly incomplete.\n
SPU_015539	SPU_015539	none	SigPep-SRCR(2)-WSC. \n
SPU_015548	SPU_015548	none	SRCR(4)-TM. Possibly incomplete.\n
SPU_015937	SPU_015937	none	SRCR(2)-TM. Possibly incomplete. See SPU_015938.\n
SPU_015938	SPU_015938	none	SigPep-SRCR(3). Possibly incomplete. See SPU_015937.\n
SPU_015989	SPU_015989	none	SRCR(2). Possibly incomplete.\n
SPU_007557	SPU_007557	none	Missing 5' segment, likely to be on a different scaffold\n
SPU_015991	SPU_015991	none	SigPep-SRCR(2)-TM. See SPU_015989.\n
SPU_016195	SPU_016195	none	SRCR(3). Possibly incomplete.\n
SPU_016373	SPU_016373	none	SRCR(2)-TM. Possibly incomplete. See SPU_016374.\n
SPU_016374	SPU_016374	none	SRCR(9). Possibly incomplete. See SPU_016373.\n
SPU_008979	SPU_008979	none	This gene model predicts a protein with two separate domains that are normally on different genes: a p53 DNA binding domain at its N-terminus, and the SPOC domain (from the spen transcriptional regulator) at its C-terminus.  There is a large gap in the sequence of the scaffold that lies between these domains however, which leads me to suspect that these are actually two separate genes that have been artificially fused in the computational predictions.  If this is the case then the C-terminal gene with the SPOC domain would be a homologue of Drosophila transcriptional regulator split ends (spen), and should be named "Sp-Spen".\n
SPU_016531	SPU_016531	none	SRCR(2). Possibly incomplete.\n
SPU_016880	SPU_016880	none	SRCR(4). Possibly incomplete.\n
SPU_028905	SPU_028905	none	PREDICTED: similar to Prostaglandin E2 receptor, EP4 subtype  \n(Prostanoid EP4 receptor) (PGE receptor, EP4 subtype)\n
SPU_015849	SPU_015849	none	poor homology (BLAST score = 5e-05)\n
SPU_010562	SPU_010562	none	NOT Embryonically Expressed, maybe be psuedo-gene or adult-only gene\n
SPU_003857	SPU_003857	none	PREDICTED: Strongylocentrotus purpuratus similar to thyroid hormone receptor interactor 12 (LOC578329), mRNA. \n
SPU_014345	SPU_014345	none	IDENTICAL to 01459 except at very 3' end\n
SPU_001459	SPU_001459	none	Identical to 14345 except at very 3' end.   \nNOTE no tiling data or EST.  May be pseudogene or expressed in adult only.\n
SPU_026783	SPU_026783	none	This model the majority of the ATR protein, but is missing C-terminus, which is encoded in SPU_011017.  The N-terminus (~200 amino acids) predicted by this model is quite different from vertebrate ATR.\n
SPU_011017	SPU_011017	none	This model appears to encode the C-terminus of the ATR protein.  There is significant overlap with SPU_026783, which encodes the N-terminal portion (minus the N-terminus).\n
SPU_009522	SPU_009522	none	SPU_023882 and SPU_003142 are PTEN hits\n
SPU_024659	SPU_024659	none	Partial sequence.\n
SPU_022223	SPU_022223	none	Contains C-lectin and PAN- Apple domains. Aligns to carboxy end of human versican-like protein.\n
SPU_026772	SPU_026772	none	Partial sequence.\n
SPU_014959	SPU_014959	none	SPU_026257 is a high score hit with less coverage of the mouse sequence used for Blast Query.\n
SPU_019405	SPU_019405	none	Base pairs 528-885 of this glean sequence are PNKP-like.\n
SPU_010324	SPU_010324	none	SPU_021010 is a significant hit limited to N-terminal coverage of the human xrcc1 sequence as Query.\n
SPU_020872	SPU_020872	none	Partial sequence.\n
SPU_003158	SPU_003158	none	Similar to Slingshot 1 and 2.\n
SPU_000287	SPU_000287	none	N-term of protein similar to human vPARP. C-term has no similarity\n
SPU_000991	SPU_000991	none	N-term of gene\n
SPU_000992	SPU_000992	none	C-term of gene (see SPU_000991)\n
SPU_002082	SPU_002082	none	similar to C elegans unnamed gene\n
SPU_022638	SPU_022638	none	Possible haplotype pair of SPU_013171\n
SPU_026339	SPU_026339	none	N-terminal portion of this gene might be SPU_018354\n
SPU_017320	SPU_017320	none	Possibly the N-term of SPU_017321\n
SPU_017321	SPU_017321	none	Possibly the C-term of SPU_017320\n
SPU_004377	SPU_004377	none	This GLEAN represents the sea urchin Outer Arm Dynein Light Chain 4, as defined by the Anthocidaris crassispina cDNA (gi|2754612|dbj|BAA24152.1| outer arm dynein light chain 4 [Anthocidaris crassispina]) and the RefSeq gi|72093505|ref|XP_794465.1| PREDICTED: similar to dynein, axonemal, light chain 4 [Strongylocentrotus purpuratus]. \n
SPU_011682	SPU_011682	none	The predicted amino-terminal 50 amino acids of SPU_011682 do not agree with the comparable regions of other Dynein Light Chain-2 sequences and are almost certainly incorrect.  SPU_011682 is essentially identical to SPU_011683, and identical for an extended length with SPU_011681 and SPU_011684. \nBecause there is a good chance that they do not represent four distinct gene products, they were named as follows: \nSPU_011682: Sp-Dynein Light Chain-2-4a \nSPU_011681: Sp-Dynein Light Chain-2-4b \nSPU_011683: Sp-Dynein Light Chain-2-4c \nSPU_011684: Sp-Dynein Light Chain-2-4d.\n
SPU_024497	SPU_024497	none	SPU_024497 is essentially identical to SPU_024498, SPU_024499, and SPU_024500.  Because there is a good chance that they do not represent four distinct gene products, they were named as follows: \nSPU_024497: Sp-Dynein Light Chain-2-5a \nSPU_024498: Sp-Dynein Light Chain-2-5b \nSPU_024499: Sp-Dynein Light Chain-2-5c \nSPU_024500: Sp-Dynein Light Chain-2-5d.\n
SPU_024498	SPU_024498	none	SPU_024498 is essentially identical to SPU_024497, SPU_024499, and SPU_024500.  Because there is a good chance that they do not represent four distinct gene products, they were named as follows: \nSPU_024497: Sp-Dynein Light Chain-2-5a \nSPU_024498: Sp-Dynein Light Chain-2-5b \nSPU_024499: Sp-Dynein Light Chain-2-5c \nSPU_024500: Sp-Dynein Light Chain-2-5d. \n \n
SPU_024499	SPU_024499	none	#\nSPU_024499 is essentially identical to SPU_024497, SPU_024498, and SPU_024500.  Because there is a good chance that they do not represent four distinct gene products, they were named as follows: \nSPU_024497: Sp-Dynein Light Chain-2-5a \nSPU_024498: Sp-Dynein Light Chain-2-5b \nSPU_024499: Sp-Dynein Light Chain-2-5c \nSPU_024500: Sp-Dynein Light Chain-2-5d.\n
SPU_024500	SPU_024500	none	SPU_024500 is essentially identical to SPU_024497, SPU_024498, and SPU_024499.  Because there is a good chance that they do not represent four distinct gene products, they were named as follows: \nSPU_024497: Sp-Dynein Light Chain-2-5a \nSPU_024498: Sp-Dynein Light Chain-2-5b \nSPU_024499: Sp-Dynein Light Chain-2-5c \nSPU_024500: Sp-Dynein Light Chain-2-5d.\n
SPU_011684	SPU_011684	none	Most of the predicted amino acid sequence of SPU_011684 does not agree with the comparable regions of other Dynein Light Chain-2 sequences and is almost certainly incorrect.  SPU_011684 is very similar to the neighboring gene models SPU_011681, SPU_011682, and SPU_011683. \nBecause there is a good chance that they do not represent four distinct gene products, they were named as follows: \nSPU_011682: Sp-Dynein Light Chain-2-4a \nSPU_011681: Sp-Dynein Light Chain-2-4b \nSPU_011683: Sp-Dynein Light Chain-2-4c \nSPU_011684: Sp-Dynein Light Chain-2-4d.\n
SPU_011681	SPU_011681	none	The predicted amino-terminal 150 amino acids of SPU_011681 do not agree with the comparable regions of other Dynein Light Chain-2 sequences and may be incorrect.  However, SPU_011681 is identical with RefSeq XP_795373.1. SPU_0116821 is essentially identical for an extended length with SPU_011682, SPU_011683, and SPU_011684. \nBecause there is a good chance that they do not represent four distinct gene products, they were named as follows: \nSPU_011682: Sp-Dynein Light Chain-2-4a \nSPU_011681: Sp-Dynein Light Chain-2-4b \nSPU_011683: Sp-Dynein Light Chain-2-4c \nSPU_011684: Sp-Dynein Light Chain-2-4d.\n
SPU_011683	SPU_011683	none	The predicted amino-terminal 50 amino acids of SPU_011683 do not agree with the comparable regions of other Dynein Light Chain-2 sequences and are almost certainly incorrect.  SPU_011683 is essentially identical to SPU_011682, and identical for an extended length with SPU_011681 and SPU_011684. \nBecause there is a good chance that they do not represent four distinct gene products, they were named as follows: \nSPU_011682: Sp-Dynein Light Chain-2-4a \nSPU_011681: Sp-Dynein Light Chain-2-4b \nSPU_011683: Sp-Dynein Light Chain-2-4c \nSPU_011684: Sp-Dynein Light Chain-2-4d.\n
SPU_008800	SPU_008800	none	The predicted amino acid sequence of SPU_008800 is identical to those of SPU_008799 and SPU_008800. \nBecause there is a good chance that they do not represent three distinct gene products, they were named as follows: \nSPU_008799: Sp-Dynein Light Chain-2-3a \nSPU_008800: Sp-Dynein Light Chain-2-3b \nSPU_008801: Sp-Dynein Light Chain-2-3c\n
SPU_011875	SPU_011875	none	Partial duplicate prediction for SPU_011837\n
SPU_017987	SPU_017987	none	Contains MMR_HSR1 domain (GTPase of unknown function domain)\n
SPU_011970	SPU_011970	none	IDENTICAL TO 12085\n
SPU_011134	SPU_011134	none	Duplicate partial prediction for SPU_001605\n
SPU_010689	SPU_010689	none	Incomplete prediction? SPU_009456 is almost complete duplicate prediction for SPU_010689.\n
SPU_028471	SPU_028471	none	No signal in tiling array or EST.  May be pseudogene or adult only expression.\n
SPU_021593	SPU_021593	none	This model appears to encode the middle part of the ATM protein.  The N- and C-terminal parts are encoded by SPU_005652 and SPU_011072, respectively.  There is sequence overlap between all three models.\n
SPU_011072	SPU_011072	none	This model encodes the C-terminus of the ATM protein.  The other parts are encoded by SPU_005652 (N-terminus) and SPU_021593 (middle).  There is sequence overlap between all three models.  This model also has significant sequence overlap with SPU_025176.\n
SPU_005652	SPU_005652	none	This model appears to encode the N-terminal portion of ATM.  The rest of the protein falls on SPU_011072 (C-terminus) and SPU_021593 (middle).  There is sequence overlap between all three models.\n
SPU_027613	SPU_027613	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 17, subfamily A, polypeptide 1 [Danio rerio]" (NP_997971.1) is 25.41% over 488 BLAST alignment positions. 383 of 817 Muscle alignment positions masked (46.800 %; 434 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_027796	SPU_027796	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 3, subfamily a, polypeptide 13 [Rattus norvegicus]" (NP_671739.1) is 40.64% over 342 BLAST alignment positions. 178 of 677 Muscle alignment positions masked (26.200 %; 499 positions used for tree generation) with a Muscle scorefile cutoff of 25. SPU_027795 may be N terminus. Not very CYP3 like.   COMMENTS from arm@stowers-institute.org:  partial, missing N-terminus\n
SPU_028152	SPU_028152	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 4, subfamily v [Gallus gallus]" (NP_001001879.1) is 49.20% over 502 BLAST alignment positions. 531 of 956 Muscle alignment positions masked (55.500 %; 425 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_028699	SPU_028699	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 38.24% over 421 BLAST alignment positions. 271 of 732 Muscle alignment positions masked (37.000 %; 461 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial, missing N-terminus\n
SPU_028922	SPU_028922	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " Cyp2f2 protein [Xenopus tropicalis]" (NP_001010999.1) is 31.80% over 217 BLAST alignment positions. 138 of 589 Muscle alignment positions masked (23.400 %; 451 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial, missing C-terminus\n
SPU_028934	SPU_028934	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily J, polypeptide 2, B [Danio rerio]" (NP_956914.1) is 38.64% over 471 BLAST alignment positions. 238 of 714 Muscle alignment positions masked (33.300 %; 476 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_008978	SPU_008978	none	This Glean3 model probably encodes the N-terminus of the p53 homologue predicted by the 5' end of SPU_008979, and should be fused therewith.  The third exon is probably not real, as there is no evidence for expression in the tiling data.  At the same time, SPU_008979 probably artifactually fuses two different genes, and needs to be broken up (see annotation to that gene model).\n
SPU_015880	SPU_015880	none	IDENTICAL TO 08410\n
SPU_013863	SPU_013863	none	IDENTICAL TO 18282\n
SPU_004692	SPU_004692	none	Partial gene.  Missing N-terminus.\n
SPU_012404	SPU_012404	none	Prediction is incomplete.\n
SPU_013183	SPU_013183	none	Genescan predicted as sea squirt Halocynthia roretzi troponin 1\n
SPU_024320	SPU_024320	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 46.11% over 193 BLAST alignment positions. 445 of 886 Muscle alignment positions masked (50.200 %; 441 positions used for tree generation) with a Muscle scorefile cutoff of 25. partial   COMMENTS from arm@stowers-institute.org:  missing stretch in middle\n
SPU_025299	SPU_025299	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " hypothetical protein LOC393108 [Danio rerio]" (NP_956433.1) is 41.15% over 384 BLAST alignment positions. 225 of 696 Muscle alignment positions masked (32.300 %; 471 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  missing N-terminus\n
SPU_025595	SPU_025595	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 51 [Strongylocentrotus purpuratus]" (NP_001001906.1) is 99.66% over 297 BLAST alignment positions. 436 of 902 Muscle alignment positions masked (48.300 %; 466 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  missing N&C-terminus\n
SPU_025829	SPU_025829	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 46, subfamily a, polypeptide 1 [Mus musculus]" (NP_034140.1) is 40.65% over 492 BLAST alignment positions. 239 of 633 Muscle alignment positions masked (37.700 %; 394 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_025830	SPU_025830	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " hypothetical protein LOC613109 [Xenopus tropicalis]" (NP_001027517.1) is 39.67% over 489 BLAST alignment positions. 241 of 639 Muscle alignment positions masked (37.700 %; 398 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  missing some N-terminus residues\n
SPU_025863	SPU_025863	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " hypothetical protein LOC393108 [Danio rerio]" (NP_956433.1) is 38.95% over 439 BLAST alignment positions. 235 of 699 Muscle alignment positions masked (33.600 %; 464 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_025956	SPU_025956	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 33.50% over 397 BLAST alignment positions. 670 of 1128 Muscle alignment positions masked (59.300 %; 458 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  missing N-terminus\n
SPU_025969	SPU_025969	none	Missing N terminus due to scaffold truncation. \nSingle exon! \nBLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 1, subfamily A, polypeptide 1 [Gallus gallus]" (NP_990477.1) is 36.00% over 300 BLAST alignment positions. 184 of 658 Muscle alignment positions masked (27.900 %; 474 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  missing N-terminus\n
SPU_026188	SPU_026188	none	Incomplete - runs off end of scaffold missing last 2 exons \nBLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " thromboxane A synthase 1 [Danio rerio]" (NP_991172.1) is 27.33% over 311 BLAST alignment positions. 89 of 573 Muscle alignment positions masked (15.500 %; 484 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_026360	SPU_026360	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily j, polypeptide 9 [Rattus norvegicus]" (NP_786942.1) is 40.98% over 388 BLAST alignment positions. 254 of 733 Muscle alignment positions masked (34.600 %; 479 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  missing N-terminus\n
SPU_026373	SPU_026373	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily u, polypeptide 1 [Rattus norvegicus]" (NP_001019950.1) is 35.43% over 446 BLAST alignment positions. 358 of 810 Muscle alignment positions masked (44.100 %; 452 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_026477	SPU_026477	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " hypothetical protein LOC572000 [Danio rerio]" (NP_001020728.1) is 33.95% over 433 BLAST alignment positions. 269 of 726 Muscle alignment positions masked (37.000 %; 457 positions used for tree generation) with a Muscle scorefile cutoff of 25.    \nExon 2 duplicated as exon3: misassembly problem \nCOMMENTS from arm@stowers-institute.org:  extra N-terminus\n
SPU_027153	SPU_027153	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " hypothetical protein LOC393204 [Danio rerio]" (NP_956529.1) is 30.21% over 480 BLAST alignment positions. 1574 of 1945 Muscle alignment positions masked (80.900 %; 371 positions used for tree generation) with a Muscle scorefile cutoff of 25. SPU_027152 may be N terminus   COMMENTS from arm@stowers-institute.org:  missing central stretch and C-terminus\n
SPU_020876	SPU_020876	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, subfamily XXVIA, polypeptide 1 [Danio rerio]" (NP_571221.2) is 35.74% over 484 BLAST alignment positions. 246 of 704 Muscle alignment positions masked (34.900 %; 458 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_021087	SPU_021087	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " hypothetical protein LOC334097 [Danio rerio]" (NP_997884.1) is 34.84% over 442 BLAST alignment positions. 256 of 679 Muscle alignment positions masked (37.700 %; 423 positions used for tree generation) with a Muscle scorefile cutoff of 25. Early diverging CYP1?\n
SPU_021185	SPU_021185	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 17, subfamily A, polypeptide 1 [Danio rerio]" (NP_997971.1) is 32.89% over 377 BLAST alignment positions. 202 of 609 Muscle alignment positions masked (33.100 %; 407 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  missing N- and C-terminus\n
SPU_021251	SPU_021251	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, subfamily 3A, polypeptide 62 [Rattus norvegicus]" (NP_001019403.1) is 31.63% over 332 BLAST alignment positions. 83 of 580 Muscle alignment positions masked (14.300 %; 497 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  missing C-terminus and stretch in middle\n
SPU_022109	SPU_022109	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 35.99% over 389 BLAST alignment positions. 229 of 671 Muscle alignment positions masked (34.100 %; 442 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  missing N-terminus\n
SPU_022110	SPU_022110	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 36.19% over 409 BLAST alignment positions. 211 of 667 Muscle alignment positions masked (31.600 %; 456 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_022432	SPU_022432	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " hypothetical protein LOC393108 [Danio rerio]" (NP_956433.1) is 38.48% over 434 BLAST alignment positions. 268 of 728 Muscle alignment positions masked (36.800 %; 460 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_022590	SPU_022590	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 37.72% over 501 BLAST alignment positions. 299 of 770 Muscle alignment positions masked (38.800 %; 471 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_022593	SPU_022593	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 38.52% over 514 BLAST alignment positions. 392 of 833 Muscle alignment positions masked (47.000 %; 441 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_023067	SPU_023067	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily u, polypeptide 1 [Rattus norvegicus]" (NP_001019950.1) is 34.01% over 444 BLAST alignment positions. 334 of 773 Muscle alignment positions masked (43.200 %; 439 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_023068	SPU_023068	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily u, polypeptide 1 [Rattus norvegicus]" (NP_001019950.1) is 32.14% over 448 BLAST alignment positions. 242 of 682 Muscle alignment positions masked (35.400 %; 440 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_024078	SPU_024078	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " hypothetical protein LOC393108 [Danio rerio]" (NP_956433.1) is 33.23% over 310 BLAST alignment positions. 251 of 718 Muscle alignment positions masked (34.900 %; 467 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_024113	SPU_024113	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " MGC107863 protein [Xenopus tropicalis]" (NP_001015719.1) is 41.63% over 430 BLAST alignment positions. 248 of 726 Muscle alignment positions masked (34.100 %; 478 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_018515	SPU_018515	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 37.69% over 459 BLAST alignment positions. 212 of 662 Muscle alignment positions masked (32.000 %; 450 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_019082	SPU_019082	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 43.40% over 235 BLAST alignment positions. 479 of 889 Muscle alignment positions masked (53.800 %; 410 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial, missing N-terminus\n
SPU_019464	SPU_019464	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450 A 37 [Gallus gallus]" (NP_001001751.1) is 39.19% over 296 BLAST alignment positions. 41 of 534 Muscle alignment positions masked (7.600 %; 493 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  missing N-terminus\n
SPU_019898	SPU_019898	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily J, polypeptide 2 [Homo sapiens]" (NP_000766.2) is 32.91% over 468 BLAST alignment positions. 170 of 636 Muscle alignment positions masked (26.700 %; 466 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_019899	SPU_019899	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily J, polypeptide 2, B [Danio rerio]" (NP_956914.1) is 34.40% over 436 BLAST alignment positions. 223 of 693 Muscle alignment positions masked (32.100 %; 470 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_020229	SPU_020229	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 4, subfamily v [Gallus gallus]" (NP_001001879.1) is 52.84% over 388 BLAST alignment positions. 738 of 1149 Muscle alignment positions masked (64.200 %; 411 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  missing N-terminus\n
SPU_020233	SPU_020233	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily d, polypeptide 22 [Rattus norvegicus]" (NP_612524.1) is 38.37% over 258 BLAST alignment positions. 201 of 633 Muscle alignment positions masked (31.700 %; 432 positions used for tree generation) with a Muscle scorefile cutoff of 25. N terminus is SPU_020232   COMMENTS from arm@stowers-institute.org:  missing N-terminus\n
SPU_020234	SPU_020234	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily j, polypeptide 11 [Mus musculus]" (NP_001004141.1) is 33.48% over 466 BLAST alignment positions. 80 of 538 Muscle alignment positions masked (14.800 %; 458 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_020752	SPU_020752	none	Fragment of CYP2. Possible missasembly with SPU_020751 \n \nBLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 46.67% over 225 BLAST alignment positions. 457 of 896 Muscle alignment positions masked (51.000 %; 439 positions used for tree generation) with a Muscle scorefile cutoff of 25. \n
SPU_020753	SPU_020753	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 37.15% over 463 BLAST alignment positions. 214 of 667 Muscle alignment positions masked (32.000 %; 453 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_020756	SPU_020756	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 39.10% over 445 BLAST alignment positions. 193 of 642 Muscle alignment positions masked (30.000 %; 449 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_014843	SPU_014843	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 1, subfamily A, polypeptide 1 [Gallus gallus]" (NP_990477.1) is 34.70% over 464 BLAST alignment positions. 173 of 659 Muscle alignment positions masked (26.200 %; 486 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_015096	SPU_015096	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily j, polypeptide 11 [Mus musculus]" (NP_001004141.1) is 29.45% over 292 BLAST alignment positions. 51 of 521 Muscle alignment positions masked (9.700 %; 470 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial, missing N- and C-terminus\n
SPU_015256	SPU_015256	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 36.93% over 501 BLAST alignment positions. 179 of 644 Muscle alignment positions masked (27.700 %; 465 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_015442	SPU_015442	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 28.47% over 439 BLAST alignment positions. 351 of 798 Muscle alignment positions masked (43.900 %; 447 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_016251	SPU_016251	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " MGC107863 protein [Xenopus tropicalis]" (NP_001015719.1) is 39.87% over 449 BLAST alignment positions. 406 of 858 Muscle alignment positions masked (47.300 %; 452 positions used for tree generation) with a Muscle scorefile cutoff of 25. SPU_016152 may be C terminus   COMMENTS from arm@stowers-institute.org:  missing N-terminus residues\n
SPU_016442	SPU_016442	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily u, polypeptide 1 [Rattus norvegicus]" (NP_001019950.1) is 36.18% over 398 BLAST alignment positions. 353 of 803 Muscle alignment positions masked (43.900 %; 450 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  missing C-terminus\n
SPU_016816	SPU_016816	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily u, polypeptide 1 [Rattus norvegicus]" (NP_001019950.1) is 38.73% over 377 BLAST alignment positions. 194 of 646 Muscle alignment positions masked (30.000 %; 452 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_017582	SPU_017582	none	Missing middle exon - part of I helix - due to missing contig data on scaffold. \nBLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 1, subfamily B, polypeptide 1 [Danio rerio]" (NP_001013285.2) is 27.59% over 464 BLAST alignment positions. 1535 of 1933 Muscle alignment positions masked (79.400 %; 398 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  missing C-terminus\n
SPU_017986	SPU_017986	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 36.96% over 514 BLAST alignment positions. 328 of 785 Muscle alignment positions masked (41.700 %; 457 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_018242	SPU_018242	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " hypothetical protein LOC393433 [Danio rerio]" (NP_956755.1) is 39.11% over 427 BLAST alignment positions. 308 of 696 Muscle alignment positions masked (44.200 %; 388 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_018372	SPU_018372	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 37.80% over 508 BLAST alignment positions. 398 of 845 Muscle alignment positions masked (47.100 %; 447 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_018468	SPU_018468	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " hypothetical protein LOC393108 [Danio rerio]" (NP_956433.1) is 35.41% over 466 BLAST alignment positions. 35 of 514 Muscle alignment positions masked (6.800 %; 479 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_000279	SPU_000279	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 17, subfamily A, polypeptide 1 [Danio rerio]" (NP_997971.1) is 44.06% over 202 BLAST alignment positions. 1620 of 2057 Muscle alignment positions masked (78.700 %; 437 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial, missing N-terminus\n
SPU_000645	SPU_000645	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 27, subfamily B, polypeptide 1 [Xenopus tropicalis]" (NP_001006907.1) is 39.75% over 483 BLAST alignment positions. 494 of 938 Muscle alignment positions masked (52.600 %; 444 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_000746	SPU_000746	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 35.25% over 278 BLAST alignment positions. 372 of 798 Muscle alignment positions masked (46.600 %; 426 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial, missing N- and C-terminus\n
SPU_001622	SPU_001622	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, subfamily IID (debrisoquine, sparteine, etc., -metabolizing), polypeptide 6 [Bos taurus]" (NP_776954.1) is 24.95% over 477 BLAST alignment positions. 65 of 554 Muscle alignment positions masked (11.700 %; 489 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_001792	SPU_001792	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " MGC107863 protein [Xenopus tropicalis]" (NP_001015719.1) is 39.95% over 398 BLAST alignment positions. 262 of 726 Muscle alignment positions masked (36.000 %; 464 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  missing some N-terminus residues\n
SPU_002371	SPU_002371	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily j, polypeptide 6 [Mus musculus]" (NP_034138.2) is 30.14% over 554 BLAST alignment positions. 416 of 879 Muscle alignment positions masked (47.300 %; 463 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_002380	SPU_002380	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 4, subfamily v [Gallus gallus]" (NP_001001879.1) is 53.59% over 362 BLAST alignment positions. 753 of 1162 Muscle alignment positions masked (64.800 %; 409 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial, missing N-terminus\n
SPU_002590	SPU_002590	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " hypothetical protein LOC613109 [Xenopus tropicalis]" (NP_001027517.1) is 43.27% over 342 BLAST alignment positions. 576 of 968 Muscle alignment positions masked (59.500 %; 392 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial, missing N-terminus\n
SPU_002656	SPU_002656	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " MGC107863 protein [Xenopus tropicalis]" (NP_001015719.1) is 38.43% over 458 BLAST alignment positions. 202 of 653 Muscle alignment positions masked (30.900 %; 451 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_002658	SPU_002658	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " CYP2A13 protein [Xenopus tropicalis]" (NP_001010998.1) is 34.52% over 478 BLAST alignment positions. 117 of 569 Muscle alignment positions masked (20.500 %; 452 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_002660	SPU_002660	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " hypothetical protein LOC613109 [Xenopus tropicalis]" (NP_001027517.1) is 42.62% over 298 BLAST alignment positions. 533 of 935 Muscle alignment positions masked (57.000 %; 402 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_002831	SPU_002831	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily b, polypeptide 19 [Mus musculus]" (NP_031840.1) is 35.96% over 178 BLAST alignment positions. 84 of 548 Muscle alignment positions masked (15.300 %; 464 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial\n
SPU_002832	SPU_002832	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 4, subfamily v [Gallus gallus]" (NP_001001879.1) is 54.14% over 423 BLAST alignment positions. 512 of 927 Muscle alignment positions masked (55.200 %; 415 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial, missing N-terminus\n
SPU_002884	SPU_002884	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 26, subfamily b, polypeptide 1 [Danio rerio]" (NP_997831.1) is 37.76% over 241 BLAST alignment positions. 267 of 709 Muscle alignment positions masked (37.600 %; 442 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial, missing N-terminus\n
SPU_002898	SPU_002898	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily d, polypeptide 9 [Rattus norvegicus]" (NP_695225.1) is 25.47% over 475 BLAST alignment positions. 287 of 759 Muscle alignment positions masked (37.800 %; 472 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_003038	SPU_003038	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 1, subfamily A, polypeptide 1 [Gallus gallus]" (NP_990477.1) is 36.04% over 455 BLAST alignment positions. 949 of 1435 Muscle alignment positions masked (66.100 %; 486 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_003231	SPU_003231	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450 1A1 [Sus scrofa]" (NP_999577.1) is 33.88% over 487 BLAST alignment positions. 163 of 595 Muscle alignment positions masked (27.300 %; 432 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_003232	SPU_003232	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 1, subfamily a, polypeptide 1 [Rattus norvegicus]" (NP_036672.2) is 34.65% over 456 BLAST alignment positions. 183 of 617 Muscle alignment positions masked (29.600 %; 434 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_003606	SPU_003606	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily J, polypeptide 2, B [Danio rerio]" (NP_956914.1) is 38.82% over 492 BLAST alignment positions. 281 of 755 Muscle alignment positions masked (37.200 %; 474 positions used for tree generation) with a Muscle scorefile cutoff of 25.   Original GLEAN3 predictiona interrupted by non-LTR retrotransposon (SPU_003604) and AP1 (SPU_003603) possibly also from phage event. \nCOMMENTS from arm@stowers-institute.org:  extra N-terminus in original GLEAN model\n
SPU_003607	SPU_003607	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily j, polypeptide 9 [Rattus norvegicus]" (NP_786942.1) is 40.05% over 382 BLAST alignment positions. 269 of 736 Muscle alignment positions masked (36.500 %; 467 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial, missing N-terminus\n
SPU_003687	SPU_003687	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " hypothetical protein LOC613109 [Xenopus tropicalis]" (NP_001027517.1) is 33.94% over 330 BLAST alignment positions. 317 of 697 Muscle alignment positions masked (45.400 %; 380 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial\n
SPU_005160	SPU_005160	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " hypothetical protein LOC572000 [Danio rerio]" (NP_001020728.1) is 33.69% over 466 BLAST alignment positions. 200 of 651 Muscle alignment positions masked (30.700 %; 451 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_005439	SPU_005439	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " hypothetical protein LOC613109 [Xenopus tropicalis]" (NP_001027517.1) is 42.06% over 504 BLAST alignment positions. 244 of 646 Muscle alignment positions masked (37.700 %; 402 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_005655	SPU_005655	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily J, polypeptide 2, B [Danio rerio]" (NP_956914.1) is 31.26% over 531 BLAST alignment positions. 192 of 658 Muscle alignment positions masked (29.100 %; 466 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_005668	SPU_005668	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 34.34% over 530 BLAST alignment positions. 230 of 693 Muscle alignment positions masked (33.100 %; 463 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_005931	SPU_005931	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450 4F6 [Rattus norvegicus]" (NP_695230.1) is 36.55% over 435 BLAST alignment positions. 681 of 1195 Muscle alignment positions masked (56.900 %; 514 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  extra C-terminus\n
SPU_006574	SPU_006574	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " MGC107863 protein [Xenopus tropicalis]" (NP_001015719.1) is 39.67% over 484 BLAST alignment positions. 308 of 772 Muscle alignment positions masked (39.800 %; 464 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_007306	SPU_007306	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 37.73% over 440 BLAST alignment positions. 182 of 638 Muscle alignment positions masked (28.500 %; 456 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  missing N-terminus\n
SPU_007335	SPU_007335	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450 CYP4F18 [Mus musculus]" (NP_077764.1) is 45.73% over 199 BLAST alignment positions. 658 of 1166 Muscle alignment positions masked (56.400 %; 508 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial, missing N-terminus\n
SPU_007409	SPU_007409	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 17, subfamily A, polypeptide 1 [Danio rerio]" (NP_997971.1) is 35.33% over 450 BLAST alignment positions. 127 of 571 Muscle alignment positions masked (22.200 %; 444 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_007558	SPU_007558	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450 4F6 [Rattus norvegicus]" (NP_695230.1) is 30.44% over 450 BLAST alignment positions. 579 of 1091 Muscle alignment positions masked (53.000 %; 512 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial, missing C-terminus\n
SPU_008152	SPU_008152	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " MGC107863 protein [Xenopus tropicalis]" (NP_001015719.1) is 40.68% over 440 BLAST alignment positions. 273 of 739 Muscle alignment positions masked (36.900 %; 466 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_008632	SPU_008632	none	Most CYP2-like urchin sequence. BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 41.78% over 450 BLAST alignment positions. 403 of 859 Muscle alignment positions masked (46.900 %; 456 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_009118	SPU_009118	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450 2AD3 [Danio rerio]" (NP_001020725.1) is 45.61% over 171 BLAST alignment positions. 379 of 822 Muscle alignment positions masked (46.100 %; 443 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_009512	SPU_009512	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " MGC107863 protein [Xenopus tropicalis]" (NP_001015719.1) is 42.69% over 431 BLAST alignment positions. 210 of 690 Muscle alignment positions masked (30.400 %; 480 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  missing N-terminus\n
SPU_009692	SPU_009692	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " MGC107863 protein [Xenopus tropicalis]" (NP_001015719.1) is 40.48% over 294 BLAST alignment positions. 221 of 692 Muscle alignment positions masked (31.900 %; 471 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial, missing C-terminus\n
SPU_009825	SPU_009825	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 38.85% over 296 BLAST alignment positions. 302 of 737 Muscle alignment positions masked (40.900 %; 435 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial, missing N-terminus\n
SPU_010143	SPU_010143	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 38.30% over 376 BLAST alignment positions. 264 of 730 Muscle alignment positions masked (36.100 %; 466 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  missing N-terminus\n
SPU_010246	SPU_010246	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily U, polypeptide 1 [Homo sapiens]" (NP_898898.1) is 38.19% over 199 BLAST alignment positions. 187 of 653 Muscle alignment positions masked (28.600 %; 466 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial, missing N-terminus\n
SPU_010563	SPU_010563	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily J, polypeptide 2 [Homo sapiens]" (NP_000766.2) is 36.23% over 494 BLAST alignment positions. 238 of 714 Muscle alignment positions masked (33.300 %; 476 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_010576	SPU_010576	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily J, polypeptide 2, B [Danio rerio]" (NP_956914.1) is 40.34% over 471 BLAST alignment positions. 76 of 553 Muscle alignment positions masked (13.700 %; 477 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_012080	SPU_012080	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 4, subfamily v, polypeptide 2 [Homo sapiens]" (NP_997235.2) is 50.75% over 469 BLAST alignment positions. 541 of 957 Muscle alignment positions masked (56.500 %; 416 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_012081	SPU_012081	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " hypothetical protein LOC510156 [Bos taurus]" (NP_001029545.1) is 51.39% over 469 BLAST alignment positions. 520 of 940 Muscle alignment positions masked (55.300 %; 420 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_012148	SPU_012148	none	Assembly error: see SPU_025595 Percent ID to " cytochrome P450, family 51 [Strongylocentrotus purpuratus]" (NP_001001906.1) is 98.59% over 142 BLAST alignment positions.\n
SPU_012500	SPU_012500	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily J, polypeptide 2, B [Danio rerio]" (NP_956914.1) is 38.26% over 311 BLAST alignment positions. 128 of 591 Muscle alignment positions masked (21.600 %; 463 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial, missing N-terminus, missing stretch in middle\n
SPU_012926	SPU_012926	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " hypothetical protein LOC553543 [Danio rerio]" (NP_001018358.1) is 33.33% over 312 BLAST alignment positions. 116 of 584 Muscle alignment positions masked (19.800 %; 468 positions used for tree generation) with a Muscle scorefile cutoff of 25.   COMMENTS from arm@stowers-institute.org:  partial, missing C-terminus\n
SPU_013039	SPU_013039	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " MGC107863 protein [Xenopus tropicalis]" (NP_001015719.1) is 41.07% over 431 BLAST alignment positions. 233 of 700 Muscle alignment positions masked (33.200 %; 467 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_013199	SPU_013199	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " cytochrome P450, family 2, subfamily J, polypeptide 4 [Rattus norvegicus]" (NP_075414.2) is 27.34% over 523 BLAST alignment positions. 307 of 771 Muscle alignment positions masked (39.800 %; 464 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_014092	SPU_014092	none	BLAST identities confined to RefSeq only. Name assigned by phylogenetic methods (ME tree with ML branch lengths or heuristic ML). Percent ID to " thromboxane A synthase 1 [Danio rerio]" (NP_991172.1) is 36.29% over 463 BLAST alignment positions. 80 of 574 Muscle alignment positions masked (13.900 %; 494 positions used for tree generation) with a Muscle scorefile cutoff of 25.\n
SPU_008801	SPU_008801	none	The predicted amino acid sequence of SPU_008801 is identical to those of SPU_008799 and SPU_008800. \nBecause there is a good chance that they do not represent three distinct gene products, they were named as follows: \nSPU_008799: Sp-Dynein Light Chain-2-3a \nSPU_008800: Sp-Dynein Light Chain-2-3b \nSPU_008801: Sp-Dynein Light Chain-2-3c\n
SPU_027937	SPU_027937	none	The predicted amino terminal 43 amino acids of SPU_027937 are inconsistent with the comparable regions of oher Dynein Light Chain Type 2 sequences, and are almost certainly incorrect; otherwise, this GLEAN matches the neighboring SPU_027938 very well. \nBecause these may not represent independent gene products, They were named as follows: \nSPU_027937: Sp-Dynein Light Chain-2-6a \nSPU_027938: Sp-Dynein Light Chain-2-6b\n
SPU_027938	SPU_027938	none	The predicted amino terminal 50 amino acids of SPU_027938 are inconsistent with the comparable regions of oher Dynein Light Chain Type 2 sequences, and are almost certainly incorrect; otherwise, this GLEAN matches the neighboring SPU_027937 very well. \nBecause these may not represent independent gene products, They were named as follows: \nSPU_027937: Sp-Dynein Light Chain-2-6a \nSPU_027938: Sp-Dynein Light Chain-2-6b\n
SPU_018579	SPU_018579	none	LDLa - 4-5 EGF - Igc2 - GPS x2 - NO TM predicted \n \nApart form lack of 7tm_2 domain looks like meber of LNB-TM7-GPCR subfamily \nC-terminal end probably misassembled\n
SPU_013084	SPU_013084	none	SR - FA58C X3 - GPS - 7tm_2 \nmatches well with LNB-7TM-GPCR subfamily \n \nNOVEL ARCHITECTURE \nSIMILAR DOMAIN COMPOSITION TO SPU_020525\n
SPU_001671	SPU_001671	none	SR x4 - EGF - GPS - three TM segments \nApart from lack of a complet 7tm_2 domain, looks like a member of LNB-7TM-GPCR subfamily \n \nNovel architecture \n
SPU_001056	SPU_001056	none	Gal-lectin/GPS/7tm_2 \n \na bit like latrophilin but lacks OLF domain\n
SPU_003929	SPU_003929	none	Gal-lectin/HormR/GPS/7tm_2 \n \na bit like latrophilin but lacks OLF domain\n
SPU_020012	SPU_020012	none	LamNT/EGF_Lam-8/FN3 x3 \n \nlikely N-terminus of usherin2A - mammalian forms have N-terminal LamGL domain in addition \n \nC-terminus encoded in SPU_017733 \n \nPreviously vertebrate-restricted\n
SPU_017733	SPU_017733	none	Large ECM or surface membrane protein with LamG and lots of FN3 repeats - homologous with the Usherin2A protein of vertebrates. \nMissing N-terminal part - LamNT AND EGF-Lam domains - probably encoded in SPU_020012 \n \nPreviously vertebrate-restricted\n
SPU_004543	SPU_004543	none	Using HsSERPINE2 in BLASTP search of the GLEAN3 models yielded 15 genes. Each was analyzed by SMART and the top 12 hits were found to have a SERPIN domain and were annonated Sp-serpin-like 3 through 12 (2 were already annotated by A.Cameron). BLASTP searches of the GLEAN3 models with other vertebrate Serpins yielded the same top 12 hits. \nThis predicted protein had 2 serpin domains. This glean model could be a fusion of two genes. The Fgenesh predictions show two separate genes. There are very few serpins with 2 serpin domains. Some were found in ciona, rat and chicken.\n
SPU_005904	SPU_005904	none	SR/SO/7TM_2 \n \nNovel architecture\n
SPU_014551	SPU_014551	none	SO - 7tm_2 \n \nNovel architecture\n
SPU_013142	SPU_013142	none	four SO repeats\n
SPU_016249	SPU_016249	none	two SO repeats\n
SPU_018658	SPU_018658	none	three SO repeats\n
SPU_027632	SPU_027632	none	two SO repeats\n
SPU_002711	SPU_002711	none	The gene prediction could be incomplete. It is at the end of a sort scaffold. \nUsing HsSERPINE2 in BLASTP search of the GLEAN3 models yielded 15 genes. Each was analyzed by SMART and the top 12 hits were found to have a SERPIN domain and were annonated Sp-serpin-like 3 through 12 (2 were already annotated by A.Cameron). BLASTP searches of the GLEAN3 models with other vertebrate Serpins yielded the same top 12 hits.\n
SPU_007225	SPU_007225	none	homolog of Tubulointerstitial nephritis antigen -  \nSO-cysteine-type endopeptidase - peptidase domain scores as inactive as in human homolog \n \nWidely distributed phylogenetically\n
SPU_013734	SPU_013734	none	FA58C/CLECT/EGF-Ca/CUB x3\n
SPU_015863	SPU_015863	none	SEMA/PSI/IPT x2 \n \nlooks like a plexin - could be a c-met-related RTK\n
SPU_026514	SPU_026514	none	SEMA/PSI/IPT x2 \n \nlooks like a plexin - could be a c-met-related RTK\n
SPU_019919	SPU_019919	none	SEMA/PSI \ncould be a semaphorin or plexin or a c-met relative\n
SPU_026431	SPU_026431	none	SEMA/PSI \ncould be a semaphorin or plexin or a c-met relative\n
SPU_024673	SPU_024673	none	SEMA/PSI \ncould be a semaphorin or plexin or a c-met relative\n
SPU_018779	SPU_018779	none	SEMA/PSI \ncould be a semaphorin or plexin or a c-met relative\n
SPU_006483	SPU_006483	none	SEMA/PSI \ncould be a semaphorin or plexin or a c-met relative\n
SPU_012034	SPU_012034	none	PSI/IPT x3 \n \nlooks like a fragment of plexin - maybe c-met family\n
SPU_022072	SPU_022072	none	same as SPU_012389 \n \nadjacent to cleavage histone H2a  SPU_022075\n
SPU_012389	SPU_012389	none	THis is a duplicate of SPU_022072 and likely should be deleted\n
SPU_025514	SPU_025514	none	This gene is on three scaffolds (544, 81593 and 56300). On scaffold544 (exon 1) there is one GLEAN model (SPU_025514) which covers exon1-3 for this gene. On scaffold81593 GLEAN_14036 is predicted (exon 2-15)for this gene. For scaffold 56300, gene prediction is SPU_017882(exon 8-24) for this gene. Exon 2 and 3 are overlapped between two scaffolds (544 and 81593) and exon 8-15 are overlapped between scaffolds 81593 and 56300. Please refer to GLEAN_14036 and GLEAN_17882 for gene features for C-terminl portion.\n
SPU_022075	SPU_022075	none	Adjacent to SPU_022072 cleavage stage histone H4\n
SPU_024567	SPU_024567	none	simialr to oocyte specific human and mouse H1oo;  there are no other known cleavage stage orthologues of the histones in mammals.\n
SPU_026501	SPU_026501	none	This is a fragment of the H2aZ/x gene or a pseudogene; should be deleted\n
SPU_011356	SPU_011356	none	This is an internal duplicate of SPU_004671; this sequence contains a stretch of possibly spurious amino acids, absent from 04671.  \nSee protein: \nMFNHLSYMRPRDVQQREFEVLLKLNHKNIIHLDQIEEDQISKQPVIVMELCTGGSLYTYLDTPENLYGLKEKEFLQVLNDVSAGMKHLRDKGIVHRDIKPGNIMMVKGEDGVIVYKLADFGAARELEDDEAFKSLYGTEEYL \n \n{LADFGAARELEAEAFKSLYGTEKYL} spurious? \n \nHPDLYERAVLGKRNRTEFTAQTDLWSLGVTFFHTATGSLPFRAHGGRNNREVMHNITTTKKSSMISGVQEVPNGPIIWSEDLPKTCHFSPDMRHQVKTLLSHTMECNIQKMWTFDIFFDFVQRLTSMRPVDVFVAPTCECHIIYVEPQRKVVDFQDKMAHLTGIQPDTQSLLWDKEVFDLKKCLKCEDLPQTSPDNPLILVRMGAEVFPSVKVPSLR\n
SPU_003438	SPU_003438	none	this is likely to be part of the same gene as 03534\n
SPU_012899	SPU_012899	none	Overlaps with SPU_022035: the N terminal part of this glean is identical to 22035, then the C terminus diverges and is longer than the C term of 22035.\n
SPU_022035	SPU_022035	none	Overlaps with SPU_012899. The N term of this protein is longer. 12899 is an internal identical match over its N terminal portion, but the proteins diverge in their C termini.\n
SPU_005514	SPU_005514	none	overlapping part of SPU_027573\n
SPU_008172	SPU_008172	none	contains RRM (RNA Recognition motif)\n
SPU_014468	SPU_014468	none	LY4-LDLA5\n
SPU_006165	SPU_006165	none	LDLA/EGF2/LY5/EGF/MULTIPLE LDLA/EGF2/LY2/EGF2\n
SPU_005642	SPU_005642	none	LY/LDLA/LY4/EGF4/CCP - likely a fragment \n \nnovel architecture - LRPs in other species do not have CCP\n
SPU_020978	SPU_020978	none	LDLA2/EGF2/LY3/EGF/LY6/EGF/MULTIPLE LDLA/EGF2/LY5/EGF/LY4/EGF. \nLY5/EGF/LY5/EGF/MULTIPLE LDLA/EGF/LY5/EGF/MULTIPLE LDLA/EGF.. \nLY3/EGF6-TM\n
SPU_024306	SPU_024306	none	very similar to SPU_008172, contains 2 RRM motifs \n
SPU_007072	SPU_007072	none	multiple LDLA/EGF2/LY3/EGF-TM\n
SPU_022132	SPU_022132	none	LDLA/EGF2/LY5/EGF/multiple LDLA/EGF2/LY3/EGF2-TM\n
SPU_009930	SPU_009930	none	LDLA3/EGF2/LY4 - TM\n
SPU_025837	SPU_025837	none	EGF/LY/EGF/LY2/EGF2/LY/LDLA/LY4/EGF2/CCP - TM \n \nnovel architecture - LRPs in other species do not have CCP\n
SPU_027625	SPU_027625	none	LY/LDLA/LY4/EGF/CCP - likely a fragment \n \nnovel architecture - LRPs in other species do not have CCP\n
SPU_021270	SPU_021270	none	highly similar to SPU_017749\n
SPU_017749	SPU_017749	none	Highly similar to SPU_021270\n
SPU_025763	SPU_025763	none	DNA mismatch repair: The Escherichia coli MutHLS system has been highly conserved throughout evolution. The eukaryotic pathway results in a specialization of MutS homologs that have evolved to play crucial roles in both DNA mismatch repair and meiotic recombination. In Saccharomyces cerevisiae, MSH4 (MutS homolog 4) is a meiosis-specific protein that is not involved in mismatch correction. This protein is required for reciprocal recombination and proper segregation of homologous chromosomes at meiosis I. Paquis-Flucklinger et al identified the human MSH4 homolog gene. The predicted amino acid sequence shows 28.7% identity with the S. cerevisiae MSH4 protein \n
SPU_017127	SPU_017127	none	SRCR(5). Possibly incomplete.\n
SPU_017194	SPU_017194	none	SRCR(2). Possibly incomplete.\n
SPU_017453	SPU_017453	none	SigPep-SRCR(2)-TM.  \n
SPU_017933	SPU_017933	none	SRCR(4). Possibly incomplete.\n
SPU_018252	SPU_018252	none	SRCR(8)-TM. Possibly incomplete.\n
SPU_018429	SPU_018429	none	SRCR(13). Probably incomplete. See SPU_018430.\n
SPU_018430	SPU_018430	none	SigPep-SRCR(5). Probably incomplete. See SPU_018429.\n
SPU_018508	SPU_018508	none	SigPep-SRCR(2)-TM.\n
SPU_018737	SPU_018737	none	SigPep-SRCR(3)-TM. \n
SPU_018939	SPU_018939	none	SRCR(5)-TM. Possibly incomplete.\n
SPU_018985	SPU_018985	none	SRCR(3). Possibly incomplete.\n
SPU_020917	SPU_020917	none	allele: SPU_021818\n
SPU_021818	SPU_021818	none	Short, 8kb contig containing 6 exons coding for N-terminal 254 aa; allele of full length SPU_020917 (17 exons).\n
SPU_004177	SPU_004177	none	3 exons in 8 kb contig coding for C-terminal 75 aa.\n
SPU_015969	SPU_015969	none	SPU_015969 may contain exons from two different genes. The N-terminal cds (182 aa) is homologous to cytochrome P450; the C-terminal cds is homologous to Exoc4. Alleles: SPU_001083, SPU_001084, SPU_022381.\n
SPU_001084	SPU_001084	none	SPU_001084 contains 15 exons coding for C-terminal 2/3 of Sp-Exoc4; SPU_001083 contains 8 exons coding for the N-terminus. Alleles: SPU_022381, SPU_015969.\n
SPU_001083	SPU_001083	none	SPU_001083 contains 8 exons coding for N-terminal 1/3 of Sp-Exoc4; SPU_001084 contains 15 exons coding for the C-terminus. Alleles: SPU_022381, SPU_015969. \n
SPU_022381	SPU_022381	none	#\nSPU_022381 occupies the entire length of a short (13 kb) contig; contains 8 exons coding for the N-terminus of Sp-Exoc4. Alleles: SPU_001083, SPU_001084, SPU_015969.\n
SPU_019241	SPU_019241	none	SigPep-SRCR(4). Possibly incomplete.  Near 7TM SRCR [SPU_019239].\n
SPU_020544	SPU_020544	none	SPU_020544 appears to be truncated, containing 3 exons located near the edge of the contig, coding for the C-terminal 155 aa of Sp-Exoc6. SPU_013045 is a full-length allele, 19 exons.\n
SPU_019263	SPU_019263	none	SRCR(3). Probably incomplete. See SPU_019262.\n
SPU_001461	SPU_001461	none	Similar to PTPR phi, short or long insert varient...a member of the PTPR BHJOQ superfamily? \n
SPU_019291	SPU_019291	none	SRCR(2). Probably incomplete.\n
SPU_019370	SPU_019370	none	SRCR(5)-Sushi-TM. Possibly incomplete. Like gi|8547243|gb|AAF76316.1|AF228824_1 scavenger receptor cysteine-rich protein variant 1 [Strongylocentrotus purpuratus] and >gi|8547245|gb|AAF76317.1|AF228825_1 scavenger receptor cysteine-rich protein variant 2 [Strongylocentrotus purpuratus] from coelomocytes (as published by Z. Pancer).\n
SPU_019374	SPU_019374	none	SRCR(2)-HYR-SRCR(2). possibly incomplete. See SPU_019370.\n
SPU_019479	SPU_019479	none	SRCR(7). Possibly incomplete.\n
SPU_019826	SPU_019826	none	SRCR(5)-TM. Possibly incomplete.\n
SPU_020081	SPU_020081	none	SRCR(5). Possibly incomplete.\n
SPU_020161	SPU_020161	none	SRCR(3). Possibly incomplete.\n
SPU_018071	SPU_018071	none	Part of PTPRB??\n
SPU_020273	SPU_020273	none	GF(2)-SRCR(2). Possibly incomplete.\n
SPU_020650	SPU_020650	none	TIL-SRCR(4). Possibly incomplete. Similar to >gi|8547241|gb|AAF76315.1|AF228823_1 scavenger receptor cysteine-rich protein [Strongylocentrotus purpuratus] published by Z. Pancer.\n
SPU_020822	SPU_020822	none	SigPep-SRCR(4)-TM.\n
SPU_020868	SPU_020868	none	SigPep-SRCR(2). Possibly incomplete.\n
SPU_021124	SPU_021124	none	SigPep-SRCR(6)-TM.\n
SPU_021348	SPU_021348	none	SigPep-SRCR(3). Possibly incomplete.\n
SPU_021457	SPU_021457	none	SRCR(2). Probably incomplete.\n
SPU_021509	SPU_021509	none	SigPep-SRCR(6). Possibly incomplete.\n
SPU_021691	SPU_021691	none	SigPep-SRCR(2). Probably incomplete. See SPU_021692.\n
SPU_004193	SPU_004193	none	This Glean sequence corresponds to 2 different genes: the complete sequence for Hhat and the 3' portion of PTPRscav, a novel PTPR, containing the PTP domain.\n
SPU_021692	SPU_021692	none	SRCR(4)-TM. Probably incomplete. See SPU_021691.\n
SPU_021782	SPU_021782	none	SRCR(3). Possibly incomplete.\n
SPU_008670	SPU_008670	none	Partial sequence.  Contains one protein tyrosine phosphatase domain.\n
SPU_021890	SPU_021890	none	SRCR(3). Probably incomplete.\n
SPU_021987	SPU_021987	none	SRCR(5)-TM. Probably incomplete. See SPU_021988.\n
SPU_021988	SPU_021988	none	SigPep-SRCR(5). Probably incomplete. See SPU_021987.\n
SPU_022000	SPU_022000	none	Somat-SRCR-Somat-SRCR-CUB.\n
SPU_028592	SPU_028592	none	Homologous to PTPN 14/21.  Also similar to the novel Drosophila protein PTPpez.\n
SPU_022085	SPU_022085	none	SRCR(5)-TM. Possibly incomplete.\n
SPU_022145	SPU_022145	none	SigPep-SRCR(6). Probably incomplete. See SPU_022146, 22147, 22148, 22149, 22150, 22151.\n
SPU_022146	SPU_022146	none	SRCR(5). Probably incomplete. See SPU_022145, 22147, 22148, 22149, 22150, 22151.\n
SPU_022147	SPU_022147	none	SRCR(3). Probably incomplete. See SPU_022145, 22146, 22148, 22149, 22150, 22151.\n
SPU_022148	SPU_022148	none	SRCR(4). Probably incomplete. See SPU_022145, 22146, 22147, 22149, 22150, 22151.\n
SPU_022149	SPU_022149	none	SRCR(8). Probably incomplete. See SPU_022145, 22146, 22147, 22148, 22150, 22151.\n
SPU_022150	SPU_022150	none	SRCR(5). Probably incomplete. See SPU_022145, 22146, 22147, 22148, 22149, 22151.\n
SPU_003149	SPU_003149	none	Partial sequence.\n
SPU_003479	SPU_003479	none	Similar to CDKN3\n
SPU_002928	SPU_002928	none	Partial sequence. Homologous to human TPIP/TPTE.\n
SPU_003142	SPU_003142	none	Partial sequence. PTENb...See also SPU_023882 (identical gene) and SPU_009522 (PTENa).\n
SPU_022991	SPU_022991	none	Identical to 11961 (11961 is missing aa 73-109) except for 5'end \n
SPU_028117	SPU_028117	none	AA 28-89 IDENTICAL TO SPU_020429, but 5' and 3' end differ\n
SPU_011467	SPU_011467	none	Identical to 06006 (except 06006 is missing aa 1-69)\n
SPU_006389	SPU_006389	none	same as SPU_006388 (IDENTICAL except at 3' END)\n
SPU_003228	SPU_003228	none	almost IDENTICAL TO 17155\n
SPU_017155	SPU_017155	none	almost DENTICAL TO 03228\n
SPU_018282	SPU_018282	none	identical to 13863\n
SPU_018645	SPU_018645	none	PREDICTED: Strongylocentrotus purpuratus similar to RNA binding motif protein 13 (LOC586606), mRNA\n
SPU_007493	SPU_007493	none	PREDICTED: Strongylocentrotus purpuratus similar to Beta-amyloid-like protein precursor (LOC585391), mRNA \n
SPU_017861	SPU_017861	none	involved in meiotic recombination found originally in yeast,  \n \nDomain search \ngnl|CDD|16546 cd00223, Topo6_Spo, DNA topoisomerase VI subunit A. Homologous to type II topoisomerase, meiotic recombination factor, Spo11; generates double stranded breaks that initiate homologous recombination during meiosis. Subunit A forms homodimers which contain a deep groove that spans both protomers; the dimer architecture suggests that DNA is bound in the groove across the A subunit interface, and that the two monomers separate during DNA transport.. \n
SPU_015850	SPU_015850	none	#\nPREDICTED: Strongylocentrotus purpuratus similar to CUG triplet repeat,RNA-binding protein 2 (LOC576912), partial mRNA.\n
SPU_022717	SPU_022717	none	non-identical duplicate of SPU_027144, _02836. A portion of the overlap is identical, but these do appear to be distinct genes.\n
SPU_006197	SPU_006197	none	This is the N terminal part of the protein. The middle portions are in SPU_027144 and _02836\n
SPU_006784	SPU_006784	none	Non-identical to other PI3K p110 GLEANs: 22717, 06197, 27144, 02836. This gene may be full length.\n
SPU_017593	SPU_017593	none	PREDICTED: Strongylocentrotus purpuratus similar to baculoviral IAP repeat-containing 2 (LOC583399)\n
SPU_004809	SPU_004809	none	partial sequence\n
SPU_028521	SPU_028521	none	PREDICTED: Strongylocentrotus purpuratus similar to Elongation factor G 2, mitochondrial precursor (mEF-G 2) (Elongation factor G2) (LOC584381)\n
SPU_010910	SPU_010910	none	partial sequence; non-identical to other P13K p110 proteins. This sequence seems to be internal to 27144 and 02836. This sequence may be the C terminus that corresponds to the N terminus in SPU_005073\n
SPU_013194	SPU_013194	none	contains SINA domains \n \ngnl|CDD|26029 pfam03145, Sina, Seven in absentia protein family. The seven in absentia (sina) gene was first identified in Drosophila. The Drosophila Sina protein is essential for the determination of the R7 pathway in photoreceptor cell development: the loss of functional Sina results in the transformation of the R7 precursor cell to a non- neuronal cell type. The Sina protein contains an N-terminal RING finger domain pfam00097. Through this domain, Sina binds E2 ubiquitin-conjugating enzymes (UbcD1) Sina also interacts with Tramtrack (TTK88) via PHYL. Tramtrack is a transcriptional repressor that blocks photoreceptor determination, while PHYL down-regulates the activity of TTK88. In turn, the activity of PHYL requires the activation of the Sevenless receptor tyrosine kinase, a process essential for R7 determination. It is thought that thus Sina targets TTK88 for degradation, therefore promoting the R7 pathway. Murine and human homologues of Sina have also been identified. The human homologue Siah-1 also binds E2 enzymes (UbcH5) and through a series of physical interactions, targets beta-catenin for ubiquitin degradation. Siah-1 expression is enhanced by p53, itself promoted by DNA damage. Thus this pathway links DNA damage to beta-catenin degradation. Sina proteins, therefore, physically interact with a variety of proteins. The N-terminal RING finger domain that binds ubiquitin conjugating enzymes is described in pfam00097, and does not form part of the alignment for this family. The remainder C-terminal part is involved in interactions with other proteins, and is included in this alignment. In addition to the Drosophila protein and mammalian homologues, whose similarity was noted previously, this family also includes putative homologues from Caenorhabditis elegans, Arabidopsis thaliana..\n
SPU_001697	SPU_001697	none	This gene prediction is only N-terminus of GAT and should be combined with SPU_001698.  \n
SPU_000102	SPU_000102	none	SPU_004413 is a perfect duplicate of this glean (protein level)\n
SPU_004413	SPU_004413	none	This ia a perfect duplicate of SPU_000102 (protein level)\n
SPU_021153	SPU_021153	none	partial protein; SPU_019900 is a nearly perfect internal duplicate\n
SPU_007693	SPU_007693	none	Groups with caspase 9 subfamily by neighbor joining of multiple sequence alignmnent.  Appears to lack an N-terminus.\n
SPU_008540	SPU_008540	none	Groups with caspase 8 subfamily in neighbor-joining of multiple sequence alignment.\n
SPU_009497	SPU_009497	none	Very high similarity to SPU_011339, SPU_022941, SPU_026645, SPU_021561, SPU_026743, \n
SPU_011339	SPU_011339	none	Very high similarity to SPU_009497, SPU_022941, SPU_026645, SPU_021561, and SPU_026743.\n
SPU_011471	SPU_011471	none	Nearly identical to C-terminus of SPU_017523, SPU_009653, and also with significant similarity to SPU_019822, SPU_009497, SPU_011339.  The tiling expression data indicate that some exons are missing from this model.\n
SPU_018137	SPU_018137	none	Highly similar to SPU_018315, may be a duplication.  See annotation to that gene.  It is not clear where the N-terminus of this model is.\n
SPU_018315	SPU_018315	none	Most similar to SPU_018137; the latter may be a duplication of the C-terminus of this gene, possibly an artifact.  This model contains two capsase domains that are not identical, and may represent a tandem duplication.  N-terminus is ambiguous (no Methionine).\n
SPU_018506	SPU_018506	none	Groups with the caspase 9 subfamily by neighbor joining of multiple sequence alignment.\n
SPU_020623	SPU_020623	none	Groups with caspase 8 subfamily by neighbor joining of multiple sequence alignment.\n
SPU_022177	SPU_022177	none	Similar to SPU_008540, SPU_019839, SPU_016039.\n
SPU_014473	SPU_014473	none	partial sequence. \nSPU_026766 is a non-identical duplicate \n
SPU_019900	SPU_019900	none	SPU_021153 is a nearly identical duplicate that is longer on both ends.\n
SPU_023504	SPU_023504	none	SPU_008853 is a non-identical duplicate \n
SPU_010284	SPU_010284	none	#\npartial sequence; this GLEAN is a nearly identical internal duplicate of SPU_000151\n
SPU_008853	SPU_008853	none	SPU_023504 is a non-identical duplicate\n
SPU_025578	SPU_025578	none	SPU_016326 is a partial sequence of this one\n
SPU_016326	SPU_016326	none	SPU_025578 is a partial sequence of this one\n
SPU_026766	SPU_026766	none	SPU_014473 is a non-identical duplicate\n
SPU_019045	SPU_019045	none	this is a partial sequence of SPU_004954\n
SPU_026272	SPU_026272	none	not clear which RTK this is\n
SPU_018556	SPU_018556	none	PREDICTED: Strongylocentrotus purpuratus similar to peroxisome biogenesis factor 1 (LOC592476)\n
SPU_000400	SPU_000400	none	PREDICTED: Strongylocentrotus purpuratus similar to folliculin isoform 1 (LOC587400)\n
SPU_007667	SPU_007667	none	This GLEAN may code for either the RBM16 or SFRS15 protein. \n
SPU_006990	SPU_006990	none	SPU_001101 matches the first half of RBM19. SPU_006990 matches the latter half.\n
SPU_003581	SPU_003581	none	There appear to be 2 gene sequences in this Glean site.  The first one corresponds to a partial sequence for a receptor protein tyrosine phosphatase (type Mu?) and the second one is similar to Alpha-2,6-sialyltransferase.\n
SPU_001558	SPU_001558	none	DNA polymerase lambda mediates a back-up base excision repair activity by similarity. \n
SPU_003548	SPU_003548	none	The zf-C4 domain is located in SPU_003547.\n
SPU_018631	SPU_018631	none	#\nUsing HsSERPINE2 in BLASTP search of the GLEAN3 models yielded 15 genes. Each was analyzed by SMART and the top 12 hits were found to have a SERPIN domain and were annonated Sp-serpin-like 3 through 12 (2 were already annotated by A.Cameron). BLASTP searches of the GLEAN3 models with other vertebrate Serpins yielded the same top 12 hits.\n
SPU_018630	SPU_018630	none	Using HsSERPINE2 in BLASTP search of the GLEAN3 models yielded 15 genes. Each was analyzed by SMART and the top 12 hits were found to have a SERPIN domain and were annonated Sp-serpin-like 3 through 12 (2 were already annotated by A.Cameron). BLASTP searches of the GLEAN3 models with other vertebrate Serpins yielded the same top 12 hits.\n
SPU_018196	SPU_018196	none	Using HsSERPINE2 in BLASTP search of the GLEAN3 models yielded 15 genes. Each was analyzed by SMART and the top 12 hits were found to have a SERPIN domain and were annonated Sp-serpin-like 3 through 12 (2 were already annotated by A.Cameron). BLASTP searches of the GLEAN3 models with other vertebrate Serpins yielded the same top 12 hits.\n
SPU_020278	SPU_020278	none	This gene model could be incomplete. It is located at the end of a scaffold. \nUsing HsSERPINE2 in BLASTP search of the GLEAN3 models yielded 15 genes. Each was analyzed by SMART and the top 12 hits were found to have a SERPIN domain and were annonated Sp-serpin-like 3 through 12 (2 were already annotated by A.Cameron). BLASTP searches of the GLEAN3 models with other vertebrate Serpins yielded the same top 12 hits.\n
SPU_006446	SPU_006446	none	SPU_023112 encodes first part of the gene and SPU_006446 the latter half.\n
SPU_002644	SPU_002644	none	Alternative splicing; DNA damage; DNA repair; Lyase; Nuclear protein by similarity.\n
SPU_009715	SPU_009715	none	DNA damage; DNA repair; DNA replication; DNA synthesis; DNA-binding; DNA-directed DNA polymerase; Magnesium; Metal-binding; Mutator protein; Nuclear protein by similarity.\n
SPU_012887	SPU_012887	none	Anti-oncogene; ATP-binding; Cell cycle; Disease mutation; DNA damage; DNA repair; DNA-binding; Nuclear protein; Nucleotide-binding; by similarity.\n
Sp-CD59/Sca2-like1	SPU_030142	none	This model was created based on alignments and reciprocal BLASTing of a FgeneshAB/++ prediction with CD59 and Sca2 (two closely related genes in sequence and domain structure). Given the simple structure/sequence of these genes, and since this annotation was based purely on bioinformatic evidence, we have decided to name this gene CD59/Sca2-like.\n
Sp-CD59/Sca2-like2	SPU_030143	none	This model was created based on alignments and reciprocal BLASTing of a FgeneshAB/++ prediction with CD59 and Sca2 (two closely related genes in sequence and domain structure). Given the simple structure/sequence of these genes, and since this annotation was based purely on bioinformatic evidence, we have decided to name this gene CD59/Sca2-like.\n
SPU_019819	SPU_019819	none	A segment of this model is duplicated on SPU_025718. \n \nNTH1 is a DNA glycosylase that excise 5-formyluracil, \n5-hydroxymethyluracil and Thymine glycol in human cells\n
SPU_027975	SPU_027975	none	Role in DNA mismatch repair by similarity.\n
SPU_017492	SPU_017492	none	Transcriptome data indicates that Glean may have falsely predicted the following exons: 1.\n
SPU_025239	SPU_025239	none	also see glean model 18861.\n
SPU_003547	SPU_003547	none	Ligand binding domain is in SPU_003548\n
SPU_022735	SPU_022735	none	SPU_002789 has the first part of the PPIE gene and SPU_022735 has the latter half. There appears to be a small overlap between the two predictions.\n
SPU_002789	SPU_002789	none	SPU_002789 has the first part of the PPIE gene and SPU_022735 has the latter half. There appears to be a small overlap between the two predictions.\n
SPU_000875	SPU_000875	none	e-val for NP_878906 = e-136. \nThis peptide is identical in length (476aas) and sequence to SPU_010081 on scaffold 6159. \nAnnotated by RA Obar, RL Morris and AM Musante.\n
SPU_002954	SPU_002954	none	e-val for NP_060111= 8e-123, [Homo sapiens].  \nKinesin-2 family member. \nAnnotated by RA Obar, RL Morris, SC Cummings, EA Kovacs, and EJ Jin.\n
SPU_004939	SPU_004939	none	#\ne val for NP_085118 = 1e-69. \nProbably incomplete based on its size (390 AA) \nSame length but different sequence as SPU_012452. \nAnnotation by RA Obar, RL Morris, BA Jeffrey,  J Bhatia, AM Musante.\n
SPU_005509	SPU_005509	none	e val for NP_008985= 6e-42; kinesin family member 3A [Homo sapiens]. \nAnnotation by RA Obar, RL Morris, LE Shorey, SA Tower, and EJ Jin.\n
SPU_005741	SPU_005741	none	e val = e-148 for NP_006836, KIF2C [Homo sapiens]. \nAnnotated by RA Obar, RL Morris, R Yen,  JM Fess, and AP Rawson.\n
SPU_006602	SPU_006602	none	The e value was 0 for CAC20443/Q9H193. ?KINESIN-13A2, ACCESSION Q9H193, swissprot locus Q9H193_HUMAN?   \nMotor domain is KIF1-like. \nKinesin-3 family. \nAnnotated by RA Obar, BD Dyer, RL Morris, and B Rossetti. \n \n \n-BDD \n \n  \n \n
SPU_007505	SPU_007505	none	 e = 0 for NP_659464. \nAnnotated by RA Obar, BD Dyer, RL Morris, and B Rossetti. \n \n
SPU_010570	SPU_010570	none	e-val for NP_005724= 5e-93, [Homo sapiens]. \nAnnotated by RA Obar, RL Morris, SC Cummings, EA Kovacs, and B Rossetti.\n
SPU_011353	SPU_011353	none	e val = 3e-61 for NP_071396, KIF13A [Homo sapiens]. \nAnnotated by RA Obar, RL Morris, R Yen, JM Fess, and B Rossetti.\n
SPU_012452	SPU_012452	none	e val = 1e-49 for NP_085118. \nSame length but different sequence as SPU_004939. \nAnnotated by RA Obar, BD Dyer, RL Morris, AM Musante.\n
SPU_015247	SPU_015247	none	May play a role in nucleotide excision repair (NER) and RNA polymerase II (POL II) transcription by interacting with ERCC2/XPD and ERCC3/XPB helicases, both subunits of NER-transcription factor TFIIH (by similarity).\n
SPU_014451	SPU_014451	none	e val for NP_002254 = 3e-34 \nExact match of XP_796037: PREDICTED: similar to kinesin family member C1 [Strongylocentrotus purpuratus]. Explained by Scaffold3656. Contains 8 exons. \nAnnotation by RA Obar, RL Morris, BA Jeffrey, J Bhatia, B Rossetti, EJ Jin, KM Judkins\n
SPU_015354	SPU_015354	none	e val = 6e-28 for NP_004789, KIF3B [Homo sapiens]. \nContains partial kinesin motor domain, N terminus missing,  when compared to human CENP-E motor domain.  Likely to be a fragment. \nAnnotated by RA Obar, RL Morris, R Yen, JM Fess, and EJ Jin.\n
SPU_016067	SPU_016067	none	e val for CAI12999 = 3e-84. \ne val for NP_006836 = e-52; kinesin family member 2C [Homo sapiens]. \nAnnotation by RA Obar, RL Morris, SA Tower, and B Rossetti.\n
SPU_017809	SPU_017809	none	accession number (Swiss-Prot): Q02224 \nLikely a fragment based on its short length. \ne val for Q02224 is 3e-25 \nAnnotation by RA Obar, RL Morris, BA Jeffrey, KM Judkins\n
SPU_017289	SPU_017289	none	e val for AAH73878 = 2e-45. \ne val for NP_002254 = 4e-47. \nAnnotated by RA Obar and RL Morris, KM Judkins\n
SPU_021586	SPU_021586	none	Hydrolysis of the deoxyribose N-glycosidic bond to excise 3-methyladenine, and 7-methylguanine from the damaged DNA polymer formed by alkylation lesions (by similarity).\n
SPU_028109	SPU_028109	none	SPU_018262 sequence is a close but not identical match of the model \nalso \nSPU_014629 is a duplicate of the N-terminal region of the model \nalso \nSPU_018263 contains a sequence match to mid region section of the model\n
SPU_018388	SPU_018388	none	e val = e-109 for NP_904325, KIF1B [Homo sapiens]  \nsee also SPU_018764, also likely ortholog of KIF1B isoform alpha. \nAnnotated by RA Obar, RL Morris, R Yen, and IJ Strachan\n
SPU_018533	SPU_018533	none	e val = 8e-147 against AAH35896.  \ne val = e val = e-150 against Q8IUN3_HUMAN Q8IUN3 (UniProtKB/TrEMBL Accession Number) \nAnnotation by RA Obar, RL Morris, AL Silverio, BJ Chick, and EJ Jin\n
SPU_018764	SPU_018764	none	e val for NP_904325 is 8e-109. \nAnnotation by RA Obar, RL Morris, BA Jeffrey, and IJ Strachan\n
SPU_019165	SPU_019165	none	e val 0 against AAG18582/NP_999659, KRP110 [Strongylocentrotus purpuratus]. \ne val for NP_612565=e-161; kinesin family member 23 Isoform 1 [Homo sapiens].   \nAnnotation by RA Obar, RL Morris, LE Shorey, SA Tower, and EJ Jin.\n
SPU_019775	SPU_019775	none	e value = 5e-37 for NP_612433; kinesin family member 12 [Homo sapiens]. \nAnnotation by R. A. Obar and R.L. Morris\n
SPU_020767	SPU_020767	none	e val for NP_056069=e-172; kinesin family member 13B [Homo sapiens]. \nAnnotation by RA Obar, RL Morris, SA Tower\n
SPU_023560	SPU_023560	none	Q4VXC4 is UniProtKB accession number. \ne val for BAE02544 = 8e-106, and for Q4VXC4 = e-104. \nsee also: SPU_018388, SPU_018764, SPU_020634, which are all KIF1B-like. \nAnnotation by: RA Obar, RL Morris, BA Jeffrey, and IJ Strachan\n
SPU_022982	SPU_022982	none	e val = 8e-29 for NP_004789, KIF3B [Homo sapiens]. \nAnnotated by RA Obar, RL Morris, AM Musante, and EJ Jin\n
SPU_021317	SPU_021317	none	e val for NP_999656 is 0.0 \nAnnotation by RA Obar, RL Morris, BA Jeffrey, and AP Rawson\n
SPU_022160	SPU_022160	none	e val for NP_112494 = 3e-120 \nAnnotated by RA Obar and RL Morris, KM Judkins\n
SPU_026503	SPU_026503	none	See SPU_026503. \nThe KRP170 gene spans SPU_020414 (Scaffold1612/Scaffoldi17703) and SPU_026503 (Scaffold56862/Scaffoldi4507).  The mRNA was published as Chui,K.K., Rogers,G.C., Kashina,A.M., Wedaman,K.P., Sharp,D.J., Nguyen,D.T., Wilt,F. and Scholey,J.M.  "Roles of two homotetrameric kinesins in sea urchin embryonic cell division."   J. Biol. Chem. 275 (48), 38005-38011 (2000).  The GenBank entry for this gene is gi|10697491|gb|AF292395.2|. \nAnnotation by RA Obar, RL Morris, AL Silverio, BJ Chick, EJ Jin.\n
SPU_027784	SPU_027784	none	e val = 2e-128 for CAI95220.  e val = e-130 for NP_055889, KIF1B_beta [Homo sapiens]. \nAnnotation by R.A.Obar R.L. Morris, and AP Rawson 013106\n
SPU_027552	SPU_027552	none	   e val = e-103 for NP_060046, KIF27 [Homo sapiens]. \n   Annotation by R.A.Obar and R.L. Morris, BJ Rossetti and AP Rawson 013106\n
SPU_012960	SPU_012960	none	SPU_012960 is the N-terminal match to MSH6 mouse. \nSPU_028261 is the C-terminal match to MSH6 mouse. \nSPU_015322 is the full match to the Query used.\n
SPU_023112	SPU_023112	none	SPU_023113 encodes first part of the gene and SPU_006446 the latter half.\n
SPU_009456	SPU_009456	none	SPU_009456 is almost complete duplicate prediction for SPU_010689 and contains the first half of the gene. SPU_015881 contains the latter half.\n
SPU_015881	SPU_015881	none	SPU_009456 is almost complete duplicate prediction for SPU_010689 and contains the first half of the gene. SPU_015881 contains the latter half.\n
SPU_003050	SPU_003050	none	The N-terminal sequence of the mouse RAD52 used for Query of the GLEAN3 models is covered by combining \n \nSPU_003050  \nwith \nSPU_006081  \n \n
SPU_008906	SPU_008906	none	regions of SPU_008906 are present in SPU_026677 and SPU_028391 also.\n
SPU_026263	SPU_026263	none	A region of SPU_026263 is present in SPU_023612 also.\n
SPU_010556	SPU_010556	none	Regions of SPU_010556 are present in SPU_000741 (covers 299-621) and SPU_004597 (covers 1-203) also.\n
SPU_002282	SPU_002282	none	SPU_002282 covers the c-terminal region of Rev3. \nSPU_002281 covers the n-terminal region of Rev3.\n
SPU_002281	SPU_002281	none	SPU_002281 covers the n-terminal region of Rev3. \nSPU_002282 covers the c-terminal region of Rev3.\n
SPU_026626	SPU_026626	none	This prediction includes only N-terminus and should be combined with SPU_026627.  \n
SPU_017523	SPU_017523	none	Highly similar to SPU_009653, SPU_011471, SPU_013801.  Presence of a sushi domain and patchiness of scaffold sequence suggests that this model might artifactually fuse two different genes.\n
SPU_021133	SPU_021133	none	Amino acid sequence highly similar but not identical to that of SPU_001683; may be haplotype based on high similarity of intron sequences.  Model incomplete, lacking many exons both 5' and 3'.  (Or, may be a pseudogene.)\n
SPU_010755	SPU_010755	none	Partial sequence.\n
SPU_011070	SPU_011070	none	Partial sequence.  This contains the serine/threonine phosphatase, family 2C, catalytic (Pp2Cc) domain .  See also SPU_011294.\n
SPU_011705	SPU_011705	none	Identical to SPU_017161\n
SPU_017161	SPU_017161	none	See also SPU_011705.\n
SPU_015869	SPU_015869	none	beginning of the gene is missing (probably in the gap between contings in the scaffold). The end of the gene is probably on another scaffold as well\n
SPU_013821	SPU_013821	none	SPU_002088 appears to be identical.\n
SPU_005107	SPU_005107	none	utrs were addad based on est and tiling data\n
SPU_006579	SPU_006579	none	In light of evidence from hybridization array, the Fgeneh gene prediction model goes a better job than the Glean3 model, though the 3' UTR predicted by Glean3 appears accurate and has been included here.\n
SPU_008840	SPU_008840	none	Shows similar genbank hits to SPU_009651, though the two sequences show little similarity to each other.  Not highly confident in this annotation.\n
SPU_009651	SPU_009651	none	Unclear what the exact homolog in non-urchins is.  Clusters tightly with SPU_008840 (annotated as SpTFIIA alpha).  \n
SPU_005661	SPU_005661	none	Little expression evidence for exons one, two and six.  But this is likely the correct gene and for at least the other exons shows good evidence for expression.  \n
SPU_002637	SPU_002637	none	Putative conserved RPB5 domain detected in BlastP search. \nAll exons are supported by tiling path array data. \nHowever, query of Poustka's database does not retrieve significant hits.\n
SPU_014575	SPU_014575	none	This GLEAN is the C terminal part of DKG-beta; the N terminal part is in SPU_014574. The sequence has been modified below to what is probably correct: the first few amino acids were deleted, while a spurious, non-homologous internal sequence was sectioned off and indicated as not real. \nDNA: \nGATTCAAGGAAAACACTCAAAGATGTTTTAGAAGAATTCCATGGAAATGGAGTATTATCCAAATACAACCCCGAGCAGCCCATCAACTATGAAGGCTTCAAGCTCTTCATGGAGACTTACCTTGACGTTGACATGCCAGAGGATCTCTGTCGACACCTCTTCCTCTCCTTCGTCAAGAAGACACCTTCC \n \n{GTGTCCAGGTCTACAAGTAGAGAGAAGGAGAGGGGCGCCATCTATGATGTCGGCAGTGCCGTCGCTACCGTGACCACCACCGCCGCGTGTGCCGCCATCACAGGGAGCAGCTCGCCGATTGAAATCGCTGGGCAGACGAGCGGCGGCAACAACAACACAGAAGGAAATGAGCTCACAAACGGTCATGGGCAAGAGTCAGGGGTTGTCACAAGAGGGAGCCCCACCCAGTCACGGAGTTCATCCAAGAAGTCCAATGCCTCTAACGGAACCCCTTCAAGTCATTCCATTCGCACCGATGCACTCAGGGTCGAACAAACCACTAATGATCATGGAGATATCAAGAAAGGGAAATCCATGGAGAACAATAGGCGGAAAAAGACAGCACTCTTCACCGCCCTGCGCAAG} NOT REAL! \n \nACTAAAAAGAATACCAAAGACACGGACTCACTAGGGGCATTGGCCGCCTCCCGGGGGTCACTCGCACATCCAGATGTCACCAACCAGGTCGTCTACATGAAAGACATCGTCTGCTACTTGTCGTTGCTAGAGGGCGGCAAACCAGAGGATAAATTAGAATTCATGTTCAGGCTTTATGACACAGATGACAATGGTATCCTTGACAGCAGTGAGCTAGATTGTATAGTAAACCAGATGATGCATGTAGCTGAGTATCTAGGCTGGGATGTCACTGAATTAAGACCAATCTTACAAGATATGATGATAGAGATTGACTTTGATTCAGATGGGACCGTATCGTTAGAAGAATGGATACGTGGCGGCATGACAACCATTCCTCTCTTAGTTCTCCTTGGCCTTGAATCAAATGTGAAAGATGATGGCAGTCATGTATGGAGGTTAAAACACTACAACAAACCAGCCTATTGTAACCTCTGCCTCAATCTCCTTGTAGGGTTTGGAAAACAAGGTCTATCATGTACATTCTGCAAATACACGGTTCATGAACGCTGTGTACAACGTGCCCCAGCCTGTTGCATTAGTACATATGTCAAGTCAAAACGGACAACGAATATCATGAACCATCACTGGGTAGAAGGTAACAGCCCGGGTAAATGTGACCGGTGTAAGAAGTCGATTAAGAGCTACAACGGTATTACTGGCCTTCACTGCAGATGGTGCAAGATAACGCTCCATAACAAATGTGCCTCCCACGTCAAACCAGAGTGTAACATGGGAGAATTCCGAGACCACATTCTACCTCCTACGGCTATCTGCCCAGCAGTTCTGGTAAGTTTAGTCACATTTTAA \n \nprotein: \nDSRKTLKDVLEEFHGNGVLSKYNPEQPINYEGFKLFMETYLDVDMPEDLCRHLFLSFVKKTPS \n \n{VSRSTSREKERGAIYDVGSAVATVTTTAACAAITGSSSPIEIAGQTSGGNNNTEGNELTNGHGQESGVVTRGSPTQSRSSSKKSNASNGTPSSHSIRTDALRVEQTTNDHGDIKKGKSMENNRRKKTALFTALRK} NOT REAL! \n \nTKKNTKDTDSLGALAASRGSLAHPDVTNQVVYMKDIVCYLSLLEGGKPEDKLEFMFRLYDTDDNGILDSSELDCIVNQMMHVAEYLGWDVTELRPILQDMMIEIDFDSDGTVSLEEWIRGGMTTIPLLVLLGLESNVKDDGSHVWRLKHYNKPAYCNLCLNLLVGFGKQGLSCTFCKYTVHERCVQRAPACCISTYVKSKRTTNIMNHHWVEGNSPGKCDRCKKSIKSYNGITGLHCRWCKITLHNKCASHVKPECNMGEFRDHILPPTAICPAVLVSLVTF\n
SPU_023664	SPU_023664	none	internal exon appears to be incorrect; indicated in sequences below: \nATGCCGAAGACGTACAAGCTGGAGTACTTTAACCTTCGTGGTCGCGCGGAATTGTCCCGTCTGCTTATGGCACAAGCTGATATGAAGTATGAAGATGTTCGGCTCTCATTCGCCGATTGGGGGACAGCAAAGGGAAATCAAGATAAGTACCCCTTGGGATTTCTGCCAGTGCTGGAGGAAGATGGAAAAGTCATTTCACAGAGTATGACCATCGCTCGTCATCTGGCGAGAGAGTTTGGAATGGCGGGACAGAATGAAGAAGAAATGGTAATGATTGATATGATTTGTGAGACGTGTAACGAATTGCTGAGCAAGATGATTGAGATAGCCCTGATGCAAGGCGAAGCC \n \n{AAGCCAAACGCGGTGAAAGAGTTTACGGAAGTCAAATCTCTACTCCCTATGAAAAATATTACAACATGGCTGGAGATGAACGGCAAGGGAAATGGATACTTCGTGGGCGAA} not real \n \nAAGATGTCGGTGGCAGACCTTTTCGTCTTCAGCATCATGGAACACCTCTCTGGAAAATACCCAAATATCCTCACCAAGCAACCCCTTCTCCAAGCCTTCTATGAGAGAATGATGAAGGAACCCAAGTTAGCTGCCTGGATCGTGAAGCGTCCAAATCAAGATATAGATATCTAA \n \nProtein: \nMPKTYKLEYFNLRGRAELSRLLMAQADMKYEDVRLSFADWGTAKGNQDKYPLGFLPVLEEDGKVISQSMTIARHLAREFGMAGQNEEEMVMIDMICETCNELLSKMIEIALMQGEA \n \n{KPNAVKEFTEVKSLLPMKNITTWLEMNGKGNGYFVGE} not real  \n \nKMSVADLFVFSIMEHLSGKYPNILTKQPLLQAFYERMMKEPKLAAWIVKRPNQDIDI\n
SPU_005626	SPU_005626	none	Only the 5' part of this GLEAN is homologous to this kinase; this is noted in the sequences below. The C terminal protein sequence does not independently produce a significant BLAST hit.  \nATGAACATTGATGAAAAGCTTACAGCAAAGCAACGTGAGGAAAGCAAAACTAAGATTCGGCATAAAGCAGATGGCCGTGTGATGGTGGCAAAGATCGGCAAGCAGAAGATATGGAGAGCCAACCAGCACAAAATGATCCAAGAACTTGAGCTTCTCAATAAACTACAACACCCTAACGTTGTCAGATATATGGGAGCTTGTGTAAAAGATGGCCATATCCATCCTGTACTCGAGTATGTATCTGGTGGATGCTTGACGGACATCTTGGCTGATGAGAGCTTGGCGTTGTCATGGAGACAGAAGGGTGACTTAGCAACAGACATCGCTCGTGGAATGACCTACCTCCACTCACAGAATGTGTGTCATCGAGATCTCACGTCAGCGAATTGCCTCGTTCGTCAAAAGCCGAATAATGTCCTCGAGGCCATACTCACCGACTTCGGCCTCGCTCGTGTGCTCGGCTGCATGCCCGACCCTCCTCCAAACTCTCCCAGAACGCCCGAGTCTCCAGAACCGGACATAATCGACGCACCGAACGGTGGTCCGATGTTGCCTCGGATACCGTCGGCCTGCATGGACGTGCCTCGGAAGATGTCGGTTGTCGGCACCGCGTTCTGGATGGCTCCCGAAGTTTTACGAGGAGAGGAATACACTCGCCAAGTGGATGTTTTCTCGTTTGGTATCGTGGTATGCGAGATTGTAGCAAGAATAACGGCCAATCCAGACGACCTCCCGAGGACTGGGAAGTTCGGTCTCGACCTGCAGCTTTTCAAAGAGAAATGTCCAGGGATACCTGAACCCTTCCTACAGATCGCTGAAGACTGTTGTTCCATGGATCCCAGGGATCGGCCGGTTTTCGCCGAGCTCGTCCGCCGTTTCGAGATCATCCGCGGTACGTTGGACACAGAAACGAGCGACACAACGTGTTACGATGTCAACCTTACGGACATCATCAGGACAAACGATTCGGACGATGATGACGATGATTGTAGCTTTGGGTTTCAGTTTCAAACGGATCTGGATGACAAACGCAGAAGTGGACGAACAAGATTGCGAGAAGTCCTTTGTGGATGTTCTAAAGGA \n \n{TGGATGACAAACGCAGAAGTGGACGAACAAGATTGCGAGAAGTCCTTTGTGGATGTTCTAAAGGAGTACGCTTATGCTGCAAGACAGTATTTCAGGGTTGTTACGGGTTTCATCGTCTGCTTGTTGCCGATGCTTTCACTGTGGATGTATTGTGATTTGAACAATGTCCTCCTGTGGACCTCATTAATATATGCAAGCATAGGGTTGGTCTTAGAGAACTCGACAAGATGTGTAGAAGTGCTTCATTCTTCTCAATCAATCTTCAGGACTTTGTTCAATCTAATATCCAGATTATTCTCCAGACTGTGGAATGTTGTATCGCTTTTGTTGCAAAGAGTGCATTGTACGAGATCCGCAGGCTCGGACCTGAACGAAAACGTGACGTACCCAAATCAAAATGGCGGTCCGACCAAATCGTTGCAACATGGTGATACCCCCAGCGAGGTCCTGAGGAATACATCAACCCGCGTTGCGGACTTGGTCCCAATCTTGAAGAATCGTCAGAAAGGTCCAGGACCCGGCGCCGAGGATCCAGAGTCGCAGGGTGCAAGAAAGAAGACGCTGCTTGCAAACGAATTGGAAGAAACAGAACTGGATAAGTTACATAATCCAAACATTAACAGGGTATTGCTGAATGGCCGCATGTCGCAAAAGCGAGTCAGCTTTTCTTTACAAAACAGTATTGACCGAGATGAAGGTGACCCAAATCCATGA} non-homologous \n \nProtein: \nMNIDEKLTAKQREESKTKIRHKADGRVMVAKIGKQKIWRANQHKMIQELELLNKLQHPNVVRYMGACVKDGHIHPVLEYVSGGCLTDILADESLALSWRQKGDLATDIARGMTYLHSQNVCHRDLTSANCLVRQKPNNVLEAILTDFGLARVLGCMPDPPPNSPRTPESPEPDIIDAPNGGPMLPRIPSACMDVPRKMSVVGTAFWMAPEVLRGEEYTRQVDVFSFGIVVCEIVARITANPDDLPRTGKFGLDLQLFKEKCPGIPEPFLQIAEDCCSMDPRDRPVFAELVRRFEIIRGTLDTETSDTTCYDVNLTDIIRTNDSDDDDDDCSFGFQFQTDLDDKRRSGRTRLREVLCGCSKG \n \n{WMTNAEVDEQDCEKSFVDVLKEYAYAARQYFRVVTGFIVCLLPMLSLWMYCDLNNVLLWTSLIYASIGLVLENSTRCVEVLHSSQSIFRTLFNLISRLFSRLWNVVSLLLQRVHCTRSAGSDLNENVTYPNQNGGPTKSLQHGDTPSEVLRNTSTRVADLVPILKNRQKGPGPGAEDPESQGARKKTLLANELEETELDKLHNPNINRVLLNGRMSQKRVSFSLQNSIDRDEGDPNP} non-homologous  \n
SPU_021174	SPU_021174	none	#\nOne of 2, duplicate of SPU_004006. This glean is much shorter and is identical (protein level) to 04006, but not on the ends. This is indicated in the sequences below \n \n{ATGGCGGAAGACCTGCTGCATCCAGGCGCCATCGTCAAAGATCGATGGAAAGTTACCAAGAAAATTGGTGGTGGAGGCTTCGGTGAGATCTACGAAGCCCTTGACCAAGTCATTGATGAGTGCGTAGCCATCAAGCTAGAATCTGCTCTCCAACCTAAGCAGGTGCTCAAGATGGAAGTTGCCGTCCTCAAGAAACTTCAGGGG} incorrect? \n \nCGGGATCATATCTGCAAGTTCATAGGCTGCGGTCGCAACGATCAGTTCAACTACGTTGTGATGACCGTCCAGGGCCAGAACCTTGCAGAGCTCCGCCGTGCACAGCCCCGTGGCACGTTCTCCGTCAGCACCATGCTCAGGCTTGGAGTACAGATTCTTGAATCGATAGAAAGCATCCACGAAGTGGGCTTTCTACACAGAGACATCAAACCTAGCAACTTTGCCATTGGGAAAGCTGCTGCTAACACAAGAAAGGTGTACATGTTAGACTTTGGTCTGGCAAGGCAGTATACCAATTCTCAGGGTCAAGTTAGAACGCCAAGGCCAGTTGCTGGGTTTCGTGGAACTGTTCGCTATGCTTCTGTCAATGCTCATAGAAATAGAGAGATGGGTCGTCATGATGATTTGTGGTCGTTGTTCTACATGTTAGTAGAGTTTGTCATTGGTCAACTTCCATGGAGAAAAATCAAAGACAAGGAACAAGTAGGTTTGTTGAAGGAGAAGTACGATCATCGTTTACTACTGAAACACATGCCCATGGAGTTCAAGCAGATATTAGAACAATTCCAGTCCTTAGAATATGCAGACAAACCAGATTACAAGTGCATCCATTCTTTATTAGAGCGATGTATGAACAGGAAGAATATCAAGGAGAATGATGCCTATGATTGGGAAAGACCACCAGTAGATGGAACTCATAACTTACTTCCTTCTTCAACTAGTCCTGCTCGA \n \n{AGGAAGACTATTTTTTCTCCTAAACATCTCTTTTGGACGATTTGTAGGATTTAA} incorrect? \n \n{MAEDLLHPGAIVKDRWKVTKKIGGGGFGEIYEALDQVIDECVAIKLESALQPKQVLKMEVAVLKKLQG} incorrect? \n \nRDHICKFIGCGRNDQFNYVVMTVQGQNLAELRRAQPRGTFSVSTMLRLGVQILESIESIHEVGFLHRDIKPSNFAIGKAAANTRKVYMLDFGLARQYTNSQGQVRTPRPVAGFRGTVRYASVNAHRNREMGRHDDLWSLFYMLVEFVIGQLPWRKIKDKEQVGLLKEKYDHRLLLKHMPMEFKQILEQFQSLEYADKPDYKCIHSLLERCMNRKNIKENDAYDWERPPVDGTHNLLPSSTSPAR \n \n{RKTIFSPKHLFWTICRI} incorrect?\n
SPU_010118	SPU_010118	none	This sequence does not include the first exon, which is instead present on SPU_011286. The correct sequence is below: \nDNA: \nGCAGAAGATACCCGGCAGAAAGGTTTGAGGGTAGCCATTAAGAAGCTGTCCAGACCGTTCCAAACAGTCATACACGCCAAGAGGACCTACAGAGAACTACGCCTTCTCAAACATATGAGACATGAAAATGTAATCAGCCTGCTTGACTGTTTCACCCCTGACCGAGTCAACTTCTCAGATGTTTACATGGTGACCCATCTCATGGGAGCCGACCTTAACAACATCATCAAGTGTCAGAAACTCTCTGATGACCATGTCCAGTTCCTCATCTATCAGGTTCTCAGGGGCCTCAAGTACATCCATTCTGCTGGTGTGATTCATCGAGATCTCAAGCCCAGTAACATAGCTGTCAATGAAGACTGTGAACTCAGGATCCTGGACTTTGGATTAGCACGTAGCACAGACGATGAGATGACAGGATATGTAGCTACCAGATGGTATAGGGCACCTGAAATCATGCTCAATTGGATGCACTACACTGAGAAAGTTGACATCTGGTCCGTAGGCTGTATCATGGCAGAGCTCCTCACACAGAAAACCCTCTTCCCAGGGTGTGATCACATAGACCAACTGAATAAGATCATTGCTATCACAGGGAAACCAGACGAGACCTTCTTACAGAAGATCGCAAGTGAGAGTGCAAAGACATACCTGATGAGCATGGCTGCCTACCCTAAGAGGGACTTCAGCACTATCTTCCTAGGGGCCAGTCGCAAGGCTGTCGATCTTCTGGAGAAGATGCTACAATTGGATGAGGACAGAAGGCTGAGTGCTGAGCAGGCTCTTCAGCACCCCTATCTGTCTAAGTACCATGATCCAGATGATGAACCAATTGCAGCCATGTTTGATGATAGTCAGGAGAACAGCGACATCGTAATAGATGAATGGAGACAACGCGTTTTGAAAGAAGTAACAGAATTTGTTGCAGACCCAGCTCCGATGGATTGA \nProtein: \nAEDTRQKGLRVAIKKLSRPFQTVIHAKRTYRELRLLKHMRHENVISLLDCFTPDRVNFSDVYMVTHLMGADLNNIIKCQKLSDDHVQFLIYQVLRGLKYIHSAGVIHRDLKPSNIAVNEDCELRILDFGLARSTDDEMTGYVATRWYRAPEIMLNWMHYTEKVDIWSVGCIMAELLTQKTLFPGCDHIDQLNKIIAITGKPDETFLQKIASESAKTYLMSMAAYPKRDFSTIFLGASRKAVDLLEKMLQLDEDRRLSAEQALQHPYLSKYHDPDDEPIAAMFDDSQENSDIVIDEWRQRVLKEVTEFVADPAPMD\n
SPU_024598	SPU_024598	none	Internal exon may be incorrect. See sequences below. \n \nDNA: \nATGTTTACTACTCACCAGAAATCAGGCCACGCAGGAGGCGGGGGCGTGAAGCGAATCGAGATCTTCTTGACGATGGTGGAACCGACGATACCAGACAGGAGATTCTTGAAGGTTGTCGTGACCAACAACGCTAAGGTCCAGGATCTGATCGGTCTCATCTGCTGGCACTATGTCAACAAGGGTCTTCAACCAGAACTCAATAAATCTTCAGATGGTGAATCTAGGCTTGCGGTGGACTCGACACAAATCACCCTCCGACAGATCTTAACCAAAGCTCTCAGAAAAAGAAAGGGTATCATTCCAACTGCAGGACCTCATTATCTACTGGAGAAGAAGTCTGCACCAGGGGTTCCACTAGACCTGGACCTGAAACTCTGCGAAACAGAATCCATGGACTTCATTATGGTCAGAGAACATAG \n \n{TAGAAGAGACTACCTGAAGAGCGACAGGACGCGACCCTACTCCGGCGATAGAGAGCCCCCTCTAGTGTTGAGTACTCAGTACCGATCGTTCCGAGTCAGCATGCTGCACAAGCTCAGACCTGCCACTGAGATCCAGCTAGGT} incorrect? \n \nATCTCAGGGGATAAGATTGAAATCGACCCGGTAGCCCAACCAAGGAACACGCCGGCCAAGTTCTGGGGCAAGCAGAAAGCCGTCTCCATCGAGTCCGACAGGCTTGCCTTCTGTAACATCACAGATGATAAACCATCAGGGAAGTCCACATTCCGACTGACATTCAAGAGTCCCAACCATGAGTTCAAGCACTACGACTTTGAGACCGGAACCAACCTCACCAAACAGATCGTCAACCGTATCAACCACATTCTCGAGCTTCGAGCAAGCTCCGTACGAAACGACTACACGCTGTGGAGAGAGAGGAGACATAGTAGAAAGGCCCACGATAAATAG \n \nProtein: \nMFTTHQKSGHAGGGGVKRIEIFLTMVEPTIPDRRFLKVVVTNNAKVQDLIGLICWHYVNKGLQPELNKSSDGESRLAVDSTQITLRQILTKALRKRKGIIPTAGPHYLLEKKSAPGVPLDLDLKLCETESMDFIMVREHS \n \n{RRDYLKSDRTRPYSGDREPPLVLSTQYRSFRVSMLHKLRPATEIQLG} incorrect? \n \nISGDKIEIDPVAQPRNTPAKFWGKQKAVSIESDRLAFCNITDDKPSGKSTFRLTFKSPNHEFKHYDFETGTNLTKQIVNRINHILELRASSVRNDYTLWRERRHSRKAHDK\n
SPU_004964	SPU_004964	none	internal exon that is likely to be erroneous has been deleted in sequences below: \n \nDNA: \n \nATGTTATGCCTTGGGATCGTGTCCAAGCAGGTTATCCGTGATGCCATCCTTCTCAATGACTTCACCAAGAACTTCGACAGCTCACAAACCCGCGAGATAGTGGAGTGCATGTTCCCTATCGACTATAAGAAGGGCCAAATAGTCATCAATGAGGGCGACTCAGGAGCACACTTCTACGTCGGAGCAACGGGTACCCTTGAGGTGAGCCAAGGTGATCGCGTCCTGGCCACTATGGGACCGGGAAAGGTCTTCGGGGAACTGGCCATCCTCTATAACTGCACCAGAACAGCCACCATCACTGCCGTCACTGACGCGCAAGTATGGGCGATCGATAGAAAAGTGTTCCAGCTGATCATGATGAAAACTGGGATGCAGCGCCATGAAGAGTATTTCAACTTCCTTAAGAGTGTGCCTTTGCTCAAAGATTTGTCTTCCGATAACCTCTTCAAGTTGGCGAACAGTTTGGAAGTAGACTTCTTCCATGAAGGTGAATACATTATAGTGGAGGGCTCCAGGGGAGATACCTTCTACATCATTAGTAAGGGGGAGGTCCGGATAACCCAATCCGTCCAAGGACAGAGAGAACCCCAGGAGGTTCGAAGCCTCCAGAAAGGAGACTTCTTCGGTGAGAAAGCGCTCCTTGGTGAGGACGTACGAACAGCGAATGTCTTGGCCAGCAAAGGGGGATGCGAGTGCTTGGCCGTTGATAGACAGTCTTTCAACGAACTGATCGGCAACATGCAGGCACTCCAGGACAAGAATTATGGAGACAAAGAAAGGGGAGCAACCAGGTCGAGCTCGGAGATGGATAATACAGAGATTGCACGAATCAAGCCGATACAAGATGAGCTAGCTGCTATACATCTCAACGATCTGGATATCATCGCTACATTGGGTGTTGGAGGGTTCGGTCGGGTCGAACTGGTTCAACTGGCAGGCGATAAGCGGACATTCGCCCTCAAGTGTTTGAAGAAACATCACATCGTAGAAACTCGGCAACAGGAACATATCTTTTCTGAGAAGAAGATCATGATGGAATCTAGCTCCCCCTTTATTGTCAAATTGTTCAAGACATTCCGTGATCAGAAGTATATCTACATGCTTATGGAAGTCTGCTTAGGAGGAGAGCTCTGGACTATCCTCAGGGACAAGGGTCATTTTGATGACCGGACAGCAAGGTTTTCCACCGCATGCGTAGTTGAAGCTTTCCACTATTTGCACAGTCGCGGCATCGTCTACCGCGATCTCAAGCCTGAGAATCTGCTCCTTGACAACAAAGGCTACGTCAAATTGGTCGACTTTGGTTTCGCGAAGAAGATCGGTTTTGGTCGTAAGACCTGGACCTTTTGTGGCACTCCCGAGTACGTGGCACCGGAAATCATCCTCAACAAAGGTCATGACCTGTCATGTGACTACTGGTCCCTGGGAATCCTCATCTTTGAGCTTTTGACCGGAAATCCGCCATTCACTGCCAATGATCCCATGAAGACATACAACGTTATTCTAAAGGGTATCGACATGGTCGAGTTTCCACGGAAGATTCCTCGTAGTGCTGGTAACCTTATCAAGCGACTCTGTCGGGACAATCCAGGCGAGAGAATCGGCTACCAGAAGAATGGCATTAGTGATATCAAGAAGCACAAATGGTTCCAAGGTTTTGACTGGGAAGGTCTCAGGAAGCAAGAAATTGCCGCCCCTCTTCCTCCAAAGGTGAAAGGCTCAAGCGACTGCAGCAACTTCGACAGCTACCCGAAAGATGTCGATATCCCGGCCGATGAAACGTCGGGATGGGACGAACACTTTTAA \n \nProtein: \nMLCLGIVSKQVIRDAILLNDFTKNFDSSQTREIVECMFPIDYKKGQIVINEGDSGAHFYVGATGTLEVSQGDRVLATMGPGKVFGELAILYNCTRTATITAVTDAQVWAIDRKVFQLIMMKTGMQRHEEYFNFLKSVPLLKDLSSDNLFKLANSLEVDFFHEGEYIIVEGSRGDTFYIISKGEVRITQSVQGQREPQEVRSLQKGDFFGEKALLGEDVRTANVLASKGGCECLAVDRQSFNELIGNMQALQDKNYGDKERGATRSSSEMDNTEIARIKPIQDELAAIHLNDLDIIATLGVGGFGRVELVQLAGDKRTFALKCLKKHHIVETRQQEHIFSEKKIMMESSSPFIVKLFKTFRDQKYIYMLMEVCLGGELWTILRDKGHFDDRTARFSTACVVEAFHYLHSRGIVYRDLKPENLLLDNKGYVKLVDFGFAKKIGFGRKTWTFCGTPEYVAPEIILNKGHDLSCDYWSLGILIFELLTGNPPFTANDPMKTYNVILKGIDMVEFPRKIPRSAGNLIKRLCRDNPGERIGYQKNGISDIKKHKWFQGFDWEGLRKQEIAAPLPPKVKGSSDCSNFDSYPKDVDIPADETSGWDEHF\n
SPU_014574	SPU_014574	none	This is the N terminal part of DGK-beta; SPU_014575 encodes the C terminal part. The sequence has been modified below, since the 3' part of the original sequence is actually non-coding, as has been indicated below. \nDNA: \nATGCCTGCTTGTAAGGGAACATGGCCTTTCTCTCAGTCCTATGCCTCCCTCGGCCAAACGGACCAAAAGAGACAGAACTCCAAGAAAGAGCGACCGAGCTGGAAGTACCGTCTCTTTCGCAACTCCAAACGAGGCCGGCAGAAAAAGGAAAAGGATAATAGGATTTCGAAACCCCTTTTTGTGCCTTGTACCTCGGCCCTTGTGATGGCGCTGCAGGGCCAAGCTCAGGATGTAGTGGATATCGTCATG \n \n{CCTTGTCCTTTCAAGACCAATGCTGGCACCTTAATGTTGTTGAAGCGTGTCACATCTAATCCCCCGAAGCCGTATGTCCCGCTTCTGCATTTTCCTTCACTGGAATTCGCAGTTTACACGGGACCCCTGACCTTCTTGGTCTTGACACCGATCAATTTAAGCATCACTCCGGGGTAA} Not Real! \n \nProtein: \nMPACKGTWPFSQSYASLGQTDQKRQNSKKERPSWKYRLFRNSKRGRQKKEKDNRISKPLFVPCTSALVMALQGQAQDVVDIVM \n \n \n{PCPFKTNAGTLMLLKRVTSNPPKPYVPLLHFPSLEFAVYTGPLTFLVLTPINLSITPG} Not Real!\n
SPU_023009	SPU_023009	none	this protein lacks a start codon and has several small regions that are possibly spurious, and one missing short internal sequence. These are indicated in the protein sequence below: \nVPQPYFNLKKRISEEVEVRQKADPPILPIMTKTELADDIVKLIPDSGLRTPIELQEAILFLHEIGSIVHFTDHLNGLNDLYFIDPVWLACTLQRVTALPIGSLKGGKVHVETLRELSKKSSIEEDKFEQYLQLLARFEIVVPISHHWYLVPARLPRDNPGVMLSPHNTDDAPFHYLRRIYKMPYLPPGFWIRLVSRLIADLQMRDKKKKISSAGNERNLGSSKR \n{LTDDEMFPFQR} spurious? \nKSSISEAISFHQTESIYWREGIFFRHNTGQILVRSMVFPSTDKSPGVDILISCQEGHFSAMGCVVDQIEGLIKDWYP \n{GLCTSIHESIQPKVQRLVPCPICVIHGPDDFEPVT} spurious? \nEDLPHCYTVEELAQTYVRGETHITACANSKEKPITISVLIPDMFMKDLSIRHFKQEDFTLHMVSGQSLGQGGFGEVFRAKFRGETVAAKTMLPSRLLKNRMFSSASEGYASCASTSSSTSNRTGESTSTENDSLEAAMLMESFHKLRNEVAIMAKLDHPYIVNLVGVSIRHLCFAMDYAPLGDLRSYLFAEHQSARPHFVKRNIVLEPVLSRMLTYKISLQVASAVGYLHRKDIIYCDLKTDNILLFSSDVNEDVNIKLIDYGISKKYDLMGAMGMAGTPGFCAPEILQGKTFDEKVDWFSYGMFLYHLMTGLVPYYDQHSRIEIELAVNEGRKPTFNFHEYTMPPKQVFPALGALMESCWQNKPGERPHGETTLQLLSEPSFLCLRRVVEVEEEEGVSLAFSQGSQDE \n{LQAFAKQPLAVNANGCKTASLFVI } this is missing from the GLEAN \nDKVVHLIVESGRGTSVRSFQVDEDGCYKSSLLNELQCPMIRTAIATPCGTKIVVGTGGDCVQLYHLPSSHSSHASLLVEARVAGQPTSLHYIQKPSGQEHSLLFVGQANGVLTVLSHETEDSGHHITDDLKLVTRMQLSKHNLPCSSIVAVSRKKNGDSMAEQRRRYEEVVYNGANGVWNRSAKSNGTREERRDETTARKMRGGRNSLEPRERTGGSRDEADGTEVWVGCGNKLRIILLDDITLEPDGIQVAAGMEGIIEGIVQSQGSVWCFTSSALYVYQYSTETRSCLAILDCRESILVPGSFLPLYQEKRQEL \n{VRSWEEKREKEQELASATA} spurious? \nERTVNIIRPRSVGQLSVFAYKLARRPQF\n
SPU_020070	SPU_020070	none	This model was annotated based on a manual inspection of protein alignments and domain structures. The features of this glean model are supported by other predictions and genome-wide tiling array embryonic hybridization data. \n \n
SPU_010307	SPU_010307	none	binds to CREB-binding protein (CBP); related to Snf2 family of proteins (by similarity).\n
SPU_012027	SPU_012027	none	Conserved domain DEAD/H box 1 identified as expected for smarcad homolog \n \ncd00046, DEXDc, DEAD-like helicases superfamily. \nscored 75.9  expectation 1e-14 \n
SPU_002950	SPU_002950	none	partial sequence of SPU_012238, different scaffolds\n
SPU_015432	SPU_015432	none	Protein involved in transcription-coupled repair nucleotide excision repair of UV-induced DNA lesions; homolog of human CSB protein; Rad26p [Saccharomyces cerevisiae] --(By Similarity). \n
SPU_019459	SPU_019459	none	Human Chrom-1 sequence match confined to residues 750 to end of the GLEAN3 model.  The N-terminal region of the model may be more similar to other isoforms of the same family.\n
SPU_019921	SPU_019921	none	Similar to Saccharomyces cerevisiae RAD26, Homo sapiens ERCC6 and chromodomain helicase proteins of the SNF2 family.\n
SPU_024818	SPU_024818	none	Related to SPU_028391 with higher coverage of the Query used.\n
SPU_007862	SPU_007862	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThe structure of this model is supported by the genome-wide tiling array embryonic hybridization experiment and by very similar models generated by other gene prediction protocols. \n \nIts structure resembles that of a partial Cbl gene, with most of the N-terminal domains of Cbl genes missing from this model, and we have therefore named this gene "Sp-Cbl-related 1". Sp-Cbl [SPU_007863] is located immediately upstream of this gene and in the opposite orientation. Given that both models map to a large region of uninterrupted sequence, that they are in opposite orientations and the strong correlation with the tiling array hybridization data, we believe it is unlikely these models represent an assembly error but that they may represent a true localized gene rearrangement event. Nonetheless, additional experimental data are needed to confirm these observations.\n
SPU_028332	SPU_028332	none	SWI/SNF-related matrix-associated actin-dependent  \nregulator of chromatin subfamily A member 3. \n \nTNF-response element binding protein.\n
SPU_028391	SPU_028391	none	Related to SPU_024818 as a subset.\n
SPU_002003	SPU_002003	none	Partial prediction. Missing the last 150 AA from human protein.\n
SPU_007863	SPU_007863	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThe structure of this model is supported by the genome-wide tiling array embryonic hybridization experiment and by very similar models generated by other gene prediction protocols. \n \nIts structure resembles that of a partial Cbl gene, with the most N-terminal Pfam Cbl_N domain missing from this model. \n \nA closely related model [SPU_007863] is located immediately upstream of this gene and in the opposite orientation. Given that both models map to a large region of uninterrupted sequence, that they are in opposite orientations and the strong correlation with the tiling array hybridization data, we believe it is unlikely these models represent an assembly error but that they may represent a true localized gene rearrangement event. Nonetheless, additional experimental data are needed to confirm these observations.\n
SPU_011695	SPU_011695	none	Likely missing a terminal exon (or more).\n
SPU_013568	SPU_013568	none	Close match to SPU_023027.\n
SPU_023027	SPU_023027	none	Close match to SPU_013568.\n
SPU_003313	SPU_003313	none	This gene model has been modified by adding SPU_003312 to the 5' end.  This sequence blasts to patched most highly,  but is most similar to human Niemann-Pick C2 by phylogenetic analysis.\n
SPU_003312	SPU_003312	none	This gene model has been modified by adding it to the 5' end of SPU_003313.  This sequence blasts to patched most highly,  but is most similar to human Niemann-Pick C2 by phylogenetic analysis.\n
SPU_003472	SPU_003472	none	The GLEAN3 model does not cover the c-terminal helicase region of the mouse Brip-1 Query.\n
SPU_009499	SPU_009499	none	The GLEAN3 model does not cover the c-terminal region of the Trel-1 Query.\n
SPU_012100	SPU_012100	none	The SPU_012100 sequence coverage is limited to the N-terminal 250 amino acids of the Query sequence presumably due to the short sequence length of the SPU_012100 model. \n \nThe SPU_012100 sequence is contained within SPU_023149.\n
SPU_023149	SPU_023149	none	SPU_023149 is "contained" within this GLEAN3 model.\n
SPU_028874	SPU_028874	none	Very similar to SPU_013756.\n
SPU_002790	SPU_002790	none	Appears to be identical to SPU_022735.\n
SPU_022479	SPU_022479	none	Appears to be identical to SPU_004237\n
SPU_010625	SPU_010625	none	C-terminal half, probably missing an exon in the middle \nmissing N-terminal half\n
SPU_005083	SPU_005083	none	ectopic transmembrane domain at the N-terminus\n
SPU_010032	SPU_010032	none	shorter than expected, missing N-terminus?\n
SPU_022285	SPU_022285	none	SRCR(3). Probably incomplete. See SPU_022286, 22287, 22288, 22289.\n
SPU_022286	SPU_022286	none	SRCR(10)-TM. Probably incomplete. See SPU_022285, 22287, 22288, 22289.\n
SPU_022287	SPU_022287	none	SRCR(4)-TM. Probably incomplete. See SPU_022285, 22286, 22288, 22289.\n
SPU_022288	SPU_022288	none	SRCR(3). Probably incomplete. See SPU_022285, 22286, 22287, 22289.\n
SPU_022289	SPU_022289	none	SRCR(4)-TM. Probably incomplete. See SPU_022285, 22286, 22287, 22288.\n
SPU_022423	SPU_022423	none	SRCR(4). Probably incomplete. See SPU_022424.\n
SPU_022424	SPU_022424	none	SRCR(8)-Sushi(2). Probably incomplete.  See SPU_022423. Like >gi|8547249|gb|AAF76319.1|AF228827_1 scavenger receptor cysteine-rich protein [Strongylocentrotus purpuratus]. \n
SPU_022528	SPU_022528	none	SRCR(9)-EGF-SRCR(5). Possibly incomplete.\n
SPU_022567	SPU_022567	none	SRCR(5)-TM. Probably incomplete.  See SPU_022568, 22569.\n
SPU_022568	SPU_022568	none	SRCR(5). Probably incomplete.  See SPU_022567, 22569.\n
SPU_022569	SPU_022569	none	SRCR(3). Probably incomplete.  See SPU_022567, 22568.\n
SPU_022814	SPU_022814	none	SRCR(4). Probably incomplete.\n
SPU_023641	SPU_023641	none	SigPep-SRCR(4)-TM.  \n
SPU_023677	SPU_023677	none	SRCR(5)-TM. Possibly incomplete.\n
SPU_023840	SPU_023840	none	SRCR(2). Probably incomplete.\n
SPU_023153	SPU_023153	none	Very similar but not identical in sequence to SPU_023152.\n
SPU_025824	SPU_025824	none	Groups with caspase 9 subfamily in neighbor joining of multiple sequence alignment.  Model may be missing an exon, as the predicted protein contains a CARD domain, but no capsase (peptidase C14) domain.  Similar but not identical to N-terminus of SPU_000882.\n
SPU_023991	SPU_023991	none	SRCR(2). Probably incomplete.\n
SPU_024084	SPU_024084	none	SRCR(4). Probably incomplete. \n
SPU_024390	SPU_024390	none	SigPep-SRCR(2)-WSC-TM.\n
SPU_024408	SPU_024408	none	SigPep-SRCR(7)-TM.\n
SPU_024487	SPU_024487	none	SRCR(9). Probably incomplete. Like gi|8547243|gb|AAF76316.1|AF228824_1 scavenger receptor cysteine-rich protein variant 1 [Strongylocentrotus purpuratus] and >gi|8547245|gb|AAF76317.1|AF228825_1 scavenger receptor cysteine-rich protein variant 2 [Strongylocentrotus purpuratus]\n
SPU_025862	SPU_025862	none	SRCR(3). Probably incoplete. See GLEAN3 25865.\n
SPU_025865	SPU_025865	none	SRCR(3). Probably incomplete. See SPU_025862.\n
SPU_025968	SPU_025968	none	SigPep-SRCR(3). Probably incomplete.\n
SPU_025983	SPU_025983	none	SRCR(27). probably incomplete.\n
SPU_026234	SPU_026234	none	SRCR(3). Probably incomplete.\n
SPU_026241	SPU_026241	none	SRCR(2). Probaly incomplete.\n
SPU_026408	SPU_026408	none	SigPep-SRCR(3). Probably incomplete.\n
SPU_026709	SPU_026709	none	SigPep-SRCR(4). Probably incomplete.\n
SPU_026848	SPU_026848	none	SRCR(8). Probably incomplete.  See SPU_026849.\n
SPU_027037	SPU_027037	none	SigPep-SRCR(2). Possibly incomplete.\n
SPU_027287	SPU_027287	none	SRCR(4). Probably incomplete.  See SPU_027288.\n
SPU_027288	SPU_027288	none	SigPep-SRCR(17). Probably incomplete. See SPU_027287. Like >gi|4165053|gb|AAD08654.1| scavenger receptor cysteine-rich protein type 12 precursor [Strongylocentrotus purpuratus].\n
SPU_027379	SPU_027379	none	SRCR(9). Probably incomplete.\n
SPU_027619	SPU_027619	none	SRCR(5). Probably incomplete.\n
SPU_028382	SPU_028382	none	SRCR(3). Probably incomplete.\n
SPU_028612	SPU_028612	none	SRCR(6). Probably incomplete.\n
SPU_028669	SPU_028669	none	SigPep-SRCR(3). Possibly incomplete.\n
SPU_028680	SPU_028680	none	EGF_CA(6)-EGF-SRCR(2)-EGF(2).\n
SPU_028804	SPU_028804	none	SRCR(3). Probably incomplete.\n
SPU_008981	SPU_008981	none	From Best Accession annotation - \n"OB-fold nucleic acid binding domain. This family contains OB-fold domains that bind to nucleic acids. The family includes the anti-codon binding domain of lysyl, aspartyl, and asparaginyl -tRNA synthetases (See pfam00152). Aminoacyl-tRNA synthetases catalyse the addition of an amino acid to the appropriate tRNA molecule EC:6.1.1.-. This family also includes part of RecG helicase involved in DNA repair. Replication factor A is a heterotrimeric complex, that contains a subunit in this family. This domain is also found at the C-terminus of bacterial DNA polymerase III alpha chain."\n
SPU_002063	SPU_002063	none	SPU_002063 is a partial duplicate prediction for SPU_004403. \n
SPU_018479	SPU_018479	none	PARTIAL\n
SPU_010770	SPU_010770	none	e val for AAH01211 = 1e-75; Kinesin family member C3 [Homo sapiens]. \ne val for NP_005541 = 2e-77; KIFC3 [Homo sapiens]. \nSPU_010770 overlaps entire concensus motor domain when compared to human CENP-E, and has long C-terminal domain. \nAnnotation by RA Obar, RL Morris, AL Silverio, BJ Chick, and B Rossetti. \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
SPU_013729	SPU_013729	none	#\ne val = 7e-77 for AAH01211, Kinesin family member C3 [Homo sapiens]. \ne val = 1e-78 for NP_005541, KIFC3 [Homo sapiens]. \nAnnotation by RA Obar, RL Morris, AL Silverio, BJ Chick, and B Rossetti.\n
SPU_019636	SPU_019636	none	PARTIAL \n
SPU_015809	SPU_015809	none	e val for NP_065867 is 8e-45. \nLikely to be a fragment based on its short length \nAnnotation by RA Obar, RL Morris, BA Jeffrey, and B Rossetti.\n
SPU_015437	SPU_015437	none	e val = 6e-168 against NP_004511, and e-149 for NP_006836; KIF2C [Homo sapiens]. \nAnnotation by RA Obar, RL Morris, AL Silverio, BJ Chick, and B Rossetti. \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
SPU_020634	SPU_020634	none	#\ne val for NP_004312, "axonal transport of synaptic vesicles" [Hs], is 3e-118.  \ne val for NP_904325, kinesin family member 1B isoform alpha [Hs], is e-116 \nSee also SPU_018764. \nAnnotation by: RA Obar, RL Morris, BA Jeffrey, and IJ Strachan\n
SPU_021656	SPU_021656	none	CAA40175 is KHC cloned from purp. \nQ66K46_HUMAN Q66K46 (UniProtKB/TrEMBL accession number) \ne val = 0.0 for Q66K46 \ne val = 0.0 for NP_004512 KIF5B [Homo sapiens]. \nThrough comparison with CENP-E (NP_001804.2) as defined by Pfam PF00225, N-terminus of motor domain is likely incomplete.   \nPeptide length=1,077 AA.  \nAnnotation by RA Obar, RL Morris, AL Silverio, BJ Chick, SA Tower, LE Shorey, and AP Rawson.   \n
SPU_022296	SPU_022296	none	e val for AAH28155=7e-115, and against NP_015556 is e-117. \nAnnotation by RA Obar, RL Morris, BA Jeffrey, AM Musante.\n
SPU_012645	SPU_012645	none	Inspection of the tiling array suggests that glean may have missed the following exons: TSVTQISRPLSLSLPLSSVAIHTFPSGSFPLYSSPYSPSLPLLFRCLHLSLCLLSLTLFSPFSFSPFSNYSFPSLHLPPSQSDT,ARIFHGRFAILLLGKWSLRDSERPKIILLWRGGKTARIIVLLLCFLNPHESYMIYDNLNIDLHEHELQLLRSVGLSLSLSLSPQLQYTPSLLVRSPCIPLRILLLYLFCFAVSISHFVCSV\n
SPU_026237	SPU_026237	none	e val for CAI43180 and for NM_024704 is 0.0.  \nAnnotation by: RA Obar, RL Morris, BA Jeffrey.\n
SPU_009940	SPU_009940	none	e val for NP_05541 is 3e-40. \nEval  = 5e-40 against ?CAK04214.1|  novel kinesin motor domain containing protein [Danio rerio], Length=690? \nsee also SPU_013729 and SPU_010770 also KIFC3-like. \nAnnotations by RA Obar, RL Morris, BA Jeffrey, and B Rossetti.\n
SPU_011200	SPU_011200	none	Strongylocentrotus purpuratus similar to F-box only protein 28 (LOC574995)\n
SPU_004722	SPU_004722	none	Strongylocentrotus purpuratus similar to WD-repeat  \nprotein 26 (LOC579141), mRNA \n
SPU_006169	SPU_006169	none	#\ne val = 2e-160 for XP_780214 "PREDICTED: similar to breast cancer metastasis-suppressor 1-like [Strongylocentrotus purpuratus]" \ne val = e-55 for XP_789383 "PREDICTED: similar to kinesin-like motor protein C20orf23 [Strongylocentrotus purpuratus]" \ne val = e-32 for NP_115728, breast cancer metastasis-suppressor 1-like [Homo sapiens] \nAnnotation by R.A.Obar and R.L. Morris 020106\n
SPU_001683	SPU_001683	none	Very similar to SPU_000882 and SPU_013850; may be a duplication or haplotype of the latter.\n
SPU_009653	SPU_009653	none	Very similar to SPU_017523, may be a duplication or haplotype.  Also significant similarity to C-terminus of SPU_011471.\n
SPU_013850	SPU_013850	none	Very similar to SPU_001683; may be a duplication or haplotype.\n
SPU_021561	SPU_021561	none	Very high similarity to C-terminal sequences of SPU_009497, SPU_011339, SPU_026645, SPU_022941, and SPU_026743.  Also has significant sequence similarity to parts of SPU_001472, SPU_009653, and SPU_017523\n
SPU_011916	SPU_011916	none	This GLEAN MAY be similar to the human nuclear receptor coactivator 5 (NCOA5).\n
SPU_026645	SPU_026645	none	Very high similarity to C-terminal sequences of SPU_009497, SPU_011339, SPU_021561, SPU_022941, and SPU_026743.  Also has significant sequence similarity to parts of SPU_001472, SPU_009653, and SPU_017523\n
SPU_022941	SPU_022941	none	Very similar to SPU_009497, SPU_011339, SPU_021561, SPU_026645, and SPU_026743.  Also has significant sequence similarity to parts of SPU_001472, SPU_009653, and SPU_017523.  Missing N-terminus (no methionine).\n
SPU_000205	SPU_000205	none	Contains MMR_HSR1 domain (GTPase of unknown function domain)\n
SPU_001505	SPU_001505	none	Contains MMR_HSR1 domain (GTPase of unknown function domain)\n
SPU_004604	SPU_004604	none	Contains MMR_HSR1 domain (GTPase of unknown function domain)\n
SPU_005393	SPU_005393	none	Contains MMR_HSR1 domain (GTPase of unknown function domain)\n
SPU_009555	SPU_009555	none	Contains MMR_HSR1 domain (GTPase of unknown function domain)\n
SPU_010114	SPU_010114	none	Contains MMR_HSR1 domain (GTPase of unknown function domain)\n
SPU_013859	SPU_013859	none	Contains MMR_HSR1 domain (GTPase of unknown function domain)\n
SPU_014831	SPU_014831	none	Contains MMR_HSR1 domain (GTPase of unknown function domain)\n
SPU_016886	SPU_016886	none	Contains MMR_HSR1 domain (GTPase of unknown function domain)\n
SPU_017772	SPU_017772	none	Contains MMR_HSR1 domain (GTPase of unknown function domain) \nNote: Identical to SPU_016886 except missing 3'end\n
SPU_020224	SPU_020224	none	Contains MMR_HSR1 domain (GTPase of unknown function domain)\n
SPU_020366	SPU_020366	none	Contains MMR_HSR1 domain (GTPase of unknown function domain)\n
SPU_021394	SPU_021394	none	Contains MMR_HSR1 domain (GTPase of unknown function domain)\n
SPU_024411	SPU_024411	none	Contains MMR_HSR1 domain (GTPase of unknown function domain)\n
SPU_025797	SPU_025797	none	Contains MMR_HSR1 domain (GTPase of unknown function domain)\n
SPU_002178	SPU_002178	none	Model contains two adjacent astacin protease domains followed by an EGF domain.  This architecture is unique among members of this groups of metalloproteases.\n
SPU_000238	SPU_000238	none	Portion of the early histone gene repeat\n
SPU_000366	SPU_000366	none	Portion of the early histone gene repeat\n
SPU_001565	SPU_001565	none	Portion of the early histone gene repeat\n
SPU_001759	SPU_001759	none	Portion of the early histone gene repeat\n
SPU_002067	SPU_002067	none	Portion of the early histone gene repeat\n
SPU_002683	SPU_002683	none	Portion of the early histone gene repeat\n
SPU_002822	SPU_002822	none	Portion of the early histone gene repeat\n
SPU_002867	SPU_002867	none	Portion of the early histone gene repeat\n
SPU_004191	SPU_004191	none	Portion of the early histone gene repeat\n
SPU_004693	SPU_004693	none	Portion of the early histone gene repeat\n
SPU_005923	SPU_005923	none	Portion of the early histone gene repeat\n
SPU_006695	SPU_006695	none	Portion of the early histone gene repeat\n
SPU_007047	SPU_007047	none	Portion of the early histone gene repeat\n
SPU_008044	SPU_008044	none	Portion of the early histone gene repeat\n
SPU_009492	SPU_009492	none	Portion of the early histone gene repeat\n
SPU_012220	SPU_012220	none	Portion of the early histone gene repeat\n
SPU_012387	SPU_012387	none	Portion of the early histone gene repeat\n
SPU_012571	SPU_012571	none	Portion of the early histone gene repeat\n
SPU_013691	SPU_013691	none	Portion of the early histone gene repeat\n
SPU_014169	SPU_014169	none	Portion of the early histone gene repeat\n
SPU_015602	SPU_015602	none	Portion of the early histone gene repeat\n
SPU_018349	SPU_018349	none	Portion of the early histone gene repeat\n
SPU_018424	SPU_018424	none	Portion of the early histone gene repeat\n
SPU_019037	SPU_019037	none	Portion of the early histone gene repeat\n
SPU_019246	SPU_019246	none	Portion of the early histone gene repeat\n
SPU_020433	SPU_020433	none	Portion of the early histone gene repeat\n
SPU_020553	SPU_020553	none	Portion of the early histone gene repeat\n
SPU_023417	SPU_023417	none	Portion of the early histone gene repeat\n
SPU_023579	SPU_023579	none	Portion of the early histone gene repeat\n
SPU_023581	SPU_023581	none	Portion of the early histone gene repeat\n
SPU_023748	SPU_023748	none	Portion of the early histone gene repeat\n
SPU_024254	SPU_024254	none	Portion of the early histone gene repeat\n
SPU_025387	SPU_025387	none	Portion of the early histone gene repeat\n
SPU_026420	SPU_026420	none	Portion of the early histone gene repeat\n
SPU_026642	SPU_026642	none	Portion of the early histone gene repeat\n
SPU_026749	SPU_026749	none	Portion of the early histone gene repeat\n
SPU_026754	SPU_026754	none	Portion of the early histone gene repeat\n
SPU_027950	SPU_027950	none	Portion of the early histone gene repeat\n
SPU_028440	SPU_028440	none	Portion of the early histone gene repeat\n
SPU_028648	SPU_028648	none	Portion of the early histone gene repeat\n
SPU_000445	SPU_000445	none	Portion of the early histone gene repeat\n
SPU_000465	SPU_000465	none	Portion of the early histone gene repeat\n
SPU_000466	SPU_000466	none	Portion of the early histone gene repeat\n
SPU_000609	SPU_000609	none	Portion of the early histone gene repeat\n
SPU_000833	SPU_000833	none	Portion of the early histone gene repeat\n
SPU_000894	SPU_000894	none	Portion of the early histone gene repeat\n
SPU_001635	SPU_001635	none	Portion of the early histone gene repeat\n
SPU_001693	SPU_001693	none	Portion of the early histone gene repeat\n
SPU_001908	SPU_001908	none	Portion of the early histone gene repeat\n
SPU_002377	SPU_002377	none	Portion of the early histone gene repeat\n
SPU_002621	SPU_002621	none	Portion of the early histone gene repeat\n
SPU_002721	SPU_002721	none	Portion of the early histone gene repeat\n
SPU_002805	SPU_002805	none	Portion of the early histone gene repeat\n
SPU_003694	SPU_003694	none	Portion of the early histone gene repeat\n
SPU_003790	SPU_003790	none	Portion of the early histone gene repeat\n
SPU_004339	SPU_004339	none	Portion of the early histone gene repeat\n
SPU_005077	SPU_005077	none	Portion of the early histone gene repeat\n
SPU_005245	SPU_005245	none	Portion of the early histone gene repeat\n
SPU_005535	SPU_005535	none	Portion of the early histone gene repeat\n
SPU_005934	SPU_005934	none	Portion of the early histone gene repeat\n
SPU_006734	SPU_006734	none	Portion of the early histone gene repeat\n
SPU_007281	SPU_007281	none	Portion of the early histone gene repeat\n
SPU_008232	SPU_008232	none	Portion of the early histone gene repeat\n
SPU_008425	SPU_008425	none	Portion of the early histone gene repeat\n
SPU_008451	SPU_008451	none	Portion of the early histone gene repeat\n
SPU_008765	SPU_008765	none	Portion of the early histone gene repeat\n
SPU_009051	SPU_009051	none	Portion of the early histone gene repeat\n
SPU_010863	SPU_010863	none	Portion of the early histone gene repeat\n
SPU_011373	SPU_011373	none	Portion of the early histone gene repeat\n
SPU_012033	SPU_012033	none	Portion of the early histone gene repeat\n
SPU_012737	SPU_012737	none	Portion of the early histone gene repeat\n
SPU_014033	SPU_014033	none	Portion of the early histone gene repeat\n
SPU_014153	SPU_014153	none	Portion of the early histone gene repeat\n
SPU_014506	SPU_014506	none	Portion of the early histone gene repeat\n
SPU_014870	SPU_014870	none	Portion of the early histone gene repeat\n
SPU_014988	SPU_014988	none	Portion of the early histone gene repeat\n
SPU_015532	SPU_015532	none	Portion of the early histone gene repeat\n
SPU_015646	SPU_015646	none	Portion of the early histone gene repeat\n
SPU_015686	SPU_015686	none	Portion of the early histone gene repeat\n
SPU_016297	SPU_016297	none	Portion of the early histone gene repeat\n
SPU_016809	SPU_016809	none	Portion of the early histone gene repeat\n
SPU_016905	SPU_016905	none	Portion of the early histone gene repeat\n
SPU_016951	SPU_016951	none	Portion of the early histone gene repeat\n
SPU_018034	SPU_018034	none	Portion of the early histone gene repeat\n
SPU_018161	SPU_018161	none	Portion of the early histone gene repeat\n
SPU_018700	SPU_018700	none	Portion of the early histone gene repeat\n
SPU_018896	SPU_018896	none	Portion of the early histone gene repeat\n
SPU_019122	SPU_019122	none	Portion of the early histone gene repeat\n
SPU_019321	SPU_019321	none	Portion of the early histone gene repeat\n
SPU_019619	SPU_019619	none	Portion of the early histone gene repeat\n
SPU_019867	SPU_019867	none	Portion of the early histone gene repeat\n
SPU_020189	SPU_020189	none	Portion of the early histone gene repeat\n
SPU_020285	SPU_020285	none	Portion of the early histone gene repeat\n
SPU_020438	SPU_020438	none	Portion of the early histone gene repeat\n
SPU_020439	SPU_020439	none	Portion of the early histone gene repeat\n
SPU_020521	SPU_020521	none	Portion of the early histone gene repeat\n
SPU_021756	SPU_021756	none	Portion of the early histone gene repeat\n
SPU_022660	SPU_022660	none	Portion of the early histone gene repeat\n
SPU_023533	SPU_023533	none	Portion of the early histone gene repeat\n
SPU_023891	SPU_023891	none	Portion of the early histone gene repeat\n
SPU_023940	SPU_023940	none	Portion of the early histone gene repeat\n
SPU_024477	SPU_024477	none	Portion of the early histone gene repeat\n
SPU_025050	SPU_025050	none	Portion of the early histone gene repeat\n
SPU_025461	SPU_025461	none	Portion of the early histone gene repeat\n
SPU_025562	SPU_025562	none	Portion of the early histone gene repeat\n
SPU_027043	SPU_027043	none	Portion of the early histone gene repeat\n
SPU_028654	SPU_028654	none	Portion of the early histone gene repeat\n
SPU_028929	SPU_028929	none	Portion of the early histone gene repeat\n
SPU_000306	SPU_000306	none	Portion of the early histone gene repeat\n
SPU_000467	SPU_000467	none	Portion of the early histone gene repeat\n
SPU_000846	SPU_000846	none	Portion of the early histone gene repeat\n
SPU_001111	SPU_001111	none	Portion of the early histone gene repeat\n
SPU_001234	SPU_001234	none	Portion of the early histone gene repeat\n
SPU_001375	SPU_001375	none	Portion of the early histone gene repeat\n
SPU_001553	SPU_001553	none	Portion of the early histone gene repeat\n
SPU_001637	SPU_001637	none	Portion of the early histone gene repeat\n
SPU_001661	SPU_001661	none	Portion of the early histone gene repeat\n
SPU_001961	SPU_001961	none	Portion of the early histone gene repeat\n
SPU_002318	SPU_002318	none	Portion of the early histone gene repeat\n
SPU_002354	SPU_002354	none	Portion of the early histone gene repeat\n
SPU_002362	SPU_002362	none	Portion of the early histone gene repeat\n
SPU_002502	SPU_002502	none	Portion of the early histone gene repeat\n
SPU_003081	SPU_003081	none	Portion of the early histone gene repeat\n
SPU_004281	SPU_004281	none	Portion of the early histone gene repeat\n
SPU_004696	SPU_004696	none	Portion of the early histone gene repeat\n
SPU_004847	SPU_004847	none	Portion of the early histone gene repeat\n
SPU_005943	SPU_005943	none	Portion of the early histone gene repeat\n
SPU_006062	SPU_006062	none	Portion of the early histone gene repeat\n
SPU_006184	SPU_006184	none	Portion of the early histone gene repeat\n
SPU_006480	SPU_006480	none	Portion of the early histone gene repeat\n
SPU_006827	SPU_006827	none	Portion of the early histone gene repeat\n
SPU_007002	SPU_007002	none	Portion of the early histone gene repeat\n
SPU_007525	SPU_007525	none	Portion of the early histone gene repeat\n
SPU_008092	SPU_008092	none	Portion of the early histone gene repeat\n
SPU_008706	SPU_008706	none	Portion of the early histone gene repeat\n
SPU_008729	SPU_008729	none	Portion of the early histone gene repeat\n
SPU_008925	SPU_008925	none	Portion of the early histone gene repeat\n
SPU_009102	SPU_009102	none	Portion of the early histone gene repeat\n
SPU_009779	SPU_009779	none	Portion of the early histone gene repeat\n
SPU_010037	SPU_010037	none	Portion of the early histone gene repeat\n
SPU_010166	SPU_010166	none	Portion of the early histone gene repeat\n
SPU_010251	SPU_010251	none	Portion of the early histone gene repeat\n
SPU_010558	SPU_010558	none	Portion of the early histone gene repeat\n
SPU_010732	SPU_010732	none	Portion of the early histone gene repeat\n
SPU_011590	SPU_011590	none	Portion of the early histone gene repeat\n
SPU_011627	SPU_011627	none	Portion of the early histone gene repeat\n
SPU_011788	SPU_011788	none	Portion of the early histone gene repeat\n
SPU_012030	SPU_012030	none	Portion of the early histone gene repeat\n
SPU_012346	SPU_012346	none	Portion of the early histone gene repeat\n
SPU_012718	SPU_012718	none	Portion of the early histone gene repeat\n
SPU_012796	SPU_012796	none	Portion of the early histone gene repeat\n
SPU_012836	SPU_012836	none	Portion of the early histone gene repeat\n
SPU_012916	SPU_012916	none	Portion of the early histone gene repeat\n
SPU_013518	SPU_013518	none	Portion of the early histone gene repeat\n
SPU_013548	SPU_013548	none	Portion of the early histone gene repeat\n
SPU_014356	SPU_014356	none	Portion of the early histone gene repeat\n
SPU_014675	SPU_014675	none	Portion of the early histone gene repeat\n
SPU_014825	SPU_014825	none	Portion of the early histone gene repeat\n
SPU_015193	SPU_015193	none	Portion of the early histone gene repeat\n
SPU_015236	SPU_015236	none	Portion of the early histone gene repeat\n
SPU_016072	SPU_016072	none	Portion of the early histone gene repeat\n
SPU_016281	SPU_016281	none	Portion of the early histone gene repeat\n
SPU_016620	SPU_016620	none	Portion of the early histone gene repeat\n
SPU_016985	SPU_016985	none	Portion of the early histone gene repeat\n
SPU_017565	SPU_017565	none	Portion of the early histone gene repeat\n
SPU_017576	SPU_017576	none	Portion of the early histone gene repeat\n
SPU_018089	SPU_018089	none	Portion of the early histone gene repeat\n
SPU_018398	SPU_018398	none	Portion of the early histone gene repeat\n
SPU_018699	SPU_018699	none	Portion of the early histone gene repeat\n
SPU_019093	SPU_019093	none	Portion of the early histone gene repeat\n
SPU_019342	SPU_019342	none	Portion of the early histone gene repeat\n
SPU_019660	SPU_019660	none	Portion of the early histone gene repeat\n
SPU_019832	SPU_019832	none	Portion of the early histone gene repeat\n
SPU_020352	SPU_020352	none	Portion of the early histone gene repeat\n
SPU_020583	SPU_020583	none	Portion of the early histone gene repeat\n
SPU_021094	SPU_021094	none	Portion of the early histone gene repeat\n
SPU_021349	SPU_021349	none	Portion of the early histone gene repeat\n
SPU_021730	SPU_021730	none	Portion of the early histone gene repeat\n
SPU_021848	SPU_021848	none	Portion of the early histone gene repeat\n
SPU_022453	SPU_022453	none	Portion of the early histone gene repeat\n
SPU_022688	SPU_022688	none	Portion of the early histone gene repeat\n
SPU_022751	SPU_022751	none	Portion of the early histone gene repeat\n
SPU_023418	SPU_023418	none	Portion of the early histone gene repeat\n
SPU_023419	SPU_023419	none	Portion of the early histone gene repeat\n
SPU_023580	SPU_023580	none	Portion of the early histone gene repeat\n
SPU_023699	SPU_023699	none	Portion of the early histone gene repeat\n
SPU_023857	SPU_023857	none	Portion of the early histone gene repeat\n
SPU_024550	SPU_024550	none	Portion of the early histone gene repeat\n
SPU_024562	SPU_024562	none	Portion of the early histone gene repeat\n
SPU_025109	SPU_025109	none	Portion of the early histone gene repeat\n
SPU_026158	SPU_026158	none	Portion of the early histone gene repeat\n
SPU_026519	SPU_026519	none	Portion of the early histone gene repeat\n
SPU_026812	SPU_026812	none	Portion of the early histone gene repeat\n
SPU_026911	SPU_026911	none	Portion of the early histone gene repeat\n
SPU_027184	SPU_027184	none	Portion of the early histone gene repeat\n
SPU_027190	SPU_027190	none	Portion of the early histone gene repeat\n
SPU_027569	SPU_027569	none	Portion of the early histone gene repeat\n
SPU_028350	SPU_028350	none	Portion of the early histone gene repeat\n
SPU_028515	SPU_028515	none	Portion of the early histone gene repeat\n
SPU_028657	SPU_028657	none	Portion of the early histone gene repeat\n
SPU_028815	SPU_028815	none	Portion of the early histone gene repeat\n
SPU_000307	SPU_000307	none	Portion of the early histone gene repeat\n
SPU_000464	SPU_000464	none	Portion of the early histone gene repeat\n
SPU_000474	SPU_000474	none	Portion of the early histone gene repeat\n
SPU_000892	SPU_000892	none	Portion of the early histone gene repeat\n
SPU_001021	SPU_001021	none	Portion of the early histone gene repeat\n
SPU_002486	SPU_002486	none	Portion of the early histone gene repeat\n
SPU_002772	SPU_002772	none	Portion of the early histone gene repeat\n
SPU_002828	SPU_002828	none	Portion of the early histone gene repeat\n
SPU_002905	SPU_002905	none	Portion of the early histone gene repeat\n
SPU_003454	SPU_003454	none	Portion of the early histone gene repeat\n
SPU_003576	SPU_003576	none	Portion of the early histone gene repeat\n
SPU_003611	SPU_003611	none	Portion of the early histone gene repeat\n
SPU_004020	SPU_004020	none	Portion of the early histone gene repeat\n
SPU_005019	SPU_005019	none	Portion of the early histone gene repeat\n
SPU_005645	SPU_005645	none	Portion of the early histone gene repeat\n
SPU_005897	SPU_005897	none	Portion of the early histone gene repeat\n
SPU_006563	SPU_006563	none	Portion of the early histone gene repeat\n
SPU_007637	SPU_007637	none	Portion of the early histone gene repeat\n
SPU_007666	SPU_007666	none	Portion of the early histone gene repeat\n
SPU_007835	SPU_007835	none	Portion of the early histone gene repeat\n
SPU_008980	SPU_008980	none	Portion of the early histone gene repeat\n
SPU_008993	SPU_008993	none	Portion of the early histone gene repeat\n
SPU_009181	SPU_009181	none	Portion of the early histone gene repeat\n
SPU_011743	SPU_011743	none	Portion of the early histone gene repeat\n
SPU_012142	SPU_012142	none	Portion of the early histone gene repeat\n
SPU_013389	SPU_013389	none	Portion of the early histone gene repeat\n
SPU_013690	SPU_013690	none	Portion of the early histone gene repeat\n
SPU_014085	SPU_014085	none	Portion of the early histone gene repeat\n
SPU_014175	SPU_014175	none	Portion of the early histone gene repeat\n
SPU_015261	SPU_015261	none	Portion of the early histone gene repeat\n
SPU_015560	SPU_015560	none	Portion of the early histone gene repeat\n
SPU_015871	SPU_015871	none	Portion of the early histone gene repeat\n
SPU_016236	SPU_016236	none	Portion of the early histone gene repeat\n
SPU_016420	SPU_016420	none	Portion of the early histone gene repeat\n
SPU_017399	SPU_017399	none	Portion of the early histone gene repeat\n
SPU_017732	SPU_017732	none	Portion of the early histone gene repeat\n
SPU_018526	SPU_018526	none	Portion of the early histone gene repeat\n
SPU_018560	SPU_018560	none	Portion of the early histone gene repeat\n
SPU_018569	SPU_018569	none	Portion of the early histone gene repeat\n
SPU_019977	SPU_019977	none	Portion of the early histone gene repeat\n
SPU_020103	SPU_020103	none	Portion of the early histone gene repeat\n
SPU_020800	SPU_020800	none	Portion of the early histone gene repeat\n
SPU_020913	SPU_020913	none	Portion of the early histone gene repeat\n
SPU_022170	SPU_022170	none	Portion of the early histone gene repeat\n
SPU_022188	SPU_022188	none	Portion of the early histone gene repeat\n
SPU_023994	SPU_023994	none	Portion of the early histone gene repeat\n
SPU_025103	SPU_025103	none	Portion of the early histone gene repeat\n
SPU_025148	SPU_025148	none	Portion of the early histone gene repeat\n
SPU_025417	SPU_025417	none	Portion of the early histone gene repeat\n
SPU_025463	SPU_025463	none	Portion of the early histone gene repeat\n
SPU_025602	SPU_025602	none	Portion of the early histone gene repeat\n
SPU_025804	SPU_025804	none	Portion of the early histone gene repeat\n
SPU_026462	SPU_026462	none	Portion of the early histone gene repeat\n
SPU_026670	SPU_026670	none	Portion of the early histone gene repeat\n
SPU_027750	SPU_027750	none	Portion of the early histone gene repeat\n
SPU_028062	SPU_028062	none	Portion of the early histone gene repeat\n
SPU_028816	SPU_028816	none	Portion of the early histone gene repeat\n
SPU_000685	SPU_000685	none	Portion of the early histone gene repeat\n
SPU_000768	SPU_000768	none	Portion of the early histone gene repeat\n
SPU_001509	SPU_001509	none	Portion of the early histone gene repeat\n
SPU_002089	SPU_002089	none	Portion of the early histone gene repeat\n
SPU_002699	SPU_002699	none	Portion of the early histone gene repeat\n
SPU_002794	SPU_002794	none	Portion of the early histone gene repeat\n
SPU_004941	SPU_004941	none	Portion of the early histone gene repeat\n
SPU_004970	SPU_004970	none	Portion of the early histone gene repeat\n
SPU_005173	SPU_005173	none	Portion of the early histone gene repeat\n
SPU_005210	SPU_005210	none	Portion of the early histone gene repeat\n
SPU_005424	SPU_005424	none	Portion of the early histone gene repeat\n
SPU_005591	SPU_005591	none	Portion of the early histone gene repeat\n
SPU_005704	SPU_005704	none	Portion of the early histone gene repeat\n
SPU_005933	SPU_005933	none	Portion of the early histone gene repeat\n
SPU_006061	SPU_006061	none	Portion of the early histone gene repeat\n
SPU_006205	SPU_006205	none	Portion of the early histone gene repeat\n
SPU_007032	SPU_007032	none	Portion of the early histone gene repeat\n
SPU_007314	SPU_007314	none	Portion of the early histone gene repeat\n
SPU_007489	SPU_007489	none	Portion of the early histone gene repeat\n
SPU_007705	SPU_007705	none	Portion of the early histone gene repeat\n
SPU_007762	SPU_007762	none	Portion of the early histone gene repeat\n
SPU_007953	SPU_007953	none	Portion of the early histone gene repeat\n
SPU_008055	SPU_008055	none	Portion of the early histone gene repeat\n
SPU_008489	SPU_008489	none	Portion of the early histone gene repeat\n
SPU_009064	SPU_009064	none	Portion of the early histone gene repeat\n
SPU_009329	SPU_009329	none	Portion of the early histone gene repeat\n
SPU_009565	SPU_009565	none	Portion of the early histone gene repeat\n
SPU_009569	SPU_009569	none	Portion of the early histone gene repeat\n
SPU_011273	SPU_011273	none	Portion of the early histone gene repeat\n
SPU_011680	SPU_011680	none	Portion of the early histone gene repeat\n
SPU_011907	SPU_011907	none	Portion of the early histone gene repeat\n
SPU_012107	SPU_012107	none	Portion of the early histone gene repeat\n
SPU_012144	SPU_012144	none	Portion of the early histone gene repeat\n
SPU_012231	SPU_012231	none	Portion of the early histone gene repeat\n
SPU_012331	SPU_012331	none	Portion of the early histone gene repeat\n
SPU_012802	SPU_012802	none	Portion of the early histone gene repeat\n
SPU_013496	SPU_013496	none	Portion of the early histone gene repeat\n
SPU_014872	SPU_014872	none	Portion of the early histone gene repeat\n
SPU_014878	SPU_014878	none	Portion of the early histone gene repeat\n
SPU_014893	SPU_014893	none	Portion of the early histone gene repeat\n
SPU_014933	SPU_014933	none	Portion of the early histone gene repeat\n
SPU_015147	SPU_015147	none	Portion of the early histone gene repeat\n
SPU_015166	SPU_015166	none	Portion of the early histone gene repeat\n
SPU_015201	SPU_015201	none	Portion of the early histone gene repeat\n
SPU_015645	SPU_015645	none	Portion of the early histone gene repeat\n
SPU_015663	SPU_015663	none	Portion of the early histone gene repeat\n
SPU_015694	SPU_015694	none	Portion of the early histone gene repeat\n
SPU_015911	SPU_015911	none	Portion of the early histone gene repeat\n
SPU_015956	SPU_015956	none	Portion of the early histone gene repeat\n
SPU_016469	SPU_016469	none	Portion of the early histone gene repeat\n
SPU_017435	SPU_017435	none	Portion of the early histone gene repeat\n
SPU_017482	SPU_017482	none	Portion of the early histone gene repeat\n
SPU_017997	SPU_017997	none	Portion of the early histone gene repeat\n
SPU_018670	SPU_018670	none	Portion of the early histone gene repeat\n
SPU_019218	SPU_019218	none	Portion of the early histone gene repeat\n
SPU_019474	SPU_019474	none	Portion of the early histone gene repeat\n
SPU_019847	SPU_019847	none	Portion of the early histone gene repeat\n
SPU_020135	SPU_020135	none	Portion of the early histone gene repeat\n
SPU_020454	SPU_020454	none	Portion of the early histone gene repeat\n
SPU_020503	SPU_020503	none	Portion of the early histone gene repeat\n
SPU_020862	SPU_020862	none	Portion of the early histone gene repeat\n
SPU_021741	SPU_021741	none	Portion of the early histone gene repeat\n
SPU_022016	SPU_022016	none	Portion of the early histone gene repeat\n
SPU_022963	SPU_022963	none	Portion of the early histone gene repeat\n
SPU_023075	SPU_023075	none	Portion of the early histone gene repeat\n
SPU_023698	SPU_023698	none	Portion of the early histone gene repeat\n
SPU_026174	SPU_026174	none	Portion of the early histone gene repeat\n
SPU_026434	SPU_026434	none	Portion of the early histone gene repeat\n
SPU_026542	SPU_026542	none	Portion of the early histone gene repeat\n
SPU_027036	SPU_027036	none	Portion of the early histone gene repeat\n
SPU_027210	SPU_027210	none	Portion of the early histone gene repeat\n
SPU_027269	SPU_027269	none	Portion of the early histone gene repeat\n
SPU_027412	SPU_027412	none	Portion of the early histone gene repeat\n
SPU_027474	SPU_027474	none	Portion of the early histone gene repeat\n
SPU_027801	SPU_027801	none	Portion of the early histone gene repeat\n
SPU_028399	SPU_028399	none	Portion of the early histone gene repeat\n
SPU_028656	SPU_028656	none	Portion of the early histone gene repeat\n
SPU_028658	SPU_028658	none	Portion of the early histone gene repeat\n
SPU_028932	SPU_028932	none	Portion of the early histone gene repeat\n
SPU_001465	SPU_001465	none	Model contains exons encoding cub repeats that are nearly identical to SPU_008802.  it probably is a partial CDS of an allele in 08802 or another closely related gene.\n
SPU_011658	SPU_011658	none	The predicted ORF has a N-terminal sequence longuer than  homologous cyclin H in other species. The first Met of these Cyclin H is conserved in Sp raising the possibility that the      initiation codon predicted in the features is perhaps not the   true one.\n
SPU_000328	SPU_000328	none	Three GLEAN: SPU_000328,14989 and 0011295 encode the cyclin L protein. They differ in the N-terminal end.\n
SPU_020986	SPU_020986	none	Unknown protein containing a Cyclin domain\n
SPU_011295	SPU_011295	none	Potential N-terminal sequence of Sp-Cyclin L found in SPU_014989 \nThree GLEAN: SPU_000328,14989 and 0011295 encode the cyclin L protein. They differ in the N-terminal end.\n
SPU_011190	SPU_011190	none	This gene was annotated and modified based on bioinformatic evidence (analysis of multiple protein sequence alignments and domain structures). \n \nThe original version of SPU_011190 showed a domain composition/structure very similar but not identical to that of vertebrate and Drosophila Stam genes. Inspection of other predictions revealed that an otherwise almost identical Genscan model incorporates an additional exon (supported by noticeable signal from the genome-wide tiling array hybridization data). When translated, this Genscan model showed an improved alignment to Stam genes and a domain structure now identical to that of vertebrate and fuit fly Stams. We have therefore decided to modify SPU_011190 accordingly.\n
SPU_028525	SPU_028525	none	This gene may represent a partial duplication of SPU_011190. It is located at the end of a relatively small scaffold, and their sequence identity is 99% at the aminoacid level and >94% at the nucleotide level (including intronic and flanking sequences from the contigs where both models map), which suggests they may reflect an assembly error.\n
SPU_008053	SPU_008053	none	This Glean sequence correspond to an exact duplication of the N-terminal region of Sp-Faim (SPU_003262).\n
SPU_005441	SPU_005441	none	Partial sequence longer than its duplicate SPU_003281 but still partial. N.B.: the two duplicates are not identical\n
SPU_001228	SPU_001228	none	Unknown CYP. Fragmentary. Last 3 or 4 exons of a P450 with insufficient homology to known proteins to identify. First exon may not be good.\n
SPU_002899	SPU_002899	none	Partial CYP2-like gene. Near SPU_002898, Sp-Cyp2-like8, suggesting possible tandem duplication as is know for other CYP2s in many species.\n
SPU_001773	SPU_001773	none	Only the N-terminal region of the Query sequence is covered by the GLEAN3 model. \n \nThe first 42 residues of SPU_001773 are unique whereas the remainder of the sequence is identical to and contained within SPU_001777. \n
SPU_018723	SPU_018723	none	Possible exta exon in the GLEAN3 model, length extended relative to the Query sequence used. \n \nSPU_018723 is near exact match to SPU_028113 with the exception of an extended N-terminal region.\n
SPU_021197	SPU_021197	none	Possible missing exon C-terminal region.  The GLEAN3 model only covers the N-terminal region of the Query sequence used. \nSPU_021197 is contained within SPU_021198.\n
SPU_021198	SPU_021198	none	SPU_021198 contains SPU_021197.\n
SPU_028113	SPU_028113	none	Posible exon duplication.  The GLEAN3 model may have a duplication of the C-terminal region revealed by the alignment with the Query sequence used. \nSPU_028113 is a near exact alignment to SPU_018723 and is contained within it.\n
SPU_001777	SPU_001777	none	Only the N-terminal region of the Query sequence is covered by the GLEAN3 model. \n \nSPU_001773 is contained within SPU_001777.\n
SPU_004494	SPU_004494	none	A clear sequence match to Msh5 but with low coverage of the Query sequence.\n
SPU_011199	SPU_011199	none	SPU_011199 contains SPU_021406.\n
SPU_021406	SPU_021406	none	SPU_021406 is a fragment of SPU_011199.\n
SPU_018944	SPU_018944	none	SPU_018944 contains an extended c-terminus of low complexity sequence relative to the query sequence used.\n
SPU_007033	SPU_007033	none	Fragment, missing C terminus due to incomplete scaffold\n
SPU_028358	SPU_028358	none	Fragment, missing C terminus, possibly other exons due to incomplete scaffolds\n
SPU_003760	SPU_003760	none	Allele: SPU_003908\n
SPU_003908	SPU_003908	none	Allele: SPU_003760\n
SPU_000064	SPU_000064	none	gi|68420855|ref|XP_700381.1|  PREDICTED: similar to Muscarinic acetylcholine receptor M3, partial  \n[Danio rerio]\n
SPU_000078	SPU_000078	none	G-protein coupled receptor 88\n
SPU_006016	SPU_006016	none	The first 60 aa of this glean number (KDIGRRLGLLEADLENIESDYPKQKERGYQMLLKWRQMTRNKDLVKTLVQGLQSVQRVDLADKYGPRFEALFPSEIESD) \npresents homology with the death domains of proteins from TNFR family \n \nHowever the rest of the sequence is more closely related to NOD/NALP proteins although the four last exons encode a sterol-desaturase domain which normally does not belong to this type of molecules.  \n \nAssembly problem must had occurred during the generation of this sequence\n
SPU_000283	SPU_000283	none	NB: sequence identical to SPU_007382\n
SPU_007382	SPU_007382	none	NB: sequence identical to SPU_000283\n
SPU_023408	SPU_023408	none	end of Nek10 sequence. See SPU_018375 from complete gene features\n
SPU_018440	SPU_018440	none	SPU_018441 predicts the first half of SND1 and SPU_018440 has the rest of the gene.\n
SPU_018441	SPU_018441	none	SPU_018441 predicts the first half of SND1 and SPU_018440 has the rest of the gene.\n
SPU_026759	SPU_026759	none	SPU_002501 is a partial duplicate prediction.\n
SPU_002501	SPU_002501	none	Partial duplicate prediction for SPU_026759\n
SPU_002533	SPU_002533	none	This prediction is likely incorrect. There are at least two separate genes present in this GLEAN. Later half of the prediction matches the human MELK gene well. See the alignment.\n
SPU_007194	SPU_007194	none	SPU_007194, SPU_024350 and SPU_027411 are tudor domain containing proteins with weak homology to human tudor domain containing protein 1 (TDRD1). TDRD1 ortholog in urchin is represented by SPU_017916. These may be novel tudor domain proteins or may be incorrect predictions.\n
SPU_024350	SPU_024350	none	SPU_007194, SPU_024350 and SPU_027411 are tudor domain containing proteins with weak homology to human tudor domain containing protein 1 (TDRD1). TDRD1 ortholog in urchin is represented by SPU_017916. These may be novel tudor domain proteins or may be incorrect predictions.\n
SPU_027411	SPU_027411	none	SPU_007194, SPU_024350 and SPU_027411 are tudor domain containing proteins with weak homology to human tudor domain containing protein 1 (TDRD1). TDRD1 ortholog in urchin is represented by SPU_017916. These may be novel tudor domain proteins or may be incorrect predictions.\n
SPU_011603	SPU_011603	none	SPU_011603 covers to 700 of 1087 residues in the Query.\n
SPU_012136	SPU_012136	none	SPU_012136 coverage limited to first 432 of 615 residues in the Query sequence used.\n
SPU_019559	SPU_019559	none	SPU_019559 contains Uba Domain N-terminal but by sequence match it is as named.\n
SPU_022259	SPU_022259	none	SPU_022259 missed 55 residues relative to Query.\n
SPU_025405	SPU_025405	none	Possible missing upstream exon relative to Query.\n
SPU_000486	SPU_000486	none	SPU_000486 coverage of the Query is 267-477 of 608 aa protein sequence.  Additional regions of the Query are present on scaffold 26695.  Add exons to the features table.\n
SPU_015411	SPU_015411	none	SPU_025651 has the first part of the SKIV2L gene and SPU_015411 has the latter half.\n
SPU_025651	SPU_025651	none	SPU_025651 has the first part of the SKIV2L gene and SPU_015411 has the latter half.\n
SPU_018262	SPU_018262	none	SPU_018262 sequence is close but not identical to SPU_028109 sequence.\n
SPU_011552	SPU_011552	none	This prediction should be combined with SPU_011551.  It contains exons encoding cub domains that are very likely to complete the C-terminal sequence of the astacin protease in 11551.\n
SPU_007658	SPU_007658	none	From Pfam 19.0 \n \nAccession number: PF02301 \nHORMA domain \n \nThe HORMA (for Hop1p, Rev7p and MAD2) domain has been suggested to recognise chromatin states that result from DNA adducts, double stranded breaks or non-attachment to the spindle and acts as an adaptor that recruits other proteins. MAD2 is a spindle checkpoint protein which prevents progression of the cell cycle upon detection of a defect in mitotic spindle integrity.  \n
SPU_009486	SPU_009486	none	e val = 2e-67 for NP_005724 \nAlmost exact match to XP_790534 : PREDICTED: similar to kinesin family member 20A, partial [Strongylocentrotus purpuratus]  \n1197 nt spread out over 8 or 9 exons. \nExon 8 may represent a false prediction of an exon or may include sequence errors.  All other exons were perfect matches to accession #  XP_790534.  With a 1 nucleotide shift, exon 8 is approximately 80% identical between the described SPU_009486 exon 8 and XP_790534. \nSame sequence is found on Scaffoldi2484 from sp_20060316.asm. \nAnnotation by RA Obar, RL Morris, J Bhatia, BA Jeffrey, AM Musante, EJ Jin, BJ Rossetti and AP Rawson\n
SPU_001874	SPU_001874	none	When blasted with mus, homo sapiens gene did not obtain same glean3 hit.\n
SPU_022840	SPU_022840	none	e val for NP_524883=3e-100. \ne val for NP_612433=1e-63; kinesin family member 12 [Homo sapiens].   \nSimilarity to NP_612433 is based on overlap of C terminal half of SPU_022840 with N terminus of 612433.  612433 contains only partial kinesin motor domain when compared with human CENP-E.  \nAnnotation by RA Obar, RL Morris, SA Tower, SC Cummings, EA Kovacs, and AP Rawson.  \n
SPU_011918	SPU_011918	none	SPU_025220 also has very good alignment.\n
SPU_007768	SPU_007768	none	From SwissPro entry - \n"Interacts with BIRC4/XIAP. These two proteins are likely to coexist in a complex with TAK1, TRAF6, TAB1 and TAB2 (By similarity)."  \n
SPU_021537	SPU_021537	none	#\nmyotubularin-related protein 9\n
SPU_025276	SPU_025276	none	myotubularin related protein 12 \nNo myotubularin domain in this protein.\n
SPU_014336	SPU_014336	none	From Swiss Pro  \n"May function as a ubiquitin-protein or polyubiquitin hydrolase. This deubiquitinating enzyme which functions at the endosome, is able to oppose the ubiquitin-dependent sorting of receptors to lysosomes (By similarity)."  \n
SPU_026259	SPU_026259	none	May be PTPR10D. Partial sequence.  Contains PTP catalytic domain.\n
SPU_022005	SPU_022005	none	From Swiss Pro \n"Probable component of the SCF (SKP1-CUL1-F-box protein) E3 ubiquitin ligase complex which mediates the ubiquitination and subsequent proteasomal degradation of target proteins involved in cell cycle progression, signal transduction and transcription. Through the RING-type zinc finger, seems to recruit the E2 ubiquitination enzyme to the complex and brings it into close proximity to the substrate. May play a role in protecting cells from apoptosis induced by redox agents. " \n
SPU_026336	SPU_026336	none	#\nSPU_026336 likely codes for part 1 of the DHX34 gene. \nSPU_027857 likely codes for part 2 of the DHX34 gene. \nSPU_011079 likely codes for part 3 of the DHX34 gene.\n
SPU_027857	SPU_027857	none	SPU_026336 likely codes for part 1 of the DHX34 gene. \nSPU_027857 likely codes for part 2 of the DHX34 gene. \nSPU_011079 likely codes for part 3 of the DHX34 gene.\n
SPU_011079	SPU_011079	none	SPU_026336 likely codes for part 1 of the DHX34 gene. \nSPU_027857 likely codes for part 2 of the DHX34 gene. \nSPU_011079 likely codes for part 3 of the DHX34 gene.\n
SPU_010141	SPU_010141	none	Same as SPU_000897.\n
SPU_000897	SPU_000897	none	Same as SPU_010141.\n
SPU_009949	SPU_009949	none	SPU_009949 is a fragment of SPU_000595.\n
SPU_011022	SPU_011022	none	Comparison to best blast hit suggests that the gene model lacks both N- and C- terminal sequences.  Note: the best blast hit encodes a huge protein more than 2800 amino acids.\n
SPU_001344	SPU_001344	none	Partial sequence.\n
SPU_017839	SPU_017839	none	the encoded protein has several ankyrin repeats\n
SPU_001143	SPU_001143	none	the encoded protein has several ankyrin repeats\n
SPU_015973	SPU_015973	none	the encoded protein has several ankyrin repeats\n
SPU_001707	SPU_001707	none	the encoded protein has several ankyrin repeats\n
SPU_015269	SPU_015269	none	the encoded protein has several ankyrin repeats\n
SPU_019767	SPU_019767	none	the encoded protein has several ankyrin repeats\n
SPU_022497	SPU_022497	none	the encoded protein has several ankyrin repeats\n
SPU_010321	SPU_010321	none	the encoded protein has several ankyrin repeats\n
SPU_015601	SPU_015601	none	the encoded protein has several ankyrin repeats\n
SPU_004926	SPU_004926	none	Partial sequence of DUSP4\n
SPU_009401	SPU_009401	none	Sequence spans collagen and head domains.  Profile scan using ScanProsite identified C1q profile from residues 133 to 269 (score 28.379).\n
SPU_021191	SPU_021191	none	May have an extra exon towards the end of the prediction.\n
SPU_000282	SPU_000282	none	Possible missing c-terminual coding exon relative to query.\n
SPU_007253	SPU_007253	none	predicted:similar to sterol regulatory element binding protein\n
SPU_017267	SPU_017267	none	predicted: similar to  mucin19 in S.purp\n
SPU_014795	SPU_014795	none	PREDICTED: similar to TRPC4-associated protein isoform b \n(transient receptor potential cation channel)\n
SPU_023016	SPU_023016	none	PREDICTED: similar to Fras1 related extracellular matrix protein  \n1\n
SPU_023856	SPU_023856	none	 \nThe AF-6 protein (also known as afadin and canoe) is a multidomain cell junction protein that contains two N-terminal Ras-associating (RA) domains in addition to FHA (forkhead-associated), DIL (class V myosin homology region), and PDZ domains and a proline-rich region\n
SPU_009389	SPU_009389	none	Hypothetical protein similar to CG15216-PA gene model  \nNo other info known\n
SPU_025634	SPU_025634	none	glean feature incomplete, missing exons predicted by genesh. \n
SPU_009107	SPU_009107	none	it is homolog of Mus musculus RIKEN cDNA 2310067G05 gene, function is not known. If you know the gene function, you can rename it and take it.\n
SPU_022632	SPU_022632	none	The protein is shorter than its mouse homolog.\n
SPU_005033	SPU_005033	none	There are about 7 more copies of Calm1 in sea urchin genome. SPU_005032, SPU_005033, SPU_021511, SPU_010425, SPU_008000, SPU_017814, SPU_024195. \n
SPU_018898	SPU_018898	none	#\nThis sequence was published in: Wedaman, K.P., Knight, A.E., Kendrick-Jones ,J. and Scholey, J.M. "Sequences of sea urchin kinesin light chain isoforms."  J. Mol. Biol. 231 (1), 155-158 (1993). \nThere are 4 known spliceoforms (mRNAs) encoded by this gene, "kinesin light chain isoform 1" through "isoform 4."  The one predicted in SPU_018898 has been named  "kinesin light chain isoform 4" (KLC-4).\n
SPU_023443	SPU_023443	none	VERY LARGE ADHESION PROTEIN - MAY BE A CONCATENATION \n \nLDLa x5 CCP2-EGFCa x3-NIDO-aAMOP-VWD-CA -EGF-many EGFCA-FOLN-TM \n \nNovel architecture\n
SPU_021797	SPU_021797	none	SPU_001260 is a partial duplicate prediction for SPU_021797.\n
SPU_013905	SPU_013905	none	SPU_008049 and SPU_013905 are duplicate predictions. \n
SPU_008049	SPU_008049	none	SPU_008049 and SPU_0 13905 are duplicate predictions. \n
SPU_013521	SPU_013521	none	SPU_013521 is a partial duplicate prediction for SPU_010382 (First 180 AA from both GLEAN's). Rest of 13521 does not appear to be similar to any protein in database.\n
SPU_010338	SPU_010338	none	This prediction is most similar to DDX43, though it could be a different DDX protein. It certainly is like a DDX protein in any case.\n
SPU_016062	SPU_016062	none	This GLEAN is almost certainly an incorrect version of the gene represented by SPU_016061, which is a Tubulin binding cofactor A (TBCA) homolog.  These two (adjacent) GLEANs differ only in their predicted amino-termini, but the predicted amino-terminus of SPU_016061 matches the rest of the proteins in the TBCA family well, while the predicted amino-terminus of SPU_016062 does not.\n
SPU_001260	SPU_001260	none	SPU_001260 is a partial duplicate prediction for SPU_021797.\n
SPU_010463	SPU_010463	none	This gene is in three GLEAN's. SPU_010463 has part 1 (to ~570 AA), SPU_024100 has part 2 (from ~ 200-1246 AA) and SPU_024101 has part 3 (~ 1028-1304 AA). AA numbers refer to human protein. In addition, SPU_010463 prediction overlaps SPU_024100 (~200-600 AA from 10463 ovelap with 1-396 AA from 24100). \n
SPU_024100	SPU_024100	none	This gene is in three GLEAN's. SPU_010463 has part 1 (to ~570 AA), SPU_024100 has part 2 (from ~ 200-1246 AA) and SPU_024101 has part 3 (~ 1028-1304 AA). AA numbers refer to human protein. In addition, SPU_010463 prediction overlaps SPU_024100 (~200-600 AA from 10463 ovelap with 1-396 AA from 24100). \n
SPU_024101	SPU_024101	none	This gene is in three GLEAN's. SPU_010463 has part 1 (to ~570 AA), SPU_024100 has part 2 (from ~ 200-1246 AA) and SPU_024101 has part 3 (~ 1028-1304 AA). AA numbers refer to human protein. In addition, SPU_010463 prediction overlaps SPU_024100 (~200-600 AA from 10463 ovelap with 1-396 AA from 24100). \n
SPU_001832	SPU_001832	none	Even though this is only a partial prediction, it precisely matches human SF3B14 protein. The rest of the human SF3B14 protein is not represented by other GLEANS's. \nWGS clone SPWDP1E744370A may have the mussing 50 AA at end.\n
SPU_021088	SPU_021088	none	Likely missing initial ~200 AA as compared to the human protein. \n \nSPU_004163 is a duplicate prediction for SPU_021088.\n
SPU_015453	SPU_015453	none	SPU_015453 is a partial duplicate prediction for SPU_005681 and MAY represent a better model for the latter half of this gene.\n
SPU_005681	SPU_005681	none	SPU_015453 is a partial duplicate prediction for SPU_005681 and MAY represent a better model for the latter half of this gene.\n
SPU_009634	SPU_009634	none	SPU_009634 is a duplicate prediction for SPU_012779.\n
SPU_008095	SPU_008095	none	SPU_005311 is a duplicate prediction for SPU_008095.\n
SPU_018805	SPU_018805	none	From Swiss Prot entry \n"FUNCTION: Substrate-recognition component of the SCF (SKP1-CUL1-F-box protein)-type E3 ubiquitin ligase complex (By similarity)." \n
SPU_010835	SPU_010835	none	Possibly missing the first exon.\n
SPU_005311	SPU_005311	none	SPU_005311 is a duplicate prediction for SPU_008095.\n
SPU_020121	SPU_020121	none	SPU_020121 has the first part of the gene. SPU_014430 has the rest of the gene.\n
SPU_000801	SPU_000801	none	SPU_023248 has first part of the gene. SPU_000801 has the rest.\n
SPU_023248	SPU_023248	none	SPU_023248 has first part of the gene. SPU_000801 has the rest.\n
SPU_024643	SPU_024643	none	SPU_024643 is a partial duplicate prediction for SPU_024644.\n
SPU_024644	SPU_024644	none	SPU_024643 is a partial duplicate prediction for SPU_024644.\n
SPU_024273	SPU_024273	none	Phylogenetic analysis shows that this glean model is in a clade with human Niemann-Pick C1 which is a patched related protein.\n
SPU_028882	SPU_028882	none	Phylogenetic analysis shows that this glean is highly similar to SPU_024273.  They both are in a clade with human Niemann Pick C1.\n
SPU_027985	SPU_027985	none	SPU_026660 has the same hit.\n
SPU_024492	SPU_024492	none	SPU_022249 has the same hit.\n
SPU_017600	SPU_017600	none	May be missing an exon at the beginning and end.\n
SPU_000170	SPU_000170	none	SPU_000170 has the first part of the gene and SPU_000171 has the rest.\n
SPU_002448	SPU_002448	none	same as SPU_013119.\n
SPU_023519	SPU_023519	none	SPU_023520 is a partial duplicate prediction of SPU_023519.\n
SPU_000171	SPU_000171	none	SPU_000170 has the first part of the gene and SPU_000171 has the rest.\n
Sp-185/333-01	SPU_030144	none	A partial gene on the end of the scaffold that includes the leader, and the start of the open reading frame (Elements 1-2).  \n
Sp-185/333-02	SPU_030145	none	Scaffold65222 is entirely 185/333 sequence, but the scaffold starts in the intron, so there is no start codon.  The sequence is element pattern most likely C4, although the scaffold sequence ends just before the putative stop codon.\n
Sp-185/333-03	SPU_030146	none	Partial gene: Leader, Intron, elements 1-3.\n
SPU_002645	SPU_002645	none	This GLEAN MAY code for SRPK1.\n
SPU_027906	SPU_027906	none	Portion of derived peptide sequence matches c-lectin domain (smart00034-E value=3e-04). \n \nExpressed in PMC est libraries.  On same scaffold as PM27.\n
SPU_028945	SPU_030147	none	Partial gene--continues off the beginning of Scaffold1445.  Did not appear in the original SPU_0XXXXX models.\n
SPU_022984	SPU_022984	none	SPU_022984 and SPU_024392 are both likely candidates for SFRS8. They internally have a significant overlap as well.\n
SPU_024392	SPU_024392	none	SPU_022984 and SPU_024392 are both likely candidates for SFRS8. They internally have a significant overlap as well.\n
SPU_017377	SPU_017377	none	SPU_017377 is a duplicate prediction for SPU_022855.\n
SPU_022855	SPU_022855	none	SPU_017377 is a duplicate prediction for SPU_022855.\n
SPU_010081	SPU_010081	none	e val = e -136 for NP_878906. \nThis peptide is identical in length (476aas) and sequence to SPU_000875 on scaffold 113994. \nAnnotated by RA Obar, BD Dyer, RL Morris.\n
SPU_025021	SPU_025021	none	Possible duplicated gene, SPU_027654\n
SPU_027654	SPU_027654	none	Possible duplicated gene, SPU_025021\n
SPU_001555	SPU_001555	none	Possible assemble error, SPU_008070 maybe belongs to 3' of this gene\n
SPU_008070	SPU_008070	none	Possible assemble error, SPU_001555 maybe belongs to 5' of this gene\n
SPU_013727	SPU_013727	none	SPU_013727 has first part of the LSM14A gene and SPU_013728 has the latter half.\n
SPU_013728	SPU_013728	none	SPU_013727 has first part of the LSM14A gene and SPU_013728 has the latter half.\n
SPU_025049	SPU_025049	none	SPU_016338 is a duplicate prediction for SPU_025409\n
SPU_016338	SPU_016338	none	SPU_016338 is a duplicate prediction for SPU_025409\n
SPU_025266	SPU_025266	none	Missing N-terminus.  N-terminus is SPU_028560.  \n
SPU_028560	SPU_028560	none	missing C-terminus.  C-terminus is predicted in SPU_025266.  Center doamin is overlapped.  \n
SPU_020437	SPU_020437	none	SPU_003537 has first part of USP52 and SPU_020437 has the rest.\n
SPU_016447	SPU_016447	none	SPU_027408 has the first part and SPU_016447 has the rest of the EXOSC10 gene. In addition, SPU_027408 and SPU_016447 share a significant partially identical overlap.\n
SPU_006216	SPU_006216	none	SPU_019209 is a significant partial duplicate prediction for SPU_006216.\n
SPU_022282	SPU_022282	none	May be the PTP domain of PTPRA. Partial gene.  Best hit was PTPSp8.\n
SPU_009557	SPU_009557	none	Partial sequence. PTPN3? PTPN4?\n
SPU_002877	SPU_002877	none	SPU_002877, 06741, 17776, 17115 and 10544 are partially identical duplicate predictions of varying degree.\n
SPU_006741	SPU_006741	none	SPU_002877, 06741, 17776, 17115 and 10544 are partially identical duplicate predictions of varying degree.\n
SPU_017776	SPU_017776	none	SPU_002877, 06741, 17776, 17115 and 10544 are partially identical duplicate predictions of varying degree.\n
SPU_017115	SPU_017115	none	SPU_002877, 06741, 17776, 17115 and 10544 are partially identical duplicate predictions of varying degree.\n
SPU_010544	SPU_010544	none	SPU_002877, 06741, 17776, 17115 and 10544 are partially identical duplicate predictions of varying degree.\n
SPU_026886	SPU_026886	none	There may be an extra exon(s) in the prediction.\n
SPU_017940	SPU_017940	none	Partial sequence.\n
SPU_007363	SPU_007363	none	Partial sequence.\n
SPU_014401	SPU_014401	none	The following exon prediction is probably incorrect as it codes for another peptide sequence found in a different class of proteins. \n>SPU_014401|Scaffold442|42154|42721| DNA_SRC: Scaffold442 START: 42154 STOP: 42721 STRAND: +  \nATGGAAGAGCAAATCACCGCAAATCTTTTTAATTGCTCTATCATGAATGCTATGCCCTACGACGATGATA \nACTTTGAGTCGCCATCGACATCACCACCTACATACGCCGAGCTCACACCCGCTGTCAATCACACTTTCAA \nTCACGGCAACATCAATTTTGATCACAACACCAGTTACGACGACGGCAACATAAGATATGAACACGACAAC \nAGCAACCATAACTTTGACGAACAAGTACCCTTGAGCACCGCGCATCTTCTTGACATCTTATCGACGACGG \nATGTCGACATCAACAATATCGCAAATGACGGGGAGGAAGAGGGAAGCGACGAGGGGAGCGAACTCGCAGC \nGTATCTCTTTCAGAATTCGGAATGGATTACGAATAACGCGACTTTAGACGATTCTCAATATTCAACTGCA \nGTTAACGGTGACCCGCAACACTTTCAGAGTTGCTACACGAATAAGTCCATGGGCTATGGCAACACTTCGT \nTCAACAGCAGCTATCATGAGGCTCACACCTTGCCACAAGTACCTTATTTTGGACATACTGATACTCAACA \nTGCTCAAG \n
SPU_010793	SPU_010793	none	missing N-terminus residues, partial\n
SPU_002993	SPU_002993	none	SPU_002993 has the first 1/3rd of the gene. SPU_025005 has the last 1/3rd. Middle part appears to be missing.\n
SPU_025005	SPU_025005	none	SPU_002993 has the first 1/3rd of the gene. SPU_025005 has the last 1/3rd. Middle part appears to be missing.\n
SPU_023532	SPU_023532	none	Domains: DEATH, NACHT, LRRs \n
SPU_025680	SPU_025680	none	#\nDomains: DEATH, NACHT, LRRs \n
SPU_028681	SPU_028681	none	Domains: DEATH, NACHT, LRRs\n
SPU_003200	SPU_003200	none	Domains: DEATH, NACHT, LRRs \n
SPU_009017	SPU_009017	none	#\nDomains: DEATH, NACHT, LRRs\n
SPU_014128	SPU_014128	none	Domains: Signal peptide, DEATH, NACHT, LRRs.\n
SPU_004053	SPU_004053	none	Domains: DEATH, NACHT, LRRs\n
SPU_002641	SPU_002641	none	Domains: DEATH, NACHT, LRRs\n
SPU_026921	SPU_026921	none	Domains: DEATH, NACHT, LRRs. \n
SPU_015340	SPU_015340	none	Domains: DEATH, NACHT, LRRs. \n
SPU_017054	SPU_017054	none	#\nDomains: DEATH, NACHT, LRRs.\n
SPU_017993	SPU_017993	none	Domains: DEATH, NACHT, LRRs.\n
SPU_023628	SPU_023628	none	Domains: DEATH, NACHT, LRRs.\n
SPU_009111	SPU_009111	none	Domains: DEATH, NACHT, LRRs.\n
SPU_022780	SPU_022780	none	Domains: DEATH, NACHT, LRRs. \n
SPU_002868	SPU_002868	none	Domains: DEATH, NACHT, LRRs.\n
SPU_020380	SPU_020380	none	Domains: DEATH, NACHT, LRRs.\n
SPU_027858	SPU_027858	none	Domains: DEATH, NACHT, LRRs.\n
SPU_015972	SPU_015972	none	Domains: DEATH, NACHT, LRRs.\n
SPU_002436	SPU_002436	none	Domains: DEATH, NACHT, LRRs.\n
SPU_003715	SPU_003715	none	Domains: DEATH, NACHT, LRRs.\n
SPU_021243	SPU_021243	none	Domains: DEATH, NACHT, LRRs.\n
Sp-VEGF-3	SPU_030148	none	This model was created based on RACE sequence (5'end) and completed based on a Fgenesh++ model (S.P_Scaffold78.seq.N000007).\n
SPU_021148	SPU_021148	none	Found by Tandem Mass spectrometry of S. purpuratus sperm membranes\n
SPU_006122	SPU_006122	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nThis and all other Sp-SARM-related genes cluster "tightly" with Sp-Sarm (SPU_011042) and other Sarm genes in MSA trees built based on the sequence of their TIR domains. In addition, most of Sp-Sarm-related genes have similar domain compositions to that of Sarm genes (TIR, Armadillo/b-catenin and SAM domains), and those that do not are often located at the end of their respective scaffold and might be incomplete models. It should be noted, however, that the subfamily of Sarm-related genes shows a different domain organization than that of Sarms (including Sp-Sarm).\n
SPU_020008	SPU_020008	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nThis and all other Sp-SARM-related genes cluster "tightly" with Sp-Sarm (SPU_011042) and other Sarm genes in MSA trees built based on the sequence of their TIR domains. In addition, most of Sp-Sarm-related genes have similar domain compositions to that of Sarm genes (TIR, Armadillo/b-catenin and SAM domains), and those that do not are often located at the end of their respective scaffold and might be incomplete models. It should be noted, however, that the subfamily of Sarm-related genes shows a different domain organization than that of Sarms (including Sp-Sarm). \n \nNB: Identical models were generated by other gene prediction protocols.\n
SPU_007020	SPU_007020	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nThis and all other Sp-SARM-related genes cluster "tightly" with Sp-Sarm (SPU_011042) and other Sarm genes in MSA trees built based on the sequence of their TIR domains. In addition, most of Sp-Sarm-related genes have similar domain compositions to that of Sarm genes (TIR, Armadillo/b-catenin and SAM domains), and those that do not are often located at the end of their respective scaffold and might be incomplete models. It should be noted, however, that the subfamily of Sarm-related genes shows a different domain organization than that of Sarms (including Sp-Sarm).\n
SPU_000764	SPU_000764	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nThis and all other Sp-SARM-related genes cluster "tightly" with Sp-Sarm (SPU_011042) and other Sarm genes in MSA trees built based on the sequence of their TIR domains. In addition, most of Sp-Sarm-related genes have similar domain compositions to that of Sarm genes (TIR, Armadillo/b-catenin and SAM domains), and those that do not are often located at the end of their respective scaffold and might be incomplete models. It should be noted, however, that the subfamily of Sarm-related genes shows a different domain organization than that of Sarms (including Sp-Sarm). \n \nNB: The exon structure and embryonic expression of this gene are supported by the genome-wide tiling array hybridization data.\n
SPU_015127	SPU_015127	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nThis and all other Sp-SARM-related genes cluster "tightly" with Sp-Sarm (SPU_011042) and other Sarm genes in MSA trees built based on the sequence of their TIR domains. In addition, most of Sp-Sarm-related genes have similar domain compositions to that of Sarm genes (TIR, Armadillo/b-catenin and SAM domains), and those that do not are often located at the end of their respective scaffold and might be incomplete models. It should be noted, however, that the subfamily of Sarm-related genes shows a different domain organization than that of Sarms (including Sp-Sarm). \n \nNB: This model lacks the SAM domains found in other members of the Sp-Sarm-related subfamily of genes (Sp-Sarm 1-4). Accordingly, it shows a weaker clustering with these models. The analysis of the scaffold in which this model is located does not reveal any obvious missing sequence, which would suggest a true loss of this domain.\n
SPU_021841	SPU_021841	none	This gene was modified and annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nThis and all other Sp-SARM-related genes cluster "tightly" with Sp-Sarm (SPU_011042) and other Sarm genes in MSA trees built based on the sequence of their TIR domains. In addition, most of Sp-Sarm-related genes have similar domain compositions to that of Sarm genes (TIR, Armadillo/b-catenin and SAM domains), and those that do not are often located at the end of their respective scaffold and might be incomplete models. It should be noted, however, that the subfamily of Sarm-related genes shows a different domain organization than that of Sarms (including Sp-Sarm). \n \nThis model was modified based on otherwise identical Fgenesh++/AB predictions that incorporate an additional C-term exon. This exon is supported by the tiling array data, and it codes for a (sub-optimal) SAM domain, which is present in other members of this subfamily of Sp-Sarm related genes. \n \nNB: The structure and embryonic expression of this gene is supported by the embryonic tiling array data.\n
SPU_018859	SPU_018859	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nThis and all other Sp-SARM-related genes cluster "tightly" with Sp-Sarm (SPU_011042) and other Sarm genes in MSA trees built based on the sequence of their TIR domains. In addition, most of Sp-Sarm-related genes have similar domain compositions to that of Sarm genes (TIR, Armadillo/b-catenin and SAM domains), and those that do not are often located at the end of their respective scaffold and might be incomplete models (NB: this model is one example, it is located at the end of the scaffold and there might be missing N-ter sequence). It should be noted, however, that the subfamily of Sarm-related genes shows a different domain organization than that of Sarms (including Sp-Sarm).\n
SPU_027640	SPU_027640	none	This gene was modified and annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nThis and all other Sp-SARM-related genes cluster "tightly" with Sp-Sarm (SPU_011042) and other Sarm genes in MSA trees built based on the sequence of their TIR domains. In addition, most of Sp-Sarm-related genes have similar domain compositions to that of Sarm genes (TIR, Armadillo/b-catenin and SAM domains), and those that do not are often located at the end of their respective scaffold and might be incomplete models. It should be noted, however, that the subfamily of Sarm-related genes shows a different domain organization than that of Sarms (including Sp-Sarm). \n \nThis model was modified based on an otherwise identical BCM:Gene prediction that incorporates an additional N-term exon. This additional exon codes for sequence that includes a sub-optimal prediction for an Armadillo/b-catenin domain, which is present in other Sp-Sarm-related genes. The additional exon corresponds to an adjacent glean model (SPU_027639) and thus the modified version of this model fuses both glean models.\n
SPU_003495	SPU_003495	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nThis and all other Sp-SARM-related genes cluster "tightly" with Sp-Sarm (SPU_011042) and other Sarm genes in MSA trees built based on the sequence of their TIR domains. In addition, most of Sp-Sarm-related genes have similar domain compositions to that of Sarm genes (TIR, Armadillo/b-catenin and SAM domains), and those that do not are often located at the end of their respective scaffold and might be incomplete models. It should be noted, however, that the subfamily of Sarm-related genes shows a different domain organization than that of Sarms (including Sp-Sarm). \n \nNB: The exon structure and embryonic expression of this model are strongly supported by the tiling-array data.\n
SPU_004557	SPU_004557	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nThis and all other Sp-SARM-related genes cluster "tightly" with Sp-Sarm (SPU_011042) and other Sarm genes in MSA trees built based on the sequence of their TIR domains. In addition, most of Sp-Sarm-related genes have similar domain compositions to that of Sarm genes (TIR, Armadillo/b-catenin and SAM domains), and those that do not are often located at the end of their respective scaffold and might be incomplete models. It should be noted, however, that the subfamily of Sarm-related genes shows a different domain organization than that of Sarms (including Sp-Sarm). \n \nNB: The exon structure and embryonic expression of this model are strongly supported by the tiling array data.\n
SPU_004107	SPU_004107	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nThis and all other Sp-SARM-related genes cluster "tightly" with Sp-Sarm (SPU_011042) and other Sarm genes in MSA trees built based on the sequence of their TIR domains. In addition, most of Sp-Sarm-related genes have similar domain compositions to that of Sarm genes (TIR, Armadillo/b-catenin and SAM domains), and those that do not are often located at the end of their respective scaffold and might be incomplete models. It should be noted, however, that the subfamily of Sarm-related genes shows a different domain organization than that of Sarms (including Sp-Sarm).\n
SPU_008302	SPU_008302	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nThis and all other Sp-SARM-related genes cluster "tightly" with Sp-Sarm (SPU_011042) and other Sarm genes in MSA trees built based on the sequence of their TIR domains. In addition, most of Sp-Sarm-related genes have similar domain compositions to that of Sarm genes (TIR, Armadillo/b-catenin and SAM domains), and those that do not are often located at the end of their respective scaffold and might be incomplete models (this model is one such example; there may be some missing N-ter sequence). It should be noted, however, that the subfamily of Sarm-related genes shows a different domain organization than that of Sarms (including Sp-Sarm). \n \nNB: The exon structure and embryonic expression of this model are strongly supported by tiling array data.\n
SPU_007088	SPU_007088	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nThis and all other Sp-SARM-related genes cluster "tightly" with Sp-Sarm (SPU_011042) and other Sarm genes in MSA trees built based on the sequence of their TIR domains. In addition, most of Sp-Sarm-related genes have similar domain compositions to that of Sarm genes (TIR, Armadillo/b-catenin and SAM domains), and those that do not are often located at the end of their respective scaffold and might be incomplete models. It should be noted, however, that the subfamily of Sarm-related genes shows a different domain organization than that of Sarms (including Sp-Sarm). \n \nThis model also includes a predicted transmembrane domain, a feature not seen in other members of the subfamily of Sp-Sarm-related models. \n \nNB: The exon structure and embryonic expression of this model are strongly supported by tiling array data. Moreover, some tiling array signal falls into introns of this prediction, and there could therefore exist more exonic sequence that was not called for by the prediction protocols.\n
SPU_016014	SPU_016014	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nSp-TIR-containing models do not show any characteristic domain composition, nor do they cluster significantly with other TIR-containing genes (in S.purpuratus or any other species). \n \nNB: This model is located in a small scaffold. Therefore, there could be missing sequence towards both the N-ter and C-ter ends of the model.\n
SPU_007952	SPU_007952	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nSp-TIR-containing models do not show any characteristic domain composition, nor do they cluster significantly with other TIR-containing genes (in S.purpuratus or any other species). The TIR domain in this model showed a weak co-segregation with TIR-domains from MyD88 genes in a MSA tree. However its domain composition/structure does not resemble any of the MyD88 genes. \n \nNB: The exon structure and embryonic expression of this model are partly supported by tiling array data. Also note this model seems duplicated in GLEAN 3_13299 (99% seq identity at aminoacid level).\n
SPU_012671	SPU_012671	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nSp-TIR-containing models do not show any characteristic domain composition, nor do they cluster significantly with other TIR-containing genes (in S.purpuratus or any other species). This model, however, co-clustered tightly with three other Sp-TIR-containing models (Sp-TIR-c5,6,7). Despite this strong co-seggregation, Sp-TIR-c4-7 do not show any characteristic domain structure/composition.\n
SPU_014926	SPU_014926	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nSp-TIR-containing models do not show any characteristic domain composition, nor do they cluster significantly with other TIR-containing genes (in S.purpuratus or any other species).\n
SPU_003608	SPU_003608	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nSp-TIR-containing models do not show any characteristic domain composition, nor do they cluster significantly with other TIR-containing genes (in S.purpuratus or any other species). This model, however, co-clustered tightly with three other Sp-TIR-containing models (Sp-TIR-c4,6,7). Despite this strong co-seggregation, Sp-TIR-c4-7 do not show any characteristic domain structure/composition. \n \nNB: The exon structure and embryonic expression of this model are strongly supported by tiling array data.\n
SPU_013352	SPU_013352	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nSp-TIR-containing models do not show any characteristic domain composition, nor do they cluster significantly with other TIR-containing genes (in S.purpuratus or any other species). This model, however, co-clustered tightly with three other Sp-TIR-containing models (Sp-TIR-c4,5,7). Despite this strong co-seggregation, Sp-TIR-c4-7 do not show any characteristic domain structure/composition. \n \nNB: This model is located at the end of a scaffold and represents a partial model (there is no ATG at th N-ter of the sequence).\n
SPU_020131	SPU_020131	none	This gene was annotated based on a manual inspection of multiple protein sequence alignments and domain structures. \n \nSp-TIR-containing models do not show any characteristic domain composition, nor do they cluster significantly with other TIR-containing genes (in S.purpuratus or any other species). This model, however, co-clustered tightly with three other Sp-TIR-containing models (Sp-TIR-c4-6). Despite this strong co-seggregation, Sp-TIR-c4-7 do not show any characteristic domain structure/composition. \n \nNB: Some exons of this model show very strong signal in the tiling array embryonic hybridization experiment, which supports the embryonic expression of this model.\n
SPU_012667	SPU_012667	none	This model was annotated based on a manual inspection of sequence alignments and domain composition. \n \nWe named this model with the sufix "a" (and not a number) to avoid the misleading assumption that this model would represent any specific ortholog of vertebrate SAA genes (e.g. SAA1). \n \nThis model is very similar in sequence to the adjacent SPU_012668 (Sp-Saa-b). However, there are enough differences between them both at the nt and aa level, and therefore they most likely represent two different genes. \n
SPU_012668	SPU_012668	none	This model was annotated based on a manual inspection of sequence alignments and domain composition. \n \nWe named this model with the sufix "b" (and not a number) to avoid the misleading assumption that this model would represent any specific ortholog of vertebrate SAA genes (e.g. SAA2). \n \nThis model is very similar in sequence to the adjacent SPU_012667 (Sp-Saa-a). However, there are enough differences between them both at the nt and aa level, and therefore they most likely represent two different genes. \n
SPU_000045	SPU_000045	none	Shows significant homology to human Ficolin-1.  Sequence overlap occurs in the fibringen domain.\n
SPU_020449	SPU_020449	none	 fragment\n
SPU_020501	SPU_020501	none	 fragment\n
SPU_020584	SPU_020584	none	 fragment\n
SPU_020736	SPU_020736	none	 fragment\n
SPU_020959	SPU_020959	none	 tiny fragment\n
SPU_021072	SPU_021072	none	 fragment\n
SPU_021329	SPU_021329	none	 tiny fragment\n
SPU_021353	SPU_021353	none	 fragment\n
SPU_021423	SPU_021423	none	 partial, missing C-terminus half\n
SPU_021534	SPU_021534	none	 fragment\n
SPU_021540	SPU_021540	none	 tiny fragment\n
SPU_021636	SPU_021636	none	 fragment\n
SPU_021657	SPU_021657	none	 missing N- and C-terminus\n
SPU_021711	SPU_021711	none	 fragment\n
SPU_021836	SPU_021836	none	 only N-terminal fragment\n
SPU_021984	SPU_021984	none	 tiny fragment\n
SPU_022014	SPU_022014	none	 fragment\n
SPU_022413	SPU_022413	none	 fragment\n
SPU_023181	SPU_023181	none	 fragment\n
SPU_023246	SPU_023246	none	 fragment\n
SPU_023276	SPU_023276	none	 fragment\n
SPU_023521	SPU_023521	none	 tiny fragment\n
SPU_023818	SPU_023818	none	 fragment, missing N-terminus half\n
SPU_023997	SPU_023997	none	 fragment\n
SPU_024225	SPU_024225	none	 fragment\n
SPU_024330	SPU_024330	none	 fragment, missing N-terminus region and a stretch in middle\n
SPU_024422	SPU_024422	none	 tiny fragment\n
SPU_024573	SPU_024573	none	 fragment\n
SPU_024701	SPU_024701	none	 tiny fragment\n
SPU_024730	SPU_024730	none	 tiny fragment\n
SPU_024872	SPU_024872	none	 small fragment\n
SPU_024892	SPU_024892	none	 fragment\n
SPU_024939	SPU_024939	none	 fragment\n
SPU_025048	SPU_025048	none	 fragment\n
SPU_025189	SPU_025189	none	 fragment\n
SPU_025694	SPU_025694	none	 fragment\n
SPU_025798	SPU_025798	none	 missing C-terminus residues\n
SPU_025847	SPU_025847	none	 small fragment\n
SPU_025877	SPU_025877	none	 small fragment\n
SPU_026128	SPU_026128	none	 fragment\n
SPU_026792	SPU_026792	none	 fragment\n
SPU_027323	SPU_027323	none	 missing N-terminus residues\n
SPU_027582	SPU_027582	none	 fragment\n
SPU_028841	SPU_028841	none	 fragment\n
SPU_011626	SPU_011626	none	 fragment\n
SPU_019005	SPU_019005	none	 fragment\n
SPU_003531	SPU_003531	none	 extra C-terminus residues\n
SPU_004263	SPU_004263	none	 fragment\n
SPU_009652	SPU_009652	none	 fragment\n
SPU_009845	SPU_009845	none	 fragment\n
SPU_015734	SPU_015734	none	 tiny fragment\n
SPU_017456	SPU_017456	none	 fragment\n
SPU_018249	SPU_018249	none	 fragment\n
SPU_024068	SPU_024068	none	 fragment\n
SPU_026707	SPU_026707	none	 fragment\n
SPU_000214	SPU_000214	none	 missing some N-terminus residues\n
SPU_001700	SPU_001700	none	 fragment\n
SPU_002107	SPU_002107	none	 fragment\n
SPU_002203	SPU_002203	none	 fragment\n
SPU_002274	SPU_002274	none	 fragment\n
SPU_002980	SPU_002980	none	 tiny fragment\n
SPU_003017	SPU_003017	none	 fragment\n
SPU_003265	SPU_003265	none	 fragment\n
SPU_003408	SPU_003408	none	 missing N-terminus region\n
SPU_003563	SPU_003563	none	 fragment\n
SPU_003804	SPU_003804	none	 fragment\n
SPU_004207	SPU_004207	none	 fragment\n
SPU_004637	SPU_004637	none	 fragment\n
SPU_005345	SPU_005345	none	 partial\n
SPU_006178	SPU_006178	none	 fragment\n
SPU_007650	SPU_007650	none	 fragment\n
SPU_007810	SPU_007810	none	 fragment\n
SPU_007990	SPU_007990	none	 small fragment\n
SPU_008078	SPU_008078	none	 fragment\n
SPU_008090	SPU_008090	none	 tiny fragment\n
SPU_008096	SPU_008096	none	 fragment\n
SPU_008341	SPU_008341	none	 partial\n
SPU_008681	SPU_008681	none	 fragment\n
SPU_009010	SPU_009010	none	 fragment\n
SPU_009101	SPU_009101	none	 fragment\n
SPU_009349	SPU_009349	none	 fragment\n
SPU_009393	SPU_009393	none	 fragment\n
SPU_009442	SPU_009442	none	 partial, missing N-terminus region\n
SPU_009533	SPU_009533	none	 partial\n
SPU_004300	SPU_004300	none	This is a partial sequence.  Does not form a clade with human Ppm1g in phylogenetic analysis of the PP2C subfamily of PPM phosphatases.   Most similar to SPU_014625.\n
SPU_005327	SPU_005327	none	Partial sequence.  This is N-terminal to SPU_004300. Identification is partially based on the EST sequence matching these two sequences.\n
SPU_006956	SPU_006956	none	Partial sequence containing PP2Ac domain.\n
SPU_019872	SPU_019872	none	Most of the exons encoded; exon 10  is identical to SPU_004123 exon 2 while the final exon (11) is also in SPU_004123 (bases 2099-2275). \n
SPU_000896	SPU_000896	none	Domains: DEATH, NACHT, LRRs.\n
SPU_004043	SPU_004043	none	Domains: DEATH, NACHT, LRRs.\n
SPU_017038	SPU_017038	none	Domains: DEATH, NACHT, LRRs.\n
SPU_017708	SPU_017708	none	Domains: DEATH, NACHT, LRRs.\n
SPU_022294	SPU_022294	none	Domains: DEATH, NACHT, LRRs.\n
SPU_001392	SPU_001392	none	NOTE: Sequence and alignment suggest this is the 5' end of a single eIF-5B gene, comprised of GLEANs 01392 (C-Term) and 01393 (N-term).\n
SPU_001393	SPU_001393	none	NOTE: Sequence and alignment suggest this is the 5' end of a single eIF5B gene, comprised of GLEANs 01392 (C-Term) and 01393 (N-term). \n
SPU_001394	SPU_001394	none	NOTE-PARTIALseq--identical to middle of eIF5B (glean 01392)\n
SPU_000506	SPU_000506	none	PARTIAL.  When aligned to human SELB, this contains the start site, but ends before Glean22870 begins --(22870) is the "REST" of SELB\n
SPU_022870	SPU_022870	none	PARTIAL.  When aligned to human SELB, this contains most of SELB, but without the 5'end.  Note--GLEAN 00506 appears to contain the start site and ends before this glean begins.  \n
SPU_016991	SPU_016991	none	NOTE partial sequence which is a duplication of the middle of GLEAN 22870 (the bulk of SELB).  \n
SPU_003646	SPU_003646	none	SPU_009292 is identical to 3' end of this sequence\n
SPU_009292	SPU_009292	none	partial--Duplication of 3' part of Glean 03646-eIF2alpha\n
SPU_006351	SPU_006351	none	Sp1200 Bacterially activated coelomocyte, arrayed \ncDNA clone Sp1200 5' similar to amassin, mRNA sequence.\n
SPU_023924	SPU_023924	none	This is the same as SPU_006351, with the addition of an extra exon, and some additional sequences in one of the other exons. Considered to be an allele of SPU_006351. mRNA obtained from coelomocytes exposed to bateria.\n
SPU_023548	SPU_023548	none	Significant homology to human ficolin2 (FCN2).  Overlaps occur at fibrinogen C-terminal domain (Expasy annotation)\n
SPU_004912	SPU_004912	none	SPU_002231 is a duplicate prediction for SPU_004912.\n
SPU_021032	SPU_021032	none	SPU_021032 is a partial duplicate prediction for SPU_008786.\n
SPU_008786	SPU_008786	none	SPU_021032 is a partial duplicate prediction for SPU_008786.\n
SPU_010773	SPU_010773	none	First ~400 aa from the human protein have no homology in urchin GLEAN's.\n
SPU_005511	SPU_005511	none	Identified from cDNA clone from gram negative bacterially activated coelomocyte of sea urchin.\n
SPU_013960	SPU_013960	none	Identified from cDNA clone from gram negative bacterially activated coelomocyte of sea urchin.\n
SPU_027883	SPU_027883	none	SPU_013201 has the first part of the DHX36 gene. SPU_027883 has the rest of the gene.\n
SPU_013201	SPU_013201	none	SPU_013201 has the first part of the DHX36 gene. SPU_027883 has the rest of the gene.\n
SPU_027440	SPU_027440	none	Partial duplicate prediction for SPU_027883.\n
SPU_014106	SPU_014106	none	SPU_014106 shows a significant partial overlap with SPU_026336 which codes for the first part of DHX34. First 170 aa from SPU_014106 show no significant homology to human proteins.\n
SPU_008946	SPU_008946	none	SPU_014505 has the first part of DDX56 and SPU_008946 has the rest.\n
SPU_014505	SPU_014505	none	SPU_014505 has the first part of DDX56 and SPU_008946 has the rest.\n
SPU_012487	SPU_012487	none	SPU_012487 is a partial duplicate prediction for SPU_008946.\n
SPU_016488	SPU_016488	none	SPU_016488 is a partial duplicate prediction for SPU_016467.\n
SPU_002662	SPU_002662	none	partial sequence containing TGF-beta domain, it has the identical c terminal end as SPU_012786, but appears to be different towards it's N terminal end\n
SPU_022079	SPU_022079	none	contains TGFbeta domain\n
SPU_028397	SPU_028397	none	SPU_016552 has one part and SPU_028397 appears to have the last part of DDX42.\n
SPU_009443	SPU_009443	none	e val for NP_065867= 9e-54; kinesin family member 17 [Homo sapiens].  \nIn the alignment with the best human hit, there is a gap of 13 aa in SPU_009443 which corresponds to a gap of 374 aa in the NP_065867. \nAnnotation by RA Obar, RL Morris, LE Shorey, SA Tower, and B Rossetti.\n
SPU_026884	SPU_026884	none	This GLEAN3 sequence is apparently missing some of the beginning sequences of the other Sp-amassins. Note the sequences in the alignments. The data against which it was compared came from the coelomocytes of a bacterially activated sea urchin.\n
SPU_000409	SPU_000409	none	This is likely a duplication of SPU_013950 (98+% ID ata aa level). Please see SPU_013950 for details.\n
SPU_005998	SPU_005998	none	SPU_005998 and SPU_019123 code for DDX52. They have a partial identical overlap that may have a haplotype as well.\n
SPU_019123	SPU_019123	none	SPU_005998 and SPU_019123 code for DDX52. They have a partial identical overlap that may have a haplotype as well.\n
SPU_000362	SPU_000362	none	Potential MASP\n
SPU_003126	SPU_003126	none	This model was annotated on a manual inspection of sequence alignments and domain structure. \n \nThis model shows a very high degree of similarity to part of an adjacent model (SPU_003127) corresponding to one of the Sp-SRCR genes. The region of similarity includes both the extra and intracellular juxtamembrane regions, the transmembrane domain and a long cytoplasmic tail of low complexity, but excludes the SRCR domains, which are replaced by 2xEGF + 2xIG domains. \n \nThe position and orientation of both models in a single uninterrupted contig suggests that they did not originate as an assembly problem, but that they may represent a true gene duplication/divergence event. Of note, a very similar situation can be seen with SPU_022566/SPU_022567.\n
SPU_022566	SPU_022566	none	This model was annotated on a manual inspection of sequence alignments and domain structure. \n \nThis model shows a very high degree of similarity to part of three adjacent models (SPU_022567-9) corresponding to Sp-SRCR genes. The region of similarity includes both the extra and intracellular juxtamembrane regions, the transmembrane domain and a long cytoplasmic tail of low complexity, but excludes the SRCR domains, which are replaced by EGF+IG domains. \n \nThe position and orientation of these models in long, uninterrupted contigs suggests that they did not originate as an assembly problem, but that they may represent true gene duplication/divergence events. Of note, a very similar situation was seen with SPU_003126/SPU_003127.\n
SPU_009374	SPU_009374	none	This GLEAN3 model encodes a partial ORC6 sequence in which exon 3 is probably missing due to an inappropriate fusion of contigs. See GLEAN_05343 for complete ORC6.\n
SPU_006096	SPU_006096	none	An incomplete MCM2 gene sequence is also find on SPU_011491\n
SPU_012983	SPU_012983	none	This Glean3 model encodes the entire MCM3 gene sequence; however the first exon probably not belongs to this protein, being artifactually fused to the MCM3 gene in the scaffold. \n
SPU_024816	SPU_024816	none	#\nThe scaffold assembly should be revised. \nAnother GLEAN encodes the CDC45 sequence: SPU_023032, in that case also exons are missing or artefactually assembled. \n
SPU_006237	SPU_006237	none	Segment of KRP95, annotated fully in SPU_026280. \nAnnotation by RL Morris, R.A.Obar, and B Rossetti.\n
SPU_009764	SPU_009764	none	Segment of KRP95, annotated fully in SPU_026280. \nAnnotation by RL Morris, R.A.Obar, and B Rossetti.\n
SPU_027513	SPU_027513	none	#\nDomains: DEATH, NACHT, LRRs.\n
SPU_001054	SPU_001054	none	Domains: DEATH, NACHT, LRRs. \n
SPU_016921	SPU_016921	none	Domains: DEATH, NACHT, LRRs. \nThis gene could be incomplete. The scaffold is quite small. \n
SPU_001548	SPU_001548	none	Domains: NACHT, LRRs. Genscan model has DEATH domain but no methionine. Likely incomplete.\n
SPU_021447	SPU_021447	none	Domains: DEATH, NACHT, LRRs.\n
SPU_002423	SPU_002423	none	Domains: DEATH, NACHT, LRRs. This model could be incomplete, the scaffold is very short.\n
SPU_026071	SPU_026071	none	Domains: DEATH, NACHT, LRRs.\n
SPU_021478	SPU_021478	none	Domains: CARD, DEATH, NACHT, LRRs. \nSome LRRs could be missing since the scaffold is short. \n
SPU_026020	SPU_026020	none	Domains: NACHT, LRRs. Fgenesh model has DEATH domain.\n
SPU_022394	SPU_022394	none	Domains: DEATH, NACHT, LRRs. Genscan model has additional exons. This gene model could be incomplete, this scaffold is small.\n
SPU_008431	SPU_008431	none	Domains: DEATH, NACHT, LRRs, TM. \nThe Genscan prediction does not have the TM.\n
SPU_024649	SPU_024649	none	Domains: NACHT, LRRs. \nGenscan model has different exon/intron structure and contains DEATH domain. Some LRRs could be missing since this model is at the end of a scaffold.\n
SPU_015105	SPU_015105	none	Domains: DEATH, NACHT, LRRs. \nThis gene model is on a short scaffold and could be incomplete.\n
SPU_003934	SPU_003934	none	Domains: Signal peptide, DEATH, NACHT, LRRs. \nThe Fgenesh model has 2 additional exons: could have more LRRs.\n
SPU_005462	SPU_005462	none	Domains: DEATH, NACHT, LRRs. \nThis gene model is on a small scaffold. It could be incomplete.\n
SPU_002183	SPU_002183	none	Shows essentially identical sequences to the XP_786692 accession number for the ADEAMc (adenosine deaminase tRNA specific) gene for the Sea Urchin as predicted by bioinformatics data.\n
SPU_004872	SPU_004872	none	Domains: Signal peptide, DEATH, NACHT, LRRs.\n
SPU_002962	SPU_002962	none	Domains: DEATH, NACHT, LRRs. \n
SPU_023550	SPU_023550	none	#\nDomains: DEATH, NACHT, LRRs. \n
SPU_025138	SPU_025138	none	Domains: DEATH, NACHT. \nThe Fgenesh model has 4 additional exons and has LRRs.\n
SPU_028387	SPU_028387	none	Domains: DEATH, NACHT, LRRs.\n
SPU_002231	SPU_002231	none	SPU_002231 is a duplicate prediction for SPU_004912.\n
SPU_004163	SPU_004163	none	SPU_004163 is a duplicate prediction for SPU_021088.\n
SPU_004597	SPU_004597	none	SPU_004597 is a duplicate prediction for SPU_010556.\n
SPU_022130	SPU_022130	none	Domains: Signal peptide, DEATH, NACHT, LRRs. \n
SPU_005004	SPU_005004	none	#\nSPU_005004 is a partial duplicate prediction for SPU_003407\n
SPU_005099	SPU_005099	none	Partial sequence.\n
SPU_012661	SPU_012661	none	looks like C-terminal region of a myosin 10 \n \nPH-MyTH4-B41 - missing N-terminal region containing MySc and IQ repeats\n
SPU_018982	SPU_018982	none	looks like C-terminal region of a myosin 10 \n \nPHx2-MyTH4-B41 - missing N-terminal region containing MySc and IQ repeats\n
SPU_023549	SPU_023549	none	looks like a partial of myosin 10 \nMySc-IQx3-PHx2-MyTH4 - missing B41 domain at C-terminus\n
SPU_026077	SPU_026077	none	looks like a C-terminal fragment of myosin 7 or 15 \n \nSH3-MyTH4-B41 \n \nmissing N-terminal part with MySc-IQ-IQ-IQ-MyTH4-B41\n
SPU_028273	SPU_028273	none	looks like a fragment of a myosin 10 \n \nPH-PH-MyTH4 \n \nmissing both N-terminus - MySc plus IQ repeats and C-terminus - B41  \n \n \n
SPU_006732	SPU_006732	none	just a reelin domain - nothing else - could be part of a reelin\n
SPU_028149	SPU_028149	none	multiple ANK domains, one SAM and a PARP domain \n \nlooks like Tankyrase - ADP ribosylase reportedly involved in modifying telomere-associated proteins and regulating GLUT4 traffic in the Golgi\n
SPU_007551	SPU_007551	none	9 ankyrin repeats and a single SAM domain \n \nToo many repeats to be exactly SANS - unless that actually has more - but definitely a homolog\n
SPU_022872	SPU_022872	none	four ankyrin domains and a single SAM - looks like a pretty good match to SANS - has one more ankyrin repeat \n \nSAM is the USH1G_HUMAN Usher syndrome type 1G protein (Scaffold protein containing ankyrin repeats and SAM domain)\n
SPU_025058	SPU_025058	none	has 4 ankyrin repeats, two SAM domains and a PTB domain \n \nThis gene organization is found in vertebrates usually with 5 or 6, ankyrin repeats (also in bees). \n \nNames assigned to those genes include cajalin (rat) and odin (human) \nAlso known as EB-1 and ANKS1B\n
SPU_014167	SPU_014167	none	large protein with N-terminal ANK/SH3-----SAM-SAM and then a long run of undistinguished sequence. \n \nThis structure exists in chordates - named caskin-1 (CASK-interacting protein) in humans. \n \nNote - Shank, a PSD-organizing protein involved in dendritic spine organization, is similar but has PDZ domain as well.\n
SPU_000935	SPU_000935	none	This is the 3' end of the gene.  The 5' end is SPU_013520\n
SPU_013520	SPU_013520	none	FARP matches approximately the first 250aa. This is the 5' end of the gene.. the 3'end appears to be SPU_000935\n
SPU_007771	SPU_007771	none	Duplication of SPU_008684.  Looks like a splice varient\n
SPU_011527	SPU_011527	none	This is the 3' end--SPU_025443 contains the first half of the gene.\n
SPU_010861	SPU_010861	none	PARTIAL Sequence, almost identical to middle of GLEAN 14908 ECT2\n
SPU_019189	SPU_019189	none	Best Blast is a Kalirin protein (a highly related family member), but based on sequence alignment and domain structure this is a TRIO not KALRN.  The TRIO Like 1 gene is in three parts.  5'end is GLEAN 22793.  The middle is 02796, while this GLEAN is the rest.\n
SPU_022793	SPU_022793	none	This is the 5'end of this gene.  The middle is GLEAN 02796, while the end is GLEAN 19189.\n
SPU_003961	SPU_003961	none	This is the 5'end of this glean.  The rest is located in GLean 28457.\n
SPU_019587	SPU_019587	none	This gene is spread among 3 GLEANs.  This is the 5'end.  The middle is GLEAN 15117, and the 3' end is GLEAN 28316.\n
SPU_015117	SPU_015117	none	This gene is spread among 3 GLEANs.  This is the middle.  The 5'end is GLEAN 19587, and the 3' end is GLEAN 28316.\n
SPU_028316	SPU_028316	none	This gene is in three parts.  This is the 3'end.  The 5'end is GLEAN 19587, and the middle is GLEAN 15117.\n
SPU_014333	SPU_014333	none	Appears to be a splice varient of SPU_002673\n
SPU_002796	SPU_002796	none	This is the middle of this gene.  The 5' end is GLEAN 22793, while most (and the 3'end) is 19189.\n
SPU_013479	SPU_013479	none	This is the middle and 3'end of this gene.  The 5' end is SPU_018498\n
SPU_018498	SPU_018498	none	This is the 5'end of the gene.  The rest of the gene is found in GLEAN 013479\n
SPU_001797	SPU_001797	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix C.2 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees. \n \nNote that SPU_001794 (Sp-MACPF-C.1), a model adjacent to this gene, also contains the MACPF domain. A comparison of their protein sequences reveals high similarity but a fair number of differences as well. It is to be determined whether this fact reflects the erroneous assembly of different haplotypes (both genes are indeed located in an area of numerous contigs) or if reflects a true gene duplication event.\n
SPU_002550	SPU_002550	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix B.1 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees. \n \nThe modification to this model was made based on an otherwise identical Fgenesh++ prediction that includes a prediction for a signal peptide, a feature of all other Sp-MACPF genes, and an additional exon supported by the tiling array hybridization data. \n \nThe embryonic expression and modified structure of this model are strongly supported by the tiling array data. \n \nNB: An adjacent model (SPU_002548/9) is very similar in sequence and exon structure. However, there are noticeable differences between both. It is difficult at this point to determine whether this represents an assembly error or a true duplication/reversion event.\n
SPU_014984	SPU_014984	none	This model was annotated and modified based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nIn their original form, most predictions for this model forced two C-ter exon into the structure of this gene due to the presence of a large NNNNNNNN gap (150+ kb) that likely includes the last exon of this gene. We have modified this model by deleting the last two exons and noting the incompleteness of this model. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix A.3 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees.\n
SPU_015144	SPU_015144	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix B.2 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees. \n \nThe embryonic expression and structure of this model are strongly supported by the tiling array data. Furthermore, there is a significant amount of signal coming from one of the introns. However, no other gene prediction protocol called for an exon in the region. \n \nNote that this model includes a prediction for a transmembrane domain towards the C-ter of the protein, which is uncharacteristic of the other Sp-MACPF genes. However, since there is no alternative models for this region, we cannot rule out that this is a true feature of this gene. It should be noted, however, that this model is located at the end of a scaffold, and the exon call may have been forced.\n
SPU_022318	SPU_022318	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix D.2 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees.\n
SPU_028756	SPU_028756	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix E.2 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees.\n
SPU_022230	SPU_022230	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix E.3 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees.\n
SPU_014229	SPU_014229	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix F.1 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees. \n \nNote that most other Sp-MACPF genes include a signal peptide. However, none of the gene prediction protocols found one for this model. This is unlikely due to lack of scaffold sequence.\n
SPU_027596	SPU_027596	none	Very similar but not identical to SPU_006866.\n
SPU_013704	SPU_013704	none	duplicate of SPU_001739\n
SPU_000386	SPU_000386	none	duplicate of SPU_012252\n
SPU_008485	SPU_008485	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix E.4 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees.\n
SPU_002549	SPU_002549	none	This model was annotated and modified based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThe modification to this model was made based on an NCBI model that spans and perfectly matches this and an adjacent glean model (SPU_002548). The modified nucleotide and protein sequences are provided for each of the fused glean models; but the gene features of only this model have been modified to reflect the fusion. \n \nThe predicted domain structure for the modified model includes only a signal peptide and the MACPF domain, a feature of all other Sp-MACPF genes. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix B.0 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees. \n \nThe embryonic expression and modified structure of this model are strongly supported by the tiling array data. \n \nNB: An adjacent model (SPU_002550) is very similar in sequence and exon structure. However, there are noticeable differences between both. It is difficult at this point to determine whether this represents an assembly error or a true duplication/reversion event.\n
SPU_002548	SPU_002548	none	This model was annotated and modified based on a manual inspection of multiple protein sequence alignments and domain structure. \n \nThe modification to this model was made based on an NCBI model that spans and perfectly matches this and an adjacent glean model (SPU_002549). The modified nucleotide and protein sequences are provided for each of the fused glean models; but the gene features of only SPU_002549 have been modified to reflect the fusion. The gene features for this model have been accepted in their present form for simplicity. Please refer to SPU_002549 for the modified exon structure. \n \nThe predicted domain structure for the modified model includes only a signal peptide and the MACPF domain, a feature of all other Sp-MACPF genes. \n \nThis model belongs to a family of models that contain a Membrane attack/perforin domain (MACPF) and show similarity in domain structure to apextrin. The sufix B.0 is arbitrary and just reflects the clustering of this model with other Sp-MACPF genes in phylogenic trees. \n \nThe embryonic expression and modified structure of this model are strongly supported by the tiling array data. \n \nNB: An adjacent model (SPU_002550) is very similar in sequence and exon structure. However, there are noticeable differences between both. It is difficult at this point to determine whether this represents an assembly error or a true duplication/reversion event.\n
SPU_002246	SPU_002246	none	Tiling data suggests several exons are wrong.\n
SPU_002766	SPU_002766	none	no embryonic expression based on tiling\n
SPU_012277	SPU_012277	none	This is the PH domain that corresponds to the rest of the Sp-Tec protein (SPU_012278).\n
SPU_009691	SPU_009691	none	PLC eta is distributed over four GLEAN3 predictions (SPU_009688-09691).  Complete annotation is located on SPU_009691 THIS ONE HERE!!!!\n
SPU_009690	SPU_009690	none	Protein sequence continued on SPU_009691 SPU_009689 SPU_009688.  This annotation contains the PH domain of PLC eta. \n \nComplete annotation is located on SPU_009691\n
SPU_009689	SPU_009689	none	PLC eta is distributed over four GLEAN3 predictions (SPU_009688-09691).  Complete annotation is located on SPU_009691\n
SPU_009688	SPU_009688	none	PLC eta is distributed over four GLEAN3 predictions (SPU_009688-09691).  Complete annotation is located on SPU_009691\n
SPU_002630	SPU_002630	none	partial duplicate of SPU_021309.  Homeodomain and C-terminal are identical to Sp-Hox8. \n
SPU_016561	SPU_016561	none	High similarity with human CDKL5. Partial N-terminal sequence\n
SPU_022670	SPU_022670	none	High similarity with human CDKL5. Partial N-terminal sequence\n
SPU_003424	SPU_003424	none	serine-threonine, not tyrosine\n
SPU_003389	SPU_003389	none	containa an olfactomedin domain - no collagen repeats or other predicted domains.\n
SPU_024073	SPU_024073	none	LY x2-EGF x5-SEA-EGF - no obvious TM domain \n \nSEA/EGF COMBINATION IS CHARACTERISTIC OF MUCINS\n
SPU_025178	SPU_025178	none	LY-EGF x6-SEA-EGF - no obvious TM domain \n \nSEA/EGF COMBINATION IS CHARACTERISTIC OF MUCINS\n
SPU_010472	SPU_010472	none	CCP x4 - SEA - EGF - apparent transmembrane domain \n \nSEA/EGF COMBINATION IS CHARACTERISTIC OF MUCINS\n
SPU_003285	SPU_003285	none	SEA-EGF - apparent TM domain \n \nSEA/EGF COMBINATION IS CHARACTERISTIC OF MUCINS\n
SPU_027741	SPU_027741	none	large protein with EGF/SEA/EGF flanked on both sides by long low complexity sequence blocks \n \nSEA/EGF COMBINATION IS CHARACTERISTIC OF MUCINS\n
SPU_006884	SPU_006884	none	SEA/EGF - apparent TM domain \nSEA/EGF COMBINATION IS CHARACTERISTIC OF MUCINS\n
SPU_027666	SPU_027666	none	Likely orthologue of Aurora A;Partially included in SPU_027833 \n
SPU_006082	SPU_006082	none	C-terminal part of SPU_000964\n
SPU_014001	SPU_014001	none	part of SPU_005312\n
SPU_000852	SPU_000852	none	Domains: Signal peptide, DEATH, NACHT, LRRs.\n
SPU_003186	SPU_003186	none	Domains: DEATH, NACHT, LRRs. \nFgenesh model has 6 additional small exons at the C-terminus, and contains additional LRRs.\n
SPU_017245	SPU_017245	none	Domains: Signal peptide, DEATH, NACHT, LRRs. \nFgenesh model has 6 additional c-terminal exons, the first 4 of which contain additional LRRs.\n
SPU_014503	SPU_014503	none	Domains: Signal peptide, DEATH, NACHT, LRRs. \n
SPU_028650	SPU_028650	none	#\nProbable homologue of human ficolin 2.\n
SPU_028805	SPU_028805	none	Domains: Signal peptide, DEATH, NACHT, LRRs.\n
SPU_008597	SPU_008597	none	Domains: Signal peptide, NACHT, LRRs. \nThe Genscan model has a DEATH domain and no signal peptide, and may be more accurate.\n
SPU_019700	SPU_019700	none	Domains: DEATH, NACHT, LRRs.\n
SPU_004096	SPU_004096	none	IDENTICAL TO SPU_004496 \nNOT Embryonically expressed.  May be pseudo-gene or adult-only gene.\n
SPU_001444	SPU_001444	none	Domains: Signal peptide, DEATH, NACHT, LRRs. \nThe Fgenesh model has 2 additional small exons, which may contain additional LRRs not detected by the SMART program.\n
SPU_023120	SPU_023120	none	Domains: NACHT, LRRs. \nThe Genscan model is probably incomplete since it doesn't start with a Met, but it has an additional 5' exon and contains a DEATH domain.\n
SPU_028483	SPU_028483	none	Domains: Signal peptide, DEATH, NACHT, LRRs. \nThe Fgenesh model has slightly different exon/intron structure at the 3' end and could contain additional LRRs not detected by the SMART program.\n
SPU_000816	SPU_000816	none	Domains: Signal peptide, DEATH, NACHT, LRRs. \n
SPU_019699	SPU_019699	none	Domains: Signal peptide, DEATH, NACHT, LRRs. \n
SPU_004165	SPU_004165	none	Domains: Signal peptide, DEATH, NACHT, LRRs.\n
SPU_015481	SPU_015481	none	Domains: DEATH, NACHT, LRRs. \nThe Fgenesh model has 7 additional small 3' exons that code for additional LRRs.\n
SPU_006733	SPU_006733	none	Domains: NACHT, LRRs. \nThis gene model is at the end of a short scaffold and could be incomplete, missing the DEATH domain.\n
SPU_008283	SPU_008283	none	Domains: DEATH, NACHT, LRRs. \nThe Genscan model has 3 additional small 3' exons that code for additional LRRs.\n
SPU_025204	SPU_025204	none	Domains: DEATH, NACHT, LRRs.\n
SPU_000457	SPU_000457	none	Domains: NACHT, LRRs. \nThe Genscan model has a different exon/intron structure and has DEATH domains (one at N-terminus and one c-terminal to the NACHT domain). \n
SPU_005993	SPU_005993	none	Domains: DEATH, NACHT, LRRs. \nThis gene model is at the end of a scaffold and could be missing LRRs.\n
SPU_006203	SPU_006203	none	Domains: DEATH, NACHT, LRRs. \nThe Genscan model has 7 additional small exons at the 3' end and has additional LRRs. \n
SPU_000015	SPU_000015	none	Domains: DEATH, NACHT, LRRs. \nThis gene model is on a short scaffold and could be incomplete, missing LRRs.\n
SPU_019696	SPU_019696	none	Domains: DEATH, NACHT, LRRs.\n
SPU_012523	SPU_012523	none	Domains: NACHT, LRRs. \nThis gene model is on a short scaffold and could be incomplete, missing the DEATH domain.\n
SPU_025948	SPU_025948	none	Domains: DEATH (2), NACHT, LRRs. \nThis gene model overlaps with SPU_025952. The Genscan model has a different terminal exon and does not overlap the other gene model. \n
SPU_007113	SPU_007113	none	Domains: NACHT, LRRs. \nSPU_007116 has low e-value DEATH domain.\n
SPU_022412	SPU_022412	none	Domains: Sulfatase, NACHT, LRRs. \nThe Fgenesh model also has a DEATH domain before the NACHT domain. This is a very unusual domain combination. \nAlso, there is a very large exon between the exons coding for the sulfatase domain and the NACHT and LRRs.\n
SPU_021844	SPU_021844	none	Domains: NACHT, LRRs. \nThis gene model is on a short scaffold and could be incomplete, missing the DEATH domain.\n
SPU_011441	SPU_011441	none	Domains: NACHT, LRRs. \nThe Fgenesh model combines SPU_011441 and SPU_011440. It has a DEATH domain.\n
SPU_010053	SPU_010053	none	Domains: NACHT, LRRs. \nThe Genscan model combines SPU_010053 and SPU_010052, which contains the DEATH domain.\n
SPU_011439	SPU_011439	none	Domains: NACHT, DEATH, LRRs. \n
SPU_006229	SPU_006229	none	#\nDomains: Signal peptide, DEATH, NACHT, LRRs. \nThis gene model is on a small scaffold and could be incomplete, missing LRRs.\n
SPU_010097	SPU_010097	none	Domains: DEATH, NACHT, LRRs.\n
SPU_028630	SPU_028630	none	Domains: DEATH, NACHT. \nSPU_028631 contains 4 exons coding for LRRs. The accurate gene model should probably be a fusion of these 2 gene models, such as the Genscan.\n
SPU_005301	SPU_005301	none	Domains: DEATH, NACHT, LRRs.\n
SPU_015052	SPU_015052	none	Domains: DEATH, NACHT, LRRs. \nThis gene model sits at the end of a scaffold and could be incomplete, missing LRRs.\n
SPU_024605	SPU_024605	none	9 armadillo repeats - matches best with plakoglobin/gamma catenin but also, less well, with beta catenin and poorly with p120/delta catenin\n
SPU_020240	SPU_020240	none	Domains: NACHT, LRRs. \nThis gene model is likely incomplete. There is an Fgenesh model just 5' of this glean that codes for 2 DEATH domains.\n
SPU_014761	SPU_014761	none	Domains: DEATH, DEATH, NACHT, LRRs. \nThe Fgenesh model has 2 additional exons that code for LRRs. This gene model is at the end of a scaffold and could be incomplete, missing LRRs.\n
SPU_008833	SPU_008833	none	Domains: Signal peptide, DEATH, NACHT, LRRs. \nThis gene model is on a small scaffold and could be incomplete, missing LRRs and/or a second DEATH domain.\n
SPU_001016	SPU_001016	none	Domains: NACHT, LRRs.  \nThe Genscan models has 2 additional exons (SPU_001015) that code for a DEATH domain. The gene features were modified to reflect this model. \nThe Fgenesh model has 2 additional 3' exons that code for additional LRRs.\n
SPU_020414	SPU_020414	none	The KRP170 gene spans SPU_020414 (Scaffold1612/Scaffoldi17703) and SPU_026503 (Scaffold56862/Scaffoldi4507).  The mRNA was published as Chui,K.K., Rogers,G.C., Kashina,A.M., Wedaman,K.P., Sharp,D.J., Nguyen,D.T., Wilt,F. and Scholey,J.M.  "Roles of two homotetrameric kinesins in sea urchin embryonic cell division."   J. Biol. Chem. 275 (48), 38005-38011 (2000).  The GenBank entry for this gene is gi|10697491|gb|AF292395.2|. \nAnnotation by RA Obar, RL Morris, AL Silverio, BJ Chick, EJ Jin.\n
SPU_016794	SPU_016794	none	Domains: NACHT, LRRs. \nThis gene model is at the end of scaffold and is likely incomplete, missing the DEATH domain.\n
SPU_003640	SPU_003640	none	Domains: DEATH, NACHT, LRRs. \nThe Fgenesh model has additional exons that could code for more LRRs.\n
SPU_003762	SPU_003762	none	Domains: DEATH, NACHT, LRRs.\n
SPU_003366	SPU_003366	none	Domains: DEATH, NACHT, LRRs. \nThe Fgenesh model just downstream has 5 exons that code for LRRs.\n
SPU_027511	SPU_027511	none	Domains: DEATH, NACHT, LRRs. \nThe Genscan model has 3 additional exons at the 3' end that code for additional LRRs.\n
SPU_018384	SPU_018384	none	Domains: DEATH, NACHT, LRRs. \nThe Fgenesh model is slightly different and codes for additional LRRs. \n
SPU_026189	SPU_026189	none	#\nDomains: NACHT, LRRs. \nThe Genscan model is slightly different and also encodes a DEATH domain. It is likely incomplete however, since this is a short scaffold.\n
SPU_028060	SPU_028060	none	Domains: Signal peptide, DEATH, NACHT, LRRs. \nThis gene model could be incomplete since it is on a small scaffold.\n
SPU_025179	SPU_025179	none	Domains: DEATH, NACHT, LRRs. \nThe Genscan model has addtional small exons that code for additional LRRs.\n
SPU_009160	SPU_009160	none	Domains: DEATH. \nThe Genscan model combines SPU_009060 and SPU_009061. SPU_009061 has NACHT and LRRs. This gene model has been modified to reflect this gene model.\n
SPU_028631	SPU_028631	none	This gene model is combined with SPU_028631. Please refer to this gene model for details.\n
SPU_025166	SPU_025166	none	Domains: DEATH, NACHT, LRRs.\n
SPU_009659	SPU_009659	none	Domains: DEATH, NACHT, LRRs.\n
SPU_005609	SPU_005609	none	Domains: DEATH, NACHT, LRRs.\n
SPU_011855	SPU_011855	none	Domains: DEATH, NACHT, LRRs.\n
SPU_017505	SPU_017505	none	Domains: DEATH, NACHT, LRRs. \nThe SPU_017506 has additional LRRs that are probably part of this gene model.\n
SPU_017506	SPU_017506	none	This gene model contains LRRs only and probably belong to SPU_017505. See this other gene model for details.\n
SPU_017196	SPU_017196	none	Domains: DEATH, NACHT, LRRs.\n
SPU_003247	SPU_003247	none	Domains: DEATH, NACHT, LRRs. \nBoth the Genscan and the Fgenesh models have different exon/intron structures in the 3' end of the gene, which code for additional LRRs.\n
SPU_008498	SPU_008498	none	Domains: NACHT, LRRs. \nThis gene model is on a small scaffold and could be incomplete, missing the DEATH domain(s).\n
SPU_027610	SPU_027610	none	Domains: DEATH, NACHT, LRRs.\n
SPU_003797	SPU_003797	none	Domains: DEATH, NACHT, LRRs.\n
SPU_022001	SPU_022001	none	Domains: DEATH, NACHT, LRRs.\n
SPU_028820	SPU_028820	none	Domains: NACHT, LRRs. \nThis gene model is on a short scaffold and could be incomplete, missing the DEATH domain(s).\n
SPU_016810	SPU_016810	none	Domains: DEATH, NACHT, LRRs.\n
SPU_026622	SPU_026622	none	Domains: NACHT, LRRs. \nThis gene model is on a small scaffold and could be incomplete, missing DEATH domain(s). \n
SPU_005383	SPU_005383	none	Domains: DEATH, NACHT, LRRs.\n
SPU_000318	SPU_000318	none	Gene neighbour of the ParaHox cluster; in chordates.\n
SPU_023183	SPU_023183	none	Domains: NACHT, LRRs. \nThe Genscan model has a different exon/intron sturcture and also codes for a DEATH domain N-terminal of the NACHT domain. \n
SPU_016257	SPU_016257	none	#\nDomains: DEATH, NACHT, LRRs. \nThis gene model has a final intron of more than 150kb. This is very unusual. The Fgenesh model has a different final exon, which is much closer to the rest of the gene, and could be a more accurate model.\n
SPU_024419	SPU_024419	none	EST data:  \n \nBG784137 SEAUMC004094 Sea urchin primary mesenchyme cell cDNA library Strongylocentrotus purpuratus cDNA clone PC_0020_A1_G10_MR 5', mRNA sequence \n \n>gi|57955070|gb|CX692889.1|CX692889 yde99f06.y1 Sea urchin EST Lib1 Strongylocentrotus purpuratus cDNA clone yde99f06 5' similar to TR:Q9VK69 Q9VK69 CG5525 PROTEIN. ;, mRNA sequence\n
SPU_016926	SPU_016926	none	Domains: DEATH, NACHT, LRRs.\n
SPU_001210	SPU_001210	none	Domains: DEATH, NACHT, LRRs.\n
SPU_004343	SPU_004343	none	Domains: DEATH, NACHT, LRRs. \n
SPU_014122	SPU_014122	none	Domains: DEATH, NACHT, LRRs.\n
SPU_025600	SPU_025600	none	Domains: DEATH, NACHT, LRRs. \nThe Genscan model has additional small exons that code for more LRRs.\n
SPU_000523	SPU_000523	none	Domains: DEATH, NACHT, LRRs. \nThis gene model is on a short scaffold and could be incomplete, missing LRRs.\n
SPU_008707	SPU_008707	none	Domains: DEATH, NACHT, LRRs. \nThe Genscan model has additional small exons that code for additional LRRs.\n
SPU_007446	SPU_007446	none	Domains: DEATH, NACHT, LRRs. \nThis gene model is on a short scaffols and could be incomplete, missing LRRs.\n
SPU_018173	SPU_018173	none	7 ARMADILLO REPEATS - similar to karyopherin (importin) alpha 6  - clusters with human importins\n
SPU_013477	SPU_013477	none	8 ARMADILLO REPEATS - similar to karyopherin (importin) alpha 6  - clusters with human importins\n
SPU_024721	SPU_024721	none	This is likely to be split between this gene and glean 24930. Correct protein sequence is predicted to be: \nMAQAYVEDTLMVRHELKHAYGKVFLVRKVGNHNQGKLYAMKVLKKATIVQ \nKAKTAEHTMTERQVLEAVRSCPFLVTLHYAFQTDSKLNLILDYVNGGELF \nTHLYQREHFRESEVRIYIAEIIIALDCLHRILTSHPPMPNTFSKEVKDFI \nNKLLVKDPTKRLGCNGVKDIKSHSFFKGLNWDDVAAKRVSPPFRPHINGE \nLDTSNFAEEFTSLVPADSPADIPKTADARVFRVGYSFIAPSILYSDNAIT \nQDMLTQPSEHNRPSLASILSIHELKDSPFNKYYELDMKSAPIGDGSFSIC \nRRCTHRKTEKEYAVKIVSRRVACTQEITTLQLCQKHPNIVHLKEEFKDKL \nHTYIIMELCKGGELLGRIRKKKHFDELEASMIMRKLVSAVDYMHSRGIVH \nRDLKPENILFTDDSDDAELKIIDFGFARITNSNQPLKTPCFSLHFAAPEV \nLKRAYEQDGEYDASCDVWSLGVILYTMLSGRVPFQDPSISKSNSASDIMK \nRIKHGNFSFDGEEWNSVSTPAKDLIKGLLTVDPSRRLTTDDLLQNEWIQG \nQQLSTSTPLMTPDILNSCASIQKRVKATMRAFHTAQREGFLLTDVSNAPL \nAKRRKKKKDSSTETRSSSSESTHSQSSSSQESTTPTPTANPVLTIPVTTV \nSCAPRTTTATGAPSIPSVQPLPSLSKQTGARLDQYESLESLGFSPILPFS \nAGGSQELPPLLARQDSGYVGQMPSYAQVTPVPRTNVGSHGVTYAPILDPS \nMYPCGLQQPILDFSSSIPEYLSVQYASTEQPSIPMTVPRTLHQPHPHPLP \nLPHQHLSHLPTISEDPSTT \n
SPU_000512	SPU_000512	none	#\nPredicts the carboxy-terminal end of a dual oxidase with an exon structure similar to that of Sp-Udx1 (SPU_000513).  May link to SPU_025507, which represents a paralogous amino-terminal domain to Sp-Udx1 (SPU_000513).\n
SPU_025518	SPU_025518	none	Best hit to Xenopus Timeless, needs to be corelated with SPU_006230\n
SPU_026036	SPU_026036	none	Domains: DEATH, DEATH, LRRs, NACHT, LRRs, ZU5. \nThe Fgenesh model has different exon/intron structure, is missing the last 2 exons, and therefore the ZU5 model and is likely more accurate, since this domain is not associated with this type of protein in any other organism.  \n \nThe presence of LRRs between the DEATH domains and the NACHT domain is also unusual. \n
SPU_000863	SPU_000863	none	Domains: DEATH, NACHT, LRRs. \nThe Genscan model fuses SPU_000863 with SPU_000864 (which is comprised of 4 small exons) that could code for additional LRRs.\n
SPU_020641	SPU_020641	none	Would prefer to use Arl13b (for clarity's sake), but ARL2L1 appears to the preferred naming scheme. \nAlso, note that tiling array shows signal on only 6/10 exons (missing signal at 5' exons).  No EST to confirm.  Likely that 5' end is wrong.\n
SPU_023642	SPU_023642	none	Domains: DEATH, NACHT, LRRs.\n
SPU_027808	SPU_027808	none	Domains: DEATH,NACHT, LRRs.\n
SPU_026400	SPU_026400	none	Domains: NACHT, LRRs. \nThis gene model is at the end of a scaffold and could be incomplete, missing the DEATH domain(s).\n
SPU_025167	SPU_025167	none	Domains: NACHT, LRRs. \nThe Fgenesh model has additional exons in the 5' region that code for a DEATH domain. However, this model is truncated at the 3' end and is missing a large portion of the LRRs. A combination of both models is likely the most accurate version of this gene. \n
SPU_022070	SPU_022070	none	Domains: NACHT, LRRs. \nThis gene model is at the end of a scaffold and could be incomplete, missing the DEATH domain(s).\n
SPU_001884	SPU_001884	none	Domains: NACHT, LRRs. \nThe Genscan model has 2 additional exons that code for additional LRRs. \nThis gene model is on a short scaffols and could be incomplete, missing the DEATH domain(s).\n
SPU_015206	SPU_015206	none	Domains: DEATH, EGF, NACHT, LRRs. \nThe presence of the EGF domain in this model is unusual. \n \nThe Genscan model combines SPU_015206 and SPU_015207. SPU_015207 has 4 exons that code for additional LRRs. The Genscan model is therefore likely to be more accurate.\n
SPU_015207	SPU_015207	none	This gene model contains LRRs only but is likely a part of SPU_025206 model. Please see this other gene model for details.\n
SPU_011035	SPU_011035	none	Domains: DEATH, DEATH, NACHT, LRRs.\n
SPU_006456	SPU_006456	none	Domains: NACHT, LRRs. \nThis gene model is on a small scaffold and could be incomplete, missing the DEATH domain(s) and LRRs.\n
SPU_006587	SPU_006587	none	Domains: Signal peptide, NACHT, DEATH, LRRs. \nThis domain organization of this gene model is unusual, since the DEATH domain is usually N-terminal to the NACHT domain.\n
SPU_015205	SPU_015205	none	Domains: DEATH, EGF(2), NACHT, LRRs. \nThe presence of the EGF domains in this gene model is unusual but it is also seen in the gene model just downstream on the same scaffold: SPU_015206.\n
SPU_001630	SPU_001630	none	Domains: NACHT, LRRs. \nThis gene model is located on a short scaffold and is likely incomplete, missing the DEATH domain(s).\n
SPU_001549	SPU_001549	none	Domains: NACHT, LRRs. \nThis gene model is on a short scaffold and is likely incomplete, missing the DEATH domain(s).\n
SPU_005732	SPU_005732	none	Domains: DEATH, NACHT, LRRs. \nThe Genscan model has three additional small exons that code for additional LRRs.\n
SPU_027300	SPU_027300	none	Domains: DEATH, EGF, NACHT, LRRs. \nThe presence of the EGF domain in this gene model is unusual. \n \nThe Fgenesh model has 2 additional terminal exons that code for additional LRRs. \nThis gene is at the end of a scaffold and could be incomplete, missing LRRs. \n
SPU_020569	SPU_020569	none	#\nDomains: DEATH, EGF, NACHT, LRRs. \nThe presence of the EGF domain in this gene model is unusual. \n
SPU_002888	SPU_002888	none	Domains: NACHT, LRRs. \nThis gene model is on a short scaffold and is likely incomplete, missing DEATH domain(s) and LRRs.\n
SPU_024709	SPU_024709	none	Domains: DEATH, NACHT, LRRs.\n
SPU_010052	SPU_010052	none	Domains: DEATH. \nThis gene model is likely part of a larger model that includes SPU_010053. See this model for further details.\n
SPU_000738	SPU_000738	none	Domains: NACHT, LRRs. \nThis gene model is on a short scaffold and could be incomplete, missing the DEATH domain(s).\n
SPU_003303	SPU_003303	none	Domains: NACHT, LRRs. \nThe Genscan model has 3 additional exons at the 5' end that code for a DEATH domain. This gene model is likely to be more accurate than the glean. \n \nThis gene model is on a short scaffold and could be incomplete, missing a DEATH domain and/or LRRs.\n
SPU_010091	SPU_010091	none	Domains: DEATH, NACHT, LRRs. \nThe DEATH and NACHT domains overlap. The Genscan model has an additional small exon before the one coding for the NACHT domain and effectively "separates" the DEATH and NACHT domains. However, the last 2 exons of the Genscan model (not in the Glean model) code for a Zn finger domain and probably do not belong to this protein. The accurate gene model is likely part of the Genscan model and the Glean model.\n
SPU_014719	SPU_014719	none	Domains: NACHT, LRRs. \nThis gene model is located on a short scaffold and is likely incomplete, missing DEATH domain(s) and/or LRRs.\n
SPU_021630	SPU_021630	none	two ANATO domains and nothing else - a "chordate-specific" domain found in complement C3/4/5 and in fibulins.  This looks like a fragment - probably of a fibulin - could be haplotype pair for SPU_026629, which encodes a fibulin. \n
SPU_016532	SPU_016532	none	#\nDomains: LRRs. \nThe Fgenesh model has additional exons that code for Signal peptide, DEATH and NACHT domains. The gene features and sequences of this gene model were modified to reflect this.\n
SPU_027053	SPU_027053	none	Duplication of the C-term of SPU_002874.\n
SPU_015768	SPU_015768	none	Domains: NACHT, LRRs.  \nThis gene model is on a short scaffold and could be incomplete, missing the DEATH domain(s).\n
SPU_001608	SPU_001608	none	Domains: NACHT, LRRs. \nThis gene model is on a short scaffold and could be incomplete, missing the DEATH domain(s). \n
SPU_028485	SPU_028485	none	Domains: NACHT, LRRs. \nThis gene model is on a short scaffold and could be incomplete, missing the DEATH domain(s). \n
SPU_006019	SPU_006019	none	#\nDomains: NACHT, LRRs. \nThis gene model is on a short scaffold and could be incomplete, missing the DEATH domain(s). \n
SPU_005410	SPU_005410	none	Domains: NACHT, LRRs. \nThis gene model is on a short scaffold and could be incomplete, missing the DEATH domain(s) and/or LRRs.\n
SPU_019497	SPU_019497	none	Domains: NACHT, LRRs. \nThis gene model is on a short scaffold and could be incomplete, missing the DEATH domain(s) and/or LRRs.\n
SPU_005581	SPU_005581	none	Domains: DEATH, NACHT, LRRs, NACHT, LRRs. \nThis structure is very unusual, since most NLRs have a single NACHT domain. There is a large (~100kb) intron in this prediction. Therefore, this could be a fusion of 2 gene models.\n
SPU_021370	SPU_021370	none	Domains: NACHT, LRRs. \nThis gene model is on a short scaffold and could be incomplete, missing the DEATH domain(s) and/or LRRs.\n
SPU_007620	SPU_007620	none	Domains: NACHT, LRRs. \nThis gene model is on a small scaffold and could be incomplete, missing the DEATH domain(s) and/or LRRs.\n
SPU_023759	SPU_023759	none	Domains: LRRs. \nThis gene model contains part of a NACHT domain as well. SPU_023758 has the N-terminal part of the NACHT domain as well as a Signal peptide and DEATH domain. The accurate gene model is probably a fusion of the two, similar to the Genscan prediction. The Genscan model is missing a part of the NACHT domain.\n
SPU_023758	SPU_023758	none	Domains: Signal peptide, DEATH. \nThis gene model probably belongs as a fusion with SPU_023759. Please see this other model for details.\n
SPU_010667	SPU_010667	none	Domains: NACHT, LRRs. \nThis gene model is on a short scaffold and could be incomplete, missing the DEATH domain(s) and/or LRRs.\n
SPU_027207	SPU_027207	none	Domains: NACHT, LRRs. \nThis gene is located on a short scaffold and could be incomplete, missing the DEATH domain(s) and/or LRRs.\n
SPU_022564	SPU_022564	none	Domains: NACHT, LRRs. \nThe Fgenes model has three additional exons at the 5'end which code for a DEATH domain. This model is probably a more accurate reflection of the actual gene.\n
SPU_016060	SPU_016060	none	Domains: NACHT, LRRs. \nThis gene model is on a short scaffold and could be incomplete, missing the DEATH domain(s) and/or LRRs.\n
SPU_013038	SPU_013038	none	Domains: NACHT, LRRs. \nThis gene model is at the end of a scaffold and could be incomplete, missing the DEATH domain(s).\n
SPU_027035	SPU_027035	none	Domains: NACHT, LRRs. \nThis gene model is on a short scaffold and could be incomplete, missing the DEATH domain(s) and LRRs.\n
SPU_015033	SPU_015033	none	Domains: NACHT, LRRs. \nThis gene model is on a short scaffold and could be incomplete, missing the DEATH domain(s).\n
SPU_011097	SPU_011097	none	Domains: NACHT, LRRs. \nThis gene mode is located at the end of a scaffold and could be incomplete, missing the DEATH domain(s).\n
SPU_021930	SPU_021930	none	Domains: DEATH, NACHT, LRRs.\n
SPU_006219	SPU_006219	none	Domains: DEATH, LRRs. \nBoth the Genscan and the Fgenesh models have a different exon/intron structure and code for a NACHT domain. The Genscan NACHT domain is better (higher e-value). However, this model may not be the most accuarate one, since it is does not start with a Methionine. \nThe gene features and sequences were changed to reflect the 5'end of the Glean3 model and the NACHT domain region of the Genscan model.  \n(The gene feature edit function was not available, so an additional "exon" was created to make the existing one bigger.)\n
SPU_003619	SPU_003619	none	Domains: DEATH, 6x(EGF), NACHT. \nThe presence of the EGF domain in this model is unusual. \n \nThe LRRs are not present in this glean model but are included in the corresponding Genscan prediction. The genscan prediction is missing part of the NACHT domain, however. A hybrid of both gene predictions is probably the accurate model.\n
SPU_001423	SPU_001423	none	Domains: DEATH, NACHT, LRRs. \nThis gene model is located on a small scaffold and could be incomplete, missing LRRs. \nAlso, the DEATH and NACHT domains overlap in the SMART prediction. This overlap is less extensive in the Genscan model which has a larger 3rd exon. The gene features and sequences for this model reflect the Genscan model.\n
SPU_018582	SPU_018582	none	This GLEAN matches a GenBank entry, "gi|72180194|ref|XP_794270.1| similar to dynein 2 light intermediate chain" exactly.\n
SPU_022644	SPU_022644	none	No signal in tiling array or EST.  May be pseudogene or adult only expression.\n
SPU_009807	SPU_009807	none	NOTE: Based on sequence, alignment and domain analysis--This look like the 5' end of a single OPA gene that is composed of GLEANS 09807 (5'end)  and 06815 (3'end), with no gaps between.\n
SPU_009674	SPU_009674	none	NOTE: No signal in tiling array and no EST.  May be pseudogene or adult expressed.\n
SPU_016744	SPU_016744	none	NOTE: No signal in tiling array and no EST.  May be pseudogene or adult expressed.\n
SPU_010656	SPU_010656	none	high homology to chondroitin glucuronate 5-epimerase, dermatan sulfate epimerase, squamous cell carcinoma antigen recognized by T cells 2, SART2 \n \nNCAG1 may be an enzyme with dual epimerase and O-sulfotransferase activities involved in dermatan sulfate biosynthesis (Maccarana et al., in press, JBC 2006)\n
SPU_012840	SPU_012840	none	Better hit against Human Traf1 sequence than SPU_026479\n
SPU_002160	SPU_002160	none	likely an incomplete version of GLEAN10659. Refer to SPU_010659 for the complete sequence.\n
SPU_017675	SPU_017675	none	likely an incomplete prediction, the C-terminal region of the protein is not predicted.\n
SPU_018854	SPU_018854	none	This GLEAN represents the sea urchin Dynein Light Chain 2, as defined by the Anthocidaris crassispina cDNA (gi|2760161|dbj|BAA24184.1| outer arm dynein light chain 2 [Anthocidaris crassispina]). \n
SPU_008471	SPU_008471	none	This GLEAN represents the sea urchin Outer Arm Dynein Light Chain 2, as defined by the Anthocidaris crassispina cDNA (gi|3336986|dbj|BAA31751.1| outer arm dynein LC3 [Anthocidaris crassispina]), and is identical to the GenBank RefSeq entry "XP_783725.1 PREDICTED: similar to t-complex testis expressed 1."\n
SPU_004009	SPU_004009	none	This GLEAN represents the sea urchin Dynein Light Chain Type 6, as defined by the Anthocidaris crassispina cDNA (gi|2811014|sp|O02414|DYL1_ANTCR Dynein light chain LC6, flagellar outer arm), which has an identical amino acid sequence.   The S. purpuratus RefSeq ID is XP_785110.1. \nThere are several other GLEANs that are identical or nearly identical to this one: \nSPU_018607 (100% Amino Acid Identity) \nSPU_008799 (93% Amino acid Identity)                                            \nSPU_024498 (91% Amino acid Identity)                                            \nSPU_024497 (88% Amino acid Identity)                                            \nSPU_018567 (88% Amino acid Identity)                                            \nSPU_025272 (88% Amino acid Identity)                                            \nSPU_024499 (88% Amino acid Identity)                                            \nSPU_008800 (86% Amino acid Identity)                                            \nSPU_024500 (87% Amino acid Identity)                                            \nSPU_008801 (76% Amino acid Identity)                                           \nSPU_027937 (70% Amino acid Identity) \nSPU_027938 (68% Amino acid Identity)\n
SPU_018607	SPU_018607	none	This GLEAN represents the sea urchin Dynein Light Chain Type 6, as defined by the Anthocidaris crassispina cDNA (gi|2811014|sp|O02414|DYL1_ANTCR Dynein light chain LC6, flagellar outer arm), which has an identical amino acid sequence.   The S. purpuratus RefSeq ID is XP_785110.1. \nThis gene model is a member of a group of several GLEANs that are identical or nearly identical to SPU_004009: \nSPU_018607 (100% Amino Acid Identity) \nSPU_008799 (93% Amino acid Identity)                                            \nSPU_024498 (91% Amino acid Identity)                                            \nSPU_024497 (88% Amino acid Identity)                                            \nSPU_018567 (88% Amino acid Identity)                                            \nSPU_025272 (88% Amino acid Identity)                                            \nSPU_024499 (88% Amino acid Identity)                                            \nSPU_008800 (86% Amino acid Identity)                                            \nSPU_024500 (87% Amino acid Identity)                                            \nSPU_008801 (76% Amino acid Identity)                                           \nSPU_027937 (70% Amino acid Identity) \nSPU_027938 (68% Amino acid Identity)\n
SPU_015529	SPU_015529	none	homologous to DNA-PKcs, 5'end of gene is most likely missing, SPU_015528 also contains parts of this gene,  \nSPU_015484 contains 5'end / amino-terminus of gene\n
SPU_008799	SPU_008799	none	This GLEAN represents the sea urchin Dynein Light Chain Type 6, as defined by the Anthocidaris crassispina cDNA (gi|2811014|sp|O02414|DYL1_ANTCR Dynein light chain LC6, flagellar outer arm), which has an identical amino acid sequence.   The S. purpuratus RefSeq ID is XP_785110.1. \nThis gene model is a member of a group of several GLEANs that are identical or nearly identical to SPU_004009: \nSPU_018607 (100% Amino Acid Identity) \nSPU_008799 (93% Amino acid Identity)                                            \nSPU_024498 (91% Amino acid Identity)                                            \nSPU_024497 (88% Amino acid Identity)                                            \nSPU_018567 (88% Amino acid Identity)                                            \nSPU_025272 (88% Amino acid Identity)                                            \nSPU_024499 (88% Amino acid Identity)                                            \nSPU_008800 (86% Amino acid Identity)                                            \nSPU_024500 (87% Amino acid Identity)                                            \nSPU_008801 (76% Amino acid Identity)                                           \nSPU_027937 (70% Amino acid Identity) \nSPU_027938 (68% Amino acid Identity) \nThe predicted amino acid sequence of SPU_008799 is identical to those of SPU_008800 and SPU_008801. \nBecause there is a good chance that they do not represent three distinct gene products, they were named as follows: \nSPU_008799: Sp-Dynein Light Chain-1-3a \nSPU_008800: Sp-Dynein Light Chain-1-3b \nSPU_008801: Sp-Dynein Light Chain-1-3c\n
SPU_003242	SPU_003242	none	This GLEAN represents an Outer Arm Dynein Light Chain-like polypeptide (RefSeq ID: gi|72012233|ref|XP_782355.1| PREDICTED: similar to outer arm dynein light chain like (XJ558) [Strongylocentrotus purpuratus]).\n
SPU_004498	SPU_004498	none	This GLEAN represents an Outer Arm Dynein Light Chain-like polypeptide (RefSeq ID: gi|72070876|ref|XP_791620.1| PREDICTED: similar to outer arm dynein light chain like).\n
SPU_026533	SPU_026533	none	This gene model represents a segment of the axonemal Dynein intermediate Chain 3.  Similarity to Chlamydomonas reinhardtii Inner_Dynein_Arm_1_Intermediate Chain IC140 (C_530081|166736|IA1-IC140|) and Homo sapiens axonemal dynein intermediate polypeptide 2.  Contains a WD-40 motifs.  The protein is also essentially identical to the Anthocidaris crassispina (gi|2494216|sp|Q16960|DYI3_ANTCR Dynein intermediate chain 3, ciliary).\n
SPU_014075	SPU_014075	none	Annotated by RL Morris and B Rossetti.\n
SPU_006911	SPU_006911	none	This GLEAN corresponds to the first 292 amino acid residues (of a total of 686) of a well-characterized Microtubule-associated Protein known as the "77kDa-MAP."  SPU_006911 overlaps with SPU_005744 by 18 codons, followed by the remainder of the carboxyl-terminus of the protein (a total of 686 codons plus the stop codon) contained in SPU_005744.  The annotation for the full-length gene product is associated with SPU_006911.\n
SPU_005744	SPU_005744	none	This GLEAN corresponds to the last 402 amino acid residues (of a total of 686) of a well-characterized sea urchin Microtubule-associated Protein known as the "77kDa-MAP."  SPU_006911 overlaps with SPU_005744 by 18 codons, followed by the remainder of the carboxyl-terminus of the protein (a total of 686 codons plus the stop codon) contained in SPU_005744.  The annotation for the full-length gene product is associated with SPU_006911.\n
SPU_009444	SPU_009444	none	This GLEAN represents a homolog of the mammalian  Microtubule-associated proteins 1A/1B light chain 3B precursor (gi|72168709|ref|XP_783653.1| PREDICTED: similar to Microtubule-associated proteins 1A/1B light chain 3B precursor (MAP1A/MAP1B LC3) (MAP1A/1B light chain 3) [Strongylocentrotus purpuratus]).  There seem to be two MAP1A/1B_LC3-like proteins encoded in the S. purpuratus genome: SPU_009444 (~72% sequence identity) and SPU_008954 (~60% sequence identity).\n
SPU_022897	SPU_022897	none	This GLEAN is a relative of TPX2, microtubule-associated protein, RefSeq: "gi|72167299|ref|XP_796944.1| PREDICTED: similar to TPX2, microtubule-associated protein homolog [Strongylocentrotus purpuratus]."\n
SPU_008221	SPU_008221	none	The last 238 amino acid residues of this GLEAN are related to TPX2, microtubule-associated protein, RefSeq: "gi|72167299|ref|XP_796944.1| PREDICTED: similar to TPX2, microtubule-associated protein homolog [Strongylocentrotus purpuratus]."\n
SPU_021172	SPU_021172	none	The two first predicted exons likely belong to a different gene called calumenin. \n
SPU_010527	SPU_010527	none	sequence identical to SPU_012840\n
SPU_004366	SPU_004366	none	Glean has falsely predicted the following exon: 3\n
SPU_007226	SPU_007226	none	eIF3e spans 2 glean prediction 07726 (N-ter) and 09248 (Cter), probably missing exons in between.\n
SPU_009248	SPU_009248	none	eIF3e spans 2 glean prediction 07726 (N-ter) and 09248 (C-ter), probably missing exons in between.\n
SPU_025856	SPU_025856	none	The prediction is likely incorrect regarding the first 160 aminoacids. \n
SPU_000502	SPU_000502	none	homology with UPF3 from aa 1200 to 1588 AAG48511.1,  \naa 1-1200 similar to Retrovirus-related POL polyprotein (Endonuclease) AAH66867.1\n
SPU_005736	SPU_005736	none	This gene was annotated with A. pectinifera mRNA and peptide.  The complete gene is annotated in SPU_012078. \nAligns with ApIP3R AA sequence from 1670-1769.\n
SPU_027674	SPU_027674	none	Annotation of this gene was done with alignments using A. pectinifera mRNA and pepetide.  The full annotation of this gene is on SPU_012078.  Of note, there were two Glean3 predicted exons in sequential order which were exactly the same.  One was erased.  This glean aligns with ApIP3R AAs 2058-2673.\n
SPU_021143	SPU_021143	none	NCBI is calling this a DSP4, but the best non-predicted BLAST hit is a DSP1\n
SPU_002908	SPU_002908	none	59% identity with the corresponding region of human VARS2 (P26640) whereas it has only 46% identity with the human VARS2-like (NP_065175.3) \n \nSp-VARS isoformB has 47% identity with the Sp-VARSisoformA (glean 3_08058) \n \nThe construction was made from \n7 exons from SPU_028547 plus 1 exon from SPU_003860 to build the N-ter region \nand 15 exons from SPU_002908 to build the C-ter region\n
SPU_028547	SPU_028547	none	fragments of Sp-VARS isoformB \ncomplete gene annotated in SPU_002908 \n \n55% identity with corresponding region in human VARS2 (P26640)\n
SPU_014014	SPU_014014	none	close similarity in the overlapping region with SPU_001009\n
SPU_001009	SPU_001009	none	close similarity in the overlapping region with SPU_014014\n
SPU_025860	SPU_025860	none	Overlaps with GLEAN_17572\n
SPU_017572	SPU_017572	none	Overlaps with SPU_025860.\n
SPU_022195	SPU_022195	none	a very simila protein is predicted in SPU_003052. \n
SPU_003052	SPU_003052	none	 A protein very similar is predicted in SPU_022195\n
SPU_025517	SPU_025517	none	Cyclic ADP-ribose is an important calcium mobilizing metabolite produced by the ADP-ribosyl cyclase (cyclases) family of enzymes. Three evolutionarily conserved ADP-ribosyl cyclase superfamily members have been identified, one from the invertebrate Aplysia californica and two from mammalian tissues, CD38 and CD157. \n \nThis annotation most bears most homology to CD157.\n
SPU_018717	SPU_018717	none	matches human amino acid sequence from position 52-396 with 82% identity, BLAST score 1e-167\n
SPU_003985	SPU_003985	none	best vertebrate hit NM_204934\n
SPU_002461	SPU_002461	none	Cyclic ADP-ribose is an important calcium mobilizing metabolite produced by the ADP-ribosyl cyclase (cyclases) family of enzymes. Three evolutionarily conserved ADP-ribosyl cyclase superfamily members have been identified, one from the invertebrate Aplysia californica and two from mammalian tissues, CD38 and CD157. \n \nincomplete sequence that has most homology to the ADP-ribosyl cyclase family member from Aplysia californica \n
SPU_007538	SPU_007538	none	Three evolutionarily conserved ADP-ribosyl cyclase superfamily members have been identified, one from the invertebrate Aplysia californica and two from mammalian tissues, CD38 and CD157. \n \nThis annotation bears greatest homology to family member CD38\n
SPU_010746	SPU_010746	none	This model was modified and annotated based on a manual inspection of multiple protein sequence alignments and its predicted domain architecture. \n \nThe 5' features of the corresponding Fgenesh model were chosen over those of the original glean model because they generate a model that better corresponds in domain structure with the genes to which this model Blasts back (B7-1/CD80). The 3' features of the glean/genscan and Fgenesh models are also different. Since they do not give rise to significant differences in domain structure or Blasting to genes from other phyla, we have decided to accept the former.\n
SPU_019026	SPU_019026	none	Appears to be an additional exon of SPU_019024.\n
SPU_011076	SPU_011076	none	This model was annotated based on a manual inspection of multiple protein sequence alignments and its predicted domain architecture.\n
SPU_020633	SPU_020633	none	vertebrate homolog DM. NM 057490\n
SPU_020159	SPU_020159	none	SPU_020159 appears to be C-terminal portion of Chlamydomonas IFT140: AAT95430. eval=1.00E-145. explains aas 183-775 of IFT140. \nSPU_021918 appears to be N-terminal portion of IFT140. eval=2.00E-84. explains aas 677-1249 of IFT140. \nAnnotated by RL Morris. \n
SPU_021918	SPU_021918	none	#\nSPU_020159 appears to be C-terminal portion of Chlamydomonas IFT140: AAT95430. eval=1.00E-145. explains aas 183-775 of IFT140. \nSPU_021918 appears to be N-terminal portion of IFT140. eval=2.00E-84. explains aas 677-1249 of IFT140. \nAnnotated by RL Morris.\n
SPU_023605	SPU_023605	none	SPU_023605 eval=0.0 against "FAP80, IFT122A, Intraflagellar Transport Protein 122A [Chlamydomonas reinhardtii]".  \nSPU_023605 explains aas 83-795 of Chlamydomonas FAP80/IFT122A which is 1162 aas long. \nAnnotated by RL Morris.\n
SPU_025443	SPU_025443	none	This is the 5- end of TIAM2.  The 3'end is SPU_011527\n
SPU_007086	SPU_007086	none	This is most of the gene.  However, the 5' end is SPU_020692\n
SPU_004041	SPU_004041	none	This is a duplication of 020022, which looks like a splice variant\n
SPU_012009	SPU_012009	none	This is most of the gene for Dock180, the 5' end is SPU_017939\n
SPU_017939	SPU_017939	none	This is the 5' part of Sp-Dock180.  The rest of the gene is SPU_012009\n
SPU_027143	SPU_027143	none	This is the 5' end of SRGAP.  SRGAP is in three GLEANs, with the middle being 027715 and the bulk (3'end) being 022632\n
SPU_027715	SPU_027715	none	This is the middle part of SRGAP.  This 5' end is 027143, and the 3' end is 022632\n
SPU_026597	SPU_026597	none	This is the 5' end of FAM13A1.  The 3' end is SPU_000258\n
SPU_000258	SPU_000258	none	This is the 3' end of FAM13A1.  The 5' end is SPU_026587.\n
SPU_025352	SPU_025352	none	This is a duplication of most of SPU_000258, which itself is the 3'end of FAM13A1.\n
SPU_012388	SPU_012388	none	This gene was annotated based on a manual inspection of domain architecture and multiple protein sequence alignments. \n \nThis model blasts back and shows a similar domain structure (signal peptide/2xIG-v/TM) to that of diverse vertebrate IGSF genes, many of which have a well documented immune function.\n
SPU_026664	SPU_026664	none	C-term part of the protein seems too long\n
SPU_027781	SPU_027781	none	seems to be an artifactual duplication of SPU_023813 \n... or it's the inverse :)\n
SPU_013624	SPU_013624	none	This model was annotated based on a manual inspection of its predicted domain architecture and its Blasting to known genes. \n \nThis model codes for what seems a fairly novel protein, for it shows very weak blasting to known genes in genebank. Its sequence does not show any high confidence domain architecture (as predicted by SMART and Pfam); however, one particularly interesting predicted domain architecture for this model is sp+3xIG[low scores]+TM+ITAM[low score].\n
SPU_028030	SPU_028030	none	N-Term of the protein, seems inaccurate. The good part (half of the peptidase S8 domain which is lacking)is most likely the SPU_028031  \n \nthe most C-term part seems as well inaccurate\n
SPU_028031	SPU_028031	none	Seems to be the N'term part (Seems to contain the lacking peptitase S8 domain half) of GLEAN_28030 wich is just near\n
scaffoldi3224	SPU_030153	none	sequence here is only the conserved homeobox domain.\n
SPU_017228	SPU_017228	none	This model was annotated based on a manual inspection of its predicted domain architecture and its similarity to known genes. \n \nIts sequence and domain structure are similar to those of various vertebrate immune IGSF genes (e.g. TCR, CD276, CD4). Furthermore, if the sequence of the glean model is fused to exons called by other predictions, there is an even better Blasting to these genes. Note neither this or any of the other corresponding predictions include a transmembrane domain, indicating that this might be a partial model.  \n
SPU_020457	SPU_020457	none	This model was annotated based on a manual inspection of its domain architecture and alignment to known sequences. \n \nThis model Blasts back to human CD276/B7-H3, and it has a partially similar predicted domain structure (sp+V-set+C2-set+TM). The alternative predictions are very similar to the glean model and they do not provide any additional information on this gene.\n
SPU_024439	SPU_024439	none	This model was annotated based on a manual inspection of sequence alignments and predicted domain architectures. \n \nThis glean model codes for a protein that Blasts back and has a similar domain structure to that of vertebrate SIRPB2 and SIRPG, although it has a slightly longer cytoplasmic portion. This model is represented in two separate Fgenesh predictions, the first of which has a domain structure more similar to that of SIRPs. However, there is equivalent signal from the tiling array data for all the exons, which would suggest that they all correspond to the same gene; we have therefore accepted the glean model in its original form.\n
SPU_024787	SPU_024787	none	This model codes for what seems a novel domain architecture: 2xIGv+2xCCP/Sushi. Eventhough there are gaps in the sequence between the N-ter IG domains (thus making it possible that this is a "forced" [artifactual] model), the 2nd IG domain and both Sushi domains are encoded in one uninterrupted contig. \n \nThe IG and Sushi domains Blast separately to sequences in Genbank, which supports further the idea that, if real, this gene would have a novel domain architecture. Of note, the IG domains blast to various vertebrate IGSF genes relevant for immunity. \n \n
SPU_028300	SPU_028300	none	This model was modified to incorporate an extra 5' exon based on an otherwise identical Fgenesh prediction. The modified model incorporates a signal peptide into the predicted sequence, which better resembles the structure of the vertebrate B7 family genes to which this gene Blasts back. It codes for sp+V-set+C2-set[low]. The sequence is most likely incomplete (transmembrane domain missing), which is expected since this model locate to a scaffold end.\n
SPU_021592	SPU_021592	none	SPU_021592 eval=1E-30 against "CPC1, Central Pair Complex 1 [Chlaydomonas reinhardtii]" \nAnnotated by RL Morris, B Rossetti, and A Shorette.\n
SPU_025252	SPU_025252	none	This is a duplicaiton of part of SPU_000206, which itself appears to be a partial PIK3R1.\n
SPU_025017	SPU_025017	none	has a C-terminal C1q domain and ~400 N-terminal extension - but without collagen repeats\n
SPU_024653	SPU_024653	none	C-terminal C1q domain and Nterminal extension of about 200 amino acids but without collagen repeats.\n
SPU_028732	SPU_028732	none	C-terminal C1q domain plus N-terminal extension of about 200 amino acids but without collagen repeats.\n
SPU_024033	SPU_024033	none	C-terminal C1q domain plus N-terminal extension of about 200 amino acids but without collagen repeats.\n
SPU_028510	SPU_028510	none	This model was modified based on a corresponding Genscan model whose domain structure better resembles the vertebrate B7 family of genes (to which this model blasts back). The added exons were indeed predicted by Glean3, but as part of the adjacent glean model (SPU_028511).\n
SPU_002608	SPU_002608	none	This model is located in a small scaffold, and is most likely incomplete (missing a transmembrane domain). \n \n
SPU_016836	SPU_016836	none	#\nN-terminal EMI domain followed by large number of EGF and EGFLam repeats.  There are homologs in vertebrates but they are a bit shorter and have TM domain.\n
SPU_012032	SPU_012032	none	LRRNT/EGF/FN3/TM - looks as if its missing N-terminus with rest of an LRR unit - such structures exist in vertebrates\n
SPU_010102	SPU_010102	none	SERIES OF LRRtyp REPEATS FOLLOWED BY KR AND 4 FA58C \nA novel domain architecture.\n
SPU_019694	SPU_019694	none	GPS/7TM2 - EXTRACELLULAR DOMAIN HAS NO DOMAINS PREDICTED \nINTRACELLULAR DOMAIN HAS SEVERAL DOMAINS DOMAINS (SR/WSC/CCP) THAT LOOK SUSPICIOUS\n
SPU_016565	SPU_016565	none	previously cloned genomic DNA\n
SPU_017445	SPU_017445	none	This gene model has a CARD domain at the N-terminus and Blasts back to the human VISA/CARDIF/Ips-1/MAVS in the top 10 hips. It however lacks the TM at the C-terminus.\n
SPU_013091	SPU_013091	none	centromere specific histone H3\n
SPU_010425	SPU_010425	none	This GLEAN appears to be an ortholog of the Chlamydoimonas reinhardtii ODA-DC3 gene (C_240117|160952 ODA-DC3, Outer Dynein Arm Docking Complex 3, Mr 25,000\n
SPU_014310	SPU_014310	none	This gene model clusters with the RIG-I family of helicases in phylogenetic tree based on a multiple sequence alignment of the DEXDc domains.  \nThis gene model has a DEATH domain instead of a CARD domain, located at the C-terminus instead of the N-termimus.\n
SPU_025885	SPU_025885	none	This gene model clusters with the RIG-I family of helicases in phylogenetic tree based on a multiple sequence alignment of the DEXDc domains.  \nThis gene model has an N-terminal DEATH domain instead of a CARD domain.\n
SPU_011866	SPU_011866	none	This gene model clusters with the RIG-I family of helicases in phylogenetic tree based on a multiple sequence alignment of the DEXDc domains.  \nThis gene's CARD domain is located at the C-terminus instead of the N-terminus.\n
SPU_014311	SPU_014311	none	This gene model clusters with the RIG-I family of helicases in phylogenetic tree based on a multiple sequence alignment of the DEXDc domains.  \nThis gene model's CARD domain overlaps the DEXDc domain by ~25 amino acids.\n
SPU_007126	SPU_007126	none	This gene model clusters with the RIG-I family of helicases in phylogenetic tree based on a multiple sequence alignment of the DEXDc domains.  \nThis gene models codes for a N-terminal DEATH domain instead of a CARD domain.\n
SPU_014119	SPU_014119	none	This gene model clusters with the RIG-I family of helicases in phylogenetic tree based on a multiple sequence alignment of the DEXDc domains.  \nThis gene combines with part of the sequence from SPU_014118 to make a complete gene model (Helicase_C domain is missing from SPU_014119). \nThis gene model codes for a N-terminal DEATH domain instead of a CARD domain.\n
SPU_003850	SPU_003850	none	 partial, missing N-terminus\n
SPU_004180	SPU_004180	none	 partial, missing N-terminus, may join with SPU_011497\n
SPU_011497	SPU_011497	none	 partial, missing C-terminus, may join with SPU_004180\n
SPU_005315	SPU_005315	none	 partial, missing some N-terminus residues\n
SPU_013104	SPU_013104	none	 fragment\n
SPU_016142	SPU_016142	none	 partial, missing N- and C-terminus\n
SPU_016259	SPU_016259	none	 partial, fragment\n
SPU_010535	SPU_010535	none	 partial\n
SPU_013319	SPU_013319	none	 partial\n
SPU_026353	SPU_026353	none	 partial\n
SPU_005476	SPU_005476	none	This gene model clusters with the RIG-I family of helicases in phylogenetic tree based on a multiple sequence alignment of the DEXDc domains.  \nThis gene model is at the end of a scaffold and is likely incomplete, it is missing part of the DEXDc domains and the effector domain (DEATH or CARD) at the N-terminus. \n
SPU_019617	SPU_019617	none	This gene model clusters with the RIG-I family of helicases in phylogenetic tree based on a multiple sequence alignment of the DEXDc domains.  \nThis gene model is a the end of a scaffold and is likely incomplete, it is missing the Helicase_c domain.\n
SPU_016718	SPU_016718	none	This gene model clusters with the RIG-I family of helicases in phylogenetic tree based on a multiple sequence alignment of the DEXDc domains.  \nThis gene is at the end of a scaffold and is likely incomplete, it is missing the Helicase_c domain.\n
SPU_020020	SPU_020020	none	This gene model clusters with the RIG-I family of helicases in phylogenetic tree based on a multiple sequence alignment of the DEXDc domains.  \nHowever the domain structure is unusual for this type of protein: It has an ERCC4 domain at the C-terminus (after a long stretch of low complexity sequence, which contains a poor hit to a DEATH domain). Both the Genscan and Fgenesh models are identical to the Glean3 prediction.\n
SPU_000006	SPU_000006	none	This gene model clusters with the RIG-I family of helicases in phylogenetic tree based on a multiple sequence alignment of the DEXDc domains.  \nThis gene model does not appear to encode an "effector" domain (N-terminal CARD domain in vertebrates) It is possible that it is incomplete, although the Genscan and Fgenesh predictions are identical to the Glean3 model and there is another ORF in opposite orientation just upstream.\n
SPU_012192	SPU_012192	none	potential novel kinase family member. Blasts reasonably well to Fumerate Dehydratase\n
SPU_026720	SPU_026720	none	This model is likely incomplete (sits on a small scaffold) and is almost identical to SPU_003486 annotated by Charlie Whittaker. For this reason we have followed their Whittaker's annotation to annotate this model.\n
SPU_027562	SPU_027562	none	MORC protein sequence revealed putative nuclear localization signals, two predicted coiled-coil structural motifs and limited homology to GHL (GyraseB, Hsp90, MutL) ATPase. Epitope-tagged MORC protein expressed in COS7 cells localized to the nucleus\n
SPU_023500	SPU_023500	none	May be a partial gene.  The best CDS match to meiosis defective 1 only spans the first 450 amino acid residue, and matches with the middle of the mei-1 protein.\n
SPU_028506	SPU_028506	none	Only one predicted exon is conserved: \n>SPU_028506|Scaffold82783|17550|17749| DNA_SRC: Scaffold82783 START: 17550 STOP: 17749 STRAND: +  \nAAACGAGCAACGCATCCATCGATTACAGCAAAGAGAATGAAACCGCCACTCTTACATTCCCCTCACCCCT \nTGCTGTCGGCAGCGGTGATCTGGCCCTGGAGTTCACAGGAGAGCTCAATGATAAGATGAAAGGGTTCTAT \nCGAAGCAAGTACACCACACCAGCTGGTGAAGAAAGATACTGTGCTGTTACTCAGTTTGAG \n
SPU_021117	SPU_021117	none	#\nOnly two predicted exons are conserved: \n>SPU_021117|Scaffold22466|7279|7378| DNA_SRC: Scaffold22466 START: 7279 STOP: 7378 STRAND: +  \nCTCAGGCAGAGGTAGACATGAAGGCCTGGTTTTGCCGTAGAGGGCGAAGAAGAAATCCACCACATCACAA \nCGTAACCTGGCATCGTGAGATGTACCTGAA \n>SPU_021117|Scaffold22466|7944|8065| DNA_SRC: Scaffold22466 START: 7944 STOP: 8065 STRAND: +  \nTTCATGTGGGCCCAGAGCCTCTCTACTAGTTCCTCTGTGCTAAGCTTCGACTGTTCCTTGTACTTAAACG \nGTGGGTTGGCTATCAGCAGCCGTAGTAGCTTGTGCCTGGGTTGGAGATACAT \n
SPU_026235	SPU_026235	none	fragment\n
SPU_012022	SPU_012022	none	The following exons are the only ones conserved with other CPA proteins. \n>SPU_012020|Scaffold48388|53670|53789| DNA_SRC: Scaffold48388 START: 53670 STOP: 53789 STRAND: +  \nCTTGATTTCTGGAAGCGCCCGTCGAAGGTTGGACGGCCCGTCGACGTGATGGTCTCCCCCGCCCAGCAGT \nTGAGCTTCGTTAGCTCCGCGAGCCGCCCTGGACTCTTTATCGAGACTTGG \n>SPU_012020|Scaffold48388|55561|55662| DNA_SRC: Scaffold48388 START: 55561 STOP: 55662 STRAND: +  \nATCGCGAAGTCACCATCTGCAACGAACGTGGCTTACATCCAGGGAGGCATCCACGCCCGCGAATGGGTCA \nGCCCAGCTACAGTCATCAACCTCATCAAAAAT \n>SPU_012020|Scaffold48388|56376|56483| DNA_SRC: Scaffold48388 START: 56376 STOP: 56483 STRAND: +  \nTACATAGATAACTACGGCAGTGATGATACGGTGACGAGCATGTTGGATAACTTTGTGTGGATCATTGTAC \nCCGTCTACAACATCGATGGATACAAGTTCAGCCACACC \n>SPU_012020|Scaffold48388|58257|58347| DNA_SRC: Scaffold48388 START: 58257 STOP: 58347 STRAND: +  \nGACGATCGTATGTGGCGCAAGAATCGCAACCCCAACGTAGGAGGCTGCGCTGGAGTCGATCTGAACCGCA \nACTATGACTTCGAGTGGGGAG \n>SPU_012020|Scaffold48388|61004|61206| DNA_SRC: Scaffold48388 START: 61004 STOP: 61206 STRAND: +  \nGTGCCAGCAAACAGAGGTGTACCCAGGATTATCAGGGCACAGAGCCGCTGAGTGAACCCGAGAACAGCGG \nCTCCAAGGCTTTCCTGCAAGGCTTTGGTTCAAACCTCAAACTCTTCATTGATTTCCACGCCTATGGCCAG \nTACTGGCTCTACCCATGGGGTTACACCAGGAGAACCCTTGCACAACCAGATAGAGACGATCAG \n>SPU_012020|Scaffold48388|62567|62757| DNA_SRC: Scaffold48388 START: 62567 STOP: 62757 STRAND: +  \nACCCCGCAACAGGTGCAAGCGAAGACTTTGGATACGGCTCCCTGGGTGTGAAGTACACCTACGTGGTGGA \nGCTGAGGGATGAGGGCACTTTCGGGTTCTCGCTCCCCGCCTACCAGATCCAGCCCACCGGTGAGGAGATC \nTTCGCCGGTATGAAGACACTCGGCAAGCAGCTCGTTGCCGAGTATGCTTAG \n\t\t\n
SPU_027565	SPU_027565	none	coomparison to best blast hit suggests that the prediction may be missing N-terminal sequence.\n
SPU_010455	SPU_010455	none	potential novel kinase family member\n
SPU_011808	SPU_011808	none	potential novel kinase family member\n
SPU_012859	SPU_012859	none	potential novel kinase family member\n
SPU_013806	SPU_013806	none	potential novel kinase family member\n
SPU_016864	SPU_016864	none	potential novel kinase family member\n
SPU_017078	SPU_017078	none	potential novel kinase family member\n
SPU_018352	SPU_018352	none	potential novel kinase family member\n
SPU_020735	SPU_020735	none	potential novel kinase family member\n
SPU_022611	SPU_022611	none	potential novel kinase family member\n
SPU_028859	SPU_028859	none	potential novel kinase family member\n
SPU_028625	SPU_028625	none	probably TK family member \n
SPU_002076	SPU_002076	none	#\nprobable TKL family member\n
SPU_006622	SPU_006622	none	probable TKL family member \n
SPU_008015	SPU_008015	none	probable TKL family member \n
SPU_002493	SPU_002493	none	#\npotential novel kinase family member\n
SPU_005457	SPU_005457	none	potential novel kinase family member\n
SPU_022612	SPU_022612	none	potential novel kinase family member\n
SPU_012527	SPU_012527	none	Best alignment in the central region, where coiled coil domain is likely encoded.\n
SPU_021921	SPU_021921	none	partial sequence; based on NCBI predicted data set\n
SPU_008923	SPU_008923	none	distinct from 07733, 25829\n
SPU_025421	SPU_025421	none	Eval 8e-12 against cd00159, RhoGAP, GTPase-activator protein for Rho-like GTPases...\t \n181 aas are 100% identical to NCBI prediction XP_798334 "PREDICTED: similar to Kinesin-like motor protein C6orf102, partial [Strongylocentrotus purpuratus]" \nAnnotated by RL Morris \n
SPU_008766	SPU_008766	none	SPU_008766 is exactly identical to the C-terminal region of XP_789874.  Evalue =1e-88  for XP_789874 against against "NP_055690.1|  kinesin family member 14 [Homo sapiens]".   \nAnnotated by RL Morris\n
SPU_001284	SPU_001284	none	Eval=2e-113 using SPU_001284 against "NP_055889.2|  kinesin family member 1B isoform b [Homo sapiens]" \nAnnotated by RL Morris\n
SPU_009400	SPU_009400	none	AAF04841 = Sp-kinesin-C cloned sequence.   \ne val against AAF04841 = 0. \nAnnotated by RL Morris.\n
SPU_015484	SPU_015484	none	Homologous to DNA-PKcs, 5'end of gene is most likely missing. This model shows similarity to the N-ter sequence of DNA-PKcs. SPU_015529 contains the 3'end / carboxy-terminus of the gene.\n
SPU_011294	SPU_011294	none	Middle third of this gene is SpILKAP b.\n
SPU_004669	SPU_004669	none	Exon 1 not confirmed by cDNA; \n5' end at edge of 16 kb contig;\t \nonly gene on contig \n
SPU_020182	SPU_020182	none	#\nAllele of SPU_00466, 3' end; \none exon in 13 kb contig\n
SPU_013544	SPU_013544	none	Gene model may be missing N-terminal 70-80 aa based on alignment to mammalian homologs.\n
SPU_027014	SPU_027014	none	Blasts to PTPRT, but forms part of a novel clade in phylogenetic analysis with PTPRFn1 and PTPRLec genes.  Partial sequence. See also SPU_008466, SPU_023162, and SPU_024820.\n
SPU_020604	SPU_020604	none	Blasts to PTPRA, but in phylogenetic analysis it forms part of a novel clade with PTPRLec1, PTPRLec2, PTPRLec4, PTPRLec5, PTPRLec6, PTPRFn1, and PTPRFn2.\n
SPU_023115	SPU_023115	none	This protein has structure typical of a PTPR, but does not clade with known PTPRs.  It forms a unique clade and has been renamed PTPRiz.\n
SPU_028575	SPU_028575	none	Blasts to Receptor-type tyrosine-protein phosphatase mu precursor, but does not clade with the PTPR K/M/T/U group in phylogenetic analysis.  Renamed PTPRY3.  Clades with SPU_015923 and SPU_020542 (PTPRY2).\n
SPU_019920	SPU_019920	none	Partial sequence.\n
SPU_000652	SPU_000652	none	See putative conserved domains at: \nhttp://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1137535304-18825-160717183845.BLASTQ4\n
SPU_022669	SPU_022669	none	Blasts to MTMR6, homologous to myotubularin related proteins 6, 7, and 8 in phylogenetic analysis.\n
SPU_011860	SPU_011860	none	This blasts to PPEF1, but phylogenetic analysis showed that it was more similar to PPEF2.  SPU_008844 is likely the identical protein.\n
SPU_024691	SPU_024691	none	SPU_005570 is a partial duplicate prediction.\n
SPU_005570	SPU_005570	none	Partial duplcate prediction for SPU_024691.\n
SPU_026828	SPU_026828	none	Protein sequence modifidied as per HMM prediction\n
SPU_007333	SPU_007333	none	Protein sequence modified as per HMM model\n
Nek1c	SPU_030165	none	Prediction as per HMM model\n
SPU_014818	SPU_014818	none	Protein sequence modified as per HMM prediction\n
SPU_016754	SPU_016754	none	prediction is probably wrong- way too long. Predicted HMM model sequence is: \n \nLTQAFGRSTAVLSMIGYIIGLGDRHLDNVLVNFVTGEVVHIDYNVCFEKG \nKNLRVPERVPFRMTQNVQAALGITGVE \n
SFK1b	SPU_030166	none	#\nprotein sequence (incomplete) as per HMM prediction. \n
SPU_024930	SPU_024930	none	see comments for glean 24721. This is 3' part of gene. Predicted protein sequence is \nMAQAYVEDTLMVRHELKHAYGKVFLVRKVGNHNQGKLYAMKVLKKATIVQ \nKAKTAEHTMTERQVLEAVRSCPFLVTLHYAFQTDSKLNLILDYVNGGELF \nTHLYQREHFRESEVRIYIAEIIIALDCLHRILTSHPPMPNTFSKEVKDFI \nNKLLVKDPTKRLGCNGVKDIKSHSFFKGLNWDDVAAKRVSPPFRPHINGE \nLDTSNFAEEFTSLVPADSPADIPKTADARVFRVGYSFIAPSILYSDNAIT \nQDMLTQPSEHNRPSLASILSIHELKDSPFNKYYELDMKSAPIGDGSFSIC \nRRCTHRKTEKEYAVKIVSRRVACTQEITTLQLCQKHPNIVHLKEEFKDKL \nHTYIIMELCKGGELLGRIRKKKHFDELEASMIMRKLVSAVDYMHSRGIVH \nRDLKPENILFTDDSDDAELKIIDFGFARITNSNQPLKTPCFSLHFAAPEV \nLKRAYEQDGEYDASCDVWSLGVILYTMLSGRVPFQDPSISKSNSASDIMK \nRIKHGNFSFDGEEWNSVSTPAKDLIKGLLTVDPSRRLTTDDLLQNEWIQG \nQQLSTSTPLMTPDILNSCASIQKRVKATMRAFHTAQREGFLLTDVSNAPL \nAKRRKKKKDSSTETRSSSSESTHSQSSSSQESTTPTPTANPVLTIPVTTV \nSCAPRTTTATGAPSIPSVQPLPSLSKQTGARLDQYESLESLGFSPILPFS \nAGGSQELPPLLARQDSGYVGQMPSYAQVTPVPRTNVGSHGVTYAPILDPS \nMYPCGLQQPILDFSSSIPEYLSVQYASTEQPSIPMTVPRTLHQPHPHPLP \nLPHQHLSHLPTISEDPSTT \n
SPU_025882	SPU_025882	none	This is likely to be the N terminus of 14404 (DAPK). The fused protein sequence is pasted below, but is probably still missing a few exons based on alignment to human DAPK1 (NP_004929). \n \nMAMFRTESVEEFYQIGEDIGSGQFSEVKKVTEKSTGKDYAGKFIRKRRST \nASRRGVKREDIVREVSILEELSHDNIISLHDAFELQKEVVLILELVTGGE \nLFHYLAEEDHVNEEVAAQFVKKILEALKHMHDRNICHLDLKPENIMLLNR \nNTQNIMLIDFGLSRRIKPGEDIRDIMGTAEFVAPEIINFEPLSLNTDMWP \nVFISSRLPNIQLIQLSRSQVIQLVCVFISPKMGHRREQLALRKMSKALRS \nDWEHGETALHLAAGYGHVDILEYLQAKGASIDVADKTGETPLHVAGRYGQ \nVEAVQYLCDQAVNSNLADEDGETPLHIAAWHGYTSIVQTLCKAGATLDLK \nNKDGETTLLCAAARGHLDIVKILVEAGALLNTIDKHGITPLHHAVRRQHY \nDIVKYLVDSNCDVNLQDKLGDTPLNVACKEGALDLVEMLHAVGAKRDILN \nRHKNSALHMAARGGHIEVVRYLCLAGALIHQRNQDGLTASQLASLEGHED \nVADVLTQVEGDKSKDLFINQLNSTSGPLHRIRIKVLGQSGVGKTALIDSL \nKCGYFRGLFRRSRSNISLIGSSSNGRSSPRSPRSPRSPLTPMFGNGKKMD \nGGRFFMESLKRKQLSSTSSSFDVDSEVTRGIEFTHGTIPGAGDFTFLEFS \nGEDTYHTAYPHFLSDEGAIHLVVFSLDDMFEEQLAQVTYWMNFLRSQLPA \nTEPVGYCGKYRQQPKIALVATHADHTQCPKQPTGELISGEGNIVLYQTKR \nLFGRLFDICDVLFVMDANSAQSKDVKMLRTHISSLRNSILKDKSKDLFIN \nQLNSTSGPLHRIRIKVLGQSGVGKTALIDSLKCGYFRGLFRRSRSNISLI \nGSSSNGRSSPRSPRSPRSPLTPMFGNGKKMDGGRFFMESLKRKQLSSTSS \nSFDADSEVTRGIEFTHGTIPGAGDFTFLEFSGEDTYHTAYPHFLSDEGAI \nHLVVFSLDDMFEEQLAQVTYWMNFLRSQLPATEPVGYCGKYRQQPKIALV \nATHADHTQCPKQPTGEMISGEGNIVLYQTKRLFGRLFDICDVLFVMDANS \nAQSKDVKMLRTHISSLRNSILKVEAPVSVLCEAVASALPAWRRTFVNFPV \nLTWQQFSEGIHASINPLAGQAHLREVGRQLHLMGEVQCFGSELLQEVIVI \nEPTWLCSGIIGRLLSHDATEQPEGQYSIHYIQSLFPDTDAMDISQLMEAM \nDICVHGTVCEIPAVMRCPAPEGIWEKEDENGNFRVYGGVRMQLSDCGSTL \nPSGLFSRIQMSLRRNFQQDMEDTTDNELVMWRNGAKCSSGSIEGLISMTN \nDECAIEIKVRGYNDTRQGCFIFLEDLVHLVKHVLVDSYPGLPLNMEVLSP \nIQLSSHEKTIMVYNACSLLRLQLRTERTVENPISNQEEDFVDIFCFGSES \nVESNLIAGVDLHLSEIPSLTRRQISMLLDPPDPMGKDWCLLAVGLGLTEK \nIPMLDTLNRRCGPDESDSPTERLLQEWGKEETNSVGVLLNKVKDLGREDV \nLRVLMQGSPLYKFVPDPRALEEGRQSGSGSNHSSGTVASR \n
SPU_014118	SPU_014118	none	#\nThis gene combines with part of the sequence from SPU_014119 to make a complete gene model (Helicase_C domain is missing from SPU_014119). \nSee other gene model for further details.\n
SPU_011306	SPU_011306	none	in progress\n
SPU_026949	SPU_026949	none	more similar to vertebrate transferrins than invertebrate \nappears to be novel form\n
Sp-Mafs	SPU_030167	none	Likely missing first exon (non-coding) and second exon encoding N-terminal sequence. \nFirst reported by Coolen, et al. (2005). Phylogenomic analysis and expression patterns of large Maf genes in Xenopus tropicalis provide new insights into the functional evolution of the gene family in osteichthyans. Dev Genes Evol 215, 327-39.\n
SPU_009476	SPU_009476	none	missing exon in middle\n
SPU_008954	SPU_008954	none	There seem to be two MAP1A/1B_LC3-like proteins encoded in the S. purpuratus genome: SPU_009444 (~72% sequence identity) and SPU_008954 (~60% sequence identity).\n
SPU_004008	SPU_004008	none	Contains FHA domain and Reverse transcriptase domain \nForkhead associated domain (FHA); found in eukaryotic and prokaryotic proteins. Putative nuclear signalling domain. FHA domains may bind phosphothreonine, phosphoserine and sometimes phosphotyrosine. In eukaryotes, many FHA domain-containing proteins localize to the nucleus, where they participate in establishing or maintaining cell cycle checkpoints, DNA repair, or transcriptional regulation. Members of the FHA family include: Dun1, Rad53, Cds1, Mek1, KAPP(kinase-associated protein phosphatase),and Ki-67 (a human nuclear protein related to cell proliferation).\n
SPU_001270	SPU_001270	none	 only N-terminus fragment\n
SPU_001271	SPU_001271	none	 only N-terminus fragment\n
SPU_003343	SPU_003343	none	 fragment\n
SPU_010265	SPU_010265	none	 partial, missing C-terminus half, should join with SPU_010266\n
SPU_010266	SPU_010266	none	 fragment, should join with SPU_010265\n
SPU_010602	SPU_010602	none	 fragment\n
SPU_010738	SPU_010738	none	 fragment\n
SPU_013126	SPU_013126	none	 fragment\n
SPU_013531	SPU_013531	none	 partial, missing N-terminus\n
SPU_014927	SPU_014927	none	 fragment\n
SPU_015609	SPU_015609	none	 fragment\n
SPU_016267	SPU_016267	none	 fragment\n
SPU_017253	SPU_017253	none	 fragment\n
SPU_018825	SPU_018825	none	 partial\n
SPU_019381	SPU_019381	none	 fragment\n
SPU_019849	SPU_019849	none	 partial, missing N- and C-terminus\n
SPU_020358	SPU_020358	none	 partial, missing N-terminus\n
SPU_022124	SPU_022124	none	 fragment, missing N-terminus region\n
SPU_001506	SPU_001506	none	 fragment\n
SPU_002449	SPU_002449	none	 fragment\n
SPU_001514	SPU_001514	none	 fragment, extra mismatched stretch on N-terminus\n
SPU_003613	SPU_003613	none	 partial\n
SPU_006198	SPU_006198	none	 extra mismatched long stretch on C-terminus\n
SPU_006971	SPU_006971	none	 fragment\n
SPU_008301	SPU_008301	none	 fragment\n
SPU_008303	SPU_008303	none	 partial\n
SPU_008612	SPU_008612	none	 fragment\n
SPU_009147	SPU_009147	none	 fragment, unmatched stretch on N-terminus\n
SPU_006296	SPU_006296	none	Hypothetical protein \nsimilarities with solute carrier \nATP binding cassette \nlactotrasferrin\n
SPU_001715	SPU_001715	none	Hypothetical protein with no homologs in other species\n
SPU_005307	SPU_005307	none	probable ortholog of human Zinc-finger 318 \nthe naming is different form the Stefan Materna naming  \nbecause of some uncertainties in the homology \n
SPU_001945	SPU_001945	none	Hypothetical protein with BLAST hits with various prots with similiraties with Kazrin (among others) \nNo clear orthology \n  \nContains SMC and 3 SAM domains  \nSMC=nucleotide binding cassette \nSAM=Sterile alpha motif.; Widespread domain in signalling and nuclear proteins. In EPH-related tyrosine kinases, appears to mediate cell-cell initiated signal transduction via the binding of SH2-containing proteins to a conserved tyrosine that is phosphorylated. In many cases mediates homodimerization. \n
SPU_021165	SPU_021165	none	#\nPREDICTED: similar to Laminin-like protein K08C7.3 precursor  \n
SPU_006349	SPU_006349	none	transposon\n
SPU_022598	SPU_022598	none	PREDICTED: similar to flavin containing monooxygenase 5 \n \nContains a long Fibrinogen-related domain (FReD); C terminal globular domain of fibrinogen. Fibrinogen is involved in blood clotting, being activated by thrombin to assemble into fibrin clots. The N-termini of 2 times 3 chains come together to form a globular arrangement called the disulfide knot. The C termini of fibrinogen chains end in globular domains, which are not completely equivalent. C terminal globular domains of the gamma chains (C-gamma) dimerize and bind to the GPR motif of the N-terminal domain of the alpha chain, while the GHR motif of N-terminal domain of the beta chain binds to the C terminal globular domains of another beta chain (C-beta), which leads to lattice formation. \n
SPU_005714	SPU_005714	none	#\nHypothetical prot similar to CTD binding prot \n \nContains a RING and a PHD domain \n \nRING-finger (Really Interesting New Gene) domain, a specialized type of Zn-finger of 40 to 60 residues that binds two atoms of zinc; defined by the 'cross-brace' motif C-X2-C-X(9-39)-C-X(1-3)- H-X(2-3)-(N/C/H)-X2-C-X(4-48)C-X2-C; probably involved in mediating protein-protein interactions; identified in a proteins with a wide range of functions such as viral replication, signal transduction, and development; has two variants, the C3HC4-type and a C3H2C3-type (RING-H2 finger), which have different cysteine/histidine pattern; a subset of RINGs are associated with B-Boxes (C-X2-H-X7-C-X7-C-X2-C-H-X2-H) \n \nPHD-finger. PHD folds into an interleaved type of Zn-finger chelating 2 Zn ions in a similar manner to that of the RING and FYVE domains.\n
SPU_022038	SPU_022038	none	contains two Cap-ED domains  \nPREDICTED: similar to cAMP-dependent protein kinase type I-alpha  regulatory subunit\n
SPU_007251	SPU_007251	none	PREDICTED: similar to sequestosome 1 isoform 1 \n \nContains PB1 sdomain and ZZ domain \nPB1 domain ; Phox and Bem1p domain, present in many eukaryotic cytoplasmic signalling proteins. The domain adopts a beta-grasp fold, similar to that found in ubiquitin and Ras-binding domains. A motif, variously termed OPR, PC and AID, represents the most conserved region of the majority of PB1 domains, and is necessary for PB1 domain function. This function is the formation of PB1 domain heterodimers, although not all PB1 domain pairs associate \n \nZinc finger, ZZ type. Zinc finger present in Drosophila ref(2)P, NBR1, Human sequestosome 1 and related proteins. The ZZ motif coordinates two zinc ions and most likely participates in ligand binding or molecular scaffolding. Drosophila ref(2)P appears to control the multiplication of sigma rhabdovirus. NBR1 (Next to BRCA1 gene 1 protein) interacts with fasciculation and elongation protein zeta-1 (FEZ1) and calcium and integrin binding protein (CIB), and may function in cell signalling pathways. Sequestosome 1 is a phosphotyrosine independent ligand for the Lck SH2 domain and binds noncovalently to ubiquitin via its UBA domain.\n
SPU_010285	SPU_010285	none	PREDICTED: similar to ubiquitously transcribed tetratricopeptide  repeat gene, X chromosome \n \nContains 1 TPR domain and 1 jmjC domain \n \nTetratricopeptide repeat domain; typically contains 34 amino acids [WLF]-X(2)-[LIM]-[GAS]-X(2)-[YLF]-X(8)-[ASE]-X(3)-[FYL]-X(2)-[ASL]-X(4)-[PKE] is the consensus sequence; found in a variety of organisms including bacteria, cyanobacteria, yeast, fungi, plants, and humans in various subcellular locations; involved in a variety of functions including protein-protein interactions, but common features in the interaction partners have not been defined; involved in chaperone, cell-cycle, transciption, and protein transport complexes; the number of TPR motifs varies among proteins (1,3-11,13 15,16,19); 5-6 tandem repeats generate a right-handed helical structure with an amphipathic channel that is thought to accomodate an alpha-helix of a target protein; it has been proposed that TPR proteins preferably interact with WD-40 repeat proteins, but in many instances several TPR-proteins seem to aggregate to multi-protein complexes; examples of TPR-proteins include, Cdc16p, Cdc23p and Cdc27p components of the cyclosome/APC, the Pex5p/Pas10p receptor for peroxisomal targeting signals, the Tom70p co-receptor for mitochondrial targeting signals, Ser/Thr phosphatase 5C and the p110 subunit of O-GlcNAc transferase; three copies of the repeat are present here \n \njmjC domain. The jmjC domain is thought to be involved in chromatin organisation by modulating heterochromatisation. \n
SPU_016786	SPU_016786	none	PREDICTED: similar to Retinal homeobox protein Rx3 \n
SPU_016787	SPU_016787	none	PREDICTED: similar to Alpha-1B adrenergic receptor \nContains a 7 transmembrane receptor (rhodopsin family)(8e-18)\n
SPU_019627	SPU_019627	none	Predicted: putative protein which \ncontains a Putative homoserine kinase type II (protein kinase fold) [General function prediction only] \nthe level of homologyis weak \n
SPU_023530	SPU_023530	none	predicted protein with similarities to At rich interactive domain Swi1-like\n
SPU_024578	SPU_024578	none	PREDICTED: similar to zn-finger, CCHC type and RNA-directed DNA polymerase  and Integrase, catalytic domain containing protein  \nfamily member (1E419) \n \nHypothetical prot with reverse transcriptase and integrase domains\n
SPU_015699	SPU_015699	none	This gene encodes a signal peptide and a C1q domain.\n
SPU_015700	SPU_015700	none	this gene encodes a signal peptide and a C1q domain\n
SPU_025635	SPU_025635	none	SPU_003194 may represent the latter half of this gene. \n
SPU_012648	SPU_012648	none	The prediction appears to be a combination of two genes, one encoding a bzip transcription factor and they other a phosphatidylinositol glycan.  Only SPU_012648|Scaffold85774|13949|15052| is conserved with the bzip transcription factor gene, zhangfei.\n
SPU_023034	SPU_023034	none	SPU_014930 is a partial duplicate prediction.\n
SPU_014930	SPU_014930	none	Partial duplicate prediction of SPU_023034\n
SPU_011973	SPU_011973	none	SPU_011973 has part I. SPU_001984 has the middle part. Last part may be missing.\n
SPU_001984	SPU_001984	none	SPU_011973 has part I. SPU_001984 has the middle part. Last part may be missing.\n
SPU_000127	SPU_000127	none	SPU_009993 is a duplicate prediction.\n
SPU_000893	SPU_000893	none	SPU_000893 encodes the middle part of USP34. SPU_013835 encodes the last. SPU_013835 and SPU_008562 are overlapping duplicate predictions. First part of the gene homologous to the human USP34 is missing. SPU_028691 (homologous with ~12-312 human USP34) and SPU_005174 (homologous with ~ 698-910 aa of human protein) are only partially similar to the human USP34.\n
SPU_002882	SPU_002882	none	SPU_002882 is a partially incorrect prediction for USP46. It appears to have an extra exon at the beginning and ~300 aa at the end are non-homologous.\n
SPU_003944	SPU_003944	none	SPU_003944 has part I, SPU_003945 as part II and SPU_023070 has part III of USP32 gene. In addition, SPU_003945 and SPU_023070 share an overlap of ~100 AA.\n
SPU_003945	SPU_003945	none	SPU_003944 has part I, SPU_003945 as part II and SPU_023070 has part III of USP32 gene. In addition, SPU_003945 and SPU_023070 share an overlap of ~100 AA.\n
SPU_023070	SPU_023070	none	SPU_003944 has part I, SPU_003945 as part II and SPU_023070 has part III of USP32 gene. In addition, SPU_003945 and SPU_023070 share an overlap of ~100 AA.\n
SPU_008736	SPU_008736	none	Human USP48 gene isoforms are very different in size. Isoform a is 1035 aa long where as isoform b is 485 aa long. SPU_008736 is homologous to the isoform b but along with SPU_001900 may also be considered homologous to human USP48 isoform a. There are ~170 aa missing from the isoform a in urchin, if SPU_008736 and 01900 are coding for USP48 isoform a.\n
SPU_001900	SPU_001900	none	Human USP48 gene isoforms are very different in size. Isoform a is 1035 aa long where as isoform b is 485 aa long. SPU_008736 is homologous to the isoform b but along with SPU_001900 may also be considered homologous to human USP48 isoform a. There are ~170 aa missing from the isoform a in urchin, if SPU_008736 and 01900 are coding for USP48 isoform a.\n
SPU_006393	SPU_006393	none	SPU_006393 encodes fisr half of the gene. SPU_011524 has the rest.\n
SPU_011524	SPU_011524	none	SPU_006393 encodes fisr half of the gene. SPU_011524 has the rest.\n
SPU_011940	SPU_011940	none	Appears to be missing ~200 at the beginning of USP22.\n
SPU_021779	SPU_021779	none	SPU_021779 is a partial duplicate prediction for SPU_009955.\n
SPU_028265	SPU_028265	none	First ~400 aa (as compared to human USP10) are missing from GLEAN3 predictions.\n
SPU_013119	SPU_013119	none	this is the same as SPU_002448.\n
SPU_015421	SPU_015421	none	also sim to SPU_022485\n
Sp-Il17-9	SPU_030184	none	This gene was annotated based on FgeneshAB and Fgenesh++. \n
Sp-Il17-10	SPU_030185	none	#\nThis gene was annotated based on FgeneshAB and ++.\n
Sp-Il17-11	SPU_030186	none	This gene was annotated based on FgeneshAB and ++. \n
Sp-Il17-12	SPU_030187	none	This gene was annotated based on FgeneshAB and ++. It may be partial because unknown sequence is located behind the first exon.\n
Sp-Il17-13	SPU_030188	none	This gene model was annotated based on FgeneshAB and ++. \n
Sp-Il17-14	SPU_030190	none	This gene model was annotated based on FgeneshAB and ++. \n
Sp-Il17-15	SPU_030191	none	This gene model was annotated based on FgeneshAB and ++. \n
Sp-Il17-16	SPU_030192	none	#\nThis gene model was annotated based on FgeneshAB and ++. \n
Sp-Il17-17	SPU_030193	none	This gene model was annotated based on FgeneshAB and ++. \n
Sp-Il17-p2	SPU_030194	none	#\nThis partial gene model was annotated based on FgeneshAB. The model is  located in a short scaffold. \n
Sp-Il17-p3	SPU_030195	none	This partial gene model was annotated based on FgeneshAB, ++ and BlastN.  The model is located at the end of a contig. \n
Sp-Il17-18	SPU_030196	none	This gene was annotated based on FgeneshAB and ++. \n
Sp-Il17-19	SPU_030197	none	#\nThis gene was annotated based on FgeneshAB and ++. \n
Sp-Il17-20	SPU_030198	none	This gene model was annotated based on FgeneshAB and ++. \n
Sp-Il17-21	SPU_030199	none	#\nThis gene model was annotated based on FgeneshAB and ++. The model is partial and located at the end of a scaffold. \n
Sp-Il17-22	SPU_030200	none	This gene model was annotated based on a part of NCBI and FgeneshAB prediction. The model is partial and located at the end of a contig. \n \n
Sp-Il17-p4	SPU_030201	none	#\nThis gene model was annotated based on FgeneshAB. The model was partial in a short scaffold, so Il17 domain was not found. But the nucleotide sequence was 93% similar to that of aother Sp-Il17.\n
Sp-Il17-p5	SPU_030202	none	This gene model was annotated based on FgeneshAB and ++. The model was located at the end of a scaffold and Il17 domain was not found. But the nucleotide sequence was 93% similar to another Sp-Il17. \n
Sp-Il17-23	SPU_030203	none	This gene model was annotated based on FgeneshAB and ++. It has a partial Il17 domain.\n
Sp-Il17-24	SPU_030204	none	This gene model was annotated based on FgeneshAB and ++. \n
SPU_014962	SPU_014962	none	It is possible that SPU_000634 may code for the first part of this protein.\n
SPU_017602	SPU_017602	none	SPU_016877 may represent the first half of this gene.\n
SPU_017871	SPU_017871	none	Duplicate prediction for SPU_007962.\n
SPU_016537	SPU_016537	none	SPU_001217 is a longer duplicate prediction.\n
SPU_001217	SPU_001217	none	SPU_001217 is a longer duplicate prediction for SPU_016537.\n
SPU_000228	SPU_000228	none	SPU_003049 had first half of the HCFC2 gene. SPU_000228 likely codes the rest, it may be incorrectly predicted.\n
SPU_003049	SPU_003049	none	SPU_003049 had first half of the HCFC2 gene. SPU_000228 likely codes the rest, it may be incorrectly predicted.\n
SPU_022607	SPU_022607	none	SPU_028866 is a partial duplicate prediction for SPU_022607.\n
SPU_023287	SPU_023287	none	SPU_008313 is a partial duplicate prediction.\n
SPU_024246	SPU_024246	none	SPU_024246 codes for part I of TSC2 and SPU_023402 codes the rest.\n
SPU_023402	SPU_023402	none	SPU_024246 codes for part I of TSC2 and SPU_023402 codes the rest.\n
SPU_028104	SPU_028104	none	SPU_028850 is a duplicate long prediction that is likely incorrect.\n
SPU_028850	SPU_028850	none	Incorrect longer prediction for RPL15. \n
SPU_023692	SPU_023692	none	Appears to be short by ~29 aa at beginning.\n
SPU_009084	SPU_009084	none	Missing last half.\n
SPU_024153	SPU_024153	none	Nterminus of APC1; the remaining parts of the gene are found in SPU_008018 and 12580\n
SPU_008018	SPU_008018	none	see SPU_024153 for annotation\n
SPU_012580	SPU_012580	none	see SPU_024153 for annotation\n
SPU_021616	SPU_021616	none	This Glean contains the C-terminal sequence of Psf2 (aa 186-301). \n
SPU_017818	SPU_017818	none	SPU_017818 covers the C-terminal sequence of the GINS protein subunit Psf1 \nSPU_017817 covers the N-terminal sequence of the GINS protein subunit Psf1\n
SPU_017817	SPU_017817	none	 \nSPU_017817 covers the N-terminal sequence of the GINS protein subunit Psf1 \nSPU_017818 covers the C-terminal sequence of the GINS protein subunit Psf1 \n
SPU_018318	SPU_018318	none	Missing ~60aa at the end.\n
SPU_002606	SPU_002606	none	SPU_027769 is a partial duplicate prediction for SPU_002606.\n
Sp-DNAH4	SPU_030222	none	The coding sequence of this gene has been assembled from exons identified in the 3/2005 Version of the S. purpuratus genome by indexing to the homologous mammalian gene, accession number AAM12861 (Morris, RL, et al., Dev. Biol. in press). The exons span multiple scaffolds (see below) in the current version of the genome and their detailed coordinates will be added to the database at a later stage. In the meantime, exon details can be obtained by email from the annotators robar@scientist.com and igibbons@berkeley.edu. Merge of scaffold78480, scaffold102459,scaffold95646 and scaffold102781. Sp-DNAH4 is 54% identical to human axonemal dynein heavy chain 3  \n \n
Sp-DNAH2	SPU_030224	none	#\nThe coding sequence of this gene has been assembled from exons identified in the 3/2005 Version of the S. purpuratus genome by indexing to the homologous mammalian gene, accession number NP_065928 (Morris, RL, et al., Dev. Biol. in press). The exons span multiple scaffolds (see below) in the current version of the genome and their detailed coordinates will be added to the database at a later stage. In the meantime, exon details can be obtained by email from the annotators robar@scientist.com and igibbons@berkeley.edu. Merge of scaffold2032, scaffold62207, scaffold46527, scaffold22838.\n
Sp-DNAH3	SPU_030225	none	The coding sequence of this gene has been assembled from exons identified in the 3/2005 Version of the S. purpuratus genome by indexing to the homologous mammalian gene, accession number NP060009 (Morris, RL, et al., Dev. Biol. in press). The exons span multiple scaffolds (see below) in the current version of the genome and their detailed coordinates will be added to the database at a later stage. In the meantime, exon details can be obtained by email from the annotators robar@scientist.com and igibbons@berkeley.edu.  Merge of scaffold108582 and scaffold87263\n
Sp-DNAH8	SPU_030229	none	The coding sequence of this gene has been assembled from exons identified in the 3/2005 Version of the S. purpuratus genome by indexing to the homologous mammalian gene, accession number CAI42433(Morris, RL, et al., Dev. Biol. in press). The exons span multiple scaffolds (see below) in the current version of the genome and their detailed coordinates will be added to the database at a later stage. In the meantime, exon details can be obtained by email from the annotators robar@scientist.com and igibbons@berkeley.edu. Merge of scaffold289, scaffold56250, scaffoldi854 (3/2006 assembly), scaffold56250 \n
Sp-DNAH14	SPU_030233	none	The coding sequence of this gene has been assembled from exons identified in the 3/2005 Version of the S. purpuratus genome by indexing to the homologous mammalian gene, accession number XP_943287(Morris, RL, et al., Dev. Biol. in press). The exons span multiple scaffolds (see below) in the current version of the genome and their detailed coordinates will be added to the database at a later stage. In the meantime, exon details can be obtained by email from the annotators robar@scientist.com and igibbons@berkeley.edu. Merge of scaffold86991, scaffold33286, scaffold58811, scaffold65994,  \n
Sp-DYNC2H1	SPU_030235	none	The coding sequence of this gene has been assembled from exons identified in the 3/2005 Version of the S. purpuratus genome by indexing to the homologous sea urchin (Tripneustes gratilla) gene, accession number AAA63583 (Morris, RL, et al., Dev. Biol. in press). The exons span multiple scaffolds (see below) in the current version of the genome and their detailed coordinates will be added to the database at a later stage. In the meantime, exon details can be obtained by email from the annotators robar@scientist.com and igibbons@berkeley.edu. Merge of scaffold23743, scaffold_v2_46821, scaffold28672, scaffold_v2_2321, scaffold53559\n
SPU_020649	SPU_020649	none	This probably represents the C terminal exon of SPU_020648, and should be combined with that model.\n
SPU_014454	SPU_014454	none	SPU_015818 is a partial duplicate.\n
Sp-DNAH5	SPU_030226	none	The coding sequence of this gene has been assembled from exons identified in the 3/2005 Version of the S. purpuratus genome by indexing to the homologous mammalian gene, accession number CAI42433 (Morris, RL, et al., Dev. Biol. in press). The exons span multiple scaffolds (see below) in the current version of the genome and their detailed coordinates will be added to the database at a later stage. In the meantime, exon details can be obtained by email from the annotators robar@scientist.com and igibbons@berkeley.edu.  Merge of scaffold1668, scaffold59332, scaffold1157, scaffold11477\n
Sp-DNAH6	SPU_030227	none	The coding sequence of this gene has been assembled from exons identified in the 3/2005 Version of the S. purpuratus genome by indexing to the homologous mammalian gene, accession number XP_532984 (Morris, RL, et al., Dev. Biol. in press). The exons span multiple scaffolds (see below) in the current version of the genome and their detailed coordinates will be added to the database at a later stage. In the meantime, exon details can be obtained by email from the annotators robar@scientist.com and igibbons@berkeley.edu.  Merge of scaffold1866 and  scaffold105269 \n \n
Sp-DNAH7	SPU_030228	none	The coding sequence of this gene has been assembled from exons identified in the 3/2005 Version of the S. purpuratus genome by indexing to the homologous mammalian gene, accession number NP_061720 (Morris, RL, et al., Dev. Biol. in press). The exons span multiple scaffolds (see below) in the current version of the genome and their detailed coordinates will be added to the database at a later stage. In the meantime, exon details can be obtained by email from the annotators robar@scientist.com and igibbons@berkeley.edu. Merge of scaffold1775 and scaffold50668 \n
Sp-DNAH9	SPU_030230	none	The coding sequence of this gene has been assembled from exons identified in the 3/2005 Version of the S. purpuratus genome by indexing to the homologous sea urchin (Tripneustes gratilla) gene, accession number CAA42170. The exons span multiple scaffolds (see below) in the current version of the genome and their detailed coordinates will be added to the database at a later stage. In the meantime, exon details can be obtained by email from the annotators robar@scientist.com and igibbons@berkeley.edu.(Morris, RL, et al., Dev. Biol. in press). Merge of scaffold27, scaffold28695\n
Sp-DNAH10	SPU_030231	none	The coding sequence of this gene has been assembled from exons identified in the 3/2005 Version of the S. purpuratus genome by indexing to the homologous mammalian gene, accession number XP_543369 (Morris, RL, et al., Dev. Biol. in press). The exons span multiple scaffolds (see below) in the current version of the genome and their detailed coordinates will be added to the database at a later stage. In the meantime, exon details can be obtained by email from the annotators robar@scientist.com and igibbons@berkeley.edu.  Merge of scaffold75058 scaffold105976, scaffold102255, scaffold52292, scaffold48159\n
Sp-DNAH12	SPU_030232	none	The coding sequence of this gene has been assembled from exons identified in the 3/2005 Version of the S. purpuratus genome by indexing to the homologous mammalian gene, accession number XP_541831 (Morris, RL, et al., Dev. Biol. in press). The exons span multiple scaffolds (see below) in the current version of the genome and their detailed coordinates will be added to the database at a later stage. In the meantime, exon details can be obtained by email from the annotators robar@scientist.com and igibbons@berkeley.edu. Merge of scaffold108117, scaffold694, scaffold133848\n
Sp-DNAH15	SPU_030234	none	The coding sequence of this gene has been assembled from exons identified in the 3/2005 Version of the S. purpuratus genome by indexing to the homologous mammalian gene, accession number NP_001360 (Morris, RL, et al., Dev. Biol. in press). The exons span multiple scaffolds (see below) in the current version of the genome and their detailed coordinates will be added to the database at a later stage. In the meantime, exon details can be obtained by email from the annotators robar@scientist.com and igibbons@berkeley.edu. Merge of scaffold1897, scaffold128501, scaffold55352, scaffold71847, scaffold26249\n
Sp-DYNC1H1	SPU_030236	none	The coding sequence of this gene has been assembled from exons identified in the 3/2005 Version of the S. purpuratus genome by indexing to the homologous mammalian gene, accession number Q14204 (Morris, RL, et al., Dev. Biol. in press). The exons span multiple scaffolds (see below) in the current version of the genome and their detailed coordinates will be added to the database at a later stage. In the meantime, exon details can be obtained by email from the annotators robar@scientist.com and igibbons@berkeley.edu.  Merge of scaffold9923, scaffold88369, scaffold88369, scaffold3805\n
SPU_014060	SPU_014060	none	First part of the gene model prediction is incorrect.\n
Sp-DNAH1	SPU_030223	none	The coding sequence of this gene has been assembled from exons identified in the 3/2005 Version of the S. purpuratus genome by indexing to the homologous mammalian gene, accession number BAA92648 (Morris, RL, et al., Dev. Biol. in press). The exons span multiple scaffolds (see below) in the current version of the genome and their detailed coordinates will be added to the database at a later stage. In the meantime, exon details can be obtained by email from the annotators robar@scientist.com and igibbons@berkeley.edu. \nMerge of scaffold60126, scaffold75782, scaffold50318, scaffold83954. Sp-DNAH1 is 63% identical to human axonemal dynein heavy chain 1\n
SPU_002458	SPU_002458	none	SPU_021702 is a duplicate prediction.\n
SPU_021702	SPU_021702	none	Duplicate prediction for SPU_002458\n
SPU_011152	SPU_011152	none	Partial duplicate of SPU_006459.\n
SPU_009946	SPU_009946	none	Prediction is short by ~20 AA.\n
SPU_007971	SPU_007971	none	SPU_011900 is a duplicate prediction.\n
SPU_011900	SPU_011900	none	SPU_007971 duplicate.\n
SPU_022682	SPU_022682	none	Likely has an extra ~100 aa predicted towards the end.\n
SPU_006487	SPU_006487	none	Incorrect gene model. First ~120 AA are completely unrelated.\n
SPU_011456	SPU_011456	none	SPU_019736 is a duplicate prediction.\n
SPU_019736	SPU_019736	none	SPU_011456 is a duplicate prediction.\n
SPU_004087	SPU_004087	none	Partial duplicate prediction for SPU_002438.\n
SPU_008650	SPU_008650	none	Incorrect gene model. Extra AA at beginning and end.\n
SPU_018079	SPU_018079	none	SPU_004721 is a duplicate prediction.\n
SPU_013834	SPU_013834	none	Longer than necessary prediction.\n
SPU_008394	SPU_008394	none	SPU_006605 has first half and SPU_008394 has the rest.\n
SPU_005825	SPU_005825	none	SPU_005825 is a duplicate prediction for SPU_010936.\n
SPU_018866	SPU_018866	none	Missing ~25 AA at end.\n
SPU_010692	SPU_010692	none	SPU_026671 is a partial duplicate prediction.\n
SPU_005700	SPU_005700	none	SPU_005700 is a partial duplicate of SPU_05699.\n
SPU_011352	SPU_011352	none	SPU_020308 and SPU_001585 are partial identical predictions.\n
SPU_020308	SPU_020308	none	Partial duplicate prediction for SPU_011352.\n
SPU_011303	SPU_011303	none	SPU_011303 appears to have first part and SPU_004643 the latter half.\n
SPU_004643	SPU_004643	none	SPU_011303 appears to have first part and SPU_004643 the latter half.\n
SPU_021496	SPU_021496	none	this gene was identified and published by  Nemer et al 1991 \n
SPU_024608	SPU_024608	none	SPU_025585 appears to have partI of CAND1 and SPU_024608 the other part. There is an overlap of ~50 AA.\n
SPU_025585	SPU_025585	none	SPU_025585 appears to have partI of CAND1 and SPU_024608 the other part. There is an overlap of ~50 AA.\n
SPU_015870	SPU_015870	none	SPU_027123 is a partial duplicate prediction.\n
SPU_027123	SPU_027123	none	Duplicate prediction for SPU_015870.\n
SPU_005722	SPU_005722	none	SPU_0023814 is a partial duplicate prediction.\n
SPU_023814	SPU_023814	none	Partial duplicate prediction for SPU_005722.\n
SPU_024801	SPU_024801	none	Likely missing first exon.\n
SPU_025140	SPU_025140	none	SPU_018792 has first part and SPU_025140 has the latter half. There is some overlap in the two predictions.\n
SPU_018792	SPU_018792	none	SPU_018792 has first part and SPU_025140 has the latter half. There is some overlap in the two predictions.\n
SPU_023345	SPU_023345	none	SPU_003864 appears to have partI. SPU_023345 has partII. SPU_000408 appears to have the rest of the protein.\n
SPU_003864	SPU_003864	none	SPU_003864 appears to have partI. SPU_023345 has partII. SPU_000408 appears to have the rest of the protein.\n
SPU_000408	SPU_000408	none	SPU_003864 appears to have partI. SPU_023345 has partII. SPU_000408 appears to have the rest of the protein.\n
SPU_010706	SPU_010706	none	May be missing ~100 AA at the end.\n
SPU_010014	SPU_010014	none	SPU_010014 has first part and SPU_006607 has the latter half.\n
SPU_006607	SPU_006607	none	SPU_010014 has first part and SPU_006607 has the latter half.\n
SPU_018853	SPU_018853	none	Hit to the bestrophin homology is embedded within a longer prediction. \n
SPU_017216	SPU_017216	none	SPU_010982 is a partial duplicate prediction.\n
SPU_010982	SPU_010982	none	SPU_010982 is a partial duplicate prediction.\n
SPU_017745	SPU_017745	none	Model is longer than necessary.\n
SPU_004715	SPU_004715	none	Model is likely missing a few AA at beginning.\n
SPU_001164	SPU_001164	none	Likely missing ~50 AA at beginning.\n
SPU_018367	SPU_018367	none	Prediction is short ~50 AA at beginning.\n
SPU_025476	SPU_025476	none	Partial duplicate prediction for SPU_017333.\n
SPU_002228	SPU_002228	none	SPU_022387 is a partial duplicate prediction.\n
SPU_022387	SPU_022387	none	SPU_002228 is a partial duplicate prediction.\n
SPU_021649	SPU_021649	none	SPU_018977 has part I and SPU_021649 appears to have the rest of the gene.\n
SPU_015415	SPU_015415	none	SPU_003876 is a partial duplicate prediction.\n
SPU_003876	SPU_003876	none	SPU_003876 is a partial duplicate prediction for SPU_015415.\n
SPU_004528	SPU_004528	none	May be missing ~50 AA at end.\n
SPU_024347	SPU_024347	none	This model may contain 2 genes. Latter half is the DNA cytosine methylase (DNMT3A).\n
SPU_009023	SPU_009023	none	SPU_012849 is a partial duplicate prediction for SPU_009023.\n
SPU_025761	SPU_025761	none	SPU_025761 has part I. SPU_020254 has part II. SPU_020255 has part III. Missing a few AA at the end.\n
SPU_020255	SPU_020255	none	SPU_025761 has part I. SPU_020254 has part II. SPU_020255 has part III. Missing a few AA at the end.\n
SPU_020254	SPU_020254	none	SPU_025761 has part I. SPU_020254 has part II. SPU_020255 has part III. Missing a few AA at the end.\n
SPU_019071	SPU_019071	none	SPU_019071 is a duplicate prediction for SPU_024656.\n
SPU_024656	SPU_024656	none	SPU_019071 is a duplicate prediction for SPU_024656.\n
SPU_009473	SPU_009473	none	Missing ~200 AA at the beginning.\n
SPU_014637	SPU_014637	none	SPU_011917 is a partial duplicate prediction for SPU_014637.\n
SPU_011917	SPU_011917	none	SPU_011917 is a partial duplicate prediction for SPU_014637.\n
SPU_011688	SPU_011688	none	SPU_011688 is a duplicate prediction for SPU_008690. It also may represent a longer incorrect model for this gene.\n
SPU_008690	SPU_008690	none	SPU_011688 is a duplicate prediction for SPU_008690. It also may represent a longer incorrect model for this gene.\n
SPU_009423	SPU_009423	none	SPU_009422 has the first part and SPU_009423 has the rest of the gene.\n
SPU_009422	SPU_009422	none	SPU_009422 has the first part and SPU_009423 has the rest of the gene.\n
SPU_025290	SPU_025290	none	SPU_025290 is a partial duplicate prediction for SPU_004522.\n
SPU_011094	SPU_011094	none	SPU_006390 may encode for first part of NOC3L gene. SPU_011094 has the rest. SPU_016413 is a partial duplicate prediction for SPU_011094.\n
SPU_006390	SPU_006390	none	SPU_006390 may encode for first part of NOC3L gene. SPU_011094 has the rest.\n
SPU_016413	SPU_016413	none	SPU_006390 may encode for first part of NOC3L gene. SPU_011094 has the rest. SPU_016413 is a partial duplicate prediction for SPU_011094.\n
SPU_003253	SPU_003253	none	SPU_012701 has part I, SPU_018955 has part II, SPU_003253 has part III and SPU_004821 has the last part of the gene. SPU_028421 is a partial duplicate prediction for SPU_004821. SPU_005616 is a partial duplicate prediction for SPU_018955.\n
SPU_004821	SPU_004821	none	SPU_012701 has part I, SPU_018955 has part II, SPU_003253 has part III and SPU_004821 has the last part of the gene. SPU_028421 is a partial duplicate prediction for SPU_004821. SPU_005616 is a partial duplicate prediction for SPU_018955.\n
SPU_012701	SPU_012701	none	SPU_012701 has part I, SPU_018955 has part II, SPU_003253 has part III and SPU_004821 has the last part of the gene. SPU_028421 is a partial duplicate prediction for SPU_004821. SPU_005616 is a partial duplicate prediction for SPU_018955.\n
SPU_018955	SPU_018955	none	SPU_012701 has part I, SPU_018955 has part II, SPU_003253 has part III and SPU_004821 has the last part of the gene. SPU_028421 is a partial duplicate prediction for SPU_004821. SPU_005616 is a partial duplicate prediction for SPU_018955.\n
SPU_028421	SPU_028421	none	SPU_012701 has part I, SPU_018955 has part II, SPU_003253 has part III and SPU_004821 has the last part of the gene. SPU_028421 is a partial duplicate prediction for SPU_004821. SPU_005616 is a partial duplicate prediction for SPU_018955.\n
SPU_005616	SPU_005616	none	SPU_012701 has part I, SPU_018955 has part II, SPU_003253 has part III and SPU_004821 has the last part of the gene. SPU_028421 is a partial duplicate prediction for SPU_004821. SPU_005616 is a partial duplicate prediction for SPU_018955.\n
SPU_028675	SPU_028675	none	SPU_028675 is a partial duplicate prediction for SPU_026740.\n
SPU_009849	SPU_009849	none	May be missing ~25 AA be beginning.\n
SPU_004974	SPU_004974	none	SPU_004974 is a duplicate prediction for SPU_006042.\n
SPU_014021	SPU_014021	none	SPU_014021 is a partial duplicate prediction for SPU_013399.\n
SPU_007111	SPU_007111	none	SPU_007455 is a duplicate longer prediction.\n
SPU_000285	SPU_000285	none	SPU_000285 has most of the gene except the last part which appears to be on SPU_009456. There are errors in the prediction at the end of SPU_000285 and at beginning of SPU_009456 which perfectly match with the COPZ1 gene.\n
SPU_009546	SPU_009546	none	SPU_000285 has most of the gene except the last part which appears to be on SPU_009456. There are errors in the prediction at the end of SPU_000285 and at beginning of SPU_009456 which perfectly match with the COPZ1 gene.\n
SPU_012559	SPU_012559	none	SPU_012559 and SPU_026314 both contain the AP2S1 gene but both are erroneous long predictions. Beginning of the predictions do not code for any significant proteins in DB.\n
SPU_026314	SPU_026314	none	SPU_012559 and SPU_026314 both contain the AP2S1 gene but both are erroneous long predictions. Beginning of the predictions do not code for any significant proteins in DB.\n
SPU_016841	SPU_016841	none	SPU_018163 and SPU_016841 are partially duplicate predictions.\n
SPU_002616	SPU_002616	none	SPU_018244 is a partial duplicate prediction for SPU_002616.\n
SPU_018244	SPU_018244	none	SPU_018244 is a partial duplicate prediction for SPU_002616.\n
SPU_001148	SPU_001148	none	SPU_001148 - SPU_003209 - SPU_009645 represent three parts of the CLPTM1 gene, perhaps in that order (and with overlap). It is difficult to be certain as only parts of SPU_003209 and SPU_009645 are homologous to CLPTM1.\n
SPU_003209	SPU_003209	none	SPU_001148 - SPU_003209 - SPU_009645 represent three parts of the CLPTM1 gene, perhaps in that order (and with overlap). It is difficult to be certain as only parts of SPU_003209 and SPU_009645 are homologous to CLPTM1.\n
SPU_009645	SPU_009645	none	SPU_001148 - SPU_003209 - SPU_009645 represent three parts of the CLPTM1 gene, perhaps in that order (and with overlap). It is difficult to be certain as only parts of SPU_003209 and SPU_009645 are homologous to CLPTM1.\n
SPU_000731	SPU_000731	none	SPU_000731 and SPU_007244 represent two parts of the CLN3 protein. There is perhaps ~50 AA missing from SPU_007244 model.\n
SPU_007244	SPU_007244	none	SPU_000731 and SPU_007244 represent two parts of the CLN3 protein. There is perhaps ~50 AA missing from SPU_007244 model.\n
SPU_001258	SPU_001258	none	SPU_010663 is a duplicate prediction for SPU_001258.\n
SPU_010663	SPU_010663	none	SPU_010663 is a duplicate prediction for SPU_001258.\n
SPU_010747	SPU_010747	none	SPU_025121 is a duplicate prediction for SPU_010747.\n
SPU_025121	SPU_025121	none	SPU_025121 is a duplicate prediction for SPU_010747.\n
SPU_021311	SPU_021311	none	This gene model is incomplete. Missing ~350 AA at the end. SPU_014811 MAY represent the missing half. It does show COPB2 as the best hit in searches against DB.\n
SPU_023769	SPU_023769	none	Missing ~100 AA at end.\n
SPU_009246	SPU_009246	none	Model is incorrect.\n
SPU_026357	SPU_026357	none	SPU_003524 is a partial duplicate prediction for SPU_026357.\n
SPU_003524	SPU_003524	none	SPU_003524 is a partial duplicate prediction for SPU_026357.\n
SPU_024920	SPU_024920	none	SPU_004667 is a partial duplicate prediction for SPU_024920.\n
SPU_002885	SPU_002885	none	Has an additional ~45 AA at beginning of protein.\n
SPU_007188	SPU_007188	none	SPU_007188 has the first half of DDB1 and SPU_007379 has the rest. A few AA may be missing at the beginning.\n
SPU_007379	SPU_007379	none	SPU_007188 has the first half of DDB1 and SPU_007379 has the rest. A few AA may be missing at the beginning.\n
SPU_009266	SPU_009266	none	SPU_009266 is a partially duplicate prediction for SPU_001826.\n
SPU_010339	SPU_010339	none	SPU_007300 is a partial duplicate prediction for SPU_010339.\n
SPU_007300	SPU_007300	none	SPU_007300 is a partial duplicate prediction for SPU_010339.\n
SPU_019786	SPU_019786	none	SPU_019786 is a partial duplicate ptediction for SPU_013953.\n
SPU_004860	SPU_004860	none	SPU_024432 is a partial duplicate prediction for SPU_004860.\n
SPU_024432	SPU_024432	none	SPU_024432 is a partial duplicate prediction for SPU_004860.\n
SPU_025153	SPU_025153	none	SPU_003593 has the first half of FBXL10 gene. SPU_0025153 MAY encode the other half. SPU_002864 is a partial duplicate prediction for SPU_003593.\n
SPU_002864	SPU_002864	none	SPU_003593 has the first half of FBXL10 gene. SPU_0025153 MAY encode the other half. SPU_002864 is a partial duplicate prediction for SPU_003593.\n
SPU_015361	SPU_015361	none	Contains a leucine-rich repeat (LRR-R1) domain.\n
SPU_005729	SPU_005729	none	SPU_005729 and SPU_008183 are similar duplicate predictions.\n
SPU_008183	SPU_008183	none	SPU_005729 and SPU_008183 are similar duplicate predictions.\n
SPU_024772	SPU_024772	none	The gene model on the N-terminal end appears to be incorrect.\n
SPU_007722	SPU_007722	none	SPU_007722 encodes the complete CYFIP2 gene. SPU_007272 and SPU_011333 are partial duplicate predictions for SPU_007722.\n
SPU_007272	SPU_007272	none	SPU_007722 encodes the complete CYFIP2 gene. SPU_007272 and SPU_011333 are partial duplicate predictions for SPU_007722.\n
SPU_011333	SPU_011333	none	SPU_007722 encodes the complete CYFIP2 gene. SPU_007272 and SPU_011333 are partial duplicate predictions for SPU_007722.\n
SPU_017645	SPU_017645	none	SPU_012959 is a partial duplicate prediction for SPU_017645.\n
SPU_012959	SPU_012959	none	SPU_012959 is a partial duplicate prediction for SPU_017645.\n
SPU_027232	SPU_027232	none	SPU_000487 has the first half of the gene. SPU_027232 MAY have the rest.\n
SPU_003575	SPU_003575	none	SPU_003575 and SPU_011118 are partial duplicate predictions for SPU_027232.\n
SPU_011118	SPU_011118	none	SPU_003575 and SPU_011118 are partial duplicate predictions for SPU_027232.\n
SPU_004739	SPU_004739	none	SPU_004739 is a partial duplicate prediction for SPU_012301.\n
SPU_012301	SPU_012301	none	SPU_004739 is a partial duplicate prediction for SPU_012301.\n
SPU_017680	SPU_017680	none	Likely isoform of DHPS.\n
SPU_022098	SPU_022098	none	Likely isoform of DHPS.\n
SPU_025930	SPU_025930	none	Duplicate prediction for SPU_012301.\n
SPU_000231	SPU_000231	none	SPU_005915 has part II of this gene. SPU_027703 MAY encode the first part. SPU_000273 is a duplicate prediction for SPU_027703.\n
SPU_005915	SPU_005915	none	SPU_005915 has part II of this gene. SPU_027703 MAY encode the first part. SPU_000273 is a duplicate prediction for SPU_027703.\n
SPU_027703	SPU_027703	none	SPU_005915 has part II of this gene. SPU_027703 MAY encode the first part. SPU_000273 is a duplicate prediction for SPU_027703.\n
SPU_006485	SPU_006485	none	SPU_006485 and SPU_008534 are duplicate predictions.\n
SPU_008534	SPU_008534	none	SPU_006485 and SPU_008534 are duplicate predictions.\n
SPU_018067	SPU_018067	none	SPU_021911 is a duplicate prediction for SPU_018067.\n
SPU_021911	SPU_021911	none	SPU_021911 is a duplicate prediction for SPU_018067.\n
SPU_010109	SPU_010109	none	Similar to both ZDHHC2 or ZDHHC20.\n
SPU_017163	SPU_017163	none	SPU_017163 has part I and SPU_017164 has part II of ZDHHC7 gene.\n
SPU_017164	SPU_017164	none	SPU_017163 has part I and SPU_017164 has part II of ZDHHC7 gene.\n
SPU_021834	SPU_021834	none	SPU_021833 has part I and SPU_021834 has part II of ZDHHC17 gene. \n
SPU_021833	SPU_021833	none	SPU_021833 has part I and SPU_021834 has part II of ZDHHC17 gene. \n
SPU_012476	SPU_012476	none	#\nThis gene blasts back to the mannose receptor for a VERY large range of animals from Human to Danio.\n
SPU_006842	SPU_006842	none	SPU_027068 is a duplicate prediction for SPU_006842.\n
SPU_027068	SPU_027068	none	SPU_027068 is a duplicate prediction for SPU_006842.\n
SPU_012025	SPU_012025	none	SPU_012025 has the first half of the gene. SPU_022214 has the rest.\n
SPU_022214	SPU_022214	none	SPU_012025 has the first half of the gene. SPU_022214 has the rest.\n
SPU_009348	SPU_009348	none	Likely missing latter half of LIN9 gene. SPU_008517 MAY code for the missing half.\n
SPU_002194	SPU_002194	none	SPU_027247 is a partial duplicate prediction for SPU_002194.\n
SPU_027247	SPU_027247	none	SPU_027247 is a partial duplicate prediction for SPU_002194.\n
SPU_004401	SPU_004401	none	May be missing an exon at the beginning.\n
SPU_007535	SPU_007535	none	SPU_007535 contains first part of the gene and SPU_004388 has the later half. There is some overlap between the two models.\n
SPU_004388	SPU_004388	none	SPU_007535 contains first part of the gene and SPU_004388 has the later half. There is some overlap between the two models.\n
SPU_015460	SPU_015460	none	SPU_015460 is duplicate prediction for SPU_009441.\n
SPU_009441	SPU_009441	none	SPU_015460 is duplicate prediction for SPU_009441.\n
SPU_015152	SPU_015152	none	SPU_004341 is a partial duplicate prediction for SPU_015152.\n
SPU_004341	SPU_004341	none	SPU_004341 is a partial duplicate prediction for SPU_015152.\n
SPU_028370	SPU_028370	none	SPU_028370 codes for first part of DNAH1 protein and SPU_000013 codes for the rest. There may be an exon missing in between the two predictions.\n
SPU_000013	SPU_000013	none	SPU_028370 codes for first part of DNAH1 protein and SPU_000013 codes for the rest. There may be an exon missing in between the two predictions.\n
SPU_003564	SPU_003564	none	SPU_015049 has first part of the DNAH7 gene and SPU_003564 has the rest.\n
SPU_015049	SPU_015049	none	SPU_015049 has first part of the DNAH7 gene and SPU_003564 has the rest.\n
SPU_027326	SPU_027326	none	SPU_027326 may have the first part of the gene. Rest is coded by SPU_017271.\n
SPU_017271	SPU_017271	none	SPU_027326 may have the first part of the gene. Rest is coded by SPU_017271.\n
SPU_024529	SPU_024529	none	SPU_024529 has the first part of DNAH5 gene and SPU_003660 has the latter. There is overlap between the two GLEAN models (~2161-2763 AA).\n
SPU_003660	SPU_003660	none	SPU_024529 has the first part of DNAH5 gene and SPU_003660 has the latter. There is overlap between the two GLEAN models (~2161-2763 AA).\n
SPU_007564	SPU_007564	none	Missing the first half.\n
SPU_002335	SPU_002335	none	SPU_002335 has most of the first part. SPU_024805 codes for the rest. It is possible that these two GLEAN's may be partial predictions of two different genes.\n
SPU_024805	SPU_024805	none	SPU_002335 has most of the first part. SPU_024805 codes for the rest. It is possible that these two GLEAN's may be partial predictions of two different genes.\n
SPU_020931	SPU_020931	none	SPU_010157 and SPU_020427 are partial duplicate predictions for SPU_020931.\n
SPU_010157	SPU_010157	none	SPU_010157 and SPU_020427 are partial duplicate predictions for SPU_020931.\n
SPU_020427	SPU_020427	none	SPU_010157 and SPU_020427 are partial duplicate predictions for SPU_020931.\n
SPU_022833	SPU_022833	none	SPU_022833 is a duplicate prediction for SPU_017547.\n
SPU_008312	SPU_008312	none	SPU_028010 is a duplicate prediction for SPU_008312.\n
SPU_028010	SPU_028010	none	SPU_028010 is a duplicate prediction for SPU_008312.\n
SPU_022907	SPU_022907	none	Gene model incorrect. Too long.\n
SPU_027407	SPU_027407	none	SPU_027407 is a partial duplicate prediction for SPU_027406.\n
SPU_027406	SPU_027406	none	SPU_027407 is a partial duplicate prediction for SPU_027406.\n
SPU_004463	SPU_004463	none	SPU_025916 is a partial duplicate prediction for SPU_004463.\n
SPU_025916	SPU_025916	none	SPU_025916 is a partial duplicate prediction for SPU_004463.\n
SPU_026427	SPU_026427	none	SPU_013244 is a partial duplicate prediction for SPU_026427.\n
SPU_013244	SPU_013244	none	SPU_013244 is a partial duplicate prediction for SPU_026427.\n
SPU_010328	SPU_010328	none	SPU_026065 is a partial duplicate prediction for SPU_010328.\n
SPU_010620	SPU_010620	none	Larger than required prediction.\n
SPU_023970	SPU_023970	none	SPU_023970 is a partial duplicate prediction for SPU_019111.\n
SPU_019111	SPU_019111	none	SPU_023970 is a partial duplicate prediction for SPU_019111.\n
SPU_008304	SPU_008304	none	SPU_008304 may represent a partial prediction for MYST2 (which is largely encoded by SPU_017172, but is missing a piece that is present in SPU_008304).\n
SPU_017172	SPU_017172	none	SPU_008304 may represent a partial prediction for MYST2 (which is largely encoded by SPU_017172, but is missing a piece that is present in SPU_008304).\n
SPU_009734	SPU_009734	none	Missing ~70 AA at beginning.\n
SPU_014886	SPU_014886	none	SPU_014887 is a longer duplicate prediction for SPU_014886.\n
SPU_014887	SPU_014887	none	SPU_014887 is a longer duplicate prediction for SPU_014886.\n
SPU_011305	SPU_011305	none	Prediction likely short by ~50 AA at beginning.\n
SPU_013192	SPU_013192	none	SPU_013192 is a partial duplicate of SPU_019276.\n
SPU_013812	SPU_013812	none	SPU_013812 encodes Part I of XRN1 and SPU_013813 probably encodes the rest (though may be partially incorrect).\n
SPU_019276	SPU_019276	none	SPU_019276 is a partial prediction that may be missing ~180-200 AA at end\n
SPU_000461	SPU_000461	none	SPU_021007 may be missing a few AA at the end. SPU_024819 is a partial duplicate prediction for SPU_021007. SPU_000416 may be as well.\n
SPU_011694	SPU_011694	none	SPU_004023 is a partial duplicate prediction for SPU_011694.\n
SPU_004608	SPU_004608	none	SPU_004608 is a duplicate prediction for SPU_004445.\n
SPU_004445	SPU_004445	none	SPU_004608 is a duplicate prediction for SPU_004445.\n
SPU_014305	SPU_014305	none	SPU_027401 is a partial duplicate prediction for SPU_014305.\n
SPU_027401	SPU_027401	none	SPU_027401 is a partial duplicate prediction for SPU_014305.\n
SPU_023752	SPU_023752	none	Prediction is incorrect .. much longer than required.\n
SPU_026862	SPU_026862	none	SPU_026862 has the first part of the FTSJ3 gene and GELAN3_21738 likely has the rest. A couple of exons are likely missing - one at the beginning and one between the two GLEAN's.\n
SPU_021738	SPU_021738	none	SPU_026862 has the first part of the FTSJ3 gene and GELAN3_21738 likely has the rest. A couple of exons are likely missing - one at the beginning and one between the two GLEAN's.\n
SPU_017569	SPU_017569	none	Likely missing last exon.\n
SPU_020191	SPU_020191	none	SPU_020191 is a partial duplicate prediction for SPU_001541.\n
SPU_001541	SPU_001541	none	SPU_020191 is a partial duplicate prediction for SPU_001541.\n
SPU_023204	SPU_023204	none	SPU_007955 is a duplicate prediction for SPU_023204.\n
SPU_007955	SPU_007955	none	SPU_007955 is a duplicate prediction for SPU_023204.\n
SPU_011958	SPU_011958	none	SPU_011958 has most of the gene except the last exon or two which are encoded by SPU_008969. There is a significant overlap between the two GLEAN's.\n
SPU_008969	SPU_008969	none	SPU_011958 has most of the gene except the last exon or two which are encoded by SPU_008969. There is a significant overlap between the two GLEAN's.\n
SPU_004785	SPU_004785	none	SPU_006645 and SPU_010748 are partial duplicate predictions for SPU_004785.\n
SPU_006645	SPU_006645	none	SPU_006645 and SPU_010748 are partial duplicate predictions for SPU_004785.\n
SPU_011454	SPU_011454	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR.   \nThis is a member of sea urchin-specific Tlr Group IA.\n
SPU_012464	SPU_012464	none	The intron of this gene model included a long unknown sequence. So the model was modified as an intronless gene by comparison to Fgenesh++ and NCBI prediction. \nThis is a member of sea urchin-specific Tlr Group IB. \n
SPU_027226	SPU_027226	none	This model is incorrect. It encodes the MTO1 gene in the first part but the latter half appears to be similar to GLIPR1L1(NP_689992).\n
SPU_009702	SPU_009702	none	This GLEAN contains the mid-region of the full length midasin polypeptide, corresponding to approximately amino acids 2660-3943 of human midasin. The N-terminal region is SPU_013160 and the C-terminal region is SPU_022614\n
SPU_022908	SPU_022908	none	Likely missing an exon.\n
SPU_023058	SPU_023058	none	Likely missing an exon (~30 AA).\n
SPU_002179	SPU_002179	none	Missing ~100 AA.\n
SPU_024540	SPU_024540	none	SPU_013159 is a partial duplicate prediction for SPU_024540.\n
SPU_013159	SPU_013159	none	SPU_013159 is a partial duplicate prediction for SPU_024540.\n
SPU_017807	SPU_017807	none	SPU_017807 is a partial duplicate prediction for SPU_018692.\n
SPU_028576	SPU_028576	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR. \nThis is a member of sea urchin-specific Tlr Group I(orphan).\n
SPU_007429	SPU_007429	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR.  \nThis is a member of sea urchin-specific Tlr Group IC. \n
SPU_011570	SPU_011570	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR. This is a member of sea urchin-specific Tlr Group IC. \n
SPU_026347	SPU_026347	none	This GLEAN forms part of the annotated full-length Sp-DNAH1 gene (SPU_030223)\n
SPU_011876	SPU_011876	none	This GLEAN forms part of the annotated full-length Sp-DNAH10 gene (SPU_030231)\n
SPU_015054	SPU_015054	none	This Glean forms part of the annotated full-length Sp-DNAH10 gene (SPU_030231)\n
SPU_028362	SPU_028362	none	This glean forms part of the annotated full-length Sp-DNAH10 gene (SPU_030231)\n
SPU_026539	SPU_026539	none	This GLEAN forms part of the annotated full length Sp-DNAH10 -gene (SPU_030231)\n
SPU_028189	SPU_028189	none	This Glewan forms part of the annotated full-length Sp-DNAH10 gene (SPU_030231)\n
SPU_002747	SPU_002747	none	This Glean forms part of the annotated full length Sp-DNAH12 gene (SPU_030232)\n
SPU_005317	SPU_005317	none	This Glean forms part of the annotated full length Sp-DNAH12 gene (SPU_030232)\n
SPU_020747	SPU_020747	none	This Glean forms part of the annotated full-length Sp-DNAH12 gene (SPU_030232)\n
SPU_002750	SPU_002750	none	This Glean forms part of the annotated full length Sp-DNAH14 gene (SPU_030233)\n
SPU_028434	SPU_028434	none	This glean forms part of the annotated full-length Sp-DNAH14 gene (SPU_030233)\n
SPU_012139	SPU_012139	none	This Glean forms part of the annotated full length Sp-DNAH14 gene (SPU_030233)\n
SPU_016536	SPU_016536	none	Intronless Toll-like receptor with predicted secretory signal peptide, LRR-NT, LRR(22), LRR-CT, TM and TIR.   \nThis is a member of sea urchin-specific Tlr Group ID.\n
SPU_026081	SPU_026081	none	This Glean is part of the annotated full-length Sp-DNAH15 gene (SPU_030234)\n
SPU_017935	SPU_017935	none	This glean is part of the annotated full-length Sp-DNAH15 gene (SPU_030234)\n
SPU_018822	SPU_018822	none	This Glean is part of the annotated full-length Sp-DNAH2 gene (SPU_030224)\n
SPU_010136	SPU_010136	none	This Glean is part of the annotated full length Sp-DNAH2 gene (SPU_030224)\n
SPU_018784	SPU_018784	none	This Glean is part of the annotated full-length Sp-DNAH2 gene (SPU_030224)\n
SPU_027159	SPU_027159	none	This Glean forms part of the annotated full-length Sp-DNAH3 gene (SPU_030236)\n
SPU_026238	SPU_026238	none	SPU_012397 is a partial duplicate prediction for SPU_026238.\n
SPU_012397	SPU_012397	none	SPU_012397 is a partial duplicate prediction for SPU_026238.\n
SPU_005987	SPU_005987	none	SPU_005987 is a composite prediction of two genes. Sp-NUC205 and the last ~350 AA code for Sp-PAQR5. \nSPU_015468 and SPU_021249 are partial duplicate predictions for SPU_005987.\n
SPU_015468	SPU_015468	none	SPU_015468 and SPU_021249 are partial duplicate predictions for SPU_005987.\n
SPU_021249	SPU_021249	none	SPU_015468 and SPU_021249 are partial duplicate predictions for SPU_005987.\n
SPU_010384	SPU_010384	none	Missing first exon.\n
SPU_012748	SPU_012748	none	SPU_012748 is a partial duplicate prediction for SPU_018535.\n
SPU_027020	SPU_027020	none	Incorrect gene model. Extra exons.\n
SPU_022235	SPU_022235	none	SPU_002618 and SPU_022235 are partial duplicate predictions for SPU_015720.\n
SPU_002618	SPU_002618	none	SPU_002618 and SPU_022235 are partial duplicate predictions for SPU_015720.\n
SPU_024386	SPU_024386	none	This gene model may represent a pseudogene or contain a sequence error. Intron sequence matches coding sequence of other Sp-Tlr genes and contains stop codons. Modified gene model has some frame shifts, but reflects best gene structure. \nThis is a member of sea urchin-specific Tlr Group IE. \n
SPU_017828	SPU_017828	none	A number of other GLEAN's may be partially similar to Sacsin. These may include SPU_008522, 15037, 21335, 28108, 25360, 04170 and 00614. \nSPU_017828,_08522 and _25360 are very simlar and form one group. \nSPU_004170, _28108, _15037, _21335, _00614, _06655, _16709 and _22759 form a second group of sacsin like genes.\n
SPU_025360	SPU_025360	none	A number of other GLEAN's may be partially similar to Sacsin. These may include SPU_008522, 15037, 21335, 28108, 25360, 04170 and 00614. \nSPU_017828,_08522 and _25360 are very simlar and form one group. \nSPU_004170, _28108, _15037, _21335, _00614, _06655, _16709 and _22759 form a second group of sacsin like genes.\n
SPU_008522	SPU_008522	none	A number of other GLEAN's may be partially similar to Sacsin. These may include SPU_008522, 15037, 21335, 28108, 25360, 04170 and 00614. \nSPU_017828,_08522 and _25360 are very simlar and form one group. \nSPU_004170, _28108, _15037, _21335, _00614, _06655, _16709 and _22759 form a second group of sacsin like genes.\n
SPU_000614	SPU_000614	none	A number of other GLEAN's may be partially similar to Sacsin. These may include SPU_008522, 15037, 21335, 28108, 25360, 04170 and 00614. \nSPU_017828,_08522 and _25360 are very simlar and form one group. \nSPU_004170, _28108, _15037, _21335, _00614, _06655, _16709 and _22759 form a second group of sacsin like genes.\n
SPU_004170	SPU_004170	none	A number of other GLEAN's may be partially similar to Sacsin. These may include SPU_008522, 15037, 21335, 28108, 25360, 04170 and 00614. \nSPU_017828,_08522 and _25360 are very simlar and form one group. \nSPU_004170, _28108, _15037, _21335, _00614, _06655, _16709 and _22759 form a second group of sacsin like genes.\n
SPU_028108	SPU_028108	none	A number of other GLEAN's may be partially similar to Sacsin. These may include SPU_008522, 15037, 21335, 28108, 25360, 04170 and 00614. \nSPU_017828,_08522 and _25360 are very simlar and form one group. \nSPU_004170, _28108, _15037, _21335, _00614, _06655, _16709 and _22759 form a second group of sacsin like genes.\n
SPU_015037	SPU_015037	none	A number of other GLEAN's may be partially similar to Sacsin. These may include SPU_008522, 15037, 21335, 28108, 25360, 04170 and 00614. \nSPU_017828,_08522 and _25360 are very simlar and form one group. \nSPU_004170, _28108, _15037, _21335, _00614, _06655, _16709 and _22759 form a second group of sacsin like genes.\n
SPU_021335	SPU_021335	none	A number of other GLEAN's may be partially similar to Sacsin. These may include SPU_008522, 15037, 21335, 28108, 25360, 04170 and 00614. \nSPU_017828,_08522 and _25360 are very simlar and form one group. \nSPU_004170, _28108, _15037, _21335, _00614, _06655, _16709 and _22759 form a second group of sacsin like genes.\n
SPU_006655	SPU_006655	none	A number of other GLEAN's may be partially similar to Sacsin. These may include SPU_008522, 15037, 21335, 28108, 25360, 04170 and 00614. \nSPU_017828,_08522 and _25360 are very simlar and form one group. \nSPU_004170, _28108, _15037, _21335, _00614, _06655, _16709 and _22759 form a second group of sacsin like genes.\n
SPU_016709	SPU_016709	none	A number of other GLEAN's may be partially similar to Sacsin. These may include SPU_008522, 15037, 21335, 28108, 25360, 04170 and 00614. \nSPU_017828,_08522 and _25360 are very simlar and form one group. \nSPU_004170, _28108, _15037, _21335, _00614, _06655, _16709 and _22759 form a second group of sacsin like genes.\n
SPU_022759	SPU_022759	none	A number of other GLEAN's may be partially similar to Sacsin. These may include SPU_008522, 15037, 21335, 28108, 25360, 04170 and 00614. \nSPU_017828,_08522 and _25360 are very simlar and form one group. \nSPU_004170, _28108, _15037, _21335, _00614, _06655, _16709 and _22759 form a second group of sacsin like genes.\n
SPU_015550	SPU_015550	none	Likely incorrect. Longer than required.\n
SPU_019664	SPU_019664	none	SPU_015087 has the second half of TSR1 gene. SPU_019664 likely codes for the first half.\n
SPU_005961	SPU_005961	none	Missing one or more exons at beginning (~80 AA).\n
SPU_005268	SPU_005268	none	SPU_020117 is a duplicate prediction for SPU_005268.\n
SPU_020117	SPU_020117	none	SPU_020117 is a duplicate prediction for SPU_005268.\n
SPU_022232	SPU_022232	none	SPU_022232 is a partial duplicate prediction for SPU_013282.\n
SPU_002056	SPU_002056	none	SPU_020938 is a partial duplicate prediction for SPU_02056.\n
SPU_020938	SPU_020938	none	SPU_020938 is a partial duplicate prediction for SPU_02056.\n
SPU_018123	SPU_018123	none	Incorrect gene model. Has an extra exon (or more) in the middle and is missing the last exon (or more).\n
SPU_023316	SPU_023316	none	Incorrect gene model. This appears to be a conserved gene across species.\n
SPU_026256	SPU_026256	none	SPU_026256 is a longer duplicate prediction for SPU_023423.\n
SPU_022950	SPU_022950	none	Only second half of the WAPAL gene is encoded by SPU_022950. First part is missing. \n
SPU_021483	SPU_021483	none	#\nIncorrect gene model. Likely has an extra exon predicted.\n
SPU_001415	SPU_001415	none	Inspection of the tiling array suggests that glean may have missed the following exons: KSKRAPREEDTAPKRRREEAAGSSKQSPTKKKISSGRQAAGSGGGTPTQDELAPDPRESAKPAAQKRAEGPIKSDQTVRVEEKQESDSESSGRSSSGKGAKLASLPELME\n
SPU_002008	SPU_002008	none	Inspection of the tiling array suggests that glean may have missed the following exons: DSYRREILLLYSLWQGLPNETSSYQPRPCAHGRETLRVRDMPQSFHRAGYSPKAQDHPFWPEALQVRDLRPSLCRQKRPQLSC,CELCPRKFVRKNFLNAHMKLHQGIKPKKPPERSFTCTICNKVLKTRASYQTHNRIHTGEKSFCCTLCGKAFPTKPRLINHVRVHTGEKPYECETCHKAFTEPGTLRRHKIIHSGLKPYKCETCDRAFADKSALNSHVKMHTGQKSHSCEFCGKMFWTATNMRQHAKTHRKKSMFECGVCSKEIFGQENLTAHLVEHEAEQR,LSNVPSAISGFRQKGQRTQHQKKVHKVKMEEGNEAEGEVVSSEDGPVIIPNVKRVFKCRVCQVEFEAKEELKEHKLTHKELENGDDEYVPISVKVSRPKKVVETFKCDICNNSFAQKAYLERHRRVHTGEKPFGCTLCEKKFSDMTSLRRHKSIHTGAKPF,QCEVCEKFFKTKKTLQKHGAIHDEEKRYECDVCQKRFSRKAYLVSHSTIHTGEKPYTCEDCGRQFRDRSSMKRHMNTHKGIKRYECNVCQKQFTDKSAANIHLRIHTGEKPYECYECK\n
SPU_002009	SPU_002009	none	Inspection of the tiling array suggests that glean may have missed the following exons: DGQRLKHMRQKHSLAKCDECGACFEDEALLQRHMKMHSQVTMFMCDVCGSTFTKKSYLTFHMVVHEKEDLSERMSMKEVDEGKVPALPSKGQKQLQIVDDDEDDDDVGDGGNDPDDSDWEEPLAAKKKGSKFDCDRCSRSFASLRGLKMHQRMLHITVEEEPQSESSEEEEEDAKTEEDAMIEEKGGNDKSKHCPVCKKSFVSVRQLTRHESTHASWDCTYCSKTFRTSWILKEHLNTHTGQRPYQCTECDKTFKSHGALRRHTIIHKGTKPYRCDLCDMRFSDGSSLKSHKKRRSCR,RATRNVDHAGDKQVECDVCLKKFYTGFQMRTHRLTHGGQDHKEENLLRCESCSKAFLSPSGLEKHKKSGKCGKKFTCPFCTDSFIYKYEREKHMETHAEFVNKADKEDEVTDGVKKSKVFKCPICEQKFPNLRTFTIHRKKHERGKVYACEVCNKVFSTPISLKYHRKLHTGQGPKCSVCDKTFYNLKSLRRHERIHTGQKPYNCGFC,SLSYLSSTHTESFVQIVGRHSSTFPDSSLPQAESTNLNLETMGEDAQPTNQDVDTVMLETELTNESADTMTPEAELTDENADPVILVSEMTNQSADTMTPEAGLTNENADPVILVSEMTNQSADTMTPEAGLTNENADPVILASEMTNQSVDTMIPEDQPQTNAEPRSSSLEVQGKL,LSNVPSAISGFRQKGQRTQHQKKVHKVKMEEGNEAEGEVVSSEDGPVIIPNVKRVFKCRVCQVEFEAKEELKEHKLTHKELENGDDEYVPISVKVSRPKKVVETFKCDICNNSFAQKAYLERHRRVHTGEKPFGCTLCEKKFSDMTSLRRHKSIHTGAKPF\n
SPU_007169	SPU_007169	none	Inspection of the tiling array suggests that glean may have missed the following exons: TRYKRICDRHICDDPVLAMLNLPLPNINSKSITNFISKQEKRSISLQPLNPKSSNIAKAFLLIPFRHLSSFQRCSRTQGMHRLSTVSYLMNLASAGVRRGTHSSMNQGLFHLQGMSQHLAHLQCLYLLLVTNQNLSYPLLPPR\n
SPU_011001	SPU_011001	none	Inspection of the tiling array suggests that glean may have missed the following exons: FYVNFLLLFFRFSLVSLYIWRYHHGLPVKRCGKLATYHPVNRSATVISYLKNSQPCQPLDMASTSNPSLLPQKNIKQEIIVVFEPPAREMASSSSHSSVPQPETSQEINVVPVSPVKDMPSTSNPSSGHETGQENDMVPDSTGIELSKDDLKAGGGKVGSSSKKKSGCEKDSDEYKRRRERNNEAVRKSRQKSRQKASETEVRVTELKKENADLEQRVTLLHKELELLKDLFLTHANELPDPSTTFGLFNANPRLGSSSPNPALSRRIVLKTESLTVSLTCRNVPESITTTT\n
SPU_014686	SPU_014686	none	Inspection of the tiling array suggests that glean may have missed the following exons: FSFNVSDAQKFQCSLCEDLFSSSKLILRHIRLEHTDGKPHDKLPLVTPKKREKKHVRLKISSKHLFKTKKKAREKEESDLKCATCGKVFLSSGRLKAHEIFHDYNQDHTCPICGKRQKNAQTWAKHMNLHKPASEARPHKCNECNGRYKSKAALRKHQHQVHGYPCRLCSERFSRMKDCKTHEQTHQAFIPPHAAVEYLVAELSPDEPKDMLGKRANYYQRRFKCRYCPKRYSDHYTVRNHEKENHTGEGTFKCSHCPKAYTSESRLKTHLLFHEQTHIYRCMLCPSSFASESALNSHQGEHTGLKPVKCDVCGKGFRTRKHSLAHRRRVHQERPKRFFCSYCNFGFAEKGDFNKHEQRHKGIRRYMCRECGMPFTSNTSLTAHIRALHTKERPFSCEICGKTFALNFKYTLHMVRHNVQVNGSSLQQQ,QNSLLMSRKTCLVNVPITTNEGSSAVIVQRDIVTITLSGTMRKRTIPVKEPSNAVIAQKLIPAKAVLKLTSCSMSRRTYTAACFVQAALHRKAPSTVTKENIPD,SQSNVMSVAKDLELGSIPLRIGVVFIRNVQSVSSAPTAILGLQKRAILTNMSSGTRALDDTCVGSVECPSPQIHLSQLTFEHCIPRKDHFHVKYAGKLLP,LGQSSIKSAPSLQSTDHRKQCLPSSKPFHQANASPPAPDVSAAEPPLFAALNLTKTISVMDLPLSLSLRVASQDKVEGVVAKDTVEKGVEFGPYTGTLLDEEQGSSKETTWEV,VSHLLNQLLHYKAPTIESSVYLRQSRFIKLMQVLRPQTSLLPNLLFLQRSISQRQSPSWIFHCLSHYELHRRTRSKEWLPRIQLRRGWSLDPTQEHCWMRSRDRLRRQPGRY,ANIIHQTPPQPPVTLPVHIRGHGVKSSTLSANYAPPINTTHGAVEEERNDLQIATHGSVEASKMLPLAFHKSVVERNFQPTTSYESGLHVESNALPIITCKSA\n
SPU_017847	SPU_017847	none	Inspection of the tiling array suggests that glean may have missed the following exons: EWTGGRHYVMIQDWTDLEGGHHVVTAHRDENPNRPKETQRNRHVKRPRSLHSMMDRKLPLDIADNLRPNQKKSWITTKDPLHHLTCDIHLGKWRRRLHGLVSGLKLLVRMLKLKLTL,PPGHEIVLADRVTGKRVALCEGLVESPVNPTELWKKGPIVARSVVKADEDVLPVRVFNPTREQGIIKAGTAIASLSALEEMGTEMCKQTMPTETSASKNMTSDKQRNRDGDARRATTHFFQCDSCTKSYEKKDSLQRHKREVHIRKYRCGQCDYRTGRKTEIERHQGAHAVAVVPVRKHSLTPVSSPRCKPTSTPITGTSEKADASWRHVRMDWRSPLRDDPRLDRPRGRSPRRYRSSRRKSKSPQRDPTKSSCQKTPESPQHDGSEASPRHSRQSTPEPEEELDNDERSPSPLDLRYSSRKMASETAWASEWTEIASKNAETQTDT,ESPVNPIELWKKGPIVARSVVKTDEDVLPVRVFNPTREQRNIKAGTAIASLSAVEEVGTEMCNQTMPTETSASRNMPPDKQGNKDVDARRTTTHFFQCDSCTKSYEKKDSLQRHKREVHILKYRCDQCDYRTGRKTEMERHQGAHEVAVVPVRKHSLTPVSSPRCKPTSTPITGTSEKADASWRHVRMDWRSPPRDDPRLDRPRGRSPRRYRSSRRKSKSPQRDLTKSSCQKTPESPQHDGSEAS,WCQKWDPVTMAQGRNRQNRPTPGEMMKHLERSMTEVDQMMEQLRASMSQVPVADPEPGRQEAFPPTLSGQTASGPTADSQMPAAQRVRFSSTPREQSRRLPSAGTGVHITPPLFSPP\n
SPU_017848	SPU_017848	none	Inspection of the tiling array suggests that glean may have missed the following exons: EWTGGRHYVMIQDWTDLEGGHHVVTAHRDENPNRPKETQRNRHVKRPRSLHSMMDRKLPLDIADNLRPNQKKSWITTKDPLHHLTCDIHLGKWRRRLHGLVSGLKLLVRMLKLKLTL,PPGHEIVLADRVTGKRVALCEGLVESPVNPTELWKKGPIVARSVVKADEDVLPVRVFNPTREQGIIKAGTAIASLSALEEMGTEMCKQTMPTETSASKNMTSDKQRNRDGDARRATTHFFQCDSCTKSYEKKDSLQRHKREVHIRKYRCGQCDYRTGRKTEIERHQGAHAVAVVPVRKHSLTPVSSPRCKPTSTPITGTSEKADASWRHVRMDWRSPLRDDPRLDRPRGRSPRRYRSSRRKSKSPQRDPTKSSCQKTPESPQHDGSEASPRHSRQSTPEPEEELDNDERSPSPLDLRYSSRKMASETAWASEWTEIASKNAETQTDT,GKGEVYLAELRQRCRQPKESLQELGQTIRELCTLSYPEFDEKGQDRLARGHFLDAVVTPEIREGLFRAQPRTLDDAVEAALNTEAFLRMEGQRNEVKRSTTYSRALEECEVSAIREQQPRNPTIDEIVKKVLDALDMRNGRNTIKPDVPDRRPEQTMPTKVSEREDNRCFNCNELGHWRNQCPYPRKVRGGTAPPAAEKANTNLQWATANGMTDEEEQARVGSSQDPNRKGLFLE,RCVNCLSGCSIQQGNMGSQRQALPLPHCQRWKKWAQKCAIRRCPLKPVQAETCHPTSRETRTLMPEVRLPTSFNVTLVPKVMRRKTL,DFLVQINAVLDCQKMELRTEWGIIPCLDSEGESFCRRIVAGEEYSIPPGHEMVLANRVTGEKIALCEGLVESPVNPTELWKKGPIVARSVVKADEDVLIACQGVQFNKGTWDHKGRHCHCLIVSAGRSGHRNVQSDDAH,LPVRVFNSTREHGITKAGTAIASLSALEEVGTEMCNQTMPIETSASRNMPPDKQGDKDVDARGTTTHFFQCDSCTKSYEKKDSL,LPVRVFNSTREQGITKAGTAIASLSALEEVGTEMCNQTMPIETSASRNMPPDKQEDKDVDARRTTTHFFQCDSCTKSYEKKDSLQRHKREVHILKYRCDQCDYRTGRKTEMERHQGAHDVAVVP,CLSGCSIQQGNRGSQRQALPLPHCQRWKKWAQKCAIRRCPLKPVQAETCHPTSRKTRTLMPDVRLPTSFNVTLVPKVMRRKTLCKDTSGKCIS,WCQKWDPVTMAQGRNRQNRPTPGEMMKHLERSMTEVDQMMEQLRASMSQVPVADPEPGRQEAFPPTLSGQTASGPTADSQMPAAQRVRFSSTPREQSRRLPSAGTGVHITPPLFSPP,SPRRYRSSRRKSKSLQRDLTKSSCQKTPESPQHDGSEAPPRHSRQSTPEPGEELDNDERSPSPLDLRYSSRKMASETAWASEWTEIASKNAETQTDTDSSR,RNRHVKRPRSLHNMMDRKLPLDIADNLRPNQEKSWITTKGPLHHLTCDIHLGKWRRRLPGLVSGLKLLVRMLKLKLTLIVLVK\n
SPU_017849	SPU_017849	none	Inspection of the tiling array suggests that glean may have missed the following exons: EWTGGRHYVMIQDWTDLEGGHHVVTAHRDENPNRPKETQRNRHVKRPRSLHSMMDRKLPLDIADNLRPNQKKSWITTKDPLHHLTCDIHLGKWRRRLHGLVSGLKLLVRMLKLKLTL,PPGHEIVLADRVTGKRVALCEGLVESPVNPTELWKKGPIVARSVVKADEDVLPVRVFNPTREQGIIKAGTAIASLSALEEMGTEMCKQTMPTETSASKNMTSDKQRNRDGDARRATTHFFQCDSCTKSYEKKDSLQRHKREVHIRKYRCGQCDYRTGRKTEIERHQGAHAVAVVPVRKHSLTPVSSPRCKPTSTPITGTSEKADASWRHVRMDWRSPLRDDPRLDRPRGRSPRRYRSSRRKSKSPQRDPTKSSCQKTPESPQHDGSEASPRHSRQSTPEPEEELDNDERSPSPLDLRYSSRKMASETAWASEWTEIASKNAETQTDT,ESPVNPIELWKKGPIVARSVVKTDEDVLPVRVFNPTREQRNIKAGTAIASLSAVEEVGTEMCNQTMPTETSASRNMPPDKQGNKDVDARRTTTHFFQCDSCTKSYEKKDSLQRHKREVHILKYRCDQCDYRTGRKTEMERHQGAHEVAVVPVRKHSLTPVSSPRCKPTSTPITGTSEKADASWRHVRMDWRSPPRDDPRLDRPRGRSPRRYRSSRRKSKSPQRDLTKSSCQKTPESPQHDGSEAS,GKGEVYLAELRQRCRQPKESLQELGQTIRELCTLSYPEFDEKGQDRLARGHFLDAVVTPEIREGLFRAQPRTLDDAVEAALNTEAFLRMEGQRNEVKRSTTYSRALEECEVSAIREQQPRNPTIDEIVKKVLDALDMRNGRNTIKPDVPDRRPEQTMPTKVSEREDNRCFNCNELGHWRNQCPYPRKVRGGTAPPAAEKANTNLQWATANGMTDEEEQARVGSSQDPNRKGLFLE,RCVNCLSGCSIQQGNMGSQRQALPLPHCQRWKKWAQKCAIRRCPLKPVQAETCHPTSRETRTLMPEVRLPTSFNVTLVPKVMRRKTL,DFLVQINAVLDCQKMELRTEWGIIPCLDSEGESFCRRIVAGEEYSIPPGHEMVLANRVTGEKIALCEGLVESPVNPTELWKKGPIVARSVVKADEDVLIACQGVQFNKGTWDHKGRHCHCLIVSAGRSGHRNVQSDDAH,LPVRVFNSTREHGITKAGTAIASLSALEEVGTEMCNQTMPIETSASRNMPPDKQGDKDVDARGTTTHFFQCDSCTKSYEKKDSL,LPVRVFNSTREQGITKAGTAIASLSALEEVGTEMCNQTMPIETSASRNMPPDKQEDKDVDARRTTTHFFQCDSCTKSYEKKDSLQRHKREVHILKYRCDQCDYRTGRKTEMERHQGAHDVAVVP,CLSGCSIQQGNRGSQRQALPLPHCQRWKKWAQKCAIRRCPLKPVQAETCHPTSRKTRTLMPDVRLPTSFNVTLVPKVMRRKTLCKDTSGKCIS,SPRRYRSSRRKSKSLQRDLTKSSCQKTPESPQHDGSEAPPRHSRQSTPEPGEELDNDERSPSPLDLRYSSRKMASETAWASEWTEIASKNAETQTDTDSSR,RNRHVKRPRSLHNMMDRKLPLDIADNLRPNQEKSWITTKGPLHHLTCDIHLGKWRRRLPGLVSGLKLLVRMLKLKLTLIVLVK\n
SPU_019384	SPU_019384	none	Inspection of the tiling array suggests that glean may have missed the following exons: TWIVTFGVRFHINADIVDIMKDEIAAAVVFLTRIVKRNTSLTAEQMSKFSEKLALTLIEKFRNHWYEDKPSKGQAYRCIRVSRNEPRDSVISKTAKDCGIHYNHLNLPAELCLWVDPLEVSCR,NQICCRFVNRHAPFSFICRFGERGTVCEVATFNQQTLTDNRPSTPISSSNNAFNDNNANQLSPTPSPPSSPPRQLVINDNLRPNSGRKMVNSSQYVFNRNATNTPRVIQRPQNKVWVRQPNPEQYRWVNKSIAGRA\n
SPU_021758	SPU_021758	none	Inspection of the tiling array suggests that glean may have missed the following exons: EGLTPEALAQAGLTQNYVNAFTQQTLESLANSQGDITAENQISLQTQQIQQQLEAITGQSASLFSHAVNIQPVQEELPPPPPQETNRLPTTNTSSVFATNEGEKKTYSCHFCEKTFKKSSHLKQHIRSHTGEKPCKCMQCGRSFVSASTLRNHMRTHSGIKSFKCNLCNTSFTTNGSLVRHMNIHTDQRPHTCQQCGEAFRKQLDLKRHLKDHATESDDGEQLEDGKRPRNVIRFNEEEAQQIAKKPLARNATTSERILVQMVNDKNRVSEVLTQEDVLKSRPNFPNRCIHCSKSFKKPCDLVRHVRTHTGEKPFKCTECERTFAVKSTLVCHVRTHKGGKQEVCHICKTT,QVVVQNSGIQNIIPSLQNQSFIPSTSGMTTQVVAPPNSVAPFLQGSDVQMTIHDTLAQNVSHGEALMTNKMYTIHTTDRGTHLVQTDTTPSQADTSNLGQSFQLSLSGDQLSLQPNILLQQPNIQNPVEYTPSSSIAHSQEQITISTNGALGDSESSGTVTVNVVDFANLASHENVQTTASSSAAVPLQEEEESGEEDEDDDDVDDEDEEELEEESEDELQYVTEGGNLPSTGPMIAGHHVMREKPAMSAKDAAAAIGSIYPCT,EKPYKCPHCDKAFNQNGALHVHLTKHTGTKPHSCEFCGQKFAQRGNLRAHIVRVHEINSELEQRFECPQCCCNFRKMSSLNAHISRFHNSEEASSPLLSLKAALSAIQEGADWLDSTVDTPALAGALEQLINMLEENAGEGDPLDRIQEIMSGNTINTDILQQALDNSGVTNNVVEGQTETPAPAAPVAPVAPDAPDPAPEIPTASTTAPTVSTETTVAPANNQLNQLFSFVAE,KCLICDCLFTTNGSLKRHMSTHTDVRPFMCPYCQKTFKTSVNCKKHMKMHKRELALQAKQQENIEQDPNHVQEIQSIAEQASNLAEHLAIEQASNSILALAQGQLTVGQDGLHGANALNETS\n
SPU_023140	SPU_023140	none	Inspection of the tiling array suggests that glean may have missed the following exons: RAFPPAGLMNPRSSDCVGVTRPEGHLFGVGFVGLGGSEVVSVAAKQIFLTCSCDSAKLIPGTALHNLQYGFRNVVPFSSVTKFW,DTMARVLLIDTVKLSRAPRVLLSSKARQVVMAKTQVHKKCETTDREVQSSGTDVPSPLIICRSCARKSTTHNTMMSLLLTQRRM\n
SPU_023510	SPU_023510	none	Inspection of the tiling array suggests that glean may have missed the following exons: GFCYEKSLLLHMKSHIGEMPHKGLVCKRGLSSNSFLLRHIRSHTGEKPYQCLVCGKSFAHNSTLKLHTLRTHPGELRASFLEGISGAPTRENPFHLFWGACLVPSGLCIRSGCRVT,SKLDISSCSLCSGGNSRMQSSHQSMSGDRSHGGERRPGLPEDFPLETGRGTRSNSSERESCPSEYDSCLIPSQVAYFHEKRAADRETADATTSYHKPGGMHKDRNADITMEYDEMINQSSQLRAAGSDQNEPSCSLTEEKLFLCC,THTGEKPYQCKFCDKKYSRSSSLGVHIRTHTGEKPYQCKHCDVSFSRVETLSRHIGTHTGKEPFECSFCKKTFSHNGHLSRHLKIHTGERPFKCSVCNKTFSERGYLKDHQVIH,EGSHFKSKIHISSHSLCSGGDSSLQSSYQSMRNTQEMSGDRSCDGARHLPEDFPLQDYLGTQSDSSERESCTSENDSYLIPSQVACFHEKQAADRETLDATTSNHKPEGIDID\n
SPU_024746	SPU_024746	none	Inspection of the tiling array suggests that glean may have missed the following exons: NLTEHMGTHIGASPHHCPLCKSTFTRLSSLKKHVRNHHGKCPFQCVFCNKTFNEKDDLLVHVKIHTKKDLYHCLLCNKPWPTIRGLSFHIQLTHKDKSLASVLPETKHSPQGISSQEIPKPTQENIIPVALFSTTNPKDHSVNP\n
SPU_024902	SPU_024902	none	Inspection of the tiling array suggests that glean may have missed the following exons: FFLSLFRFLHMKKSMCVDQSTQTMPDIYSCSHCGSSLLAKPMPPLARPQPSGPQQLPGLESSQARCINKKDNGDDYTYDDGIVTLDNEKELLGPDKWVSEAPDHPVAIFVKEEVMSDLEQSNSMHSDKEESFFDPRQDQEFRSDSKLEHSETGWYEEEEEEEEEEEEDEDEMEDDEEIDYEALERLDPTYDPFIGSRPHQLVSIVNF,GVSDDEFQDKQDRVKKCLTGNHSSVRSECPGVCPLCSTIWNSLADRTSHLQTHIPKDQGRQAFKEACEAIEKKIGKEYTQRMGWCEVCDKFSSSLHTHMSNV\n
SPU_025849	SPU_025849	none	Inspection of the tiling array suggests that glean may have missed the following exons: SFVFSVSIHFGDEQHIETEHGVAQPVQTKDTVPSRGKLVTPATVSQHQDSMEVQIDYDDDEDNSYAAWMTEPDDGNDSDTQPSDTAKAVESDCPSAIQQPREEHPCPDCHVILHSTWDLDLHKLHDCTESNNSIQYVYNCDQCRKSFHRWAFTVAHLRKVHNCNMSHSEIIKKLDELKSTKSKKSGNKDEKCLPSEKKAPKRPGLIEAKGQTTRKKVARRKKRDNGDMQEVKKQTKSDRRNAPRLSKRPCIRLNKDLIKIFESERDDENNLTTNNRSIKPERTAGVSNVEKAQQRHTSDEHQNLNFKTEANNSSGGELATGRLNDERQGQGSTAEQVEEKGEDDLILMKEVELVIKDGERETSATIAMPESEISKAELEEETGNTLISQRDEGQGQAAEGVEENQDASANPIMEKKRKIHTADLKVHESQEGKSEPRDMIEHKASHETAERETKEASNLDADDDENNFTTIQKSTENAIVCSLCDLQFETKHDRSKHMPSHKEHRLQYKCSTCGKTFNRKVIYRTHVETHQDKTKRQKYH,RRECLKPTMRKSQEEMVSLINLLPHHLHPQTQGLTTLTMTVQTSKEAMLLKLLRPVMGKWLETGASPKRGCAGCNTPQPVRILSLKLHCRRVKSEKGERKKKR,LTAHTDVYKYSCDVCGKKFKRTSLRNSHMKVHSNDPANKPFKCELCSKAFAAQGKLKVHMDWHYNIRSYTCDVCGKSFLTKGNLDKHQFLHKGTKPHECKICSRGFMDLPGLRKHLDLVHKITLKKVVTQRVLEANDAKESGGDGVPHKPAATPSSPSNSGSDDADNDSPDFKRGNVAQAPEASDGQVVRDRGVSKEGVCRVQHPPTCQDIVAKIALPSGKKRKRGEEKKEE,NEIFFILFFNLLCSIPLTLRFIVQSFPTGIIGFSFEVKVLMFIRSVPLLCFCYVGNPCSSLRFNNHVVRGELIFIISFTFENLYPIFIQTDTKSLGKQRCLSSVSFCLLFHFLCISYIYFLPPCHLLPHLPSCFSEAWTSLNYCLRWKAFFIRIAIF,RSRPGAFLVLGGFCLGAGIGLLGAGTGLCCTVLSASLCNNPPSDMSCICSVSVSAWTPTPSSTRSSVCDIPLSSSDWSTC,TLYNHAICADLSPMCCVRSHGNPGMGVPLSADELHSGSHFLPLLRCLMFGCHYHLHHQVLSSMQHKNYPRHHRNRSVPPWNLGVVKR\n
SPU_027708	SPU_027708	none	Inspection of the tiling array suggests that glean may have missed the following exons: SRCHVHASFVGRHVLSCSESLAAEVALVGPFTPVFWKVVLERILRREGAWALLALEGPFQRVFPPVHDKPGLFIECQWTELTFVSTTIRMHSHFVLVHLVVSGELLWAVLTFEDLWFLAGDKYSLSKHVLW,TWRSTIITLAVAWLTLWQSSISWVECGKCRLLSRHNTLVPMRDSKCESFDQQSGDISTRLKLVHDDTNRMSKLSNNSSPEWLRL\n
SPU_001633	SPU_001633	none	motor domain\n
SPU_007595	SPU_007595	none	SPU_016421 is a partial duplicate prediction for SPU_007595.\n
SPU_016421	SPU_016421	none	SPU_016421 is a partial duplicate prediction for SPU_007595.\n
SPU_005795	SPU_005795	none	Missing first ~80-100 AA.\n
SPU_002037	SPU_002037	none	SPU_002037 is a partial duplicate prediction for SPU_015843.\n
SPU_026535	SPU_026535	none	SPU_026535 has part II of the gene. SPU_019403 may code for Part I. There is still ~100 AA missing between the two GLEAN's.\n
SPU_019403	SPU_019403	none	SPU_026535 has part II of the gene. SPU_019403 may code for Part I. There is still ~100 AA missing between the two GLEAN's.\n
SPU_021620	SPU_021620	none	SPU_005597 has most of the gene but a small part that is better coded by SPU_021620.\n
SPU_005597	SPU_005597	none	SPU_005597 has most of the gene but a small part that is better coded by SPU_021620.\n
SPU_013688	SPU_013688	none	Incorrect gene prediction. SPU_013687 is a partial duplicate prediction for UNC50 gene.\n
SPU_013687	SPU_013687	none	Incorrect gene prediction. SPU_013687 is a partial duplicate prediction for UNC50 gene.\n
SPU_003464	SPU_003464	none	May have an extra exon at beginning.\n
SPU_001579	SPU_001579	none	SPU_025102 is a partial duplicate prediction for SPU_001579.\n
SPU_025102	SPU_025102	none	SPU_025102 is a partial duplicate prediction for SPU_001579.\n
SPU_010312	SPU_010312	none	Missing first ~20 AA.\n
SPU_014712	SPU_014712	none	SPU_020985 is a partial duplicate prediction for SPU_014712.\n
SPU_020985	SPU_020985	none	SPU_020985 is a partial duplicate prediction for SPU_014712.\n
SPU_028809	SPU_028809	none	Incorrect gene model. Extra exon(s).\n
SPU_008110	SPU_008110	none	e val = 0 against NP_036442. \nExact match to XP_795366: PREDICTED: similar to Chromosome-associated kinesin KIF4A(Chromokinesin) [Strongylocentrotus purpuratus]. \n3447 nts spread over 25 exons. \nThis same sequence is found on Scaffoldi3148 from sp_20060316_asm. \nAnnotated by RA Obar, RL Morris, AL Silverio, BJ Chick,  AM Musante, AS Shorette.\n
SPU_006544	SPU_006544	none	Incorrect gene model. Excellent homology to xylulokinase only in the latter half of the gene model. Annotated as such. \n
SPU_019081	SPU_019081	none	SPU_004122 is a partial duplicate prediction for SPU_019081. SPU_019081 is likely missing a few amino acids in middle.\n
SPU_004122	SPU_004122	none	SPU_004122 is a partial duplicate prediction for SPU_019081. SPU_019081 is likely missing a few amino acids in middle.\n
SPU_004700	SPU_004700	none	SPU_004741 is a partial duplicate prediction for SPU_004700.\n
SPU_004741	SPU_004741	none	SPU_004741 is a partial duplicate prediction for SPU_004700.\n
SPU_006047	SPU_006047	none	SPU_006047 is missing a few AA at beginning and end. SPU_015592 is a partial duplicate prediction that goes to the end of the CCT8 protein. There is a significant overlap between the two predictions.\n
SPU_015592	SPU_015592	none	SPU_006047 is missing a few AA at beginning and end. SPU_015592 is a partial duplicate prediction that goes to the end of the CCT8 protein. There is a significant overlap between the two predictions.\n
SPU_009300	SPU_009300	none	Likely missing ~50 AA in middle (one or more exon).\n
SPU_022270	SPU_022270	none	SPU_022269 is a partial duplicate prediction for SPU_022270.\n
SPU_022269	SPU_022269	none	SPU_022269 is a partial duplicate prediction for SPU_022270.\n
SPU_024103	SPU_024103	none	Incorrect gene model. Longer than necessary.\n
SPU_028735	SPU_028735	none	Missing ~100 AA.\n
SPU_001512	SPU_001512	none	SPU_017897 is a partial duplicate prediction for SPU_001512.\n
SPU_013005	SPU_013005	none	SPU_002924 is a partial duplicate prediction for SPU_013005.\n
SPU_002924	SPU_002924	none	SPU_002924 is a partial duplicate prediction for SPU_013005.\n
SPU_017669	SPU_017669	none	SPU_025669 is a partial duplicate prediction for SPU_017669.\n
SPU_025669	SPU_025669	none	SPU_025669 is a partial duplicate prediction for SPU_017669.\n
SPU_004893	SPU_004893	none	SPU_01686 is a partial duplicate prediction for SPU_004893.\n
SPU_001686	SPU_001686	none	SPU_01686 is a partial duplicate prediction for SPU_004893.\n
SPU_014697	SPU_014697	none	SPU_011718 is a partial duplicate prediction for SPU_014697.\n
SPU_011718	SPU_011718	none	SPU_011718 is a partial duplicate prediction for SPU_014697.\n
SPU_020801	SPU_020801	none	May be missing ~100 AA.\n
SPU_012946	SPU_012946	none	SPU_005913 is a partial duplicate prediction for SPU_012946.\n
SPU_005913	SPU_005913	none	SPU_005913 is a partial duplicate prediction for SPU_012946.\n
SPU_021795	SPU_021795	none	Missing first ~80 AA.\n
SPU_013396	SPU_013396	none	SPU_000788 has first part of the TIMM44 gene. SPU_013396 has the rest.\n
SPU_000778	SPU_000778	none	SPU_000788 has first part of the TIMM44 gene. SPU_013396 has the rest.\n
SPU_008672	SPU_008672	none	SPU_006174 is a partial duplicate prediction for SPU_008672.\n
SPU_006174	SPU_006174	none	SPU_006174 is a partial duplicate prediction for SPU_008672.\n
SPU_010671	SPU_010671	none	Likely missing the last exon.\n
SPU_017059	SPU_017059	none	SPU_019019 has the first part and SPU_017059 the rest. There is a significant overlap between the two.\n
SPU_019091	SPU_019091	none	SPU_019019 has the first part and SPU_017059 the rest. There is a significant overlap between the two.\n
SPU_020010	SPU_020010	none	Only the first half of TOP3A is predicted by this GLEAN, rest is missing from the predictions.\n
SPU_017723	SPU_017723	none	Missing the first half of the gene.\n
SPU_005321	SPU_005321	none	Likely has an extra exon predicted. \n
SPU_021928	SPU_021928	none	#\nMissing first ~40 AA.\n
SPU_011091	SPU_011091	none	Model is missing ~50 AA.\n
SPU_027066	SPU_027066	none	Missing first half.\n
SPU_003970	SPU_003970	none	SPU_009819 is a partial duplicate prediction for SPU_003970.\n
SPU_009819	SPU_009819	none	SPU_009819 is a partial duplicate prediction for SPU_003970.\n
SPU_015304	SPU_015304	none	Incomplete model. First half is not correctly predicted.\n
SPU_013174	SPU_013174	none	SPU_020504 is a partial duplicate prediction for SPU_013174.\n
SPU_020504	SPU_020504	none	SPU_020504 is a partial duplicate prediction for SPU_013174.\n
SPU_015044	SPU_015044	none	SPU_011230 is a partial duplicate prediction for SPU_015044.\n
SPU_011230	SPU_011230	none	SPU_011230 is a partial duplicate prediction for SPU_015044.\n
SPU_011567	SPU_011567	none	SPU_011567 has first part of the gene and SPU_025450 has the rest.\n
SPU_025450	SPU_025450	none	SPU_011567 has first part of the gene and SPU_025450 has the rest.\n
SPU_027693	SPU_027693	none	SPU_022721 is a partial duplicate prediction for SPU_027693.\n
SPU_022721	SPU_022721	none	SPU_022721 is a partial duplicate prediction for SPU_027693.\n
SPU_008040	SPU_008040	none	SPU_027793 is a partial duplicate prediction for SPU_008040.\n
SPU_027793	SPU_027793	none	SPU_027793 is a partial duplicate prediction for SPU_008040.\n
SPU_015114	SPU_015114	none	SPU_015114 is a partial duplicate prediction for SPU_015351.\n
SPU_001978	SPU_001978	none	Incorrect gene model. Longer than required. \nSPU_003188 is a partial duplicate prediction for SPU_001978.\n
SPU_003183	SPU_003183	none	SPU_003188 is a partial duplicate prediction for SPU_001978.\n
SPU_012816	SPU_012816	none	SPU_012816 has first part of the gene and SPU_025309 has the latter. \n
SPU_025309	SPU_025309	none	SPU_012816 has first part of the gene and SPU_025309 has the latter. \n
SPU_028866	SPU_028866	none	SPU_028866 is a partial duplicate prediction for SPU_022607.\n
SPU_026082	SPU_026082	none	Transcriptome and alignment with best blast hit suggests that prediction may lack an N-terminal exon\n
SPU_019582	SPU_019582	none	SPU_019260 is a partial duplicate prediction for SPU_019582.\n
SPU_019260	SPU_019260	none	SPU_019260 is a partial duplicate prediction for SPU_019582.\n
SPU_021510	SPU_021510	none	SPU_001446 has the first part of the HECTD1 gene. SPU_021510 has the rest. There is an overlap between the two GLEANs.\n
SPU_005518	SPU_005518	none	Missing an exon at beginning.\n
SPU_019431	SPU_019431	none	Missing first half of the gene. SPU_009004 is a partial duplicate prediction for SPU_019431.\n
SPU_009004	SPU_009004	none	Missing first half of the gene. SPU_009004 is a partial duplicate prediction for SPU_019431.\n
SPU_003914	SPU_003914	none	SPU_005017 is a partial duplicate prediction for SPU_003914.\n
SPU_005017	SPU_005017	none	SPU_005017 is a partial duplicate prediction for SPU_003914.\n
SPU_007280	SPU_007280	none	SPU_007280, SPU_013615, SPU_021285 and SPU_027378 are all "phospholipid scramblase like" sequences.\n
SPU_013615	SPU_013615	none	SPU_007280, SPU_013615, SPU_021285 and SPU_027378 are all "phospholipid scramblase like" sequences.\n
SPU_021285	SPU_021285	none	SPU_007280, SPU_013615, SPU_021285 and SPU_027378 are all "phospholipid scramblase like" sequences.\n
SPU_005123	SPU_005123	none	SPU_005123 has the first part of SCFD1 gene and SPU_005492 has the other half.\n
SPU_005492	SPU_005492	none	SPU_005123 has the first part of SCFD1 gene and SPU_005492 has the other half.\n
SPU_017144	SPU_017144	none	SPU_017144 appears to have the first half of the VPS33B gene, while SPU_002218 may have the remaining part.\n
SPU_002218	SPU_002218	none	SPU_017144 appears to have the first half of the VPS33B gene, while SPU_002218 may have the remaining part.\n
SPU_024158	SPU_024158	none	May be missing the last exon.\n
SPU_005862	SPU_005862	none	SPU_001745 is a duplicate prediction for SPU_005862.\n
SPU_001745	SPU_001745	none	SPU_001745 is a duplicate prediction for SPU_005862.\n
SPU_002387	SPU_002387	none	Incorrect gene model. Likely extra exon(s) predicted for the longer model SPU_002387. SPU_016361 has better sequence homology with GBF1 but is incomplete.\n
SPU_016361	SPU_016361	none	Incorrect gene model. Likely extra exon(s) predicted for the longer model SPU_002387. SPU_016361 has better sequence homology with GBF1 but is incomplete.\n
SPU_003406	SPU_003406	none	SPU_025852 is a partial duplicate prediction for SPU_003406. \nSPU_003406 is likely missing the last exon(s).\n
SPU_025852	SPU_025852	none	SPU_025852 is a partial duplicate prediction for SPU_003406.\n
SPU_019999	SPU_019999	none	Missing first ~80 AA.\n
SPU_006680	SPU_006680	none	Incorrect gene model. Longer than is necessary.\n
SPU_008487	SPU_008487	none	SPU_008487 is a duplicate prediction for SPU_013194.\n
SPU_010670	SPU_010670	none	Incorrect gene model. Prediction longer than necessary.\n
SPU_003805	SPU_003805	none	SPU_025072 is a similar prediction.\n
SPU_025072	SPU_025072	none	SPU_003805 is a similar prediction. \n
SPU_000723	SPU_000723	none	Missing first ~25 AA.\n
SPU_026892	SPU_026892	none	SPU_026892 has most of the TUBGCP3 gene. Last 150 aa are likely encoded by SPU_010158. SPU_021617 is a partial duplicate prediction for SPU_026892.\n
SPU_021617	SPU_021617	none	SPU_026892 has most of the TUBGCP3 gene. Last 150 aa are likely encoded by SPU_010158. SPU_021617 is a partial duplicate prediction for SPU_026892.\n
SPU_010158	SPU_010158	none	SPU_026892 has most of the TUBGCP3 gene. Last 150 aa are likely encoded by SPU_010158. SPU_021617 is a partial duplicate prediction for SPU_026892.\n
SPU_011574	SPU_011574	none	Exons 24, 14, 31 and 46 are not conserved or expressed according to the transcriptome data and are therefore questionable.\n
SPU_009648	SPU_009648	none	Missing first ~100 AA.\n
SPU_002510	SPU_002510	none	SPU_028365 is a partial duplicate prediction for SPU_002510.\n
SPU_028635	SPU_028635	none	SPU_028365 is a partial duplicate prediction for SPU_002510.\n
SPU_001119	SPU_001119	none	SPU_001119 and SPU_023988 are overlapping incomplete predictions for SSRP1. SPU_001119 is missing ~100 AA at the beginning and end of SSRP1. SPU_023988 is missing ~250 AA at the beginning but goes all the way to the end of SSRP1.\n
SPU_023988	SPU_023988	none	SPU_001119 and SPU_023988 are overlapping incomplete predictions for SSRP1. SPU_001119 is missing ~100 AA at the beginning and end of SSRP1. SPU_023988 is missing ~250 AA at the beginning but goes all the way to the end of SSRP1.\n
SPU_022853	SPU_022853	none	SPU_022853 is a partial duplicate prediction for SPU_019373.\n
SPU_011902	SPU_011902	none	SPU_007972 is a partial duplicate prediction for SPU_011902.\n
SPU_007972	SPU_007972	none	SPU_007972 is a partial duplicate prediction for SPU_011902.\n
SPU_003845	SPU_003845	none	SPU_022665 encodes the first part of the RABEP1 gene and SPU_003845 appears to encode the latter half. There is overlap between the two predictions.\n
SPU_022665	SPU_022665	none	SPU_022665 encodes the first part of the RABEP1 gene and SPU_003845 appears to encode the latter half. There is overlap between the two predictions.\n
SPU_011194	SPU_011194	none	PREDICTED: similar to fertility related protein WMP1 [Strongylocentrotus purpuratus],spermatogenesis-associated protein 7 [Rattus norvegicus]\n
SPU_011195	SPU_011195	none	PREDICTED: hypothetical protein [Mus musculus] \n
SPU_011196	SPU_011196	none	#\nTryptophanyl-tRNA synthetase, cytoplasmic (Tryptophan--tRNA ligase) (TrpRS)\n
SPU_011198	SPU_011198	none	PREDICTED: similar to polymerase (DNA directed), alpha 2 (70kD subunit) [Strongylocentrotus purpuratus] \n
SPU_003899	SPU_003899	none	#\nhypothetical protein LOC423694 [Gallus gallus], \n
SPU_003902	SPU_003902	none	myosin V [Strongylocentrotus purpuratus]\n
SPU_003903	SPU_003903	none	PREDICTED: hypothetical protein\n
SPU_023288	SPU_023288	none	PREDICTED: hypothetical protein \n
SPU_023289	SPU_023289	none	PREDICTED: similar to Ribonuclease Oy (RNase Oy)\n
SPU_008157	SPU_008157	none	hypothetical protein DDBDRAFT_0206412 [Dictyostelium discoideum AX4]\n
SPU_008158	SPU_008158	none	PREDICTED: similar to peroxisomal biogenesis factor 6-like protein [Strongylocentrotus purpuratus]\n
SPU_014604	SPU_014604	none	PREDICTED: similar to smad nuclear interacting protein [Strongylocentrotus purpuratus] \n
SPU_015924	SPU_015924	none	#\nPREDICTED: similar to acid-sensing ion channel 1, partial [Strongylocentrotus purpuratus] \n
SPU_025080	SPU_025080	none	PREDICTED: similar to Leucine rich repeat containing 15 [Strongylocentrotus purpuratus] \n
SPU_025081	SPU_025081	none	#\nPREDICTED: similar to Transmembrane and coiled-coil domains 2 [Strongylocentrotus purpuratus] \n
SPU_023431	SPU_023431	none	PREDICTED: similar to selectin-like protein\n
SPU_007137	SPU_007137	none	PREDICTED: similar to LDL receptor adaptor protein (ARH) [Bos taurus] \n
SPU_007138	SPU_007138	none	#\nPREDICTED: hypothetical protein [Strongylocentrotus purpuratus] \n
SPU_007139	SPU_007139	none	PREDICTED: hypothetical protein \nPREDICTED: similar to heparan sulfate sulfotransferase [Strongylocentrotus purpuratus] \n
SPU_007140	SPU_007140	none	#\nPREDICTED: similar to WD repeat domain 66 [Strongylocentrotus purpuratus]\n
SPU_008948	SPU_008948	none	PREDICTED: similar to xanthine dehydrogenase [Strongylocentrotus purpuratus]\n
SPU_008949	SPU_008949	none	PREDICTED: similar to codanin 1 [Gallus gallus] \n
SPU_008951	SPU_008951	none	PREDICTED: KIAA0586 isoform 7 [Pan troglodytes]\n
SPU_008952	SPU_008952	none	hypothetical protein LOC324723 [Danio rerio]\n
SPU_008955	SPU_008955	none	PREDICTED: similar to small zinc finger-like protein [Strongylocentrotus purpuratus], also similar to G protein-coupled receptor 1 [Strongylocentrotus purpuratus] \n \n
SPU_008956	SPU_008956	none	#\nPREDICTED: similar to neurexin iv [Strongylocentrotus purpuratus]\n
SPU_008957	SPU_008957	none	PREDICTED: similar to ligand Delta-1, partial [Strongylocentrotus purpuratus] \n
SPU_008958	SPU_008958	none	PREDICTED: hypothetical protein [Strongylocentrotus purpuratus] \n
SPU_008960	SPU_008960	none	#\nPREDICTED: similar to CG4089-PA [Tribolium castaneum]\n
SPU_027146	SPU_027146	none	PREDICTED: hypothetical protein isoform 1 [Strongylocentrotus purpuratus]\n
SPU_023395	SPU_023395	none	PREDICTED: similar to putative urease accessory protein F [Strongylocentrotus purpuratus] \n
SPU_023396	SPU_023396	none	similar to fibrosurfin, partial [Strongylocentrotus  \npurpuratus] \n
SPU_023399	SPU_023399	none	#\nPREDICTED: similar to hyalin [Strongylocentrotus purpuratus]\n
SPU_000996	SPU_000996	none	PREDICTED: hypothetical protein [Strongylocentrotus purpuratus], similar to tumor protein p53 inducible protein 11, isoform CRA_b [Homo sapiens] \n
SPU_002116	SPU_002116	none	PREDICTED: hypothetical protein [Strongylocentrotus purpuratus]. also similar to Rap2-binding protein 9 isoform 4 [Macaca mulatta]. \n
Sp-185/333-07	SPU_030265	none	#\nThis gene goes out of frame in element 2, and therefore may be either a pseudogene or a problem in the assembly.\n
SPU_000141	SPU_000141	none	lec, EGF-like (epidermal-growth-factor like domain), laminin-EGF, fibronectin 3, TM\n
SPU_000192	SPU_000192	none	lec\n
SPU_000271	SPU_000271	none	EGF(epidermal growth factor like domain), lec\n
SPU_000289	SPU_000289	none	Clec X 2\n
SPU_000290	SPU_000290	none	Clec X 3 (or 2.5)\n
SPU_000542	SPU_000542	none	lec, PAN\n
SPU_000543	SPU_000543	none	lec, EGF (epidermal growth factor like domain), EGF(epidermal growth factor like domain), TM, cyt; macrophage mannose binding lec?\n
SPU_000837	SPU_000837	none	SP, lec\n
SPU_001027	SPU_001027	none	low complexity, Clec X 2, low complexity, EGF (epidermal growth factor like domain)-like X 3, TM, low complexity\n
SPU_001200	SPU_001200	none	EGF, EGF, EGF, EGF, EGF, EGF, EGF, EGF, lec, EGF \n*EGF = Epidermal Growth Factor like domain\n
SPU_001527	SPU_001527	none	SP, lec, hyalin, lec, hyalin, hyalin, apple domain (plasminogen)\n
SPU_001878	SPU_001878	none	gal_lec, transmembrane\n
SPU_001887	SPU_001887	none	EGF (Epidermal growth factor like domain, lec\n
SPU_001898	SPU_001898	none	lec apple domain\n
SPU_002005	SPU_002005	none	Gal_lec\n
SPU_002383	SPU_002383	none	SP, lec\n
SPU_002420	SPU_002420	none	Clec X 2, low complexity, LY(Low-density lipoprotein-receptor YWTD domain) X 2\n
SPU_002720	SPU_002720	none	SP, gal_lec, coiled coil, TM, TM, TM, TM, TM, TM, TM, low complexity region\n
SPU_002861	SPU_002861	none	EGF(epidermal growth factor like domain), hyalin, hyalin, lec\n
SPU_003246	SPU_003246	none	SP, gal_lec, transmembrane, low complexity\n
SPU_003284	SPU_003284	none	Kringle, lec, EGF(epidermal growth factor like domain)-CA(cadherin repeats)\n
SPU_003610	SPU_003610	none	Sp, Clec X 3, TM\n
SPU_004014	SPU_004014	none	lec\n
SPU_004476	SPU_004476	none	hyalin, hyalin, lec\n
SPU_004818	SPU_004818	none	Clec X 2\n
SPU_004831	SPU_004831	none	low complexity, gal_lec\n
SPU_005013	SPU_005013	none	SP, CCP, lec, CCP, CCP, low complexity, TM \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)\n
SPU_005028	SPU_005028	none	lec, LDLA, LDLA \n \n*LDLA = Low-density lipoprotein receptor domain class A\n
SPU_005248	SPU_005248	none	lec, CCP (Low-density lipoprotein receptor domain class A), TM, cyt\n
SPU_005596	SPU_005596	none	EGF (epidermal growth factor like domain), lec, lec\n
SPU_005706	SPU_005706	none	SP, EGF (Epidermal growth factor like domain), hyalin, lec\n
SPU_005725	SPU_005725	none	CUB, coag factor 5/8 C terminal domain, lec, LDLa, LDLa, LDLa, space,  LDLa \n*CUB = Domain first found in C1r, C1s, uEGF, and bone morphogenetic protein. \n*LDLA = Low-density lipoprotein receptor domain class A\n
SPU_005888	SPU_005888	none	Clec X 2\n
SPU_005909	SPU_005909	none	CCP, CCP, lec, CCP, CCP, EGF \n \n* CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)\n
SPU_005910	SPU_005910	none	lec C-type, CCP, CCP, TM, cyt \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)\n
SPU_006083	SPU_006083	none	lec, hyalin, hyalin, hyalin, hyalin, apple domain\n
SPU_006306	SPU_006306	none	gal_lec, gal_lec\n
SPU_006310	SPU_006310	none	EGF(epidermal growth factor link domain), lec\n
SPU_006620	SPU_006620	none	apple, lec\n
SPU_006648	SPU_006648	none	Clec, CCP, CCP, low complexity, TM \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)\n
SPU_007118	SPU_007118	none	CCP, lec, CCP TM \n \n* CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)\n
SPU_007565	SPU_007565	none	gal_lec\n
SPU_007566	SPU_007566	none	low complex, lec\n
SPU_008001	SPU_008001	none	lec\n
SPU_008065	SPU_008065	none	SP,  lec, EGF,EGF, protein tyr phosphatase or fibronectin3 \n \n*EGF = Epidermal Growth Factor like Domain\n
SPU_008976	SPU_008976	none	CCP, lec \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)\n
SPU_009174	SPU_009174	none	SP, EGF (epidermal growth factor like domain), hyalin, hyalin,  lec\n
SPU_009222	SPU_009222	none	CCP, lec, lec, CCP, CCP, EGF, EGF \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR) \n \n*EGF = epidermal growth factor like domain\n
SPU_009615	SPU_009615	none	lec\n
SPU_009886	SPU_009886	none	CLECT X 4\n
SPU_010019	SPU_010019	none	SP, gal_lec, transmembrane\n
SPU_010099	SPU_010099	none	Sp, lec\n
SPU_010313	SPU_010313	none	gal_lec, gal_lec, gal_lec\n
SPU_010317	SPU_010317	none	Clec X 2, low complexity\n
SPU_010470	SPU_010470	none	Clec X 2, PAN_1\n
SPU_010615	SPU_010615	none	SP, lec, PAN?\n
SPU_010639	SPU_010639	none	PAN apple, lec, PAN apple, TM\n
SPU_011014	SPU_011014	none	F5F8 type C, C_lec, gal_lec, C_lec, FA58C(Coagulation factor 5/8 C-terminal domain, discoidin domain), Protein has FTP (eel-Fucolectin Tachylectin-4 Pentaxrin-1 Domain) at relatively low probability \n
SPU_011167	SPU_011167	none	lec, CCP, CCP, EGF, EGF, TM \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR) \n*EGF = epidermal growth factor like domain\n
SPU_011176	SPU_011176	none	SCOP or Clectin, low complexity, GPS (G-protein-coupled receptor proteolytic site domain), 7TM-2\n
SPU_011192	SPU_011192	none	EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, lec \n \n*EGF(epidermal growth factor link domain)-Ca(Cadherin repeats)\n
SPU_011632	SPU_011632	none	lec\n
SPU_011829	SPU_011829	none	Sp, Clec X 3, PAN 1\n
SPU_011867	SPU_011867	none	EGF (epidermal growth factor like domain) X 2, FTP(eel-Fucolectin Tachylectin-4 Pentaxrin-1 Domain)\n
SPU_011971	SPU_011971	none	SP, CUB, CUB,CUB, lec, LDLa, LDLa, LDLa, LDLa, LDLa, LDLa \n \n*CUB = Domain first found in C1r, C1s, uEGF, and bone morphogenetic protein. \n \n*LEDLA = Low-density lipoprotein receptor domain class A\n
SPU_012121	SPU_012121	none	CCP, CCP, lec, CCP CCO, EGF-Ca, EGF-Ca, EGF-Ca, TM \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR) \n \n*EGF-Ca = Epidermal Growth factor like domain - Cadherin repeats. \n
SPU_012302	SPU_012302	none	#\nlow complexity, gal_lec\n
SPU_012477	SPU_012477	none	CLec X 5, PAN_AP (divergent subfamily of APPLE domains), CLec X 3\n
SPU_012478	SPU_012478	none	Clec, PAN_AP(divergent subfamily of APPLE domains), Clec, Clec, Clec\n
SPU_012479	SPU_012479	none	Sp, Clec X 4, TM\n
SPU_012585	SPU_012585	none	lec, CCP, CCP, TM \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)\n
SPU_012742	SPU_012742	none	Clec X 2, CCP, EGF(epidermal growth factor like domian)-like X 2, TM\n
SPU_012869	SPU_012869	none	CUB, CUB, CUB, CUB, CUB or lec, LDLa, LDLa, LDLa, LDLa, TM, TM, TM, TM  TM, TM, TM  \n \n*CUB = Domain first found in C1r, C1s, uEGF, and bone morphogenetic protein. \n \n*LDLA = Low-density lipoprotein receptor domain class A\n
SPU_012951	SPU_012951	none	PAN, FA5/8C, FA5/8C, FA5/8C, lec, CUB, CUB, CUB, lec \n \n*FA58C = Coagulation factor 5/8 C-terminal domain, discoidin domain \n*CUB = Domain first found in C1r, C1s, uEGF, and bone morphogenetic protein.\n
SPU_013186	SPU_013186	none	SP,  CCP, CCP, leclec \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)\n
SPU_013388	SPU_013388	none	lec\n
SPU_013825	SPU_013825	none	lec, low complex\n
SPU_013860	SPU_013860	none	Sp, lec, lec\n
SPU_013872	SPU_013872	none	Sp, lec,lec \n
SPU_014004	SPU_014004	none	SP, low complexity, gal_lec\n
SPU_014097	SPU_014097	none	SRCR , lec, EGF (Epidermal Growth Factor Like Domain) \n
SPU_014218	SPU_014218	none	#\ngal_lec, EGF_like, EGF_like, transmembrane \n \n*EGF = epidermal growth factor like domain\n
SPU_014219	SPU_014219	none	lec, low complelx, TM, long cyt\n
SPU_014220	SPU_014220	none	Low complexity, TM \nAlternativer: Clec below threshold match \n
SPU_014585	SPU_014585	none	(FTP(eel-Fucolectin Tachylectin-4 Pentaxrin-1 Domain) domain) F5F8_type_C(Coagulation factor 5/8 C-terminal domain, discoidin domain), gal_lec, gal_lec, gal_lec,\n
SPU_014623	SPU_014623	none	lec, PAN\n
SPU_014672	SPU_014672	none	CUB, lec, low complex\n
SPU_015065	SPU_015065	none	Clec X 2, PAN_1\n
SPU_015487	SPU_015487	none	Sp, Lec\n
SPU_015961	SPU_015961	none	lec, pan\n
SPU_015962	SPU_015962	none	Clec X 2, PAN 1\n
SPU_016088	SPU_016088	none	DUF1339,lec,TM\n
SPU_016103	SPU_016103	none	SP,  CCP, CCP, lec \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)\n
SPU_016529	SPU_016529	none	CCP, CCP, lec, low complexity region \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)\n
SPU_016629	SPU_016629	none	Clec, CCP X 3, EGF (Epidermal growth factor like domain)-like, CCP, TM \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)\n
SPU_016681	SPU_016681	none	CCP, lec, CCP, CCP, EGF(Epidermal Growth Factor like domain)/Ca X34, TM, short cyt \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR) \n \n
SPU_016772	SPU_016772	none	lec\n
SPU_017006	SPU_017006	none	CCP, CCP, CCP, lec, CCP, CCP, CCP, TM cyt \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR) \n \n
SPU_017007	SPU_017007	none	#\nSP,  CCP, CCP, CCP, lec, CCP, TM cyt \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)\n
SPU_017109	SPU_017109	none	Gal_lec\n
SPU_017519	SPU_017519	none	CCP(Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)), lec\n
SPU_017520	SPU_017520	none	lec, CCP, CCP \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)\n
SPU_017842	SPU_017842	none	lec overlaps with VWF type C\n
SPU_017887	SPU_017887	none	Sp, Clec, PAN_AP(divergent subfamily of APPLE domains), Clec X3, SCOP\n
SPU_018258	SPU_018258	none	lec, CCP, CCP, EGF(epidermal growth factor link domain) \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)\n
SPU_018501	SPU_018501	none	low complexity, Clec X 2\n
SPU_018550	SPU_018550	none	SP, gal_lec, gal_lec, low complexity\n
SPU_018601	SPU_018601	none	lec, EGF(epidermal growth factor domain\n
SPU_018682	SPU_018682	none	low complexity, gal_lec, gal_lec\n
SPU_019060	SPU_019060	none	lec, low complexity\n
SPU_019088	SPU_019088	none	SP, lec\n
SPU_019150	SPU_019150	none	lec, LDLa, LDLa \n \n*LDLDA = Low-density lipoprotein receptor domain class A\n
SPU_019438	SPU_019438	none	gal_lec, gal_lec, PAN\n
SPU_019576	SPU_019576	none	SP, lec, EGF, EGF \n \n*EGF = epidermal growth factor link domain\n
SPU_019805	SPU_019805	none	Clec, Clec, low complexity\n
SPU_019986	SPU_019986	none	SP,  lec, EGF, EGF \n \n*EGF = Epidermal growth factor like domain\n
SPU_020377	SPU_020377	none	lec, PAN\n
SPU_020424	SPU_020424	none	 low complexity, lec, lec\n
SPU_020513	SPU_020513	none	CCP(Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)), lec\n
SPU_020760	SPU_020760	none	hyalin, hyalin, lec\n
SPU_021221	SPU_021221	none	low complexity, Clec X 2, low complexity\n
SPU_021505	SPU_021505	none	Clec X 2, low complexity regions\n
SPU_021675	SPU_021675	none	low complexity, gal_lec\n
SPU_021709	SPU_021709	none	SP, lec, space, PAN\n
SPU_022197	SPU_022197	none	SP, CUB, CUB, CUB, lec, TM \n \n*CUB = Domain first found in C1r, C1s, uEGF, and bone morphogenetic protein.\n
SPU_022470	SPU_022470	none	SP, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, Clec, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca, EGF-Ca \n \n* EGF = epidermal growth factor like domain \n*CA = Cadherin repeats.\n
SPU_022595	SPU_022595	none	low complexity, gal_lec\n
SPU_022718	SPU_022718	none	SP,F_raikovi_mat, gal_lec, EGF_like, transmembrane\n
SPU_022719	SPU_022719	none	SP, gal_lec\n
SPU_023114	SPU_023114	none	Clec X 2, several low complexity regions\n
SPU_023130	SPU_023130	none	lec, long low complexity\n
SPU_023175	SPU_023175	none	lec overlap with LDLa, lec overlap with 2 \n LDLa, LDLa \n \n*LDLA = Low-density lipoprotein receptor domain class A\n
SPU_023607	SPU_023607	none	EGF-Ca, lec, EGF-Ca, EGF-Ca, lec, EGF-Ca, lec, EGF-Ca, lec, EGF-Ca, lec, EGF-Ca, lec \n \n*EGF = epidermal growth factor like domain  \n*CA = Cadherin repeats.\n
SPU_023714	SPU_023714	none	Sp, Clec X 2\n
SPU_024218	SPU_024218	none	CCP X 2, CLec X 2, CCP \n \n* CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)\n
SPU_024494	SPU_024494	none	lec, low complexity\n
SPU_024638	SPU_024638	none	Clec X 3\n
SPU_024786	SPU_024786	none	FA5/8C, FA5/8C, lec, EGF-Ca, lec C-type, CUB, CUB, CUB, TM \n \n*FA5/8C = Coagulation factor 5/8 C-terminal domain, discoidin domain \n \n*EGF-Ca = Epidermal growth factor like domain - Cadherin repeats. \n \n*CUB = Domain first found in C1r, C1s, uEGF, and bone morphogenetic protein.\n
SPU_025051	SPU_025051	none	Clec X 3, TM, low complexity\n
SPU_025074	SPU_025074	none	lec, lec\n
SPU_025097	SPU_025097	none	F5/F8 type C (Coagulation factor 5/8 C-terminal domain, discoidin domain), lec, lec C-type, CUB (Domain first found in C1r, C1s, uEGF, and bone morphogenetic protein.)\n
SPU_025194	SPU_025194	none	SP, CCP, CCP, CCP, TM, CCP, CCP, CCP, lec, EGF overlaps with histone deacetylase interacting domain \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR) \n*EGF = Epidermal Growth Factor like domain\n
SPU_025414	SPU_025414	none	SP, gal_lec, gal_lec, gal_lec, gal_lec\n
SPU_026103	SPU_026103	none	CCP, CCP, lec \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)\n
SPU_026208	SPU_026208	none	lec, CCP, CCP \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)\n
SPU_027084	SPU_027084	none	CUB (Domain first found in C1r, C1s, uEGF, and bone morphogenetic protein.) overlaps with lec, LDLa, LDLa, LDLa, LDLa, LDLa \n \n*LDLA = Low-density lipoprotein receptor domain class A\n
SPU_027332	SPU_027332	none	SP, lec, long space, hormone receptor domain with many Cys, TM, TM, TM, TM, TM, TM, TM, long cyt\n
SPU_028067	SPU_028067	none	SP,  CCP, lec, TM, cyt\n
SPU_028298	SPU_028298	none	SP,  CCP, CCP, lec \n \n*CCP = Domain abundant in complement control proteins; SUSHI repeat; short complement-like repeat (SCR)\n
SPU_028326	SPU_028326	none	CUB (Domain first found in C1r, C1s, uEGF, and bone morphogenetic protein.), lec, LDLa (Low-density lipoprotein receptor domain class A) \n
SPU_028539	SPU_028539	none	Sp, Clec, Kring X 3, Clec X 6, HDAC_interact(Histone deacetylase (HDAC) interacting), SCOP\n
SPU_028565	SPU_028565	none	Sp, lec\n
SPU_028712	SPU_028712	none	CLec, Fibrinogen, PAN_AP(divergent subfamily of APPLE domains), CLec\n
SPU_019008	SPU_019008	none	dystrophin-like protein [Strongylocentrotus purpuratus] \n,3908 aa \n
SPU_019010	SPU_019010	none	#\nhypothetical protein isoform 1 [Strongylocentrotus  \npurpuratus]\n
SPU_010580	SPU_010580	none	similar to ubiquitin-conjugating enzyme E2D 3 [Strongylocentrotus purpuratus]\n
SPU_010581	SPU_010581	none	similar to cathepsin l [Strongylocentrotus purpuratus], 821 aa\n
SPU_010582	SPU_010582	none	similar to transcriptional intermediary factor 1 alpha  \n[Strongylocentrotus purpuratus],220 aa\n
SPU_009812	SPU_009812	none	2OG-Fe(II) oxygenase domain\n
SPU_016862	SPU_016862	none	frizzled domain, CCP X2, CLECT, CCP\n
SPU_023863	SPU_023863	none	SPU_023863 is a partial duplicate prediction for SPU_028180\n
SPU_025316	SPU_025316	none	Domains: low complexity, Clec X 2, low complexity, Clec X 2, low complexity\n
SPU_000239	SPU_000239	none	Many eukaryotic proteins that are known or supposed to bind single-stranded RNA contain one or more copies of a putative RNA-binding domain of about 90 amino acids. This is known as the eukaryotic putative RNA-binding region RNP-1 signature or RNA recognition motif (RRM). RRMs are found in a variety of RNA binding proteins, including heterogeneous nuclear ribonucleoproteins (hnRNPs), proteins implicated in regulation of alternative splicing, and protein components of small nuclear ribonucleoproteins (snRNPs). The motif also appears in a few single stranded DNA binding proteins. The RRM structure consists of four strands and two helices arranged in an alpha/beta sandwich, with a third helix present during RNA binding in some cases. Two individual models were built which identify subtypes of this domain, but there is no functional difference between the subtypes. \n
SPU_027751	SPU_027751	none	Among the different families of transporter only two occur ubiquitously in all classifications of organisms. These are the ATP-Binding Cassette (ABC) superfamily and the Major Facilitator Superfamily (MFS). The MFS transporters are single-polypeptide secondary carriers capable only of transporting small solutes in response to chemiosmotic ion gradients\n
SPU_023347	SPU_023347	none	Ras proteins are membrane-associated molecular switches that bind GTP and GDP and slowly hydrolyze GTP to GDP. The balance between the GTP bound (active) and GDP bound (inactive) states is regulated by the opposite action of proteins activating the GTPase activity and that of proteins which promote the loss of bound GDP and the uptake of fresh GTP. The latter proteins are known as guanine-nucleotide dissociation stimulators (GDSs) (or also as guanine-nucleotide releasing (or exchange) factors (GRFs)). Proteins that act as GDS can be classified into at least two families, on the basis of sequence similarities, the CDC24 family (see IPR001331) and the CDC25 family. \nThe size of the proteins of the CDC25 family range from 309 residues (LTE1) to 1596 residues (sos). The sequence similarity shared by all these proteins is limited to a region of about 250 amino acids generally located in their C-terminal section (currently the only exceptions are sos and ralGDS where this domain makes up the central part of the protein). This domain has been shown, in CDC25 an SCD25, to be essential for the activity of these proteins. \n
SPU_018998	SPU_018998	none	Hormone receptor (7 transmember receptor) \n
SPU_018997	SPU_018997	none	Hormone receptor (7 Transmember protein)\n
SPU_010716	SPU_010716	none	ATP-binding cassette (ABC) transporters are multidomain membrane proteins, responsible for the controlled efflux and influx of substances (allocrites) across cellular membranes. They are minimally composed of four domains, with two transmembrane domains (TMDs) responsible for allocrite binding and transport and two nucleotide-binding domains (NBDs) responsible for coupling the energy of ATP hydrolysis to conformational changes in the TMDs. Both NBDs are capable of ATP hydrolysis, and inhibition of hydrolysis at one NBD effectively abrogates hydrolysis at the other. Hydrolysis at the two NBDs may occur in an alternative fashion although they appear substantially functionally symmetrical in terms of their binding to diverse nucleotides\n
SPU_010968	SPU_010968	none	Epidermal growth factors and transforming growth factors belong to a general class of proteins that share a repeat pattern involving a number of conserved Cys residues. Growth factors are involved in cell recognition and division. The repeating pattern, especially of cysteines (the so-called EGF repeat), is thought to be important to the 3D structure of the proteins, and hence its recognition by receptors and other molecules. The type 1 EGF signature includes six conserved cysteines believed to be involved in disulphide bond formation. The EGF motif is found frequently in nature, particularly in extracellular proteins.\n
SPU_005786	SPU_005786	none	Zinc finger domains are nucleic acid-binding protein structures first identified in the Xenopus laevis transcription factor TFIIIA. These domains have since been found in numerous nucleic acid-binding proteins. A zinc finger domain is composed of 25 to 30 amino-acid residues including 2 conserved Cys and 2 conserved His residues in a C-2-C-12-H-3-H type motif. The 12 residues separating the second Cys and the first His are mainly polar and basic, implicating this region in particular in nucleic acid binding. The zinc finger motif is an unusually small, self-folding domain in which Zn is a crucial component of its tertiary structure. All bind 1 atom of Zn in a tetrahedral array to yield a finger-like projection, which interacts with nucleotides in the major groove of the nucleic acid. The Zn binds to the conserved Cys and His residues. Fingers have been found to bind to about 5 base pairs of nucleic acid containing short runs of guanine residues. They have the ability to bind to both RNA and DNA, a versatility not demonstrated by the helix-turn-helix motif. The zinc finger may thus represent the original nucleic acid binding protein. It has also been suggested that a Zn-centred domain could be used in a protein interaction, e.g. in protein kinase C. Many classes of zinc fingers are characterized according to the number and positions of the histidine and cysteine residues involved in the zinc atom coordination. In the first class to be characterized, called C2H2, the first pair of zinc coordinating residues are cysteines, while the second pair are histidines. \nThis THAP domain is a putative DNA-binding domain with a C2CH architecture that probably binds a zinc ion. The domain is widespread in Drosophila species, Mus musculus, Homo sapiens and has been reported in Caenorhabditis elegans. \n
SPU_010967	SPU_010967	none	#\nThe CUB domain is an extracellular domain of approximately 110 residues which is found in functionally diverse, mostly developmentally regulated proteins and in peptidases belonging to MEROPS peptidase families M12A (astacin) and S1A (chymotrypsin). Almost all CUB domains contain four conserved cysteines which probably form two disulphide bridges (C1-C2, C3-C4). The structure of the CUB domain has been predicted to be a beta-barrel similar to that of immunoglobulins. Proteins that have been found to contain the CUB domain include mammalian complement subcomponents C1s/C1r, which form the calcium-dependent complex C1, the first component of the classical pathway of the complement system; hamster serine protease Casp, which degrades type I and IV collagen and fibronectin in the presence of calcium; mammalian complement-activating component of Ra-reactive factor (RARF), a protease that cleaves the C4 component of complement; vertebrate enteropeptidase (EC 3.4.21.9), a type II membrane protein of the intestinal brush border, which activates trypsinogen; vertebrate bone morphogenic protein 1 (BMP-1), a protein which induces cartilage and bone formation and expresses metalloendopeptidase activity; sea urchins blastula proteins BP10 and SpAN; Caenorhabditis elegans hypothetical proteins F42A10.8 and R151.5; neuropilin (A5 antigen), a calcium-independent cell adhesion molecule that functions during the formation of certain neuronal circuits; fibropellins I and III from sea urchin; mammalian hyaluronate-binding protein TSG-6 (or PS4), a serum and growth factor induced protein; mammalian spermadhesins; and Xenopus embryonic protein UVS.2, which is expressed during dorsoanterior development.\n
SPU_007497	SPU_007497	none	This gene has an allel in SPU_028871\n
SPU_019286	SPU_019286	none	Two non-overlapping peptides were pulled from an IP coupled MS/MS from unfertilized eggs.\n
SPU_010845	SPU_010845	none	Looks to be an allel of SPU_022112\n
SPU_023905	SPU_023905	none	C term of the prediction seems to be a piece of the Fringe protein\n
SPU_003256	SPU_003256	none	The GLEAN3 model shows little confidence and may thus be incorrect.  However, the sequence of the GLEAN model CDS is nearly identical to the probe used by Rast et. al. 2002 Dev. Biol. and should thus is likely the "Kakapo" discussed in that paper.  \n
SPU_014715	SPU_014715	none	May be a fragment of SPU_000296 or SPU_003985 (or another gene altogether) as this model is significantly shorter than either of these gelsolin-like genes.  However, this Glean model is nearly identical to the probe for Gelsolin used in Rast et. al.(2002) Dev Bio and suggests that this gene is expressed in the sea urchin genome in an area around the blastopore > 24 hour along with Kakapo and apobec\n
SPU_008512	SPU_008512	none	Alignmenet with best blast sequence suggests that the model may lack N- and C-terminal sequences.\nThere are a large number of nearly identical sequences on many  scaffolds that are not on the glean3 list.  There are also multiple copies on the glean3 list.
SPU_006056	SPU_006056	none	This glean result contains the C terminal reigon of the PLCg sequence.  The n terminal sequence is held on scaffold 53431 and SPU_027462.
SPU_008936	SPU_008936	none	Matches SPU_027487.
SPU_022557	SPU_022557	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When reviewing the excel data, it appears that both scaffolds do not have any gaps or repeats present and if the two were combined, the entire sequence would have an orderly and continuous arrangement. There was Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appear to be strong.	none
SPU_027090	SPU_027090	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 3 different scaffolds. When examining the excel data to that of the BLAST results, it appears that if the 3 scaffolds were to be combined, the sequence would have an orderly and continuous arrangement. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with all of the values being greater than10	none
SPU_006176	SPU_006176	After reviewing the data and performing a BLAST search it appears that the sequence is distributed onto 2 different scaffolds. When reviewing the excel data, it appears that there are 2 overlaps between subject gb|DS002008| and gb|DS007834| from 297-369 and from 219-301. Besides the overlaps between the 2 scaffolds, the sequence has an orderly arrangement without any gaps. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak (all the values were less than 6.) 	none
SPU_016907	SPU_016907	After reviewing the data and performing a BLAST search, it appears that the data is distributed onto 2 different scaffolds. When examining both scaffolds individually it is evident that they both have and orderly arrangement without any gaps or repeats present. It appears that if the two scaffolds were to be combined, the overall sequence would have an orderly and continuous arrangement. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most several values that we greater than 10.	none
SPU_013699	SPU_013699	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 3 different scaffolds. When examining the excel data, it appears that there are several internal repeats present within one of the scaffolds (v2.1_scaffold34023) but the sequence has an orderly arrangement without any gaps present. There was Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with the majority of the values ranging <5	none
SPU_009528	SPU_009528	From the BLAST results and the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. It appears that if both scaffolds were to be combined, the entire sequence would have an orderly and continuous arrangement without any gaps or repeats present. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be  somewhat strong with most of the values being greater than 10.	none
SPU_018705	SPU_018705	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. When examining the excel data it is apparent that there is a gap within the second scaffold that spans from 1026-1149. However, there were no internal repeats present in either scaffold. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with the majority of the values ranging greater than 10.	none
SPU_019803	SPU_019803	From the BLAST results and the excel data, it was evident that the sequence is distributed onto two different scaffolds. When examining the excel data in comparison with the BLAST results, it was apparent that the sequence had an orderly arrangement without any internal repeats or large gaps present.  There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with the majority of the values being greater than 10.	none
SPU_001128	SPU_001128	After reviewing the data performing a BLAST search, it appears that the data is distributed onto 2 different scaffolds. When examining the excel data and comparing it to the BLAST results, it is evident that if these 2 scaffolds were combined the sequence would have an orderly and continuous arrangement. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5.	none
SPU_002250	SPU_002250	After reviewing the excel data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. There appears to be one sequence overlap within >v2.1_scaffold84012, however, if this overlap is discarded and the two scaffolds were combined the sequence would have an orderly, continuous arrangement. There was no Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be widely distributed with but most of the values are >5. 	none
SPU_005672	SPU_005672	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When comparing the BLAST results with the excel data, it is apparent that both scaffolds have an orderly continuous arrangement with out any repeats or gaps present and if the two scaffolds were combined, the sequence would have good coverage. There was some Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appear to be somewhat weak, with most of the values being less than 5.	none
SPU_006308	SPU_006308	From the BLAST results and the excel data it is evident that this sequence is distributed onto 2 different scaffolds. After reviewing the excel data, it was determined that the both scaffolds had an orderly arrangement without any gaps present. However, there is a sequence overlap within the first scaffold that occurs from 512-573 that does appear to be apart of the sequence. There was Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being greater than 10. 	none
SPU_006772	SPU_006772	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. When reviewing the excel data, it was apparent that there were numerous repeats and gaps present within both scaffolds, resulting in poor overall coverage. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak	none
SPU_012804	SPU_012804	After reviewing the data and performing a BLAST search it appears that the sequence is distributed onto 3 different scaffolds. However, if the 2 scaffolds were combined there would be several sequence overlaps between v2.1_scaffold68484 and v2.1_scaffold100953 due to similar base pairing between the two. There was also an internal repeat within v2.1_scaffold100953 , but this may have been apart of the sequence since there was only one repeat of the same base (563-609). There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the scores being above 10.	none
SPU_019206	SPU_019206	The BLAST results showed that only 213 bases were mapped out of a total of 375 indicating that the sequence did not map very well. Sea urchin GBrowse (assembly 2.1) was used to examine the scaffold, which revealed that the sequence had gaps and wasn't covered very well. This is an un-annotated gene so no additional comments were available on the Baylor page. Some EST information was available on the GBrowse V0.5 assembly as well. The transcriptome scores appeared to be a little weak with intensities averaging a little below 10. 	none
SPU_026226	SPU_026226	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. It appears that both scaffolds contain an orderly arrangement and if the two scaffolds were combined the sequence would have good overall coverage (excluding a small overlap from about 571-686.) There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5.	none
SPU_023325	SPU_023325	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto two different scaffolds. When examining the excel data, it is apparent that both scaffolds contain an orderly arrangement without any internal repeats or gaps present. If the two scaffolds were to be combined, the overall sequence would have good coverage. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being greater than 10.	none
SPU_011962	SPU_011962	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 3 different scaffolds. If the 3 scaffolds were combined, the overall sequence would have an orderly and continuous arrangement without any gaps or repeats present. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appear to be somewhat strong with most of the values being <5.	none
SPU_021377	SPU_021377	After reviewing the data and performing the BLAST search it appears that there is no good GLEAN model fit for SPU_021377. There are numerous repeats and poor sequence coverage. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values above 5 and several that were above 10.	none
SPU_023589	SPU_023589	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When examining the excel data, it is apparent that there are several gaps within each scaffold. However, the entire sequence is covered between the two scaffolds. There was some Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most the values being greater than 10. 	none
SPU_020564	SPU_020564	After reviewing the data and performing a BLAST search, it was determined that this is the best match for SPU_020564. Before a BLAST search was done, it was initially thought that the best match was subject gb|DS006697| do to the orderly arrangement of the sequence and the coverage. However, the BLAST search indicated that the best match was subject AAGJ02027540 due to a high bit score and a low E value even though the coverage only spanned a total of 291/960. 	none
SPU_021315	SPU_021315	From the BLAST results and the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. There is a large gap that occurs between the two scaffolds that spans from1032-1188. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with all of the values being less than 10 (excluding one outlier value at 72.)	none
SPU_020954	SPU_020954	After reviewing the data from the excel file and performing a BLAST search it appears that there is no sufficient GLEAN model that fits. The sequence is distributed onto 3 different scaffolds. There are also numerous gaps within the sequence and poor sequence coverage. There are also several repeats apparent. There is no Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appear to be strong with values ranging from about 5-54. This is an un-annotated gene so no additional information was available from Baylor annotations (gene comments).	none
SPU_013123	SPU_013123	From the BLAST results as well as the excel data it is clear that this is the best fit for this particular GLEAN model. When examining the excel data, in conjunction with the BLAST results, it is apparent that there is a large gap present that spans from 1065-1126. The entire scaffold ends at 3183 but is truncated at 1126 resulting in a significant loss of sequence information. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat weak with most of the values being about 5 (excluding 2 outliers at 18 and 30). 	none
SPU_022770	SPU_022770	After reviewing the data and performing a BLAST search, it appears that this is the best match for this particular GLEAN model. When comparing the BLAST results with the excel data it is apparent that the sequence has an orderly arrangement without any gaps or internal repeats present. However the sequence is truncated at 756 on this scaffold. The other BLAST results displayed part of the rest of this sequence, but there were numerous gaps and repeats present. There was Est support available from GBrowse V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being <5	none
SPU_026633	SPU_026633	After reviewing the data it appears that the sequence is distributed onto 2 different scaffolds. There doesn't appear to be very good sequence coverage and there are several internal repeats and sequence overlaps on both scaffolds. There was Est information available on GBrowse assembly V0.5 and the transcriptome score intensities appear to be strong. This was an un-annotated gene so no additional information was available on the Baylor (comments). 	none
SPU_019686	SPU_019686	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. If these two scaffolds were combined, the sequence would have an orderly and continuous arrangement without any gaps or repeats present. There was Est information available and the transcriptome intensity scores appeared to be weak with most of the values being less 5 excluding 2 outliers at about 45. 	none
SPU_020374	SPU_020374	For this particular GLEAN model it appears that this is the best match. There are several gaps within the sequence and poor coverage (591/1077). There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be week with all of the values ranging less than 5. 	none
SPU_024362	SPU_024362	After reviewing the data and performing a BLAST search it appears that the sequence is distributed onto 2 different scaffolds. The overall sequence coverage is poor and the first scaffold doesn't begin until about 318. There was Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appear to be somewhat strong with most of the values being greater than 5.	none
SPU_007471	SPU_007471	From the BLAST results as well as the excel data, it appears that this is the best results for this particular GLEAN model. When examining the excel data, it is apparent that there are several sequence overlaps present as well as internal repeats. There was no Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with most of the values being less than 10 (excluding one outlier at about 130.)	none
SPU_006172	SPU_006172	For this particular GLEAN model there was no Cds information available from Baylor annotations or from SpBase. However, there was mRNA information available from SpBase. When examining the excel data it appears that this is a very short sequence (489 base pairs in length) and there is a small gap between 91-124. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be widely distributed, with several values that were high (110-120) and some that were low (<5.)	none
SPU_014885	SPU_014885	After reviewing the data and performing a BLAST search, it appears that there is no good fit for this particular GLEAN model due to numerous gaps within the sequence and poor coverage. There was Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with the majority of scores being greater than 10.	none
SPU_027889	SPU_027889	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. Both scaffolds have an orderly and continuous arrangement and if they were to be combined, the overall sequence would have good coverage. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be widely dispersed but overall strong, with most of the values being greater than 5. 	none
SPU_026573	SPU_026573	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto two different scaffolds. When reviewing the excel data, it was apparent that were several internal repeats present as well as gaps within both scaffolds. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being greater than 10.	none
SPU_017972	SPU_017972	For this particular GLEAN model, there was no Cds information available from both Baylor annotations and SpBase. There was however mRNA information available from SpBase. When examining the excel data there are several repeats that are apparent within one of the scaffolds that are apart of the sequence. Other than this repeat, if the 3 scaffolds were combined, the sequence would have an orderly arrangement. There was Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being greater than 10. 	none
SPU_011617	SPU_011617	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto two different scaffolds. When examining the excel data, it was apparent that there were no large gaps present or internal repeats. Both scaffolds had an orderly arrangement and if the two were combined, the overall sequence would have good coverage. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being greater than 10. 	none
SPU_015473	SPU_015473	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto two different scaffolds. When examining the excel data, it was apparent that there were no large gaps present or internal repeats. Both scaffolds had an orderly arrangement and if the two were combined, the overall sequence would have good coverage. There was some Est support available from GBRowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5. 	none
SPU_005519	SPU_005519	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed on to 2 different scaffolds. The first scaffold (v2.1_scaffold56386) doesn't begin until the 4th base pair and ends at 152 where the rest of the sequence is continued on it the second scaffold (v2.1_scaffold57984). If the 2 scaffolds were combined the sequence would have an orderly and continuous arrangement.  There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5. 	none
SPU_013337	SPU_013337	From the BLAST results and the excel data, it appears that the sequence is distributed onto two different scaffolds. There is a sequence overlap between the two scaffolds that occurs between 373-505 and the second scaffold appears to be cut short at the end of the sequence. 	none
SPU_028205	SPU_028205	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When examining the excel data, it appears that if the 2 scaffolds were to be combined the sequence would have an orderly and continuous arrangement without and gaps or repeats present. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong as well. 	none
SPU_005172	SPU_005172	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When examining the excel data, it appears that there are numerous repeats and gaps present within the sequence resulting in poor sequence coverage overall. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with all of the values being less than 10.	none
SPU_025679	SPU_025679	From the BLAST results and the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. There is a large gap present that separates the two scaffolds that spans from 314-454. Other than this missing sequence information that separates the two scaffolds, the BLAST results indicate that both scaffolds contain an orderly arrangement without any internal repeats present.  There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat weak with most of the values being less than 8.	none
SPU_008587	SPU_008587	From the BLAST results and the excel data, it is evident that this sequence is distributed onto 2 different scaffolds for this particular GLEAN model. When reviewing the excel data it is appears that both scaffolds have an orderly arrangement. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being about 5 and greater.	none
SPU_011498	SPU_011498	After reviewing the data and performing a BLAST search, it is apparent that there is no sufficient fit for this GLEAN model. The sequence is dispersed onto 3 different scaffolds, however, there are numerous repeats and gaps present within each scaffold and poor coverage overall. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak (less than 5.)	none
SPU_023019	SPU_023019	From the BLAST results and the excel data, it is evident that the sequence is distributed onto two different scaffolds. The first scaffold contains a small gap that occurs from 92-147. However, there are no internal repeats or other gaps present within either or the two remaining scaffolds. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being greater than 10.	none
SPU_001244	SPU_001244	After reviewing the data and performing a BLAST search, it appears that there is no good fit for this GLEAN model. The best fit was v2.1_scaffold35442 which has a low bit score and a high e-value. When reviewing the excel data it appears that there are numerous repeats and gaps present within the sequence. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appear to be strong with all of the values being greater than 10.	none
SPU_012337	SPU_012337	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto3 different scaffolds. When examining the excel data, it is evident that there are several gaps within the first scaffold as well as internal repeats. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with the majority of the values ranging <5	none
SPU_005061	SPU_005061	After reviewing the data and performing a BLAST search it appears that the sequences is distributed onto 2 different scaffolds. The BLAST search revealed an additional set of information that was not available on the excel spreadsheet (query 287-312). This additional query overlaps with some of the sequence provided from the excel data.  It was also initially thought that the best results would have been from subject gb|DS002183| instead of subject gb|DS003004| as the BLAST search indicated. Subject gb|DS003004| sequence only covered 490/1182 base pairs while subject gb|DS002183| sequence covered 704/1182. It appears that the BLAST search indicated the subject gb|DS003004 was the best match based on high bit score and low e-value. There was also one internal repeat, but it was only repeated twice so the repeat may have been apart of the sequence. There was Est information available from GBrowse assembly V0.5 and the transcriptome information appeared to be somewh at strong (values <5). This is an un-annotated gene so no additional comments were available from Baylor annotation (gene information).	none
SPU_003070	SPU_003070	After reviewing the data and performing a BLAST search it appears that there is no good GLEAN model fit for SPU_003070. The sequence is distributed on to 3 different scaffolds. There are several gaps within v2.1_scaffold9764 and an overlap in the sequence if scaffolds >v2.1_scaffold30607 and v2.1_scaffold9764 were combined. If the three scaffolds were combined, there would still be several gaps and repeats within the sequence. There was Est information available from GBrowse V0.5 and the transcriptome intensity scores appeared to be strong. This is an un-annotated gene so no additional gene information was available from Baylor annotations (comments).	none
SPU_008258	SPU_008258	From the BLAST results and the excel data, it is evident that the sequence is distributed onto 2 scaffolds. The first scaffold doesn't begin until the 31st base pair. There also is a large gap between the two scaffolds that spans from 357-627. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being greater than 10.	none
SPU_001333	SPU_001333	From the excel data and the BLAST results it is evident that the sequence is distributed onto two different scaffolds. There is a gap present between the two scaffolds that occurs between 288-314. Other than this missing part of the scaffold, the overall sequence has an orderly arrangement without any repeats present. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with several values that were greater than 5. 	none
SPU_014839	SPU_014839	For this particular GLEAN model there wasn't any Cds information available from either Baylor annotations or SpBase. However, there was mRNA information available from SpBase. There was no Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores were very low with values lower than 5.	none
SPU_004718	SPU_004718	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. The first portion of the sequence (v2.1_scaffold46046) has an orderly and continuous arrangement without any gaps or repeats present until 823 is reached. From there, the rest of the sequence is continued on v2.1_scaffold26351 however, this scaffold contains several internal repeats present but the sequence remains orderly. There was no Est. support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5.	none
SPU_019253	SPU_019253	From the BLAST results and the excel data it appears that the sequence is distributed onto 2 different scaffolds. There is a small sequence overlap between the 2 scaffolds that occurs from 2485-2610. When revewing the excel data, it is apparent that there are no internal repeats or gaps present within the two scaffolds. There was Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with all of the values being less than 6.	none
SPU_019224	SPU_019224	After reviewing the data and performing a BLAST search it appears that SPU_019224 is on two different scaffolds. The sequence would be continuous and orderly if the two scaffolds were combined. The BLAST search revealed its adjacency and there is an internal overlap in the second scaffold. There was also no EST support found on GBrowse V0.5 and there as well as low transcriptome intensity.\nAdditional information from blastp:\ngb|AAS01046.1|  Src family kinase [Asterina miniata]     380      5e-104	none
SPU_024311	SPU_024311	After reviewing the data and performing a BLAST search it appears there is no sufficient GLEAN model fit for SPU_024311. The sequence appears to be distributed onto 2 different scaffolds.  One the first scaffold the sequence doesn't start until aboutHowever, there appears to be an internal overlap between two of the sequences within the query. If one of the overlaps is discarded (805-902), there would be an orderly arrangement of the sequence. There was also a long string of repeats that were on different scaffolds that were discarded. \nAdditional information found:\n                                                          Score     E\nSequences producing significant alignments:               (Bits)  Value\nref|XP_001189237.1|PREDICTED: similar to DEP domain containi.645    0.0 	none
SPU_009283	SPU_009283	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When examining the excel data and comparing it to the BLAST results it is evident that if the 2 scaffolds were combined, the sequence would have an orderly continuous arrangement without any gaps or repeats present. There was some Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being <5.	none
SPU_006276	SPU_006276	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto 3 different scaffolds. After reviewing the excel data in comparison with the BLAST results it is clear that there are numerous gaps and internal repeats present within the 3 different scaffolds, resulting in poor overall coverage. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak with most of the values ranging >6. 	none
SPU_007861	SPU_007861	After reviewing the data and performing a BLAST search, it appears that this is the best match for this particular GLEAN model. When reviewing the excel data, it appears that the sequence is distributed onto 2 different scaffolds, between subject gb|DS010076 and gb|DS001479|. The first part of the sequence doesn't begin until the 37th base pair and continues until the end of the scaffold, 592. The rest of the sequence begins on 635 resulting in a small gap between the two scaffolds. There was no Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being greater than 10. 	none
SPU_012848	SPU_012848	After reviewing the excel data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. When comparing the BLAST results to that of the excel data, it is evident that if these 2 scaffolds were combined the sequence would have an orderly continuous arrangement without any repeats or gaps present. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be some what weak with most of the vlause being <5, but there were several values that were <5 as well	none
SPU_016150	SPU_016150	The BLAST results and the excel data it appears that the sequence is distributed onto 2 different scaffolds. However, there is a large gap between the two scaffolds from 995-1095. There are also several internal repeats within v2.1_scaffold84508. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores were somewhat strong with most of the values being greater than 5	none
SPU_000964	SPU_000964	From the excel data and the BLAST results, it is evident that the sequence is distributed onto 2 different scaffolds. When reviewing the excel data, it appears that the two scaffolds have an orderly arrangement without any gaps or repeats present. If the two scaffolds were to be combined, the entire sequence would have an orderly and continuous arrangement. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be clustered in two groups on opposite sides of the graph. Both clusters appeared to have some what strong intensities with most of the values being greater than 5. 	none
SPU_024035	SPU_024035	After reviewing the data and performing a BLAST search, it appears that there is no good fit for this particular GLEAN model. There are numerous gaps and repeats within the sequence resulting in poor coverage. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appear to be strong with most of the values being greater than 10.	none
SPU_009508	SPU_009508	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 3 different scaffolds. There are several gaps within each scaffold and several internal repeats as well. Scaffold >v2.1_scaffold73429 was unique in that the sequence covered the first 100 base pairs. There was Est information available from Growse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong as well with values ranging from 2-22. This is an un-annotated gene so no additional information was available from Baylor annotations (gene informational comments).	none
SPU_018777	SPU_018777	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto two different scaffolds. When examining the excel data in comparison with the BLAST results it was determined that both scaffolds contain an orderly arrangement, without any gaps or repeats present. If the two scaffolds were to be combined, the overall sequence would have good coverage.  There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak overall, with most of the values being less than 5. 	none
SPU_002961	SPU_002961	From the BLAST results as well as the excel data, it is evident that the sequence is distributed onto 4 different scaffolds. When examining the excel data, it is apparent that between the 4 different scaffolds there are numerous gaps and repeats present. The overall sequence ends at 1869 but the fourth scaffold (v2.1_scaffold56607) is truncated at 1471. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong. 	none
SPU_027308	SPU_027308	When a BLAST search was done, this was the only scaffold that was found. The sequence coverage isn't very good since only 383/849 base pairs match. 	none
SPU_024299	SPU_024299	For this particular GLEAN model there was no CDS information or gene features available on either the SpBase search engine or the Baylor annotations page. Only the mRNA sequence was available on the Baylor page. When the Excel data was analyzed, subject gb|DS006122| appeared to have an orderly arrangement and no internal repeats.  	none
SPU_017515	SPU_017515	For this particular GLEAN model, there was no Cds information available from either SpBase or Baylor annotations. There was however mRNA information provided by SpBase. When examining the excel data, it is believed that this sequence may be distributed onto two different scaffolds. The first portion of the sequence appears to be distributed onto AAGJ02126498 and the remaining portion of the sequence onto. gb|DS009869|. Both scaffolds contain an orderly arrangement without any internal repeats present. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being greater than 10.	none
SPU_026961	SPU_026961	After reviewing the data and performing a BLAST search, it appears that the data is distributed onto 2 different scaffolds. The BLAST results did not display these 2 scaffolds as the best fit, probably due to a slightly higher bit score and lowered e-value when compared to the other scaffold results. When examining the excel results, it is evident that if the 2 scaffolds were combined, the sequence would have an orderly continuous arrangement. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak overall	none
SPU_027616	SPU_027616	After reviewing the data from the excel file and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. The sequence doesn't begin until about 25, but from there if the two scaffolds were combined the sequence would have an orderly continuous arrangement, without any repeats or gaps present. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values beign greater than 10. 	none
SPU_028120	SPU_028120	From the BLAST results and the excel data, it is evident that the sequence is distributed onto 2 different scaffolds. When examining the excel data, it was apparent that both scaffolds contained an orderly arrangement without any gaps or repeats present. There was no Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with the majority of values being <10	none
SPU_008121	SPU_008121	After reviewing the data for model_SPU_008121 it appears that this is the best fit for this particular GLEAN model based on bit score and e-value. The sequence is distributed onto 2 different scaffolds but there is a wide gap from 540-670. The arrangement of the sequence is orderly; however, there are several internal repeats within the sequence. There is Est information available on GBrowse assembly V0.5 and the transcriptome intensity information appears to be widely distributed and somewhat strong.	none
SPU_008728	SPU_008728	After reviewing the data and performing a BLAST search it appear that the data is distributed onto 2 different scaffolds. If the two scaffolds were combined, it appears that the sequence would have a continuous, orderly arrangement without any repeats or gaps present. There was some Est information available and the transcriptome intensity scores appeared to be some what strong with most of the values being greater than 5.	none
SPU_009558	SPU_009558	After reviewing the data and performing a BLAST search, it appears that there is no good GLEAN model fit for model_SPU_009558. There are several internal repeats within the scaffolds, several gaps within the sequence and poor sequence coverage overall. >v2.1_scaffold57986 was the best results indicated by the BLAST search results based on a low e-value and high bit score. There was no Est information available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values being greater than 5. 	none
SPU_015126	SPU_015126	After reviewing the data it appears that there is no sufficient GLEAN model that fits SPU_015126 due to the poor sequence coverage, gaps present and internal repeats within the sequence. The sequence appears to be distributed across 4 different scaffolds.  There was Est. information available from GBrowse assembly V0.5 and the transcriptome intensity scores appear to be clumped together and strong. This is an un-annotated gene so no additional comments were available from Baylor annotations (comments).	none
SPU_023339	SPU_023339	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. If the two scaffolds were combined, the sequence would have an orderly, continuous arrangement without any gaps or repeats present. This is an un-annotated gene so no additional gene information (comments) was available from the Baylor webpage. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be weak overall with most of the values being <8	none
SPU_009523	SPU_009523	For this particular GLEAN model it appears that this is the best fit based on the BLAST results in comparison with the excel data. From the excel data it was evident that there were no large gaps present and the sequence had overall good coverage. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being <10	none
SPU_001585	SPU_001585	The BLAST results along with the excel data indicate that this sequence is distributed onto 2 different scaffolds.  Both scaffolds contain an orderly arrangement without any gaps or repeats present. If the two scaffolds were to be combined, the entire sequence would have a continuous arrangement with good overall coverage. There was some Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being greater than 10.	none
SPU_020693	SPU_020693	For this particular GLEAN model there was no CDS information available or gene features in either the SpBase search engine or the Baylor annotations page. There was also no mRNA information available and the gene was not annotated.  After examining the excel data, it appears that the sequence did not have an orderly arrangement due to several gaps present. Several sequences between different subjects were the same as well and if the data could have been mapped onto V2.1 it would have been difficult to distinguish between the different scaffolds. 	none
SPU_005743	SPU_005743	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. Within the first scaffold, there is one internal repeat present that is evident from both the BLAST results and the excel data. Other than this repeat, if the two scaffolds were combined the sequence would have an orderly and continuous arrangement. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with many of the values being greater than 10.	none
SPU_027730	SPU_027730	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. There is an overlap between the 2 scaffolds from 536-745. Besides this sequence overlap, it appears that if the 2 scaffolds were combined the sequence would have an orderly and continuous arrangement. The was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be strong with most of the values being <10	none
SPU_027766	SPU_027766	After reviewing the data and performing a BLAST search, it appears that the sequence is distributed onto 2 different scaffolds. Based on the BLAST results and the excel data, it appears that if these 2 scaffolds were combined the sequence would have a continuous and orderly arrangement. There was Est. support from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat weak with most of the scores being below 8. This is an un-annotated gene so no additional gene information (comments) was available from Baylor annotations. 	none
SPU_015701	SPU_015701	After reviewing the data and performing a BLAST search, it appears that this is the best match for this GLEAN model. There is poor sequence coverage as well as several repeats present. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be widely dispersed with most of the values being less than 10 excluding 2 outliers at 34 and 26. This is an un-annotated gene so no additional gene information was available.	none
SPU_003369	SPU_003369	share strong homologies with mammalian counterparts	none
SPU_025042	SPU_025042	numerous paralogs and orthologs	none
SPU_025041	SPU_025041	numerous paralogs and orthologs	none
SPU_013932	SPU_013932	Very similar to Strongylocentrotus franciscanus epidermal growth factor II precursor	none
SPU_022099	SPU_022099	ArgRS core-containing	none
SPU_000014	SPU_000014	RAD6 homolog in Pan troglodytes	none
SPU_000021	SPU_000021	contains both DnaJ heat shock N-terminus motif and BSCB_C superfamily domain	none
SPU_000029	SPU_000029	contains RT_nLTR like motif	none
SPU_000032	SPU_000032	contains Rieske ISP(Iron-Sulfur, Fe-S) protein domain	none
SPU_000034	SPU_000034	contains COG3264 motif	none
SPU_000037	SPU_000037	PH-like domain near the C-terminus	none
SPU_000038	SPU_000038	Major Facilitator superfamily	none
SPU_000042	SPU_000042	also contains COG3055 conserved motif.	none
SPU_000050	SPU_000050	ankyrin repeat protein	none
SPU_000059	SPU_000059	DDX55 DEAD box polypeptide 55	none
SPU_000088	SPU_000088	homozygously deleted in human neuroblastoma	none
SPU_000110	SPU_000110	contains PRK08262 domain, hypothetical protein domain	none
SPU_000123	SPU_000123	contains SH3BP5 domain	none
SPU_000126	SPU_000126	contains IG CAM domain	none
SPU_000135	SPU_000135	3 ANK domains	none
SPU_000136	SPU_000136	contains Hf1C domain, also similar to stomatin isoforms 2,3 and band 7.2b stomatin and other forms of stomatin	none
SPU_000137	SPU_000137	also similar to a wide variety of lysozymes	none
SPU_000138	SPU_000138	RING superfamily motif towards C-terminus	none
SPU_000144	SPU_000144	contains NOP5 and MOSIC domains and NOP5NT towards N-terminus	none
SPU_000145	SPU_000145	contains Pyr_r and Ndh domains towards N-terminus	none
SPU_000157	SPU_000157	homologous to Drosophila dumpy gene CG33196-PB	none
SPU_000168	SPU_000168	contains LRR_RI superfamily motif in the N-terminal half, and Ran_GAP1_C superfamily motif in the C-terminal half	none
SPU_000169	SPU_000169	contains TLD superfamily motif near the C-terminus	none
SPU_000194	SPU_000194	contains RVP superfamily motif and RT_LTR domain motif	none
SPU_000195	SPU_000195	contains 2 C2 superfamily motifs, one in N-terminal half, the other towards C-terminus	none
SPU_000197	SPU_000197	also contains SpoVK domain	none
SPU_000207	SPU_000207	also similar to peroxisome membrane protein 3 (PXMP3), and contains 2 Pex2_Pex2 motifs	none
SPU_000212	SPU_000212	contains ATP-binding site and helicase motif near the N-terminus	none
SPU_000220	SPU_000220	homologous to various subtypes of annexin	none
SPU_000221	SPU_000221	homologous to various isoforms	none
SPU_000233	SPU_000233	also homologous to endonuclease/reverse transcriptase and many hypothetical proteins	none
SPU_000243	SPU_000243	contains a WD40 motif at the C-terminus, and a COG2319 domain towards C-terminus	none
SPU_000251	SPU_000251	contains DedA domain	none
SPU_000252	SPU_000252	similar to peroxisome proliferator-activated receptor binding protein (PBP) (PPAR binding protein) (thyroid hormone receptor associated protein complex 220 kDa component) (TRAP220) (thyroid receptor interacting protein 2) (TRIP2) (p52 regulatory protein ...)	none
SPU_000254	SPU_000254	contains a RING superfamly motif at the N-terminus, and a PRK02224 domain	none
SPU_000260	SPU_000260	homologous to numerous hypothetical proteins in a variety of species	none
SPU_000262	SPU_000262	contains Mab-21 superfamily motif at the N-terminus. also homologous to several hypothetical proteins.	none
SPU_000263	SPU_000263	also homologous to several hypothetical proteins in many species	none
SPU_000273	SPU_000273	also homologous to many different subtypes of annexin	none
SPU_000284	SPU_000284	contains Ins145_P3_rec domain. also homologous to various subtypes of ryanodine receptor (1, 2, 3) and isoforms.	none
SPU_000291	SPU_000291	also called PDGFA-associated protein 2, Fus-1 protein, or Fusion 1 protein.	none
SPU_000293	SPU_000293	contains TPR superfamily motif in N-terminal half, and Asp_Arg_Hydroxylase superfmily motif in C-terminal half.	none
SPU_000295	SPU_000295	contains two SH3 motifs and one WD40 motif in between. also orthologous to numerous hypothetical proteins.	none
SPU_000304	SPU_000304	orthologous to numerous hypothetical proteins	none
SPU_000311	SPU_000311	contains 3 WD40 motifs in the N-terminal half	none
SPU_000312	SPU_000312	contains F-box motif near the N-terminus	none
SPU_000316	SPU_000316	contains a PH-like superfamily motif at the N-terminus. also orthologous to many hypothetical proteins.	none
SPU_000319	SPU_000319	contains p23_NUDCD2-like motif	none
SPU_000321	SPU_000321	contains 2 UMP1 superfamily motifs	none
SPU_000313	SPU_000313	Mod_r superfamily motif at C-terminus	none
SPU_003362	SPU_003362	a large number of homologous gene models from Strongylocentrotus purpuratus indicating species-specific gene family	none
SPU_025044	SPU_025044	typical member of rhodopsin superfamily	none
SPU_000234	SPU_000234	also homologous to ankyrin-2 and ankyrin-3. contains 5 ANK motifs and 2 Arp domains.	none
SPU_013937	SPU_013937	large group of homologs in Strongylocentrotus purpuratus	none
SPU_000175	SPU_000175	contains two FA58C superfamily motifs: one near the N-terminus and the other in the C-terminal half	none
SPU_000177	SPU_000177	contains a Subtilisin_N superfamily motif near the N-terminus, and two PA superfamily motifs	none
SPU_000269	SPU_000269	also homologous to plexin A1, B1, A4, and other subtypes and isoforms	none
SPU_017137	SPU_017137	homologs are found in fungi, fish, and mouse, but not in plants	none
SPU_003367	SPU_003367	reversion-inducing-cysteine-rich protein in Gallus gallus	none
SPU_025047	SPU_025047	telomere-associated protein	none
SPU_026532	SPU_026532	TBC domain containing putative GTPase activating protein	none
SPU_012048	SPU_012048	superfamily of hypothetical proteins with unknown functions	none
SPU_012044	SPU_012044	most orthologous proteins are classified as hypothetical gene products	none
SPU_012047	SPU_012047	superfamily of proteins with known function	none
SPU_000180	SPU_000180	HEC1 domain containing; most homologs/orthologs are classified as hypothetical	none
SPU_000082	SPU_000082	contains Herpes_, OTUs, P-loop, and Exo_c domains	none
SPU_000324	SPU_000324	contains Asn_Synthase_B_C motif	none
SPU_000326	SPU_000326	contains 3 regions with WD40 motifs	none
SPU_000417	SPU_000417	contains PRK13388 domain. also orthologous to numerous hypothetical proteins.	none
SPU_000430	SPU_000430	also orthologous to numerous hypothetical proteins	none
SPU_000446	SPU_000446	also orthologous to numerous hypothetical proteins	none
SPU_000450	SPU_000450	also orthologous to numerous hypothetical proteins	none
SPU_000478	SPU_000478	also orthologous to numerous putative proteins	none
SPU_000510	SPU_000510	contains Nop53 domain in the N-terminal half, and small zf-C4 motif in the middle	none
SPU_000545	SPU_000545	contains BTB and BACK superfamily motifs at N-terminus. also orthologous to putative and miscellaneous proteins.	none
SPU_000548	SPU_000548	also orthologous to numerous putative and miscellaneous proteins	none
SPU_000560	SPU_000560	hypothetical protein. similar to SPU_000625.	none
SPU_000602	SPU_000602	also orthologous to numerous putative and miscellaneous proteins	none
SPU_000648	SPU_000648	nucleolar preribosomal-associated protein 1	none
SPU_000650	SPU_000650	contains FHA superfamily motif near C-terminus	none
SPU_000760	SPU_000760	also orthologous to exonuclease/endonuclease/phosphatase family proteins and other miscellaneous putative proteins	none
SPU_000790	SPU_000790	contains RPA2_OBF_family motif at the N-terminus, tRNA-synthetase 2 motif, and AsxRS_core domain	none
SPU_000821	SPU_000821	contains 2 AdoHCyase superfamily motifs, and a ProC domain	none
SPU_000847	SPU_000847	contains MFS_1 domain in addition to Sugar_tr superfamily motif	none
SPU_000886	SPU_000886	also orthologous to mitochondrial carrier homolog 1	none
SPU_000900	SPU_000900	contains Occludin_ELL motif at C-terminal region	none
SPU_000912	SPU_000912	contains prefoldin superfamily motif at N-terminus	none
SPU_000952	SPU_000952	contains CCL_1 domain at the C-terminus	none
SPU_001165	SPU_001165	contains 2 Arrestin_N superfamily motifs. also orthologous to numerous hypothetical proteins and other gene products.	none
SPU_001185	SPU_001185	also orthologous to numerous hypothetical proteins and other gene products.	none
SPU_001192	SPU_001192	contains S_TKc domain	none
SPU_001204	SPU_001204	also orthologous to numerous hypothetical proteins	none
SPU_001235	SPU_001235	contains a DUF1162 motif at N-terminus, and 3 MRS6 domains	none
SPU_001281	SPU_001281	contains ZnF_TTF motif towards N-terminus, and hATC motif towards C-terminus. also orthologous to putative hAT family dimerization domain proteins, and transposases.	none
SPU_001303	SPU_001303	also orthologous to hexuronate transport proteins in MFS superfamily	none
SPU_001377	SPU_001377	contains 2 Autophagy_C motifs	none
SPU_001404	SPU_001404	contains 2 AdoMet_MTase superfamily motifs	none
SPU_001441	SPU_001441	also orthologous to pyridoxin phosphatase, phosphoglycolate phosphatase, and several other unnamed hypothetical proteins	none
SPU_001450	SPU_001450	also orthologous to goliath E3 ubiquitin ligase, ring finger 130, and numerous unnamed hypothetical proteins	none
SPU_001496	SPU_001496	contains DnaJ superfamily motif near N-terminus, and SANT superfamily motif near C-terminus	none
SPU_001542	SPU_001542	also orthologous to numerous hypothetical proteins	none
SPU_001595	SPU_001595	contains SUL1 domain. also orthologous to member 6 (SLC26A6) or member 4 (SLC26A4) depending on species.	none
SPU_001657	SPU_001657	contains bZIP_1 superfamily motif at C-terminus	none
SPU_001681	SPU_001681	contains 2 DUF926 motifs in the C-terminal half	none
SPU_001696	SPU_001696	contains FYVE motif at C-terminus	none
SPU_001718	SPU_001718	contains BCNT motif towards the C-terminus	none
SPU_001722	SPU_001722	contains Prefoldin superfamily motif towards N-terminus	none
SPU_001771	SPU_001771	contains Sun domain	none
SPU_001786	SPU_001786	contains 2 Nup133_N superfmaily motifs	none
SPU_001799	SPU_001799	contains 3 C2 superfamily motifs and a FerI superfamily motif	none
SPU_001935	SPU_001935	contains vinculin domain	none
SPU_002002	SPU_002002	also similar to C18orf55 protein	none
SPU_002013	SPU_002013	similar to NADPH-dependent FMN reductase	none
SPU_002054	SPU_002054	contains 2 PRK0415 (replication factor C large subunit) domains	none
SPU_002069	SPU_002069	also similar to sarcoglycan delta in some species	none
SPU_002074	SPU_002074	also similar to mandelate racemase	none
SPU_002122	SPU_002122	contains SH3 superfamily motif at C-terminus	none
SPU_002132	SPU_002132	contains RRM superfamily motif at N-terminus	none
SPU_002150	SPU_002150	contains PRP38 superfamily motif at N-terminus	none
SPU_002189	SPU_002189	also similar to IP4/PIP3 binding-protein-like protein	none
SPU_002260	SPU_002260	also similar to membrane associated protein gex-3	none
SPU_002284	SPU_002284	also orthologous to 3-hydroxybutyrate dehydrogenase, 2-hydroxy-3-oxopropionate reductase, and several hypothetical proteins	none
SPU_002294	SPU_002294	contains PRK09198 domain. also orthologous to pre-B-cell colony enhancing factor 1	none
SPU_002302	SPU_002302	contains tRNA_Me_trans domain	none
SPU_002339	SPU_002339	similar to bacterial acetyltransferases	none
SPU_002374	SPU_002374	contains 2 FERM_M motifs and B41 domain	none
SPU_002455	SPU_002455	contains 4 repeats of SRCR superfamily domain	none
SPU_002514	SPU_002514	similar to bacterial proteins. also homologous to endonuclease/exonuclease/phosphatase, and aspartyl-tRNA synthetase, and several other bacterial proteins.	none
SPU_002572	SPU_002572	contains PLA2_bee_venom_like domain	none
SPU_002648	SPU_002648	contains dozens of PTB (protein binding) sites	none
SPU_002938	SPU_002938	also similar to uterferrin, and other protein phosphatases	none
SPU_002968	SPU_002968	contains 2 EFh superfamil motifs	none
SPU_002981	SPU_002981	also orthologous to alpha tectorin	none
SPU_002984	SPU_002984	contains 2 WD40 superfamily motifs, each at N- and C-terminus	none
SPU_003009	SPU_003009	contains Pyr_redox_2 domain	none
SPU_003078	SPU_003078	contains 2 Glyco_hydro_39 superfamily motifs	none
SPU_003087	SPU_003087	contains YSH1 domain	none
SPU_003099	SPU_003099	contains G-patch superfamily motif at C-terminus	none
SPU_003123	SPU_003123	contains 2 WD40 superfamily motifs and COG2319 domain	none
SPU_003153	SPU_003153	contains MFS_1 domain	none
SPU_003181	SPU_003181	contains 2 PH_like superfamily motifs	none
SPU_003190	SPU_003190	contains UAA domain	none
SPU_003219	SPU_003219	also orthologous to many endonuclease reverse transcriptases in Bos taurus	none
SPU_003234	SPU_003234	contains 2 ZnF_CSH1 superfmaily motifs	none
SPU_001914	SPU_001914	contains 2 7tm_1 superfamily motifs	none
SPU_002033	SPU_002033	also similar to oncoprotein-induced transcript 3 (OIT3)	none
SPU_002114	SPU_002114	contains two clusters of tandemn repeats (5x and 11x, respectively) of EGF_CA, calcium-binding EGF-like domains. also similar to fibropellin I isoforms.	none
SPU_001830	SPU_001830	contains COG1041 domain	none
SPU_017031	SPU_017031	strong DNA polymerase motifs shared by diverse groups of insects and intestinal parasites	none
SPU_027935	SPU_027935	After reviewing the data and performing the BLAST search, it appears that there is no good GLEAN model that fits SPU_027935 sufficiently. According to the BLAST results and the excel data, there are large sequence gaps resulting in poor sequence coverage an un-orderly arrangement. There was Est support available from GBrowse assembly V0.5 and the transcriptome intensity scores appeared to be somewhat strong with most of the values slightly greater than 5. 	none
