Nucleic Acids Research (2005): Comparative gene finding in chicken indicates that we are closing in on the set of multi-exonic widely expressed human genes
Comparative gene finding in chicken indicates
that we are closing in on the set of
multi-exonic widely expressed human genes
R. Castelo*, A. Reymond, C. Wyss, F. Câmara, G. Parra, S.E. Antonarakis, R. Guigó and E. Eyras
Nucleic Acids Research, 33(6):1935-1939, 2005 [full text]
*To whom correspondence should be adressed.
Contents |
In this site you can find the set of 311 putative novel human genes found using the comparative gene predictor SGP2 and the chicken genome sequence. You also will find the subset of 50 most promising predictions that were tested by RT-PCR as well as the identifiers and GenBank accessions of the six positives.
Abstract |
The recent availability of the chicken genome sequence poses the question of whether there are human protein-coding genes conserved in chicken that are currently not included in the human gene catalog. Here we show, using comparative gene finding followed by experimental verification of exon pairs by RT-PCR, that the addition to the multi-exonic subset of this catalog could be as little as 0.2% suggesting that we may be closing in on the human gene set. Our protocol, however, has two shortcomings: (1) the bioinformatic screening of the predicted genes, applied to filter out false positives, cannot handle intronless genes; and (2) the experimental verification could fail to identify expression at a specific developmental time. This highlights the importance of developing methods that could provide a reliable estimate of the number of these two types of genes.
The data |
The following files contain the amino acid sequence, DNA coding sequence and genomic coordinates of the 311 putative novel human genes:
hg16.311.putative.aa.fa (52K) |
amino acid sequences in FASTA format |
hg16.311.putative.cds.fa (148K) |
DNA coding sequences in FASTA format |
hg16.311.putative.gff (72K) |
genomic coordinates in GFF format |
The file hg16.50.mostpromisingexonjunctions.tbl contains the identifiers and (tested) exon junctions from the 50 most promising genes chosen according to the criteria described in the main article. The file consists of the five columns: identifier, intron position, tested exon-exon junction position, upstream exon and downstream exon (forming the tested exon-exon junction). From the 50 exon-exon junctions tested by RT-PCR the following six were positive:
identifier |
exon-exon junction position |
GenBank accession |
chr18_515 |
8 |
|
chr4_55 |
2 |
|
chr4_1746 |
2 |
|
chr5_400 |
3 |
|
chr15_51 |
1 |
|
chr22_143 |
2 |