11700 • The Journal of Neuroscience, October 24, 2007 • 27(43):11700 –11711 Neurobiology of Disease Cannabinoids Elicit Antidepressant-Like Behavior andActivate Serotonergic Neurons through the MedialPrefrontal Cortex Francis Rodriguez Bambico,1 Noam Katz,1,2 Guy Debonnel,1† and Gabriella Gobbi1,21Neurobiological Psychiatry Unit, Department of Psychiatry, McGill University, Montre´al, Quebec, Canada H3A 1A1, and 2Department of Psychiatry, Centrede Recherche Fernand Seguin, Hoˆpital L.H. Lafontaine, Universite´ de Montre´al, Quebec, Canada H1N 3V2
Npgrj_ng_1630 1.6Genome-wide analysis of mouse transcripts using exonmicroarrays and factor graphs Brendan J Frey1,2,6, Naveed Mohammad2,6, Quaid D Morris1,2,6, Wen Zhang2,3,6, Mark D Robinson1,2,Sanie Mnaimneh2, Richard Chang2, Qun Pan2, Eric Sat4, Janet Rossant3,4, Benoit G Bruneau3,5,Jane E Aubin3, Benjamin J Blencowe2,3 & Timothy R Hughes2,3 Recent mammalian microarray experiments detected resulting set of putative exons to have broad coverage because we used widespread transcription and indicated that there may be low stringency settings for the search algorithms (Supplementary many undiscovered multiple-exon protein-coding genes. To Methods online). The resulting set of putative exons was more than explore this possibility, we labeled cDNA from unamplified, five times larger than the set of exons in known genes. We analyzed polyadenylation-selected RNA samples from 37 mouse tissues data from a previous study8 to select twelve tissue pools, encompass- to microarrays encompassing 1.14 million exon probes. We ing 37 different tissue samples (Table 1), in a way that maximizes both analyzed these data using GenRate, a Bayesian algorithm differential expression between pools and global activity in every pool that uses a genome-wide scoring function in a factor graph (Supplementary Methods online). We analyzed wild-type mouse to infer genes. At a stringent exon false detection rate of tissues, rather than cell lines, to ensure that genes contained in the 2.7%, GenRate detected 12,145 gene-length transcripts and pools were expressed under normal physiological conditions. To confirmed 81% of the 10,000 most highly expressed known achieve high fidelity, we hybridized unamplified first-strand fluor- genes. Notably, our analysis showed that most of the 155,839 labeled cDNA obtained from polyadenylation-purified mRNA primed exons detected by GenRate were associated with known genes, with oligo-dT and random nonamers (Supplementary Methods providing microarray-based evidence that most multiple-exongenes have already been identified. GenRate also detectedtens of thousands of potential new exons and reconciled Table 1 Compositions of the 12 mRNA pools analyzed discrepancies in current cDNA databases by ‘stitching' new Composition (mRNA per array hybridization) transcribed regions into previously annotated genes.
Heart (2 mg), skeletal muscle (2 mg) Mammalian genome and transcript sequencing efforts indicate that most protein-coding genes have already been identified1. But Whole brain (1.5 mg), cerebellum (0.48 mg), olfactory bulb (0.15 mg) microarray-based analyses suggest that polyadenylated transcripts are Colon (0.96 mg), intestine (1.04 mg) produced from a considerably larger proportion of the genome, Testis (3 mg), epididymis (0.4 mg) including regions that are conserved and seem to be noncoding, as Femur (0.9 mg), knee (0.4 mg), calvaria (0.06 mg), teeth and well as regions that contain potential coding exons2.
mandible (1.3 mg), teeth (0.4 mg) To reconcile this discrepancy, we reasoned that much of the 15-d embryo (1.3 mg), 12.5-d embryo (12.5 mg), 9.5-d embryo (0.3 mg), functional mammalian transcriptome could be rapidly identified 14.5-d embryo head (0.25 mg), embryonic stem cells (0.24 mg) and characterized by surveying exon expression across multiple Digit (1.3 mg), tongue (0.6 mg), trachea (0.15 mg) normal tissues, because most known genes consist of exons and are Pancreas (1 mg), mammary gland (0.9 mg), adrenal gland (0.25 mg), expressed at different levels across tissues and developmental states.
prostate gland (0.25 mg) Salivary gland (1.26 We designed microarrays3 containing 1,140,421 sequences selected mg), lymph node (0.74 mg) 12.5-d placenta (1.15 mg), 9.5-d placenta (0.5 mg), from the combined outputs of five exon-finding and gene-like 15-d placenta (0.35 mg) sequence detection algorithms (GenScan4, HMMGene5, GrailEXP6, Lung (1 mg), kidney (1 mg), adipose tissue (1 mg), bladder (0.05 mg) BlastX and BlastN) applied to the mouse genome7. We expected the 1Electrical and Computer Engineering, University of Toronto, 10 King's College Rd., Toronto, Ontario M5S 3G4, Canada. 2Banting and Best Department of MedicalResearch, University of Toronto, 112 College St., Toronto, Ontario M5G 1L6, Canada. 3Medical Genetics and Microbiology, University of Toronto, 1 King's College Ct.,Toronto, Ontario M5S 3G4, Canada. 4Samuel Lunenfeld Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, Ontario M5G 1X5, Canada. 5TheHospital for Sick Children, 555 University Ave., Toronto, Ontario M5G 1X8, Canada. 6These authors contributed equally to this work. Correspondence should beaddressed to T.R.H. ([email protected]) or B.J.F. ([email protected]).
Published online 28 August 2005; doi:10.1038/ng1630 NATURE GENETICS ADVANCE ONLINE PUBLICATION online). This technique generated a data matrix of 1,140,421 exon particular, because most multiple-exon protein-coding mammalian expression profiles across the 12 tissue pools. Figure 1a shows a subset genes vary in expression to some extent across tissues8, a subset of of the expression data. The data are available from our project website, similar expression profiles from probes derived from putative along with an interface (linked to the University of California Santa exons that are close to each other in the genome can be taken as Cruz genome browser9) that enables browsing of microarray data, ab evidence of a functional transcript10. A disadvantage of previous initio exon predictions, mappings of known genes and genes predicted applications of this approach is that decisions to link putative by our analysis.
exons are irreversible. In particular, a decision to assign a probe to a Detection of transcripts in exon and genome tiling data is influ- particular transcript removes the probe from further consideration, enced by cross-hybridization, probe sensitivity, probe noise and even if another transcript that is better suited to the probe emerges experimental procedures, among other variables. Previous analyses later in the analysis.
detected transcripts by applying thresholds to individual signal inten- To carry out a global analysis of our microarray data, we derived a sities, correlations of coregulation patterns in multiple samples, the genome-wide scoring function that describes relationships between number of consecutive probes that constitute a ‘hit' and genomic hidden variables and expression profiles. Figure 1b provides a distances between probes10–15. Substantial increases in detection graphical depiction of the relationships between n microarray sensitivity can be gained by analyzing multiple samples jointly. In probe signals, each containing 12 expression levels, and hidden NM_027491 NM_019586 Transcription start/stop indicator Relative index of CoReg prototype Exon versus nonexon indicator Probe sensitivity and noise Probe signals (each with 12 values) Probes ordered according to position in genome Figure 1 Example of results and illustration of analysis method. (a) Sample of exon-resolution data and GenRate output from the positive strand ofchromosome 4, map positions 32833512–33300999 (build 33) from left to right. Colored rows indicate the origin of the exon prediction, sequencematches to cDNAs in six databases, the expression data (scaled from minimum to maximum), the maximum log-probe intensity and the GenRate-predictedgenes at 2.7% exon FDER. A change in color of a cDNA database match indicates the beginning of a different transcript. Similar views for the entire dataset are available at our project website. This example shows that GenRate can correctly connect together erroneously disjoined sequences in gene databasesand that coregulation across tissues can be more useful than probe intensity (purple or black track) for detecting genes. HS, human; MM, mouse. (b) Thefactor graph that GenRate uses to find CoRegs in microarray data. Each black box corresponds to a local scoring function that depends on nearby hidden andobserved variables, as indicated.
ADVANCE ONLINE PUBLICATION NATURE GENETICS
Known RefSeq/Ensembl exons in mouse+human Probes detected by GenRate Non–RefSeq probes GenRate (5 –10) GenRate (11–30) Probes not detected by GenRate log (signal intensity) Figure 2 GenRate detects exons with high sensitivity and high specificity.
(a) A comparison of exons detected by GenRate with mouse and human exons in RefSeq and Ensembl, in addition to exons in the FANTOM2 and Unigene databases. (b) Distributions of maximum probe signal intensities for RefSeq exons, probes detected by GenRate, non-RefSeq exons and probes not detected by GenRate. GenRate detects many probes that have low intensity but correspond to known exons and rejects many probes that have high intensity but do not correspond to known exons. (c) Accuracy versus recall of GenRate for various gene size categories (number of exons), and the method of thresholding the probe intensity. A comparison with a previously reported system15 (Bertone) using closely corresponding regions of the mouse andhuman genomes (Chromosome X) shows that GenRate achieves higher accuracy and recall. The portion of each recall level expected to be due to false detections is indicated for several points on the plots.
variables, including transcription start and stop sites; relative To validate the reliability of the predictions made by GenRate, transcript length; a true or false flag for each putative exon; and we used a permutation test (i.e., randomly reordering the probes) to the sensitivity, cross-hybridization level and additive noise level estimate exon and CoReg false detection rates (FDERs), the fraction of for each probe. This network is formally called a factor graph16, and detections that are expected to be false. To limit effects of cross- the global score (probability distribution) is equal to the sum hybridization noise, we applied GenRate to the 837,251 probes that (product) of a large number of local scores (probability functions).
map uniquely to build 33 of the mouse genome. By varying GenRate's Each local score reflects how well neighboring observations and sensitivity, we obtained exon and CoReg FDERs varying from hidden variables match. Our technique, called GenRate (generative 0.13% to 32% and from 0.2% to 37%, respectively.
model for finding and rating transcripts), uses the max-product To test GenRate's ability to recover previously annotated exons, we algorithm to efficiently find the globally optimum score (B.J.F., compared our predictions with exons mapped from human and Q.D.M. & T.R.H., unpublished results) and identifies sets of probe mouse genes in six cDNA databases, as well as transcripts detected signals, called CoRegs, that represent coregulated transcription. By in a recent human liver microarray analysis15. At a stringent exon maximizing a global scoring function, GenRate achieves higher FDER of 2.7%, GenRate detected 155,839 exons (4,186 expected false sensitivities than standard clustering techniques (Supplementary detections) comprising 12,145 CoRegs. GenRate detected 64% of the Methods online).
exons in the 17,577 RefSeq Golden Path mouse genes and identified NATURE GENETICS ADVANCE ONLINE PUBLICATION
false positives. Intensity thresholding in our data also achieved substantially higher accuracies than the previously reported system15, partly because we used a wider selection of tissues.
We next studied all CoRegs detected by GenRate and how they compared with RefSeq genes. At an exon FDER of 2.7%, GenRate detected 12,145 CoRegs, of which 412 (3.4%) were expected to be false Detected CoRegsNew detected CoRegs detections. Figure 1a shows a sample of the output at this FDER False negative calls (%) and shows two general trends: (i) long transcripts, which are the Number of detected CoRegs most difficult to clone, could be identified by this approach; and Number of false detections Number of false detections (ii) coregulation of expression among adjacent probes yielded sub-stantially different predicted transcripts than would be identified by Figure 3 Performance of GenRate on detecting genes. (a) False detection probe intensity level alone. The mean and median number of exons analysis. The number of CoRegs identified in the randomized probe data(horizontal axis; average over ten repetitions) and the original probe order per CoReg were 12.8 and 10, respectively, and the mean and median (vertical axis) is plotted. (b) False negative analysis. For each false detection genomic lengths were 67,592 bp and 29,483 bp, respectively. GenRate level (horizontal axis), the fraction of false negative calls among RefSeq detected 11,395 (51%) RefSeq genes, including 8,121 (81%) of the genes (vertical axis) is plotted.
10,000 RefSeq genes most highly expressed in our data.
Despite the high sensitivity of our system, we did not detect a substantial number of CoRegs consisting entirely of exons not 70,913 putative new exons. We next expanded our comparison set to included in any of the databases (Fig. 3a). At an exon FDER of include all mouse and human genes in the RefSeq and Ensembl cDNA 2.7%, only 332 of the CoRegs were entirely new and only 96 of these databases (Fig. 2a). GenRate detected 116,118 (52%) of the exons in did not overlap substantially with RepeatMasker sequence (i.e., these these databases and identified 39,721 putative new exons. We also CoRegs contain less than 10% of exons that map to RepeatMasker expanded our comparison set to include all previous annotations, sequence). On average, 83 CoRegs detected in the randomly permuted including poorly characterized expressed-sequence tags (ESTs) and data consisted entirely of new exons, suggesting that most, if not all, of cDNAs in the FANTOM2 and Unigene databases (Fig. 2).
the 96 new CoRegs not found in RepeatMasker are false detections.
Notably, GenRate detected known exons whose probe signal inten- To confirm this prediction, we tested 35 of them by RT-PCR, using sities were below the median intensity and rejected putative exons primers that bridge putative exons and distinguish spliced transcripts whose probe signals had high intensity but were not coregulated (Supplementary Table 1 online). For 18 of these, we obtained (Fig. 2b). We compared the sensitivity and specificity of GenRate with products of some form after repeated attempts, but sequencing results from a recent study of expression in human liver15. At an FDER confirmed in all cases that the product was aberrant amplification of 2.9%, their system identified 13,889 exon-size transcripts, 4,931 of of either genomic sequence or nontargeted highly expressed which corresponded to previously annotated exons. At an FDER of2.7%, GenRate detected B11 times more exons (155,839) andconfirmed B24 times more previously annotated exons (116,118).
We also estimated accuracy using the fraction of detected exons that map to the reference set of RefSeq genes. Figure 2c shows the accuracy of GenRate as a function of the fraction of RefSeq exons that aredetected (recall), for various sizes of genes. As expected, the recall of GenRate for exons in short genes (o5 exons) was low, because there isless evidence of coregulation. Figure 2c also shows the accuracy versusrecall obtained by applying a threshold to the maximum intensity for All detected exons each probe. For all but high levels of recall, where false detections are Within RefSeq, mapped with RefSeq 30,118 expected to dominate predictions for all methods, GenRate had to EST/cDNA 2,547 substantially higher accuracy than intensity thresholding. We com- pared the accuracy versus recall of our system with the previously reported system15 on the X chromosome. GenRate achieved higher accuracy over a much wider range of recall levels (Fig. 2c) and 3′/5′ RefSeq achieved higher recall levels with a much lower fraction of expected extensions 16,691 with RefSeq 40,791 Figure 4 New exons detected by GenRate and associated with RefSeq Highly expressed detected exons (90th percentile) Golden Path genes are categorized by 3¢ or 5¢ extensions of known genes, bridges that join together known genes, new exons that map to an EST or with RefSeq 2,562 Within RefSeq, mapped cDNA in the FANTOM2 or Unigene database, new exons that can be stitched together with the known gene by a previously detected EST or cDNA, and new exons that do not map to any previously detected sequences. The expected number of false detections is 4,186. This analysis was repeated for new exons that were detected among the probes with maximum signal intensity above the 90th percentile.
3′/5′ RefSeq Among these exons, the fraction of completely new exons decreased Golden Path 24,575 and the fraction of new exons that are confirmed by ESTs or cDNAs with RefSeq 5,314 that overlap with known genes increased.
ADVANCE ONLINE PUBLICATION NATURE GENETICS mRNAs. In contrast, we obtained correct RT-PCR products Pol II transcription21, or simply undegraded fragments of hetero- for 475% of known genes from the same samples in the first attempt geneous nuclear RNA, as more than half of the genome seems to be (ref. 8 and data not shown), indicating that our RT-PCR technique transcribed as pre-mRNA22. Our primary data also include strong was reliable.
signals from many isolated probes (Fig. 1). Our results support the How comprehensive is the set of CoRegs predicted by GenRate? view that most multiple-exon genes expressed in diverse tissues are Figure 3b shows the fraction of RefSeq genes not detected by GenRate already identified, although there are probably thousands of additional (i.e., the rate of false negative calls) for the same numbers of false exons that are not currently annotated. Our study therefore reconciles detections shown in Figure 3a, among the 10,000 RefSeq genes that a discrepancy between previous approaches to gene identification and, were most highly expressed in our data. The low false negative rates in furthermore, extensively revises our knowledge of the exon composi- combination with the lack of a significant number of new CoRegs tion of the mammalian genome.
detected by GenRate provides compelling evidence that almost all themultiple-exon genes with expression in the 37 tissue pools we studied are already known.
Array design. To achieve broad coverage of putative exons, we used liberal We next examined the relationship between exons detected by detection criteria. The numbers of putative exons and of unique putative exons GenRate and 17,577 well-characterized RefSeq Golden Path genes.
detected by each program were as follows: GenScan, 374,540 and 117,849; Figure 4 shows the number of detected exons in each of several HMMGene, 385,759 and 159,523; GrailEXP, 307,911 and 139,906; BlastX,327,746 and 32,869; and BlastN, 642,401 and 272,152. These yielded a total categories, including extensions of RefSeq genes, bridges joining of 1,140,421 unique putative exons. Details of exon detection and probe RefSeq genes, new exons in RefSeq genes that map to cDNA or EST selection are given in Supplementary Methods online. We selected a single databases (FANTOM2, Unigene and Ensembl mouse and human) and Tm-balanced oligonucleotide to represent each exon on the basis of a scoring completely new exons in RefSeq genes. To be especially stringent system that favors unique sequences without secondary structure and a in making predictions, we repeated this analysis for exons detected minimum of simple repeats and homopolymeric runs. Six copies of each of by GenRate whose maximum probe intensities were above the 52 array designs, each containing 21,929 60-mer probes, were manufactured by 90th percentile.
Agilent Technologies. Sequences of probes are available from our project Two lines of evidence indicate that most of the new exons that we website, together with mappings to build 33 of the mouse genome.
identified are valid. First, there is a bias against new exons internal to Tissue pools. We combined the mRNA samples from tissues listed in Table 1 RefSeq genes (Fig. 4), where errors or omissions are least likely, and a and reverse-transcribed them for each of the 52 array designs. Typical cDNA corresponding bias toward new exons flanking cDNAs in Unigene or yields were 25–50% of the amount of mRNA input. Full details of pool FANTOM2, which are most likely to be incomplete. Second, we selection, tissue sources and RNA preparation are given in Supplementary verified new exons by RT-PCR experiments. For example, CoReg Methods online.
BF_C4_2262 (Fig. 1a) is fragmented in current mouse cDNA and ESTdatabases; virtually all the exons in this CoReg are contained in a Varying the sensitivity of GenRate and permutation-based estimates of theexon and CoReg FDERs. We estimated exon and CoReg FDERs as a function single transcript of 411 kb (Supplementary Table 2 and Supple- of the parameters used in the GenRate analysis. GenRate is a deterministic mentary Fig. 1 online), which seems to be the mouse homolog of algorithm with three parameters: the probability y that a probe is at the start of Midasin, the largest gene in yeast17, and which has not been com- a CoReg; the probability that a probe in a CoReg corresponds to an exon; and pletely identified by comparison to its human counterpart (Fig. 1a).
the average number of probes encompassed by a CoReg. The analysis is most More than half of the CoRegs included at least one exon that did not dependent on the first parameter, y, which determines the number of CoRegs map to the best-matching cDNA, and so our analysis provides a that are detected (i.e., the sensitivity of the system) and the FDER. We report revised view of the potential exon composition of mammalian genes.
results as a function of FDER. The analysis is much less sensitive to the other Many of the new exons detected by GenRate reconcile discrepancies two parameters, which we set to 0.7 and 20, respectively, using estimates in current gene databases. For example, GenRate detects 3,266 obtained by mapping known human and mouse genes to our probe set. To non-RefSeq exons that are internal to RefSeq genes and are confirmed estimate the FDER, we applied GenRate to a version of our data in which theprobes were randomly reordered on a chip-by-chip basis (disrupting their by sequences in the FANTOM2 or Unigene databases. There are also order on the chromosome) and repeated this process ten times to obtain an examples where GenRate detects a CoReg that bridges together accurate estimate. By varying y, we obtained exon and CoReg FDERs varying distinct, neighboring cDNAs. Because such a CoReg may correspond from 0.13% to 32% and from 0.2% to 37%, respectively.
to two separate but coregulated genes, we used RT-PCR to confirmthat the longest such example in our data is expressed as a single Mapping known human and mouse genes to our probe set. We compared the transcript (Supplementary Table 2 online).
chromosomal locations of exons in mouse RefSeq Golden Path genes (build 33) The International Human Genome Sequencing Consortium directly with our probe locations, which were mapped to build 33. To includecDNA sequences not on the Golden Path, we used BLAT23 to map cDNA recently estimated that the human genome contains B25,000 sequences in RefSeq24, Ensembl25, FANTOM2 and Unigene26 to the chromo- protein-coding genes (presumably, this is similar for mouse), of somes of build 33 of the mouse genome. To minimize false discovery of genes which most have already been identified by transcript sequencing1.
by cross-hybridization3, we allowed all probes with 19 in 20 contiguous In contrast, previous tiling microarray analyses10–15 focused on the nucleotide matches to a cDNA to be considered a match. With the exception discovery of thousands of new transcripts in intergenic regions, in of Unigene, more than 90% of genes in these five databases were represented by introns and antisense to known genes. Although some of these have probes on our arrays. We also mapped all 33,930 (28,374 known and 5,556 been confirmed by RT-PCR13–15 and, in some cases, distinct molecular ‘novel') of the Ensembl25 human genes in a similar fashion, using E o 10 4 species have been identified by northern blotting, rapid amplification (BLAST) as a cut-off for identity to the array probe.
of cDNA ends or cDNA cloning14,18,19, the function and origin of Comparison with previous results15 on the X chromosome. We mapped all these transcripts is largely unknown. Thousands of putative new mouse exons for which we have probes to the human-mouse homologous transcripts probably evolve at a neutral rate20, suggesting that their regions of the X chromosome, using the two-way human-mouse BlastZ function (if any) is independent of sequence. These transcripts might alignments downloaded from the University of California Santa Cruz in March be ‘cryptic', potentially resulting from incomplete quality control in 2005. All probes that were previously designed for the human X chromosome15 NATURE GENETICS ADVANCE ONLINE PUBLICATION were lined up with the corresponding matched human coordinates in the 2. Schadt, E.E. et al. A comprehensive transcript index of the human genome homologous regions. We constructed an evaluation set of 6,699 putative exons generated using microarrays and computational approaches. Genome Biol. 5, R73(2004).
from aligned sequences that include at least one probe from our system and 3. Hughes, T.R. et al. Expression profiling using microarrays fabricated by an ink-jet one probe from the previously reported system15. We found that 3,447 (52%) oligonucleotide synthesizer. Nat. Biotechnol. 19, 342–347 (2001).
of these map to exons in the reference set of mouse RefSeq genes.
4. Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA.
We normalized both microarray data sets on a chip by chip basis by applying J. Mol. Biol. 268, 78–94 (1997).
5. Krogh, A. Two methods for improving performance of an HMM and their applicatoin for an affine transformation to the probe signals on each chip, so that the median gene finding. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 179–186 (1997).
probe signal and the difference between the 75th percentile and 25th percentile 6. Xu, Y., Mural, R.J. & Uberbacher, E.C. Inferring gene structures in genomic sequences were the same across all chips. To compare the accuracy and recall of GenRate using pattern recognition and expressed sequence tags. Proc. Int. Conf. Intell. Syst.
against those of the previously reported system15, we varied the probe intensity Mol. Biol. 5, 344–353 (1997).
7. Waterston, R.H. et al. Initial sequencing and comparative analysis of the mouse threshold in their method from the 20th percentile to the 90th percentile, genome. Nature 420, 520–562 (2002).
obtaining multiple analyses with different sensitivities. The sensitivity of 8. Zhang, W. et al. The functional landscape of moues gene expression. J. Biol. 3, 21 GenRate was varied as described previously.
9. Karolchik, D. et al. The UCSC genome browser database. Nucleic Acids Res. 31, Comparing GenRate CoRegs with RefSeq genes. A RefSeq gene was con- 51–54 (2003).
sidered to be detected if at least one half or at least five of the exons in the gene 10. Shoemaker, D.D. et al. Experimental annotation of the human genome using micro- array technology. Nature 409, 922–927 (2001).
were detected by GenRate. To determine the set of most highly expressed 11. Stolc, V. et al. Global identification of human transcribed sequences with genome tiling RefSeq genes, we computed a total expression level for every RefSeq gene. To arrays. Science 306, 2242–2246 (2004).
limit the effects of probe sensitivity, we determined the total expression of a 12. Yamada, K. et al. Empricial analysis of transcriptional activity in the Arabidopsis RefSeq gene by counting the number of exons with maximum probe signal genome. Science 302, 842–846 (2003).
13. Kapranov, P. et al. Large-scale transcriptional activity in Chromosomes 21 and 22.
(over the 12 tissue pools) in excess of 20. In a previous study8, this threshold Science 296, 916–919 (2002).
was useful in distinguishing positive signals from negative controls.
14. Rinn, J.L. et al. The transcriptional activity of human Chromosome 22. Genes Dev. 17, 529–540 (2003).
URL. Our project website is available at http://www.psi.toronto.edu/genrate/.
15. Bertone, P. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004).
Accession codes. GEO, GSE3047.
16. Kschischang, F.R., Frey, B.J. & Loeliger, H.A. Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47, 498–519 (2001).
Note: Supplementary information is available on the Nature Genetics website.
17. Garbarino, J.E. & Gibbons, I.R. Expression and genomic analysis of midasin, a novel and highly conserved AAA protein distantly related to dynein. BMC Genomics 3, 18(2002).
18. Okazaki, Y. et al. Analysis of the mouse transcriptome based on functional annotation We thank G.E. Hinton for conversations and C. Boone and B. Andrews for of 60,770 full-length cDNAs. Nature 420, 563–573 (2002).
their support. This work was supported by grants from the Canadian Institutes 19. Cheng, J. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide of Health Research, the Natural Sciences and Engineering Research Council of resolution. Science 308, 1149–1154 (2005).
Canada and the Canadian Foundation for Innovation (to T.R.H., B.J.F. and 20. Wang, J. et al. Mouse transcriptome: Neutral evolution of ‘non-coding' complementary B.J.B.), by a PREA award (to B.J.F.) and by a Natural Sciences and Engineering DNAs (reply). Nature 431, 757 (2004).
Research Council of Canada postdoctoral fellowship (to Q.D.M.).
21. Wyers, F. et al. Cryptic Pol II transcripts are degraded by a nuclear quality control pathway involving a new poly(A) polymerase. Cell 121, 725–737 (2005).
22. Wong, G.K., Passey, D.A. & Yu, J. Most of the human genome is transcribed. Genome COMPETING INTERESTS STATEMENT Res. 11, 1975–1977 (2001).
The authors declare that they have no competing financial interests.
23. Kent, W.J. BLAT - The BLAST-like alignment tool. Genome Res. 12, 656–664 Received 17 June; accepted 28 July 2005 24. Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI Reference Sequence Project: update Published online at http://www.nature.com/naturegenetics/ and current status. Nucleic Acids Res. 31, 34–37 (2003).
25. Hubbard, T. et al. Ensembl 2005. Nucleic Acids Res. 33, D447–D453 (2005).
26. Pontius, J.U., Wagner, L. & Schuler, G.D. Unigene: A unified view of the transcriptome.
1. International Human Genome Sequencing Consortium. Finishing the euchromatic in The NCBI Handbook (National Center for Biotechnology Information, Bethesda, sequence of the human genome. Nature 431, 931–945 (2004).
ADVANCE ONLINE PUBLICATION NATURE GENETICS
Analysis on the Use of Synonymous Adverbs: Maybe, Perhaps, Possibly, Probably, and Likely Analysis on the Use of Synonymous Adverbs: Maybe, Perhaps, Possibly, Probably, and Likely The main objective of the current paper is to provide fuller definitions of five synonymous adverbs that express uncertainty: Maybe, perhaps, possibly, probably, and likely. In order to achieve this goal, 178 examples are collected from both spoken and written corpora and closely examined from semantic, stylistic, pragmatic, and syntactic points of view. The major findings are as follows: Maybe is used frequently in a casual context; perhaps is salient in its pragmatic use, such as hedges, when used in speech; possibly conveys a less degree of likelihood due to its theoretical property; probably frequently occurs with non-human propositions; and likely often accompanies good evidence and is the highest in the likelihood hierarchy. Observations under different perspectives are amalgamated to provide a clearer grasp of each adverb.