Npgrj_ng_1630 1.6
Genome-wide analysis of mouse transcripts using exonmicroarrays and factor graphs
Brendan J Frey1,2,6, Naveed Mohammad2,6, Quaid D Morris1,2,6, Wen Zhang2,3,6, Mark D Robinson1,2,Sanie Mnaimneh2, Richard Chang2, Qun Pan2, Eric Sat4, Janet Rossant3,4, Benoit G Bruneau3,5,Jane E Aubin3, Benjamin J Blencowe2,3 & Timothy R Hughes2,3
Recent mammalian microarray experiments detected
resulting set of putative exons to have broad coverage because we used
widespread transcription and indicated that there may be
low stringency settings for the search algorithms (Supplementary
many undiscovered multiple-exon protein-coding genes. To
Methods online). The resulting set of putative exons was more than
explore this possibility, we labeled cDNA from unamplified,
five times larger than the set of exons in known genes. We analyzed
polyadenylation-selected RNA samples from 37 mouse tissues
data from a previous study8 to select twelve tissue pools, encompass-
to microarrays encompassing 1.14 million exon probes. We
ing 37 different tissue samples (Table 1), in a way that maximizes both
analyzed these data using GenRate, a Bayesian algorithm
differential expression between pools and global activity in every pool
that uses a genome-wide scoring function in a factor graph
(Supplementary Methods online). We analyzed wild-type mouse
to infer genes. At a stringent exon false detection rate of
tissues, rather than cell lines, to ensure that genes contained in the
2.7%, GenRate detected 12,145 gene-length transcripts and
pools were expressed under normal physiological conditions. To
confirmed 81% of the 10,000 most highly expressed known
achieve high fidelity, we hybridized unamplified first-strand fluor-
genes. Notably, our analysis showed that most of the 155,839
labeled cDNA obtained from polyadenylation-purified mRNA primed
exons detected by GenRate were associated with known genes,
with oligo-dT and random nonamers (Supplementary Methods
providing microarray-based evidence that most multiple-exongenes have already been identified. GenRate also detectedtens of thousands of potential new exons and reconciled
Table 1 Compositions of the 12 mRNA pools analyzed
discrepancies in current cDNA databases by ‘stitching' new
Composition (mRNA per array hybridization)
transcribed regions into previously annotated genes.
Heart (2 mg), skeletal muscle (2 mg)
Mammalian genome and transcript sequencing efforts indicate
that most protein-coding genes have already been identified1. But
Whole brain (1.5 mg), cerebellum (0.48 mg), olfactory bulb (0.15 mg)
microarray-based analyses suggest that polyadenylated transcripts are
Colon (0.96 mg), intestine (1.04 mg)
produced from a considerably larger proportion of the genome,
Testis (3 mg), epididymis (0.4 mg)
including regions that are conserved and seem to be noncoding, as
Femur (0.9 mg), knee (0.4 mg), calvaria (0.06 mg), teeth and
well as regions that contain potential coding exons2.
mandible (1.3 mg), teeth (0.4 mg)
To reconcile this discrepancy, we reasoned that much of the
15-d embryo (1.3 mg), 12.5-d embryo (12.5 mg), 9.5-d embryo (0.3 mg),
functional mammalian transcriptome could be rapidly identified
14.5-d embryo head (0.25 mg), embryonic stem cells (0.24 mg)
and characterized by surveying exon expression across multiple
Digit (1.3 mg), tongue (0.6 mg), trachea (0.15 mg)
normal tissues, because most known genes consist of exons and are
Pancreas (1 mg), mammary gland (0.9 mg), adrenal gland (0.25 mg),
expressed at different levels across tissues and developmental states.
prostate gland (0.25 mg)
Salivary gland (1.26
We designed microarrays3 containing 1,140,421 sequences selected
mg), lymph node (0.74 mg)
12.5-d placenta (1.15 mg), 9.5-d placenta (0.5 mg),
from the combined outputs of five exon-finding and gene-like
15-d placenta (0.35 mg)
sequence detection algorithms (GenScan4, HMMGene5, GrailEXP6,
Lung (1 mg), kidney (1 mg), adipose tissue (1 mg), bladder (0.05 mg)
BlastX and BlastN) applied to the mouse genome7. We expected the
1Electrical and Computer Engineering, University of Toronto, 10 King's College Rd., Toronto, Ontario M5S 3G4, Canada. 2Banting and Best Department of MedicalResearch, University of Toronto, 112 College St., Toronto, Ontario M5G 1L6, Canada. 3Medical Genetics and Microbiology, University of Toronto, 1 King's College Ct.,Toronto, Ontario M5S 3G4, Canada. 4Samuel Lunenfeld Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, Ontario M5G 1X5, Canada. 5TheHospital for Sick Children, 555 University Ave., Toronto, Ontario M5G 1X8, Canada. 6These authors contributed equally to this work. Correspondence should beaddressed to T.R.H. (
[email protected]) or B.J.F. (
[email protected]).
Published online 28 August 2005; doi:10.1038/ng1630
NATURE GENETICS ADVANCE ONLINE PUBLICATION
online). This technique generated a data matrix of 1,140,421 exon
particular, because most multiple-exon protein-coding mammalian
expression profiles across the 12 tissue pools. Figure 1a shows a subset
genes vary in expression to some extent across tissues8, a subset of
of the expression data. The data are available from our project website,
similar expression profiles from probes derived from putative
along with an interface (linked to the University of California Santa
exons that are close to each other in the genome can be taken as
Cruz genome browser9) that enables browsing of microarray data, ab
evidence of a functional transcript10. A disadvantage of previous
initio exon predictions, mappings of known genes and genes predicted
applications of this approach is that decisions to link putative
by our analysis.
exons are irreversible. In particular, a decision to assign a probe to a
Detection of transcripts in exon and genome tiling data is influ-
particular transcript removes the probe from further consideration,
enced by cross-hybridization, probe sensitivity, probe noise and
even if another transcript that is better suited to the probe emerges
experimental procedures, among other variables. Previous analyses
later in the analysis.
detected transcripts by applying thresholds to individual signal inten-
To carry out a global analysis of our microarray data, we derived a
sities, correlations of coregulation patterns in multiple samples, the
genome-wide scoring function that describes relationships between
number of consecutive probes that constitute a ‘hit' and genomic
hidden variables and expression profiles. Figure 1b provides a
distances between probes10–15. Substantial increases in detection
graphical depiction of the relationships between n microarray
sensitivity can be gained by analyzing multiple samples jointly. In
probe signals, each containing 12 expression levels, and hidden
NM_027491 NM_019586
Transcription start/stop indicator
Relative index of CoReg prototype
Exon versus nonexon indicator
Probe sensitivity and noise
Probe signals (each with 12 values)
Probes ordered according to position in genome
Figure 1 Example of results and illustration of analysis method. (a) Sample of exon-resolution data and GenRate output from the positive strand ofchromosome 4, map positions 32833512–33300999 (build 33) from left to right. Colored rows indicate the origin of the exon prediction, sequencematches to cDNAs in six databases, the expression data (scaled from minimum to maximum), the maximum log-probe intensity and the GenRate-predictedgenes at 2.7% exon FDER. A change in color of a cDNA database match indicates the beginning of a different transcript. Similar views for the entire dataset are available at our project website. This example shows that GenRate can correctly connect together erroneously disjoined sequences in gene databasesand that coregulation across tissues can be more useful than probe intensity (purple or black track) for detecting genes. HS, human; MM, mouse. (b) Thefactor graph that GenRate uses to find CoRegs in microarray data. Each black box corresponds to a local scoring function that depends on nearby hidden andobserved variables, as indicated.
ADVANCE ONLINE PUBLICATION NATURE GENETICS
Known RefSeq/Ensembl
exons in mouse+human
Probes detected by GenRate
Non–RefSeq probes
GenRate (5 –10)
GenRate (11–30)
Probes not detected by GenRate
log (signal intensity)
Figure 2 GenRate detects exons with high sensitivity and high specificity.
(a) A comparison of exons detected by GenRate with mouse and human exons
in RefSeq and Ensembl, in addition to exons in the FANTOM2 and Unigene
databases. (b) Distributions of maximum probe signal intensities for RefSeq
exons, probes detected by GenRate, non-RefSeq exons and probes not detected
by GenRate. GenRate detects many probes that have low intensity but
correspond to known exons and rejects many probes that have high intensity
but do not correspond to known exons. (c) Accuracy versus recall of GenRate
for various gene size categories (number of exons), and the method of
thresholding the probe intensity. A comparison with a previously reported
system15 (Bertone) using closely corresponding regions of the mouse andhuman genomes (Chromosome X) shows that GenRate achieves higher
accuracy and recall. The portion of each recall level expected to be due to
false detections is indicated for several points on the plots.
variables, including transcription start and stop sites; relative
To validate the reliability of the predictions made by GenRate,
transcript length; a true or false flag for each putative exon; and
we used a permutation test (i.e., randomly reordering the probes) to
the sensitivity, cross-hybridization level and additive noise level
estimate exon and CoReg false detection rates (FDERs), the fraction of
for each probe. This network is formally called a factor graph16, and
detections that are expected to be false. To limit effects of cross-
the global score (probability distribution) is equal to the sum
hybridization noise, we applied GenRate to the 837,251 probes that
(product) of a large number of local scores (probability functions).
map uniquely to build 33 of the mouse genome. By varying GenRate's
Each local score reflects how well neighboring observations and
sensitivity, we obtained exon and CoReg FDERs varying from
hidden variables match. Our technique, called GenRate (generative
0.13% to 32% and from 0.2% to 37%, respectively.
model for finding and rating transcripts), uses the max-product
To test GenRate's ability to recover previously annotated exons, we
algorithm to efficiently find the globally optimum score (B.J.F.,
compared our predictions with exons mapped from human and
Q.D.M. & T.R.H., unpublished results) and identifies sets of probe
mouse genes in six cDNA databases, as well as transcripts detected
signals, called CoRegs, that represent coregulated transcription. By
in a recent human liver microarray analysis15. At a stringent exon
maximizing a global scoring function, GenRate achieves higher
FDER of 2.7%, GenRate detected 155,839 exons (4,186 expected false
sensitivities than standard clustering techniques (Supplementary
detections) comprising 12,145 CoRegs. GenRate detected 64% of the
Methods online).
exons in the 17,577 RefSeq Golden Path mouse genes and identified
NATURE GENETICS ADVANCE ONLINE PUBLICATION
false positives. Intensity thresholding in our data also achieved
substantially higher accuracies than the previously reported system15,
partly because we used a wider selection of tissues.
We next studied all CoRegs detected by GenRate and how they
compared with RefSeq genes. At an exon FDER of 2.7%, GenRate
detected 12,145 CoRegs, of which 412 (3.4%) were expected to be false
Detected CoRegsNew detected CoRegs
detections. Figure 1a shows a sample of the output at this FDER
False negative calls (%)
and shows two general trends: (i) long transcripts, which are the
Number of detected CoRegs
most difficult to clone, could be identified by this approach; and
Number of false detections
Number of false detections
(ii) coregulation of expression among adjacent probes yielded sub-stantially different predicted transcripts than would be identified by
Figure 3 Performance of GenRate on detecting genes. (a) False detection
probe intensity level alone. The mean and median number of exons
analysis. The number of CoRegs identified in the randomized probe data(horizontal axis; average over ten repetitions) and the original probe order
per CoReg were 12.8 and 10, respectively, and the mean and median
(vertical axis) is plotted. (b) False negative analysis. For each false detection
genomic lengths were 67,592 bp and 29,483 bp, respectively. GenRate
level (horizontal axis), the fraction of false negative calls among RefSeq
detected 11,395 (51%) RefSeq genes, including 8,121 (81%) of the
genes (vertical axis) is plotted.
10,000 RefSeq genes most highly expressed in our data.
Despite the high sensitivity of our system, we did not detect a
substantial number of CoRegs consisting entirely of exons not
70,913 putative new exons. We next expanded our comparison set to
included in any of the databases (Fig. 3a). At an exon FDER of
include all mouse and human genes in the RefSeq and Ensembl cDNA
2.7%, only 332 of the CoRegs were entirely new and only 96 of these
databases (Fig. 2a). GenRate detected 116,118 (52%) of the exons in
did not overlap substantially with RepeatMasker sequence (i.e., these
these databases and identified 39,721 putative new exons. We also
CoRegs contain less than 10% of exons that map to RepeatMasker
expanded our comparison set to include all previous annotations,
sequence). On average, 83 CoRegs detected in the randomly permuted
including poorly characterized expressed-sequence tags (ESTs) and
data consisted entirely of new exons, suggesting that most, if not all, of
cDNAs in the FANTOM2 and Unigene databases (Fig. 2).
the 96 new CoRegs not found in RepeatMasker are false detections.
Notably, GenRate detected known exons whose probe signal inten-
To confirm this prediction, we tested 35 of them by RT-PCR, using
sities were below the median intensity and rejected putative exons
primers that bridge putative exons and distinguish spliced transcripts
whose probe signals had high intensity but were not coregulated
(Supplementary Table 1 online). For 18 of these, we obtained
(Fig. 2b). We compared the sensitivity and specificity of GenRate with
products of some form after repeated attempts, but sequencing
results from a recent study of expression in human liver15. At an FDER
confirmed in all cases that the product was aberrant amplification
of 2.9%, their system identified 13,889 exon-size transcripts, 4,931 of
of either genomic sequence or nontargeted highly expressed
which corresponded to previously annotated exons. At an FDER of2.7%, GenRate detected B11 times more exons (155,839) andconfirmed B24 times more previously annotated exons (116,118).
We also estimated accuracy using the fraction of detected exons that
map to the reference set of RefSeq genes. Figure 2c shows the accuracy
of GenRate as a function of the fraction of RefSeq exons that aredetected (recall), for various sizes of genes. As expected, the recall of
GenRate for exons in short genes (o5 exons) was low, because there isless evidence of coregulation. Figure 2c also shows the accuracy versusrecall obtained by applying a threshold to the maximum intensity for
All detected exons
each probe. For all but high levels of recall, where false detections are
Within RefSeq, mapped
with RefSeq 30,118
expected to dominate predictions for all methods, GenRate had
to EST/cDNA 2,547
substantially higher accuracy than intensity thresholding. We com-
pared the accuracy versus recall of our system with the previously
reported system15 on the X chromosome. GenRate achieved higher
accuracy over a much wider range of recall levels (Fig. 2c) and
3′/5′ RefSeq
achieved higher recall levels with a much lower fraction of expected
extensions 16,691
with RefSeq 40,791
Figure 4 New exons detected by GenRate and associated with RefSeq
Highly expressed detected exons (90th percentile)
Golden Path genes are categorized by 3¢ or 5¢ extensions of known genes,
bridges that join together known genes, new exons that map to an EST or
with RefSeq 2,562
Within RefSeq, mapped
cDNA in the FANTOM2 or Unigene database, new exons that can be
stitched together with the known gene by a previously detected EST or
cDNA, and new exons that do not map to any previously detected
sequences. The expected number of false detections is 4,186. This
analysis was repeated for new exons that were detected among the
probes with maximum signal intensity above the 90th percentile.
3′/5′ RefSeq
Among these exons, the fraction of completely new exons decreased
Golden Path 24,575
and the fraction of new exons that are confirmed by ESTs or cDNAs
with RefSeq 5,314
that overlap with known genes increased.
ADVANCE ONLINE PUBLICATION NATURE GENETICS
mRNAs. In contrast, we obtained correct RT-PCR products
Pol II transcription21, or simply undegraded fragments of hetero-
for 475% of known genes from the same samples in the first attempt
geneous nuclear RNA, as more than half of the genome seems to be
(ref. 8 and data not shown), indicating that our RT-PCR technique
transcribed as pre-mRNA22. Our primary data also include strong
was reliable.
signals from many isolated probes (Fig. 1). Our results support the
How comprehensive is the set of CoRegs predicted by GenRate?
view that most multiple-exon genes expressed in diverse tissues are
Figure 3b shows the fraction of RefSeq genes not detected by GenRate
already identified, although there are probably thousands of additional
(i.e., the rate of false negative calls) for the same numbers of false
exons that are not currently annotated. Our study therefore reconciles
detections shown in Figure 3a, among the 10,000 RefSeq genes that
a discrepancy between previous approaches to gene identification and,
were most highly expressed in our data. The low false negative rates in
furthermore, extensively revises our knowledge of the exon composi-
combination with the lack of a significant number of new CoRegs
tion of the mammalian genome.
detected by GenRate provides compelling evidence that almost all themultiple-exon genes with expression in the 37 tissue pools we studied
are already known.
Array design. To achieve broad coverage of putative exons, we used liberal
We next examined the relationship between exons detected by
detection criteria. The numbers of putative exons and of unique putative exons
GenRate and 17,577 well-characterized RefSeq Golden Path genes.
detected by each program were as follows: GenScan, 374,540 and 117,849;
Figure 4 shows the number of detected exons in each of several
HMMGene, 385,759 and 159,523; GrailEXP, 307,911 and 139,906; BlastX,327,746 and 32,869; and BlastN, 642,401 and 272,152. These yielded a total
categories, including extensions of RefSeq genes, bridges joining
of 1,140,421 unique putative exons. Details of exon detection and probe
RefSeq genes, new exons in RefSeq genes that map to cDNA or EST
selection are given in Supplementary Methods online. We selected a single
databases (FANTOM2, Unigene and Ensembl mouse and human) and
Tm-balanced oligonucleotide to represent each exon on the basis of a scoring
completely new exons in RefSeq genes. To be especially stringent
system that favors unique sequences without secondary structure and a
in making predictions, we repeated this analysis for exons detected
minimum of simple repeats and homopolymeric runs. Six copies of each of
by GenRate whose maximum probe intensities were above the
52 array designs, each containing 21,929 60-mer probes, were manufactured by
90th percentile.
Agilent Technologies. Sequences of probes are available from our project
Two lines of evidence indicate that most of the new exons that we
website, together with mappings to build 33 of the mouse genome.
identified are valid. First, there is a bias against new exons internal to
Tissue pools. We combined the mRNA samples from tissues listed in Table 1
RefSeq genes (Fig. 4), where errors or omissions are least likely, and a
and reverse-transcribed them for each of the 52 array designs. Typical cDNA
corresponding bias toward new exons flanking cDNAs in Unigene or
yields were 25–50% of the amount of mRNA input. Full details of pool
FANTOM2, which are most likely to be incomplete. Second, we
selection, tissue sources and RNA preparation are given in Supplementary
verified new exons by RT-PCR experiments. For example, CoReg
Methods online.
BF_C4_2262 (Fig. 1a) is fragmented in current mouse cDNA and ESTdatabases; virtually all the exons in this CoReg are contained in a
Varying the sensitivity of GenRate and permutation-based estimates of theexon and CoReg FDERs. We estimated exon and CoReg FDERs as a function
single transcript of 411 kb (Supplementary Table 2 and Supple-
of the parameters used in the GenRate analysis. GenRate is a deterministic
mentary Fig. 1 online), which seems to be the mouse homolog of
algorithm with three parameters: the probability y that a probe is at the start of
Midasin, the largest gene in yeast17, and which has not been com-
a CoReg; the probability that a probe in a CoReg corresponds to an exon; and
pletely identified by comparison to its human counterpart (Fig. 1a).
the average number of probes encompassed by a CoReg. The analysis is most
More than half of the CoRegs included at least one exon that did not
dependent on the first parameter, y, which determines the number of CoRegs
map to the best-matching cDNA, and so our analysis provides a
that are detected (i.e., the sensitivity of the system) and the FDER. We report
revised view of the potential exon composition of mammalian genes.
results as a function of FDER. The analysis is much less sensitive to the other
Many of the new exons detected by GenRate reconcile discrepancies
two parameters, which we set to 0.7 and 20, respectively, using estimates
in current gene databases. For example, GenRate detects 3,266
obtained by mapping known human and mouse genes to our probe set. To
non-RefSeq exons that are internal to RefSeq genes and are confirmed
estimate the FDER, we applied GenRate to a version of our data in which theprobes were randomly reordered on a chip-by-chip basis (disrupting their
by sequences in the FANTOM2 or Unigene databases. There are also
order on the chromosome) and repeated this process ten times to obtain an
examples where GenRate detects a CoReg that bridges together
accurate estimate. By varying y, we obtained exon and CoReg FDERs varying
distinct, neighboring cDNAs. Because such a CoReg may correspond
from 0.13% to 32% and from 0.2% to 37%, respectively.
to two separate but coregulated genes, we used RT-PCR to confirmthat the longest such example in our data is expressed as a single
Mapping known human and mouse genes to our probe set. We compared the
transcript (Supplementary Table 2 online).
chromosomal locations of exons in mouse RefSeq Golden Path genes (build 33)
The International Human Genome Sequencing Consortium
directly with our probe locations, which were mapped to build 33. To includecDNA sequences not on the Golden Path, we used BLAT23 to map cDNA
recently estimated that the human genome contains B25,000
sequences in RefSeq24, Ensembl25, FANTOM2 and Unigene26 to the chromo-
protein-coding genes (presumably, this is similar for mouse), of
somes of build 33 of the mouse genome. To minimize false discovery of genes
which most have already been identified by transcript sequencing1.
by cross-hybridization3, we allowed all probes with 19 in 20 contiguous
In contrast, previous tiling microarray analyses10–15 focused on the
nucleotide matches to a cDNA to be considered a match. With the exception
discovery of thousands of new transcripts in intergenic regions, in
of Unigene, more than 90% of genes in these five databases were represented by
introns and antisense to known genes. Although some of these have
probes on our arrays. We also mapped all 33,930 (28,374 known and 5,556
been confirmed by RT-PCR13–15 and, in some cases, distinct molecular
‘novel') of the Ensembl25 human genes in a similar fashion, using E o 10 4
species have been identified by northern blotting, rapid amplification
(BLAST) as a cut-off for identity to the array probe.
of cDNA ends or cDNA cloning14,18,19, the function and origin of
Comparison with previous results15 on the X chromosome. We mapped all
these transcripts is largely unknown. Thousands of putative new
mouse exons for which we have probes to the human-mouse homologous
transcripts probably evolve at a neutral rate20, suggesting that their
regions of the X chromosome, using the two-way human-mouse BlastZ
function (if any) is independent of sequence. These transcripts might
alignments downloaded from the University of California Santa Cruz in March
be ‘cryptic', potentially resulting from incomplete quality control in
2005. All probes that were previously designed for the human X chromosome15
NATURE GENETICS ADVANCE ONLINE PUBLICATION
were lined up with the corresponding matched human coordinates in the
2. Schadt, E.E. et al. A comprehensive transcript index of the human genome
homologous regions. We constructed an evaluation set of 6,699 putative exons
generated using microarrays and computational approaches. Genome Biol. 5, R73(2004).
from aligned sequences that include at least one probe from our system and
3. Hughes, T.R. et al. Expression profiling using microarrays fabricated by an ink-jet
one probe from the previously reported system15. We found that 3,447 (52%)
oligonucleotide synthesizer. Nat. Biotechnol. 19, 342–347 (2001).
of these map to exons in the reference set of mouse RefSeq genes.
4. Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA.
We normalized both microarray data sets on a chip by chip basis by applying
J. Mol. Biol. 268, 78–94 (1997).
5. Krogh, A. Two methods for improving performance of an HMM and their applicatoin for
an affine transformation to the probe signals on each chip, so that the median
gene finding. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 179–186 (1997).
probe signal and the difference between the 75th percentile and 25th percentile
6. Xu, Y., Mural, R.J. & Uberbacher, E.C. Inferring gene structures in genomic sequences
were the same across all chips. To compare the accuracy and recall of GenRate
using pattern recognition and expressed sequence tags. Proc. Int. Conf. Intell. Syst.
against those of the previously reported system15, we varied the probe intensity
Mol. Biol. 5, 344–353 (1997).
7. Waterston, R.H. et al. Initial sequencing and comparative analysis of the mouse
threshold in their method from the 20th percentile to the 90th percentile,
genome. Nature 420, 520–562 (2002).
obtaining multiple analyses with different sensitivities. The sensitivity of
8. Zhang, W. et al. The functional landscape of moues gene expression. J. Biol. 3, 21
GenRate was varied as described previously.
9. Karolchik, D. et al. The UCSC genome browser database. Nucleic Acids Res. 31,
Comparing GenRate CoRegs with RefSeq genes. A RefSeq gene was con-
51–54 (2003).
sidered to be detected if at least one half or at least five of the exons in the gene
10. Shoemaker, D.D. et al. Experimental annotation of the human genome using micro-
array technology. Nature 409, 922–927 (2001).
were detected by GenRate. To determine the set of most highly expressed
11. Stolc, V. et al. Global identification of human transcribed sequences with genome tiling
RefSeq genes, we computed a total expression level for every RefSeq gene. To
arrays. Science 306, 2242–2246 (2004).
limit the effects of probe sensitivity, we determined the total expression of a
12. Yamada, K. et al. Empricial analysis of transcriptional activity in the Arabidopsis
RefSeq gene by counting the number of exons with maximum probe signal
genome. Science 302, 842–846 (2003).
13. Kapranov, P. et al. Large-scale transcriptional activity in Chromosomes 21 and 22.
(over the 12 tissue pools) in excess of 20. In a previous study8, this threshold
Science 296, 916–919 (2002).
was useful in distinguishing positive signals from negative controls.
14. Rinn, J.L. et al. The transcriptional activity of human Chromosome 22. Genes Dev. 17,
529–540 (2003).
URL. Our project website is available at http://www.psi.toronto.edu/genrate/.
15. Bertone, P. et al. Global identification of human transcribed sequences with genome
tiling arrays. Science 306, 2242–2246 (2004).
Accession codes. GEO, GSE3047.
16. Kschischang, F.R., Frey, B.J. & Loeliger, H.A. Factor graphs and the sum-product
algorithm. IEEE Trans. Inf. Theory 47, 498–519 (2001).
Note: Supplementary information is available on the Nature Genetics website.
17. Garbarino, J.E. & Gibbons, I.R. Expression and genomic analysis of midasin, a novel
and highly conserved AAA protein distantly related to dynein. BMC Genomics 3, 18(2002).
18. Okazaki, Y. et al. Analysis of the mouse transcriptome based on functional annotation
We thank G.E. Hinton for conversations and C. Boone and B. Andrews for
of 60,770 full-length cDNAs. Nature 420, 563–573 (2002).
their support. This work was supported by grants from the Canadian Institutes
19. Cheng, J. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide
of Health Research, the Natural Sciences and Engineering Research Council of
resolution. Science 308, 1149–1154 (2005).
Canada and the Canadian Foundation for Innovation (to T.R.H., B.J.F. and
20. Wang, J. et al. Mouse transcriptome: Neutral evolution of ‘non-coding' complementary
B.J.B.), by a PREA award (to B.J.F.) and by a Natural Sciences and Engineering
DNAs (reply). Nature 431, 757 (2004).
Research Council of Canada postdoctoral fellowship (to Q.D.M.).
21. Wyers, F. et al. Cryptic Pol II transcripts are degraded by a nuclear quality control
pathway involving a new poly(A) polymerase. Cell 121, 725–737 (2005).
22. Wong, G.K., Passey, D.A. & Yu, J. Most of the human genome is transcribed. Genome
COMPETING INTERESTS STATEMENT
Res. 11, 1975–1977 (2001).
The authors declare that they have no competing financial interests.
23. Kent, W.J. BLAT - The BLAST-like alignment tool. Genome Res. 12, 656–664
Received 17 June; accepted 28 July 2005
24. Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI Reference Sequence Project: update
Published online at http://www.nature.com/naturegenetics/
and current status. Nucleic Acids Res. 31, 34–37 (2003).
25. Hubbard, T. et al. Ensembl 2005. Nucleic Acids Res. 33, D447–D453 (2005).
26. Pontius, J.U., Wagner, L. & Schuler, G.D. Unigene: A unified view of the transcriptome.
1. International Human Genome Sequencing Consortium. Finishing the euchromatic
in The NCBI Handbook (National Center for Biotechnology Information, Bethesda,
sequence of the human genome. Nature 431, 931–945 (2004).
ADVANCE ONLINE PUBLICATION NATURE GENETICS
Source: http://genes.toronto.edu/genrate/FreyNG2005.pdf
11700 • The Journal of Neuroscience, October 24, 2007 • 27(43):11700 –11711 Neurobiology of Disease Cannabinoids Elicit Antidepressant-Like Behavior andActivate Serotonergic Neurons through the MedialPrefrontal Cortex Francis Rodriguez Bambico,1 Noam Katz,1,2 Guy Debonnel,1† and Gabriella Gobbi1,21Neurobiological Psychiatry Unit, Department of Psychiatry, McGill University, Montre´al, Quebec, Canada H3A 1A1, and 2Department of Psychiatry, Centrede Recherche Fernand Seguin, Hoˆpital L.H. Lafontaine, Universite´ de Montre´al, Quebec, Canada H1N 3V2
Analysis on the Use of Synonymous Adverbs: Maybe, Perhaps, Possibly, Probably, and Likely Analysis on the Use of Synonymous Adverbs: Maybe, Perhaps, Possibly, Probably, and Likely The main objective of the current paper is to provide fuller definitions of five synonymous adverbs that express uncertainty: Maybe, perhaps, possibly, probably, and likely. In order to achieve this goal, 178 examples are collected from both spoken and written corpora and closely examined from semantic, stylistic, pragmatic, and syntactic points of view. The major findings are as follows: Maybe is used frequently in a casual context; perhaps is salient in its pragmatic use, such as hedges, when used in speech; possibly conveys a less degree of likelihood due to its theoretical property; probably frequently occurs with non-human propositions; and likely often accompanies good evidence and is the highest in the likelihood hierarchy. Observations under different perspectives are amalgamated to provide a clearer grasp of each adverb.