Doi:10.1016/j.jphysparis.2005.12.084
Journal of Physiology - Paris 99 (2006) 232–244
Development and virtual screening of target libraries
Bioinformatics of the Drug, CNRS, UMR 7175, 74 route du Rhin, F-67400 Illkirch, France
The concomitant development of in silico screening technologies and of three-dimensional information on therapeutically relevant
macromolecular targets makes it possible to navigate in the structural proteome and to identify targets fulfilling user-defined queries.
This review illustrates some in-house recent advances in the development of target libraries and how they can be browsed to unravelchemogenomic information.
2005 Elsevier Ltd. All rights reserved.
Keywords: Virtual screening; Docking; Chemogenomics
gets and whenever possible relevant ligand binding sites,(2) predict the most likely target(s) of a given ligand, (3)
Virtual screening of compound libraries (
predict a selectivity profile for either a target or a ligand,
) has recently gained considerable importance in early
(4) predict the ‘druggability' of a given target from a struc-
hit finding programs, notably when technological or eco-
tural point of view. All these issues require early answers in
nomic hurdles disfavor experimental screening. Numerous
the evaluation of drug discovery programs. We will try to
successful applications of either ligand-based
review each of these applications in the coming sections.
) or structure-based () in silicoscreening have been reported in the literature. Quite unex-
2. Setting up target libraries
pectedly, the inverse paradigm still has not been deeplyinvestigated. Given a set of ligands, is it possible to prior-
When developing a target library, a first compromise
itize their most likely targets for experimental validation?
between available information (notably at the structural
Answering this question first requires the development of
level) and the therapeutical relevance of selected targets
a library covering the most reliable target space (
has to be made. Many proteins for which fine structural
By target library, we mean here a col-
details are known (e.g. toxins, antibodies) are not ‘drugga-
lection of macromolecules for which either the amino acid
ble'. Conversely, some important protein families for the
sequence and/or three-dimensional (3-D) coordinates are
pharmaceutical industry (e.g. G-protein-coupled receptors)
available and can be browsed using simple queries. Then,
are poorly understood at the 3-D level. Next, a scope has to
an appropriate screening method has to be set up which
be assigned to the library. Which target space has to be cov-
is able to select a panel of targets fulfilling requirements
ered? Last, which kind of data (amino acid sequences, 3-D
imposed by either a ligand structure or a specific fingerprint
atomic coordinates) is browsed for defining a target list?
) or an evolutionary trace
2.1. sc-PDB: a collection of active sites from the Protein
Once a target library has been developed, sev-
eral applications can be foreseen: (1) simply compare tar-
2.1.1. Setting up the database
To establish the proof-of-concept that a protein library
Fax: +33 3 90 24 42 35.
E-mail address:
might be of screening interest, we have chosen the Protein
0928-4257/$ - see front matter 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.jphysparis.2005.12.084
D. Rognan / Journal of Physiology - Paris 99 (2006) 232–244
Fig. 1. Flowchart for developing the sc-PDB databank ).
Data Bank (PDB) ) as it is the major
tein active sites customized to accommodate small molecu-
3-D protein database for which experimentally determined
lar-weight ‘drug-like' ligands. Generally, no differences
protein coordinates are available. Several protein–ligand
between solvent, detergent, co-factors and ligands (in the
databases derived from the PDB have been recently
pharmaceutical sense) are made in the above-mentioned
databases. To fill this gap, we recently developed a rela-
tional database (sc-PDB) (specifically
customized for screening purposes ().
easily allows retrieving protein–ligand com-
Starting from 27,000 PDB entries, a series of hierarchi-
plexes from a user-defined query focusing on specific
cal filters has been applied to constitute the database as
molecular interactions. MSDsite () is
a database search and retrieval system for listing PDBentries fulfilling user-defined queries based on ligand
• removal of undesirable entries: low resolution (>2.5 A
information. The LPDB ) stores 195
X-ray structures, NMR structures;
high-resolution protein–ligand complexes and related
• on-the fly detection of the molecule to which each refer-
physicochemical descriptors as well as binding constants.
enced PDB atom belongs to (target, organic ligand, pep-
Its main purpose, as well as related protein–ligand data-
tide ligand, co-factor, ion, solvent, detergent) thanks to
sets ) is to pro-
knowledge-based rules and preexisting lists of ‘HET'
vide reliable 3-D information for calibrating docking
codes defined in the PDBsum database
algorithms and scoring functions. The ProLINT database
(contains about 20,000 interaction
• removal of undesirable small molecular-weight ligands
data for two protein families (kinases, proteases) with
(solvent, detergents, ions and co-factors exhibiting atom
attached information about the ligand, the protein, exper-
types not recognized by classical docking algorithms);
imental binding constants and published literature. It has
• definition of putative ligands (organic or peptidic mole-
been used to derive structure–activity relationships and
cules, co-factor if present alone);
predict binding constants. LigBase )
• definition of the binding site for each ligand (collection
is a database of ligand binding sites aligned with related
of amino acids for which any heavy atom is closer than
protein structures and sequences containing 50,000 bind-
˚ from any ligand atom);
ing sites for heterogeneous ligands (ions, solvent, co-fac-
• prioritization of a single ligand/active site for each PDB
tors, inhibitors, etc.).
entry by calculating the buried surface area of the ligand
However, none of the above-mentioned databases are
and of the site, and selecting the ligand/site pair for
directly usable to generate a collection of ‘druggable' pro-
which the percentage of burial is the highest;
D. Rognan / Journal of Physiology - Paris 99 (2006) 232–244
• storage, for each selected PDB entry, of 3-D atomic
great interest for application like virtual screening. Indeed,
coordinates in readable PDB format (target, active site)
conformational differences between several copies of an
and SD/MOL2 formats (ligand, co-factors, ions).
active site reflect the local protein flexibility.
2.2. hGPCRDb: a collection of human non-olfactory
2.1.2. Annotating the database
The current version of the sc-PDB database contains
5947 ligand-binding sites for 2626 small molecules; In total,
2.2.1. Setting up the database
the database refers to 5947 PDB entries. We assigned a
G Protein-Coupled Receptors (GPCRs) constitute a
unique UniProt (accession number
superfamily of membrane receptors of outmost importance
to each protein, thereby identifying 1628 different proteins
in pharmaceutical research ().
in the database. Additional information collected from
Hence, GPCRs are the macromolecular targets of ca.
both UniProt and PDB databanks was collected to obtain
30% of marketed drugs (). The first draft
the source organism and the biological function of each
of the human genome suggests that over 800 genes encode
protein. A functional classification of the database entries
for a GPCR out of which only a few
is shown in Entries were separated into two super-
(ca. 30) are currently addressed by marketed drugs. If
families, namely enzymatic and non-enzymatic proteins.
one excludes the family of sensory receptors, about 400
Out of the 5947 different entries of the database, ca. 85%
GPCRs are potentially ‘druggable' with ca. 120 proteins
are enzymes with a well-referenced EC (Enzyme Commis-
being still considered as orphan targets (
sion) number The distribution of enzyme
Traditionally, the first stage in the design of GPCR
families displayed in reveals that the most populated
ligands has focused on the potency of the ligands for the
family is that of hydrolases (35% of the enzymes). This is
selected receptor target. Selectivity towards the host recep-
correlated to the high number of proteases in the sc-PDB
tor is usually considered once potency has already been
database. B gives an overview of the redundancy of
reached. It would however be highly desirable to consider
current database entries. In most cases, less than 10 copies
selectivity as soon as possible in the design process. Ideally,
of an active site corresponding to a given protein are avail-
one would like to consider the GPCR universe for design-
able in the database. The uneven protein entries distribu-
ing a ligand with the desired selectivity profile. As address-
tion, which reflects the intrinsic PDB redundancy, is of
ing this issue by high-throughput screening is currentlyimpossible, ‘in silico' screening could provide a reasonablestart. Indeed the recently described 2.8 A
˚ -resolution X-ray
structure of bovine rhodopsin pro-vides a possible 3-D template for modeling other GPCRs.
Recent reports unambiguously demonstrated that rhodop-sin-based GPCR homology models are accurate enough topropose reliable 3-D models of receptors very differentfrom bovine rhodopsin ) and to identify new ligands by structure-basedvirtual screening ). Ofcourse, using classical homology modeling to establish a3-D target library including ca. 400 reliable 3-D models isnot possible. We therefore designed a chemoinformatictool (GPCRMod) specifically dedicated to high-through-put GPCR modeling (). From the verybeginning, several considerations were taken in the designof the code: (i) the target library should cover all humannon-olfactory GPCRs, (ii) a reliable multiple alignmentof all investigated GPCRs should be obtained at theseven-transmembrane (7-TM) domain only, acknowledg-ing that high-throughput modeling of intra- and extra-cel-lular loops is not feasible, (iii) the 7-TM binding cavity ofevery 3-D model should not be biased by the X-ray struc-ture of bovine rhodopsin.
In a first step, 372 human GPCR amino acid sequences
were aligned at the 7-TM by browsing the target sequence
Fig. 2. sc-PDB content (release 3, March 2005): (A) distribution of
for family-specific fingerprints and motifs (
enzymes and non-enzymes; (B) observed redundancy.
Then, alignments were converted into 3-D
D. Rognan / Journal of Physiology - Paris 99 (2006) 232–244
Fig. 3. Multiple alignment flowchart in GPCRMod.
model using a comparative modeling tool that uses a set of
nal binding site in bovine rhodopsin, were extracted from
ligand-biased GPCR models as main chain templates, and
all entries and concatenated into ungapped sequences out
two rotamer libraries for side chain positioning (). A
of which a phylogenetic tree could be derived using the
key point of the modeling procedure is that 7-TM cavities
standard UPGMA clustering method (
are modeled starting from templates which prove useful to
discriminate known ligands from decoys. Resulting 3-D
Twenty two clusters could be unambiguously detected
models are qualitatively quite similar to those obtained
from the present analysis of 30 amino acid positions
by ligand-assisted comparative modeling
These clusters were defined in order to encompass
) but obtained at through-
the maximum number of related entries within a branch
put allowing the fast generation of hundreds of targets.
characterized by the highest possible statistical bootstrapvalue. Thirty four out of 372 entries could not be assigned
2.2.2. Annotation of the hGPCRdb
to one of the existing 22 clusters and are defined as single-
Assuming that similar targets recognize similar ligands,
tons. The herein presented tree is very similar to the most
an accurate annotation of all entries should consider simi-
complete phylogenetic tree (GRAFS classification) known
larities/differences at their binding cavity. As most small
to date although the latter has
molecular-weight ligands probably bind to the 7-TM core,
been obtained from full TM sequences. In both classifica-
all GPCR entries have been annotated using a chemoge-
tions, GPCRs of the Frizzled, Glutamate, Secretin and
nomic procedure considering a fingerprint characterizing
Adhesion families cluster in well-separated groups whereas
their 7-TM binding cavity. Thirty positions lining the reti-
the large Rhodopsin family can be classified into 18
Fig. 4. 3-D model generation flowchart in GPCRMod.
D. Rognan / Journal of Physiology - Paris 99 (2006) 232–244
Fig. 5. Two-step protocol to generate a TM cavity-driven phylogenetic tree: (1) selection of 30 critical positions, (2) definition of ungapped sequencesdescribing the 7-TM cavity, (3) TM cavity-derived phylogenetic tree for 372 human GPCRs. The consensus tree was derived from 1000 replicas usingamino acid identity within a set of 30 discontinuous positions to measure protein distances. Numbers in commas indicate the number of entries in eachcluster. Numbers in italic represent bootstrap values to assess the statistical significance of the tree. Receptors classified as singletons (see text) are notdisplayed here for sake of clarity. Glutamate, Rhodospin, Adhesion, Frizzled and Secretin subfamilies are colored in green, cyan, yellow, pink and orange,respectively.
D. Rognan / Journal of Physiology - Paris 99 (2006) 232–244
different clusters. Remarkably, all known GPCR subfami-lies (e.g. receptors for biogenic amines, purines, and che-mokines) are reproduced with high bootstrap support.
The five main families (Glutamate, Rhodopsin, Adhesion,Frizzled, Secretin) reported in the GRAFS classificationare recovered with no overlaps between the correspondingclusters with the single exception of Q9GZN0 (GPR88), arhodopsin-like GPCR clustered with class III GPCRs.
Interestingly, receptors for which the orthosteric bindingsite is not located in the TM domain (Adhesion, Secretinand Glutamate families) are nevertheless grouped into
Fig. 6. Target library screening flowchart.
homogeneous clusters. Relating cluster members to precisemolecular features is here greatly facilitated by the analysisof a small subset of amino acids. For each of the 22 clus-
3.1. 1-D screening
ters, there is often a clear relationship between knownligand chemotypes (e.g. amines, carboxylic acids, phos-
Simple 1-D screening is less precise than 3-D screening
phates, peptides, eicosanoids, and lipids) and the cognate
but also less sensitive to errors. When applied entire target
TM cavities. For example, receptors for bulky ligands
families (e.g. GPCRs, kinases), its accuracy only depends
(e.g. phospholipids, prostanoids) have a TM cavity signifi-
on the quality of the sequence alignment which is generally
cantly larger than that for smaller compounds (e.g. bio-
much higher that of 3-D structural models. Assuming that
genic amines, nucleotides). Receptors for charged ligands
similar ligands should bind to similar cavities, browsing a
(cationic amines, phosphates, mono and di-carboxylic
database of sequence alignments can easily provide access
acids) always present among the 30 critical residues one
to reliable information if specific fingerprints are already
or more conserved amino acid exhibiting the opposite
known. Three possible applications of 1-D screening of a
charge (e.g. Asp3.32 for biogenic amines; Asp4.60/Glu7.39
GPCR target library are presented here.
for chemokines; Arg3.29/Lys6.55/Arg7.35 for nucleotides).
Our clustering approach implies two assumptions: (i) the
3.1.1. Searching for orthologs/paralogs
overall fold of the 7-TM domain around the binding cavity
The amino acid sequence of GPCRs is extremely vari-
has been conserved along evolution; (ii) critical hotspots
able in length (from 290 to 6300 residues for human
spread over the 7-TM domain repeatedly account for
GPCRs) notably at extra- and intra-cellular loops. Relying
ligand binding. Although solid biostructural data for the
receptor comparisons on full sequence alignment may thus
three most important GPCR classes (class I, class II, class
be quite misleading. Comparing the above-defined TM
III) are missing, numerous experimental do provide evi-
cavity-lining residues is much more appropriate. For any
dence for data in favor of strong similarities among many
GPCR target of interest, these 30 residues can be identified
GPCRs: (i) residues known to affect small molecular-
quite unambiguously at least for rhodopsin-like GPCRs as
weight ligand binding to unrelated GPCRs are mostly
several class specific TM fingerprints previously identified
spread among the herein selected 30 residues suggesting a
in this family of receptors can guide the sequence alignment
common architecture of the TM pocket, (ii) many known
ligands are promiscuous for even unrelated GPCRs and
As an example, we have been looking for the human ortho-
are usually anchored through so-called privileged struc-
log(s) of a gene product from C. elegans (Y22D7AR_13) in
tures to common subpockets of different GPCRs (
order to predict the functional role of this presumed GPCR.
). Of course, we are aware that
Blasting its full amino acid sequence against human GPCRs
class II and class III GPCRs exhibit an additional
leads to ambiguous conclusions because the level of
orthosteric site located outside the 7-TM bundle. There-
sequence identity with the closest human GPCRs is low (usu-
fore, conclusions drawn herein only apply to the 7-TM
ally in the 15–30% range) and that several candidates are
binding site.
possible ). Looking at local sequence identity withina set of 30 TM cavity-lining residues provides an answer that
3. Screening target libraries
is easier to interpret because the sequence identities with theinput query are much higher (above 70% for the first three 5-
Provided that a target library has been set up, two
TH receptors, Since 7 out of the top 10 ranked can-
screening methods are possible (In a 1-D screening,
didates were 5-HT receptors, the C. elegans gene product
a query enclosing amino acid sequence information (e.g.
was predicted to be a receptor for serotonin, which was
fingerprint) is used to parse family-specific alignments in
further experimentally validated (Segalat, personal commu-
order to retrieve interesting targets. In a 3-D screening,
nication). The proposed approach has the merit to be extre-
the 3-D structure of either a ligand or a known active site
mely fast (a few ms) but requires the a priori identification of
is used to browse 3-D structures or homology models. Both
the 7-TMs and a good sequence alignment of the latter
applications will be detailed in the following section.
domain. Therefore, the presence of TM fingerprints (usually
D. Rognan / Journal of Physiology - Paris 99 (2006) 232–244
Table 1Searching for the 10 closest human orthologs of the C. elegans Y22D7AR_13 gene product
Full sequence blast
Sequence identity, %
Sequence identity, %
a Sequence comparison achieved using standard settings of the BLASTP program ().
b Sequence comparison achieved using our in-house GPCRfind program
present in nearly all entries)
who identified ligands for the CRTh2 (GPR44) recep-
in the input query is a prerequisite.
tor by evaluating angiotensin 2 receptor (AG2R, AG2S)ligands, the corresponding targets being close when consid-
3.1.2. Computer-guided target deorphanization
ering the 7-TM cavity ().
A TM-cavity biased phylogenetic tree offers the oppor-
tunity to navigate in target space without the necessity to
3.1.3. Matching target space with ligand space
rely on questionable 3-D structures. Receptors close in tar-
GPCR ligands sharing a common privileged structure
get space can be expected to bind ligands close in chemical
and exhibiting promiscuous binding to unrelated GPCRs
space. Known GPCR ligands are thus a good starting point
are a current important source for GPCR library design.
to start deorphanizing receptors predicted to be close
Assuming that conserved moieties of the ligands are likely
enough to liganded receptors ).
to bind to conserved subsites of the targets
For example, focusing our cavity-based tree on two
), matching privileged structures with TM hot-
related orphan receptors (GPR19, GPR83) predicts a sig-
spots can be achieved very easily without biasing the match
by a manual or automated 3-D docking.
(NK1R, NK2R, NK3R; Likewise, GPR54 is pre-
As an example, biphenyltetrazoles and biphenylcarb-
dicted to be close to three galanine receptor subtypes
oxylic acids ) are known to bind to at least six
(GALR, GALS, and GALT). Therefore, a rational start
GPCRs (AG22, AG2R, AG2S, GHSR, L4R1, L4R2)
to find ligands for these three orphans would be first to test
known ligands for neurokinine and galanine receptors,
details of 3-D recognition of this privileged substructure by
respectively. An experimental validation of this approach
GPCR hotspots have been recently proposed by a thor-
has been recently reported by scientists at 7TM-Pharma
ough mutagenesis-guided manual docking of several
Table 2Possible ligand source for some orphan GPCRs
Orphan receptor(s
GABA-B allosteric ligands
CaSR allosteric ligands
LH/FSH nonpeptide ligands
Cannabinoid receptors ligands
Tachykinin receptors ligands
Galanine receptor ligands
Oxytocin/vasopressin receptor ligands
O14804, GP57, GP58
Biogenic amine receptors ligands
Brain-gut peptides
Neuromedin U receptors ligands
Chemokine receptor ligands
Somatostatine receptor ligands
GP15, GP25, GP44, GPR1
Angiotensin II receptor ligands
LPC/SPC receptor ligands
Purinergic nucleotide receptor ligands
GP17, GP34, FK79, P2YA
Cysteinyl Leukotriene receptor ligands
a Receptors are labeled according to their UniProt (entry name.
b For cluster definition, see .
D. Rognan / Journal of Physiology - Paris 99 (2006) 232–244
Fig. 7. Close up to the peptide receptors cluster.
Fig. 8. Matching privileged structures of known GPCR ligands to TM hotspots. An in-house GPCR ligands database is searched to retrieve privilegedstructures common to multiple GPCRs and to find conserved residues within the 7-TM cavity of selected entries. Browsing the in-house GPCR cavitydatabase (sequence of 30 critical positions lining the 7-TM cavity of 372 human GPCRs) allow to retrieve new GPCR entries satisfying the query and likelyto accommodate the privileged structure.
GPCR ligands (). We propose
identified the same important anchoring residues than
here a much simpler approach leading to the same out-
by simply looking at sequence align-
come; looking at the 30 residues lining the TM cavity of
ments of TM cavity-lining amino acids, without relying on
the later six GPCRs allows us to clearly identify putative
any 3-D docking data. Searching our TM cavity database
TM residues able to interact with this substructure ).
for additional GPCRs fulfilling the above-described
Conserved aromatic residues are likely to interact with
the biaryl moiety cluster between TMs 6 and 7 (Phe6.44,
Tyr7.43 and Lys5.42 or Arg6.55 or Arg7.35 and His/Gln6.52)
Trp6.48, Phe/Tyr/His6.51, Phe/Tyr7.43). A positively charged
permits us to extract 17 new GPCRs that might accommo-
residue that probably interacts with the bioisosteric tetra-
date biphenyl-tetrazoles and biphenyl-carboxylic acids
zole and carboxylate groups should be located nearby the
). Among putative targets are 10 chemoattractant
aromatic cluster. Hence, three basic residues (Lys5.42,
receptors (APJ, C3AR, C5AR, C5L2, CML1, FML1,
Arg6.55, and Arg7.35) fulfill this requirement. Last a polar
FMLR, GP15, GP44, and GPR1), three brain-gut peptide
side chain at position 6.52 (His/Gln) is conserved for the
receptors (MTLR, NTR1, and Q9GZQ4), two cationic
six investigated GPCRs and might H-bond to the acidic
phospholipid receptors (G2A, SPR1) and two peptide
moiety of the privileged structure. We have then clearly
receptors (GALR, GALS). This target list encompasses
D. Rognan / Journal of Physiology - Paris 99 (2006) 232–244
receptors recently identified by
(e.g. APJ, NTR1). It also suggests totally new putative tar-
Streptavidine Others
gets for the investigated privileged structure that might
serve as a common scaffold for small-sized combinatoriallibraries targeting the new receptors list.
3.2. 3-D screening
High-throughput docking of large chemical libraries
has established as a promising tool
for identifying new hits from protein 3-D structures comingmostly from X-ray diffraction data
but also from homology modeling (). Finding out of a large library which ligands are
likely to bind to a protein of interest is slowly turning to
routine computational chemistry Sur-
prisingly, the opposite question is still an issue. Given aknown ligand, is it possible to recover its most likely tar-
get(s)? Answering this question using the above-mentioned
estrogen receptor α
docking approach implies first the development of a collec-
3α-hydroxysteroid dehydrogenase NADP[H] quinone oxidoreductase
tion of protein active sites (see Section and then use of
an inverse docking tool able to dock a single ligand to mul-tiple macromolecules. Although inverse screening uses the
same paradigm as ligand screening (predicting the most
likely ligand-target interactions from molecular docking),docking a single ligand to a target library is more difficult
to setup than classical docking of a ligand library to a sin-
gle target. One should automate the generation of inputfiles (3-D coordinates of the target or/and of the cognate
binding site; docking configuration file) for a large array
of heterogeneous targets, which is much more difficult than
setting up a reliable set of coordinates for a ligand library.
Notably, protein and binding site 3-D coordinates should
Fig. 9. Inverse screening of the sc-PDB database for finding the target of
be prepared automatically and should be rendered suitable
four small molecular weight ligands: top panel, biotin; bottom panel: 4-
for docking by removal of any additional molecule (sol-
hydroxy tamoxifen. Filled stars indicate the different sc-PDB copies of the
vent, ion, and co-factor) not essential for ligand binding.
true target (top: streptavidin, bottom: estrogen receptor a). Filled triangles
We have chosen the GOLD docking software
and squares indicate known secondary targets of 4-hydroxy tamoxifen(3a-hydroxysteroid dehydrogenase and NADP[H] quinone oxidoreduc-
for two main reasons: (i) it is one of the most
tase, respectively). Targets are ranked by decreasing GOLD fitness scores
robust and accurate docking tool in our hands
averaged over 10 independent docking runs.
); (ii) it only requires a single configurationfile whose distribution over a target library is easy toprocess.
for which a key amino acid (Asp128) has been mutated(1swt) or alternative binding sites (peptide binding sites
3.2.1. 3-D screening of the PDB: proof of concept
for 1vwr and 1rsu). Altogether, the proposed inverse
The first validation of inverse screening was to recover
screening protocol is able to unambiguously rank streptavi-
among 2 150 entries of the sc-PDB (release 1, February
din as the most likely target for biotin with a percentage of
2004) the true target(s) of either selective (e.g. biotin, 6-
coverage of 70% (7 out of 10) among the top 10 (0.5%)
hydroxyl-1,6-dihydropurine ribonucleoside) or promiscu-
ous ligands (e.g. 4-hydroxytamoxifen, methotrexate).
Likewise, the two sc-PDB entries of the estrogen recep-
Screening the sc-PDB database clearly allowed to unam-
tor a were ranked at the top two positions when screening
biguously recover the true targets of the four investigated
for the target of 4-hydroxy tamoxifen ) Interestingly,
ligands (). When screening our database
two other targets ((NADP[H] quinone oxidoreductase, 3a-
for potential targets of biotin, 7 out of the 10 streptavidin
hydroxysteroid dehydrogenase) at least ranked twice
entries present in the sc-PDB were ranked at the top eight
among the top 25 scorers, are known minor targets of this
positions with very good averaged fitness scores
ligand. Therefore, inverse screening of target databases
Interestingly, the three streptavidin copies with lower rank-
could also be viewed as a computational filter to roughly
ings (90th, 195th, 315th) correspond to either an active site
predict the selectivity profile of a given ligand and thus
D. Rognan / Journal of Physiology - Paris 99 (2006) 232–244
Fig. 10. Percentage of recovery of known targets as a function of the top scoring fraction found by inverse screening (green line) and random picking (redline). The percentage of coverage of known targets is the ratio in percentage between the number of true target entries recovered by inverse screening at adefined top scoring fraction and the total number of true target entries in the sc-PDB dataset.
putative side effects. When compared to random screening,
representative compounds from the library In
a significant enrichment in the true target is observed
the sc-PDB, a target is defined either as an enzyme from
among the top scorers ). Analyzing both the enrich-
the PDB with a unique EC number, or a non-enzymatic
ment factor and the percentage of coverage of known tar-
protein with a unique name according to our previous
gets indicates that the best compromise can be reached byselecting a very small fraction (0.5%) of the sc-PDB data-base. Even for the rather difficult case of methotrexate,
Table 3Predicted targets for five compounds from a triazepanedione library
selecting the top 2.6% scorers would allow to select 40%of all dihydrofolate reductase entries with a 15-fold enrich-
ment with respect to random screening.
3.2.2. 3-D screening of the PDB: test case
Having validated the inverse screening approach for
four unrelated ligands, a prospective screening was applied
to the identification of putative targets for representative
compounds of a scaffold-focused combinatorial library
(). Release 1 of the sc-PDB (2148 entries) was
screened to prioritize targets likely to accommodate five
a Enzyme commission number.
b Number of copies in the sc-PDB (release 1, February 2004).
Target rate: Percentage of targets ranked in the top 2% scoring entries.
h Methionine aminopeptidase.
Phospholipase A2.
j Purine nucleoside phosphorylase.
Fig. 11. The 1,3,5-triazepane-2,6-dione scaffold with five diversity points.
k Thymidine kinase.
D. Rognan / Journal of Physiology - Paris 99 (2006) 232–244
annotation of the database. Differences related to species,
PNP, PLA2). A detailed description of corresponding
isoforms or mutations are thus not considered in our clas-
structures and inhibitory constants will be reported
sification scheme. For each of the five investigated com-
pounds, a target was selected if it fulfills any of the threefollowing criteria: (i) 50% of target entries present in the
3.2.3. 3-D screening of the hGPCR library:
sc-PDB were scored, according to the average GOLD fit-
ness score, among the top 2% scoring entries, (ii) the aver-
Screening the collection of human GPCRs for identify-
age fitness score for all entries of the corresponding target
ing the receptors of known ligands is a quite demanding
was above 50; two entries of the same target were scored in
task regarding the current limited accuracy of GPCR mod-
the top 2% scoring entries.
els. We however tried to recover, from the GPCR target
Out of the nine targets fulfilling this selection procedure,
database, either the known receptor of a selective puriner-
five were finally selected for biological evaluation (ES, MA,
gic P2Y1 antagonist (MRS-2179) or the known receptors of
PLA2, PNP, TK; ). About 24 compounds enclosing
a promiscuous antagonist (NAN-190; ) previously
the five representative used for inverse screening were
shown to bind to several monoamine receptors with nano-
tested for inhibition of the above-described five enzymes.
molar affinities (a1A, D2, D3, 5-HT1A, 5-HT1D, 5-HT1F, 5-
Micromolar inhibitors from this small library could be
HT2A, 5-HT2C, 5-HT7). When screening the protein library
found for three out of the five predicted entries (MA,
for putative receptors of MRS-2179, the P2Y1 receptor isindeed ranked among the top scorers (7th, Fiveout of the nine known targets of NAN-190, the secondligand investigated herein, are ranked in the top 25 posi-
tions, and seven out of nine in the top 31 positions
(The worst-ranked true receptor (5-HT1A) isranked 68th. For both ligands, ca. 80% of GPCRs closely
related to the true target(s) (P2Y receptors for MRS-2179; 5-HT receptors for NAN-190) usually clustered in
the top 20% scorers. Thus, the current inverse screening
procedure is more aimed at identifying the likely receptor
subfamily (dopamine, serotonin, adenosine, etc.) than pre-
cisely mapping the individual preference for highly related
GPCR subtypes. It could thus be used as a computational
filter to study the most likely targets when addressing theselectivity profile of a given compound or trying to identify
the yet unknown receptor of a molecule showing promising
in vivo biological effects. Although the hGPCR database
enclosed ground-state models suitable for docking antago-nists and inverse agonists ) we checked
whether the same protocol could be applied to identify the
Fig. 12. Ranking of the true receptor(s) of a selective ligand (A: MRS-
2179, P2Y1 receptor antagonist) and of a promiscuous ligand (B: NAN-90,
antagonist of the dopamine D2 and D3 receptors, serotonin 5-HT1A, 5-HT1D, 5-HT1F, 5-HT2A, 5-HT2C, 5-HT7 receptors, and adrenergic a1a
Fig. 13. Ranking of the true receptor (GPR91, filled star) of an endog-
receptor). Known receptor(s) are indicated by filled stars. Targets are
enous ligand (succinic acid) by an inverse screening of a GPCR 3-D
ranked by decreasing GOLD fitness scores averaged over 10 independent
library. Targets are ranked by decreasing GOLD fitness scores averaged
docking runs.
over 10 independent docking runs.
D. Rognan / Journal of Physiology - Paris 99 (2006) 232–244
receptor of endogenous ligands. The hGPCR database was
Bairoch, A., 2000. The ENZYME database in 2000. Nucl. Acids Res. 28,
therefore screened to recover the receptor of succinic acid
Bairoch, A., Apweiler, R., Wu, C.H., Barker, W.C., Boeckmann, B.,
(a recently identified ligand for the previously
Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin,
orphan GPR91 receptor (Although
M.J., Natale, D.A., O'Donovan, C., Redaschi, N., Yeh, L.S., 2005.
ground-state 3-D models were screened, the native receptor
The Universal Protein Resource (UniProt). Nucl. Acids Res. 33, 154–
was surprisingly ranked among the top-scoring receptors
(11th) in our inverse screening. Again, the true receptor
Bajorath, J., 2002. Integration of virtual and high-throughput screening.
Nat. Rev. Drug. Discov. 11, 882–894.
was not ranked first but high enough in a shortlist that
Becker, O.M., Marantz, Y., Shacham, S., Inbal, B., Heifetz, A., Kalid, O.,
could be experimentally evaluated.
Bar-Haim, S., Warshaviak, D., Fichman, M., Noiman, S., 2004. Gprotein-coupled receptors: in silico drug discovery in 3D. Proc. Natl.
Acad. Sci. USA 101, 11304–11309.
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N.,
Weissig, H., Shindyalov, I.N., Bourne, P.E., 2000. The Protein Data
Virtual screening of target libraries offers new opportu-
Bank. Nucl. Acids Res. 28, 235–242.
nities to prioritize a few targets for experimental evaluation
Bissantz, C., Bernard, P., Hibert, M., Rognan, D., 2003. Protein-based
by applying simple ligand-based or target-based queries.
virtual screening of chemical databases. II. Are homology models of
There is no reason that single ligand docking to a wide
G-Protein Coupled Receptors suitable targets? Proteins 50, 5–25.
array of targets might not be as useful as classical docking
Bissantz, C., Logean, A., Rognan, D., 2004. High-throughput modeling of
human G-protein coupled receptors: amino acid sequence alignment
of ligand libraries to a single protein, assuming comparable
three-dimensional model building and receptor library screening. J.
accuracies of input data. The increasing coverage of target
Chem. Info. Comput. Sci. 44, 1162–1176.
space by the Protein Data Bank as well as the development
Bondensgaard, K., Ankersen, M., Thogersen, H., Hansen, B.S., Wulff,
of accurate comparative models describing entire protein
B.S., et al., 2004. Recognition of privileged structures by G-protein
families is likely to favor target screening in a near future.
coupled receptors. J. Med. Chem. 47, 888–899.
Evers, A., Klabunde, T., 2005. Structure-based drug discovery using
Pharmacophore-based and protein-based computational
GPCR homology modeling: successful virtual screening for antago-
filters are nowadays used sequentially in virtual screening
nists of the alpha1A adrenergic receptor. J. Med. Chem. 48, 1088–
One could imagine very similar scenarios for target screen-
Evers, A., Klebe, G., 2004a. Successful virtual screening for a submi-
ing, where interesting cavities would be first filtered by sim-
cromolar antagonist of the neurokinin-1 receptor based on a ligand-supported homology model. J. Med. Chem. 47, 5381–5392.
ilarity measurements to a binding site of interest
Evers, A., Klebe, G., 2004b. Ligand-supported homology modeling of
), and then selected by ligand
g-protein-coupled receptor sites: models sufficient for successful virtual
docking. Furthermore, orthogonal clustering of target fam-
screening. Angew. Chem. Intl. Ed. Engl. 43, 248–251.
ilies and of their ligands should soon provide precise che-
Fredriksson, R., Lagerstrom, M.C., Lundin, L.G., Schioth, H.B., 2003.
mogenomic information for selecting the most interesting
The G-protein-coupled receptors in the human genome form five mainfamilies. Phylogenetic analysis, paralogon groups, and fingerprints.
compounds/scaffolds according to a predefined selectivity
Mol. Pharmacol. 63, 1256–1272.
profile. Addressing simultaneously potency and selectivity
Frimurer, T.M., Ulven, T., Elling, C.E., Gerlach, L.O., Kostenis, E.,
in hit evaluation will undoubtedly affords added-value mol-
Hogberg, T., 2005. A physicogenetic method to assign ligand-binding
ecules in early drug discovery processes.
relationships between 7TM receptors. Bioorg. Med. Chem. Lett. 15,3707–3712.
Golovin, A., Dimitropoulos, D., Oldfield, T., Rachedi, A., Henrick, K.,
2005. MSDsite: a database search and retrieval system for the analysis
and viewing of bound ligands and active sites. Proteins 58, 190–199.
Halperin, I., Ma, B., Wolfson, H., Nussinov, R., 2002. Principles of
I would like to thank several former and current collab-
docking: an overview of search algorithms and a guide to scoring
orators of the Bioinformatics group (C. Bissantz, G. Bret,
functions. Proteins 47, 409–443.
He, W., Miao, F.J., Lin, D.C., Schwandner, R.T., Wang, Z., Gao, J.,
E. Kellenberger, A. Logean, P. Muller, N. Paul, and C.
Chen, J.L., Tian, H., Ling, L., 2004. Citric acid cycle intermediates as
Schalon) for their invaluable work in the development of
ligands for orphan G-protein-coupled receptors. Nature 429, 188–193.
target libraries. Financial support of the French Ministry
Hendlich, M., Bergner, A., Gunther, J., Klebe, G., 2003. Relibase: design
of Research and Technology, and of the Alsace-Lorraine
and development of a database for comprehensive analysis of protein–
Genopole is acknowledged as well as the allocation of com-
ligand interactions. J. Mol. Biol. 326, 607–620.
Jambon, M., Imberty, A., Deleage, G., Geourjon, C., 2003. A new
puting resources at the Centre Informatique National de
bioinformatic approach to detect common 3D sites in protein
l'Enseignement supe´rieur (CINES, Montpellier, France).
structures. Proteins 52, 137–145.
Ji, H., Leung, M., Zhang, Y., Catt, K.J., Sandberg, K., 1994. Differential
structural requirements for specific binding of nonpeptide and peptide
antagonists to the AT1 receptor. Identification of amino acid residuesthat determine binding of the antihypertensive drug losartan. J. Biol.
Attwood, T.K., Bradley, P., Flower, D.R., Gaulton, A., Maudling, N.,
Chem. 269, 16533–16536.
Mitchell, A.L., Moulton, G., Nordle, A., Paine, K., Taylor, P., Uddin,
Kellenberger, E., Rodrigo, J., Muller, P., Rognan, D., 2004. Comparative
A., Zygouri, C., 2003. PRINTS and its automatic supplement,
evaluation of eight docking tools for docking and virtual screening
prePRINTS. Nucl. Acids Res. 31, 400–402.
accuracy. Proteins 57, 225–242.
D. Rognan / Journal of Physiology - Paris 99 (2006) 232–244
Kitajima, K., Ahmad, S., Selvaraj, S., Kubodera, H., Sunada, S., An, J.,
Reiter, L.A., Koch, K., Piscopio, A.D., Showell, H.J., Alpert, R., et al.,
Sarai, A., 2002. Development of a protein–ligand interaction database,
1998. Trans-3-benzyl-4-hydroxy-7-chromanylbenzoic acid derivatives
ProLINT, and its application to QSAR analysis. Genome Informat.
as antagonists of the leukotriene B4 (LTB4) receptor. Bioorg. Med.
13, 498–499.
Chem. Lett. 8, 1781–1786.
Kitchen, D.B., Decornez, H., Furr, J.R., Bajorath, J., 2004. Docking and
Roche, O., Kiyama, R., Brooks III, C.L., 2001. Ligand–protein database:
scoring in virtual screening for drug discovery: methods and applica-
linking protein–ligand complex structures to binding data. J. Med.
tions. Nat. Rev. Drug. Discov. 3, 935–949.
Chem. 44, 3592–3598.
Kramer, B., Rarey, M., Lengauer T, T., 1999. Evaluation of the FLEXX
Schoichet, B.K., 2004. Virtual screening of chemical libraries. Nature 432,
incremental construction algorithm for protein–ligand docking. Pro-
teins 37, 228–241.
Schwalbe, H., Wess, G., 2002. Dissecting G-protein-coupled receptors:
Laskowski, R.A., Chistyakov, V.V., Thornton, J.M., 2005. PDBsum:
structure, function, and ligand interaction. ChemBioChem 3, 915–919.
summaries and analyses of PDB structures. Nucl. Acids Res. D26,
Stuart, C., Ilyin, V.A., Sali, A., 2002. LigBase: a database of families of
aligned ligand binding sites in known protein sequences and structures.
Lichtarge, O., Bourne, H., Cohen, F., 1996. An evolutionary trace method
Bioinformatics 18, 200–201.
defines binding surfaces common to protein families. J. Mol. Biol. 257,
Smith, R.G., Cheng, K., Schoen, W.R., Pong, S.S., Hickey, G., et al.,
1993. A non peptidyl growth hormone secretagogue. Science 260,
Lipinski, C., Hopkins, A., 2004. Navigating chemical space for biology
and medicine. Nature 432, 855–861.
Surgand, J.S., Rodrigo, J., Kellenberger, E., Rognan, D., 2006. A
Malherbe, P., Kratochvwil, N., Knoflach, F., Zenner, M.-T., Kew, J.N.C.,
chemogenomic analysis of the transmembrane binding cavity of
Krattzeisen, C., Maerki, H.P., Adam, G., Mutel, V., 2003. Mutational
human G-protein-coupled receptors. Proteins 62, 509–538.
analysis and molecular modeling of the allosteric binding site of a
Varady, J., Wu, X., Fang, X., Min, J., Hu, Z., Levant, B., Wang, S., 2003.
novel selective, noncompetitive antagonist of the metabotropic gluta-
Molecular modeling of the three-dimensional structure of dopamine 3
mate 1 receptor. J. Biol. Chem. 278, 8340–8347.
(D3) subtype receptor: discovery of novel and potent D3 ligands
Nissink, J.W., Murray, C., Hartshorn, M., Verdonk, M.L., Cole, J.S.,
through a hybrid pharmacophore- and structure-based database
Taylor, R., 2002. A new test set for validating predictions of protein–
searching approach. J. Med. Chem. 46, 4377–4392.
ligand interaction. Proteins 49, 457–471.
Venter, J.C. et al., 2004. The sequence of the human genome. Science 291,
Palczewski, K., Kumasaka, T., Hori, T., Behnke, C.A., Motoshima, H.,
Fox, B.A., Trong, I.L., Teller, D.C., Okada, T., Stenkamp, R.E.,
Verdonk, M.L., Cole, J.C., Hartshorn, M.J., Murray, C.W., Taylor, R.D.,
Yamamoto, M., Miyano, M., 2000. Crystal structure of rhodopsin: a
2003. Improved protein–ligand docking using GOLD. Proteins 52 (4),
G protein-coupled receptor. Science 289, 739–745.
Paul, N., Bret, G., Kellenberger, E., Mu¨ller, P., Rognan, D., 2004.
Weber, A., Casini, A., Heine, A., Kuhn, D., Supuran, C.T., Scozzafava,
Recovering the true targets of specific ligands by virtual screening of
A., Klebe, G., 2004. Unexpected nanomolar inhibition of carbonic
the protein data bank. Proteins 54, 671–680.
anhydrase by COX-2 selective celecoxib: new pharmacological oppor-
Petrel, C., Kessler, A., Maslah, F., Dauban, P., Dood, R.H.,
tunities due to related binding site recognition. J. Med. Chem. 47, 550–
Rognan, D., Ruat, M., 2003. Modeling and mutagenesis of the
binding site of Calhex 231, a novel negative allosteric modulator of the
Wise, A., Jupe, S.C., Rees, S., 2004. The identification of ligands at
extracellular Ca(2+)-sensing receptor. J. Biol. Chem. 278, 49487–
Orphan G-Protein coupled receptors. Annu. Rev. Pharmacol. Toxicol.
44, 43–66.
Source: http://cheminfo.u-strasbg.fr/labwebsite/publications/paper87.pdf
The common diseases of goat, their symptoms, treatment, and methods used in Sindh-Pakistan. By Mrs. Farzana Panhwar, July 2005 Author: Farzana Panhwar (Mrs) Address: 157-C, Unit No.2, Latifabad, Hyderabad (Sindh), Pakistan [email protected]
Derecho y Ciencia Seminario de Derecho y Ciencia Departamento Académico de Derecho Instituto Tecnológico Autónomo de México Cuadernos de Derecho y Ciencia Consejo Editorial Isabel Davara F. de Marcos Christian López Silva Ana Teresa Valdivia Alvarado Instituto Tecnológico Autónomo de México Arturo Fernández Departamento Académico de Derecho Coordinadora General