Multilingual semantic annotations of
Electronic Health Records and Pharmacogenomics data with ontologies.
Type: Postdoc position
Employer: University of Montpellier
Context: PratikPharma project (Practice-based evidences
for actioning Knowledge in Pharmacogenomics)
– ANR project
When: September 2016 – for 24 months (12 months renewable)
Where: LIRMM, Montpellier
(requires collaboration with Stanford University, USA.))
semantic annotation, biomedical ontologies
& terminologies, multilingual context (French & English), text mining, semantic
web, natural language processing, medical informatics, knowledge extraction.
The goal of the PractiKPharma
project (http://practikpharma.loria.fr) is to validate or moderate
Pharmacogenomics state-of-the-art knowledge on the basis of practice-based evidences,
i.e., knowledge extracted from Electronic Health Records. During this project,
we will extract state-of-the-art knowledge from (English) structured and
unstructured descriptions in reference databases (e.g., PharmGKB)
and literature (e.g., PubMed) as well as extract observational knowledge from (French)
EHRs. Part of this multilingual knowledge extraction process will be based on
semantic annotation (using relevant biomedical ontologies) of plain-text data.
We plan to reuse and enhance tools developed in the context of the NCBO (www.bioontology.org) and SIFR projects (www.lirmm.fr/sifr).
We are seeking a motivated, curious
and interested postdoc candidate to design and develop the semantic annotation
workflows and help the project to annotate HER and Pharmacogenomics data. The
postdoc develop new methodologies to capture the context of French clinical
narrative, such as negations, specific sections, modality and word sense
disambiguation. To support this French-English context we will also investigate
the generation and use of multilingual ontology mappings (mostly reusing
LIRMM’s YAM++ approach).
Pharmacogenomics (PGx) studies how individual gene variations cause
variability in drug responses and constitutes a basis for implementing
personalized medicine i.e., a medicine tailored to each patient by considering her/his
genomic context [8]. PGx
data is often not yet validated because most of it results from studies that do
not fulfill statistics validation standards and are difficult to reproduce
because of the rarity of gene variations studied. The goal of the PractiKPharma project is to validate PGx
state-of-the-art knowledge publicly available (in English) on the basis of
practice-based evidences, i.e., knowledge extracted from EHRs (in French). The
project is mainly funded by the French ANR and leaded by A. Coulet
(INRIA Nancy) in collaboration with HEGP (Georges Pompidou European Hospital, Paris),
CHU Saint-Etienne and the LIRMM.
Units of knowledge in PGx typically have the form of ternary relationships gene
variant–drug–adverse event, and can be formalized to different extents using
biomedical ontologies. To achieve our goal, we will (1) extract
state-of-the-art knowledge from PGx databases, (such
as PharmGKB [3]) and literature (such as PubMed);
(2) extract observational knowledge (i.e., knowledge extracted from
observational data) from French EHRs, (3) to compare knowledge units extracted
from these two origins, to confirm or moderate state-of-the-art knowledge, with
the goal of enabling personalized medicine. (4) Finally, we will emphasize
newly confirmed knowledge by investigating omics databases for molecular
mechanisms that underlie and explain drug adverse events use biomedical Linked
Open Data [1].
The clarification of the validity of PGx data will have an important impact on clinical care.
This would permit the definition of guidelines that enable clinicians to
implement personalized medicine by choosing and dosing drugs more precisely.
Concretely, this will reduce the toxicity and increasing the efficacy of
prescribed drugs, consequently reducing cost and improving quality of care.
Within this project, we (task leaded
by LIRMM) will design and implement a semantic annotation workflow that will
use the relevant reference ontologies to facilitate the knowledge extraction
process. We plan to reuse and improve the outcomes of the National Center for
Biomedical Ontology (www.bioontology.org
- NCBO) and the Semantic Indexing of French Biomedical Data Resources (www.lirmm.fr/sifr - SIFR) projects that
have developed tools for semantic annotations respectively for English and
French data [5] [4].
The biggest improvements will come
from introducing more natural language processing (NLP) mechanisms in the
annotation process. We will specifically work on better capturing the context
of French clinical or patient narrative. The new workflow improvements will
include:
- to
identify polysemic terms during the annotation
process and choose the proper concept (e.g., cell, the biological component or
telephone).
- to detect negation (e.g., the
patient does not have the symptom) for instance using state-of-the-art results
in the domain such as NegEx [2] already tested by NCBO team, and
inventing others.
- to
detect, to some extent, elements of context, by detecting modulators words
(hypothetically, strongly) to associate annotations with different levels of
importance and detect time information.
- the
publication and link of the annotations produced by our workflow in the Web of
data using standard semantic Web technologies & the new W3C Web Annotation
Model standard.
- to improve
the annotation scoring, which allows to rank annotations by importance, related
to the context in which the annotation has been done and to the frequency of
the matches [6].
- to make
the workflow handles multilingual data and offers annotations with multilingual
ontologies by leveraging multilingual mappings. For this purpose, multilingual
mappings will have to be generated between English ontologies and their French
counterparts.
The annotation of EHRs data will have
to be run in-house at the HEGP in collaboration with the team there. The work done
on text mining and semantic annotation in French, will be generalized to
English and improvements of the SIFR annotator will be incorporated into the
NCBO annotator when possible in collaboration with Stanford University. Multiple
visits are planned. The work on ontology alignment will capitalize on the work
of the OpenData team at LIRMM (e.g., on YAM++ [7]) and involve Zohra Bellahsene & Konstantin Todorov.
The ontologies used will include: SNOMED-CT,
NCIt, HDO, HPO, MESH, LOINC, RXNORM, WHO-ATC, MEDRA
For reference, SIFR Annotator code
and API are available here:
https://github.com/sifrproject & http://data.bioportal.lirmm.fr/documentation
We are seeking a motivated, curious
and interested postdoc candidate to design and develop the semantic annotation
workflow and bring interesting new ideas to the project team. A computer
science or bioinformatics PhD degree is required. Besides an important
motivation for the research questions, we are also looking for someone with some
good technical skills and motivation for concrete outcomes. The best candidate will
demonstrate good programming skills in addition of a good track record of
publications. The supervision will be done mostly remotely as Clement Jonquet is currently visiting scholar at Stanford
University. The candidate will demonstrate aptitudes or matches with some of
the following aspects:
- Research experience that match
with the proposed subject.
- Experience with semantic some of
the technologies involved: Java/JEE, Ruby/Rails, RESTful web services, XML/JSON.
- Experience with the semantic web vision
(& technologies OWL, RDF, SPARQL)
- Experience in biomedical
informatics (knowledge extraction, use of ontologies, BioPortal)
- Good track records in terms of
publications and communication of his/her work.
- Excellent remote working
capabilities (emails, trackers, collaborative tools, etc.)
- Perfect English oral and writing
skills.
- Few knowledge in French language
with objective to learn the language during the contract.
- Multiple project meetings are
planned in France (Paris or Nancy),
- International trips accepted (collaboration
with Stanford) and the being eligible for a visa for the USA.
- Excellent writing skills as reports, publications,
and technical notes will always be necessary.
- Autonomy and initiative, take on technical
decisions within the project and justify choices.
- Friendly person to join a small research team
in Montpellier.
For more information about this
position, please contact Clement Jonquet (jonquet@lirmm.fr). To apply, please send an
email including links to (NO ATTACHED DOCUMENTS) the following:
- a motivation letter describing an explanation of YOUR
interest for the position;
- a curriculum vitae describing your experience and the
matches with the expected profile;
- copies of diplomas (PhD) and other relevant certificates
(MSc grades, PhD jury evaluation);
- names and contact details of referees.
- The postdoc
will be hired by the University of Montpellier (social security, etc. included).
- Salary
will be around 2000€ net per month depending on experience.
- Contract
should start September 1st 2016 or October 1st 2016.
- The
contract will be for 1 year and renewable another year.
[1] C. Bizer, T. Heath, and T. Berners-Lee. Linked Data
- The Story So Far. Semantic Web and
Information Systems, 5(3):1–22, 2009.
[2] W. Chapman,
W. Bridewell, P. Hanbury,
G. F. Cooper, and B. G. Buchanan. A simple algorithm for identifying
negated findings and diseases in discharge summaries. Biomedical Informatics, 34(5):301–310, October 2001.
[3] M. Hewett,
D. E. Oliver, D. L. Rubin, K. L. Easton, J. M. Stuart,
R. B. Altman, and T. E. Klein. PharmGKB:
the pharmacogenetics knowledge base. Nucleic
acids research, 30(1):163–165, 2002.
[4] C. Jonquet, A. Annane, K. Bouarech, V. Emonet, and S. Melzi. SIFR BioPortal : Un portail ouvert et générique d’ontologies et de terminologies biomédicales françaises au service de l’annotation sémantique. In 16th Journées Francophones d’Informatique Médicale, JFIM’16, Genève, July 2016.
[5] C. Jonquet,
N. H. Shah, and M. A. Musen. The Open
Biomedical Annotator. In American Medical
Informatics Association Symposium on Translational BioInformatics,
AMIA-TBI’09, pages 56–60, San Francisco, CA, USA, March 2009.
[6] S. Melzi and C. Jonquet.
Scoring semantic annotations returned by the NCBO Annotator. In A. Paschke, A. Burger, P. Romano, M. Marshall,
and A. Splendiani, editors, 7th International Semantic Web Applications and Tools for Life
Sciences, SWAT4LS’14, Berlin, Germany, December 2014.
[7] D. Ngo
and Z. Bellahsene. YAM++ :
A Multi-strategy Based Approach for Ontology Matching Task. In A. ten Teije, J. Völker, S. Handschuh, H. Stuckenschmidt,
M. d’Acquin, A. Nikolov,
N. Aussenac-Gilles, and N. Hernandez,
editors, 18th International Conference on
Knowledge Engineering and Knowledge Management,EKAW’12, p. 421–425, Galway,
Irland, 2012. Springer.
[8] H.-G. Xie and F. W. Frueh.
Pharmacogenomics steps toward personalized medicine. Personalized Medicine, 2(4):325–337, 2005.