Research engineer
position: Web architect – Semantic web technologies – Text and data mining
Employer: University of Montpellier
Context: ANR PractiKPharma project and VisaTM
project
When: November 2018
Duration 12-14 months minimum (extensible to 2 additional years on
related projects)
Where: LIRMM,
Montpellier, France
Collaboration: NCBO (Stanford, USA), LORIA & INSIT (Nancy, France), HEGP (Paris, France), INRA (Jouy-en-Josas, France), EMA (Alès, France)
Web development, ontologies &
terminologies, semantic web, ontology repository, knowledge engineering, semantic
interoperability, annotation, natural language processing, text and data mining.
Web development, Java/JEE, Ruby/Rails,
RESTful web services, XML/JSON, Web technologies (HTML5, Bootstrap, JavaScript),
Semantic web technologies (OWL, RDF, SPARQL, triplestore, Linked data), NCBO
technology (BioPortal), OpenMinTeD technology.
A key aspect in addressing semantic
interoperability is the use of terminologies, vocabularies and ontologies as a
common denominator to structure data and make them interoperable. In
partnership with Stanford University, LIRMM designs, develops and maintains two
vocabulary and ontology repositories: (i) the
SIFR BioPortal (http://bioportal.lirmm.fr)
which targets the French biomedical community and offers an ontology-based
annotation workflow to semantically index text biomedical and clinical data;
(ii) AgroPortal (http://agroportal.lirmm.fr) a
reference repository for semantic resources in agronomy, agriculture, food,
plant sciences and biodiversity. These platforms allow us to tackle scientific
problems in natural language processing, semantic annotation, ontology
engineering, while being driven by concrete use cases with impacts in
biomedicine and agronomy.
In the context of the ANR
PractiKPharma and VisaTM projects, we are seeking a motivated, curious and
interested research engineer and web developer to take hands on the platforms
and prototypes developed. Your role will be both to support the current
platforms and investigate technical decisions to enable the development of new
features in relation to text and data mining. Within PractiKPharma, your role
will consist in enhancing the SIFR Annotator to facilitate its use to annotate
clinical data (in collaboration with HEGP and LORIA). Within the VisaTM
project, your role will be to enable interoperation of both AgroProtal
and BioPortal with the text and data mining infrastructure developed in the
OpenMinTeD project (in collaboration with INRA and INIST). You will work with a
small team (4 persons) at LIRMM in both a national and international context.
Further extension of the proposed 12-month-contract are possible.
Pharmacogenomics (PGx) studies how individual gene variations cause
variability in drug responses and constitutes a basis for implementing
personalized medicine i.e., a medicine tailored to each patient by considering
her/his genomic context. The goal of the PractiKPharma project (http://practikpharma.loria.fr)
is to validate or moderate Pharmacogenomics state-of-the-art knowledge on the
basis of practice-based evidences, i.e., knowledge extracted from Electronic
Health Records. During this project, we extract state-of-the-art knowledge from
(English) structured and unstructured descriptions in reference databases
(e.g., PharmGKB) and literature (e.g., PubMed) as
well as extract observational knowledge from (French) EHRs. Part of this
multilingual knowledge extraction process is based on semantic annotation of plain-text
data. We use and enhance tools developed in the context of the NCBO (www.bioontology.org) [1,
2] and SIFR projects (www.lirmm.fr/sifr).
Especially the SIFR Annotator, a web service allowing scientists to utilize
available biomedical ontologies for annotating their French biomedical or
clinical text automatically [3]. The SIFR Annotator service processes
raw textual descriptions, tags them with relevant biomedical ontology concepts
and returns the annotations to the users in several formats such as JSON-LD,
RDF or BRAT. We have started to develop specific feature to process clinical
text (e.g., detect negation, temporality and experiencer) [4], however they need to be improved
and complemented by a general disambiguation module that will allow to increase
the precision of the annotations. Another technical challenge is to enable the
use of the system in data-sensitive environment (e.g., hospitals) for this a
Docker packaging is available. The use cases will be discussed with our
collaborators at HEGP hospital and in Nancy (LORIA and CHU).
The semantic
Web relies on the construction of standard vocabularies and ontologies to
formally capture the knowledge of a domain into semantic resources computers
use to index, search or reason on the data [5]. In recent years, we
have seen an explosion in the number of knowledge resources (thesaurus,
terminologies, vocabularies and ontologies) being developed in life sciences,
agronomy and biodiversity. However, those resources are spread out, in
different formats, of different size, with different structures and from
overlapping domains. Therefore, there is need for common reference platforms to
receive and host them, align them, and enabling their use in external applications.
More generally, ontologies and vocabularies are a key element to make data FAIR
(Findable, Accessible, Interoperable and Reusable). For example, ontologies and
terminologies are highly valuable in text and data mining workflows. To
facilitate the development of such workflows, the EU OpenMinTeD project (http://openminted.eu)
has developed a platform in which different text and data mining components can
be used to process text corpora, possibly using ontologies and terminologies.
Within the VisaTM project we develop the interoperation components for
OpenMinTeD and AgroPortal (and by extension to any other ontology repository
based on the NCBO technology) to rely on one another: OpenMinTed to consume
AgroPortal’s semantic resources [6] and AgroPortal to
consume simplified/customized OpenMinTeD’s workflows.
More specifically, we are interested in using (and maybe contributing to) the
OpenMinTeD technology to enable easy external access, via web service, to advanced
text and data mining workflows.
We are reusing the technologies
developed by the National Center for Biomedical Ontologies at Stanford
University: the BioPortal web application (http://bioportal.bioontology.org)
made available via its virtual appliance (http://www.bioontology.org/wiki/index.php/Category:NCBO_Virtual_Appliance).
Please refer to our GitHub repository for more detail:
- https://github.com/sifrproject
- https://github.com/agroportal
The developer will have:
- to manage, administrate and modify
the SIFR BioPortal and AgroPortal ontology repositories.
- to modify the SIFR Annotator with
respect to the needs of the PractiKPharma project.
- to modify the AgroPortal (and
generic technology) with respect to VisaTM’s roadmap
for interoperation with OpenMinTeD platform.
We are seeking a motivated, curious and
interested research engineer candidate with a computer science or
bioinformatics training (master/engineer or PhD). Besides an important
motivation for the technical challenges and excellent software development
expertise, we are also looking for someone with some interest for the research
topics presented. The candidate will demonstrate aptitudes or matches with some
of the following aspects:
- Web
developer with good knowledge of JEE technologies, Ruby/Ruby On rails, Bootsrap.
-
Experience with semantic web technologies, especially JSON/RDF/SPARQL.
- Experience with text and data
mining software (knowledge extraction, use of ontologies, etc.)
- Between 1 and 5 years of experience. Including
experience in private companies.
- Excellent technical skills to push prototypes
into production environment.
- Excellent remote working capabilities
(emails, trackers, collaborative tools, etc.)
- Perfect English oral and writing skills.
- Few knowledge with French
language with objective to learn the language during the contract.
- Excellent writing skills as reports,
documentation, and technical notes will always be necessary.
- International trips accepted (collaboration
with Stanford) and possibility to get a visa for the USA.
- Autonomy and initiative, take on technical
decisions within the project and justify choices.
- Friendly person to join a small research team
in Montpellier and to listen to their needs.
- Eventually interested in research valorization of the outcomes
(everything will be published).
- Open source developer.
For more information about this
position, please contact Clement Jonquet (jonquet@lirmm.fr). To apply, please send
an email including links to (NO ATTACHED DOCUMENTS) the following:
- a curriculum vitae describing your
training and experience;
- a
motivation letter describing YOUR interest for the position and the matches
with the expected profile;
-
reference to already developed web application or project (URL, GitHub,
technical documentation) clarifying your role;
- copies
of diplomas and possibly other relevant certificates;
- names
and contact details of referees.
- The
engineer will be hired under the “Ingénieur de
recherche” or “ingénieur d’étude”
status depending on qualification
- Social
security and benefits are included.
- Salary
will be between 1600 and 2000€ net per month depending on qualification and experience.
1. Noy, N.F., Shah, N.H., Whetzel, P.L., Dai, B., Dorf, M.,
Griffith, N.B., Jonquet, C., Rubin, D.L., Storey, M.-A., Chute, C.G., Musen,
M.A.: BioPortal: ontologies and integrated data resources at the click of a
mouse. Nucleic Acids Res. 37, 170–173 (2009).
2. Whetzel,
P.L., Team, N.: NCBO Technology: Powering semantically aware applications.
Biomed. Semant. 4S1, 49 (2013).
3. Jonquet,
C., Annane, A., Bouarech, K., Emonet, V., Melzi, S.: SIFR BioPortal : Un
portail ouvert et générique d’ontologies et de terminologies biomédicales
françaises au service de l’annotation sémantique. In: 16th Journées
Francophones d’Informatique Médicale, JFIM’16. p. 16. , Genève, Suisse (2016).
4. Tchechmedjiev,
A., Abdaoui, A., Emonet, V., Melzi, S., Jonnagaddala, J., Jonquet, C.: Enhanced
functionalities for annotating and indexing clinical text with the NCBO
Annotator+. Bioinformatics. 34, (2018).
5. Jonquet,
C., Toulet, A., Arnaud, E., Aubin, S., Dzalé Yeumo, E., Emonet, V., Graybeal,
J., Laporte, M.-A., Musen, M.A., Pesce, V., Larmande, P.: AgroPortal: A
vocabulary and ontology repository for agronomy. Comput. Electron. Agric. 144,
(2018).
6. Kettani,
F., Schneider, S., Aubin, S., Bossy, R., François, C., Jonquet, C.,
Tchechmedjiev, A., Toulet, A., Nédellec, C.: Projet VisaTM : l’interconnexion
OpenMinTeD – AgroPortal – ISTEX, un exemple de service de Text et Data Mining
pour les scientifiques français. In: Rawnez, S. (ed.) 29èmes Journées
Francophones d’Ingénierie des Connaissances, IC’18, Poster Session. pp.
247–249. , Nancy, France (2018).