Experience

Positions after PhD

 
 
 
 
 

Research director

LIRMM, CNRS

Jan 2016 – Present Montpellier, France

Responsibilities include:

  • Head of Institute of Computational Biology (2015-2019)
  • Head of French Molecular Bioinformatics network (2010-2016)
  • Head of computer science dpt (2007-2010)
 
 
 
 
 

Reasearcher

LIRMM, CNRS

Jan 2008 – Oct 1999 Montpellier, France
Bioinformatics and algorithmics research.
 
 
 
 
 

Postdoctoral fellow

German Cancer Research Center (Deutsches Krebsforschung Zentrum - DKFZ)

Sep 1996 – Oct 1999 Heidelberg, Germany
Algorithms for transcriptomics.

News

News about software, results, publications, or collaborations

A collaborative work on the impact of molecular conflicts in the treatment of Multiple Myeloma

Paper accepted at 50th international SOFSEM conference

Pengfei earned his PhD on December 13, 2024

Projects

ALL THINGS ARE DIFFICULT BEFORE THEY ARE EASY

*

ALPACA

EU funded Int’al Training Network on Computational Pan-Genomics

Superstring

Algorithms for shortest superstring questions

Fast_place

Software for efficient metagenomics and applications to virus metagenomics

GEM

Information fuelled biophysical models for the control of gene expression

Recent Publications

Consider words of length $n$. The set of all periods of a word of length $n$ is a subset of $0,1,2,łdots,n-1$. However, not every subset of $0,1,2,łdots,n-1$ can be a valid set of periods. In a seminal paper in 1981, Guibas and Odlyzko proposed encoding the set of periods of a word into a binary string of length $n$, called an autocorrelation, where a $1$ at position $i$ denotes the period $i$. They considered the question of recognizing a valid period set, and also studied the number $ąppa_n$ of valid period sets for strings of length $n$. They conjectured that $łn p̨pa_n$ asymptotically converges to a constant times $(łn n)^2$. Although improved lower bounds for $łn kp̨a_n/(łn n)^2$ were proved in 2001, the question of a tight upper bound has remained open since Guibas and Odlyzko’s paper. Here, we exhibit an upper bound for this fraction, which implies its convergence and closes this longstanding conjecture. Moreover, we extend our result to find similar bounds for the number of correlations: a generalization of autocorrelations that encodes the overlaps between two strings.

Overlaps between words are crucial in many areas of computer science, such as code design, stringology, and bioinformatics. A self overlapping word is characterized by its periods and borders. A period of a word u is the starting position of a suffix of u that is also a prefix u, and such a suffix is called a border. Each word of length, say n>0, has a set of periods, but not all combinations of integers are sets of periods. Computing the period set of a word u takes linear time in the length of u. We address the question of computing, the set, denoted $Γ_n$, of all period sets of words of length n. Although period sets have been characterized, there is no formula to compute the cardinality of $Γ_n$ (which is exponential in n), and the known dynamic programming algorithm to enumerate $Γ_n$ suffers from its space complexity. We present an incremental approach to compute $Γ_n$ from $Γ_n-1$, which reduces the space complexity, and then a constructive certification algorithm useful for verification purposes. The incremental approach defines a parental relation between sets in $Γ_n-1$ and $Γ_n$, enabling one to investigate the dynamics of period sets, and their intriguing statistical properties. Moreover, the period set of a word u is key for computing the absence probability of u in random texts. Thus, knowing $Γ_n$ is useful to assess the significance of word statistics, such as the number of missing words in a random text.

Replication stress exerts an important role in fueling genomic instability characterizing multiple myeloma (MM) evolution and is a leading cause of drug resistance. Normal and malignant plasma cells (PCs) are associated with a high transcriptional stress due to the huge production of immunoglobulins. Transcription-replication conflicts (TRCs), arising from collisions between replication and transcription machineries, can promote tumor progression and represent an Achilles’ heel to cancer cells. We reported a gene signature related to TRCs management (TRC score), overexpressed in malignant vs normal PCs. High TRC score identified patients with MM with a poor prognosis who could benefit from a TRC-enhancing therapy, in independent cohorts of patients with MM treated with high-dose melphalan chemotherapy or anti-CD38 immunotherapy. Here, we investigated the therapeutic interest of increasing TRCs to target specifically malignant PCs using the G-quadruplex (G4) stabilizer pyridostatin (PDS). PDS exerted significant toxicity in MM cell lines and primary MM cells, inducing DNA damage, cell cycle arrest, and apoptosis. Importantly, primary myeloma cells are significantly more sensitive to PDS treatment than normal bone marrow cells. Moreover, PDS improved the efficacy of MM treatments such as melphalan and histone deacetylase (HDAC) or bromodomain (BRD) inhibitors. Thus, our study shows that G4 stabilizers could be used to specifically target MM cells that exhibit concomitant replication stress and a high level of transcription, through the increase of TRCs. These molecules could be used to increase the efficacy of other treatments including melphalan, HDAC inhibitors, and BRD inhibitors.

Ancestral sequence reconstruction is an important task in bioinformatics, with applications ranging from protein engineering to the study of genome evolution. When sequences can only undergo substitutions, optimal reconstructions can be efficiently computed using well-known algorithms. However, accounting for indels in ancestral reconstructions is much harder. First, for biologically-relevant problem formulations, no polynomial-time exact algorithms are available. Second, multiple reconstructions are often equally parsimonious or likely, making it crucial to correctly display uncertainty in the results. Here, we consider a parsimony approach where any indel event has the same cost, irrespective of its size or the branch where it occurs. We thoroughly examine the case where only deletions are allowed, while addressing the aforementioned limitations. First, we describe an exact algorithm to obtain all the optimal solutions. The algorithm runs in polynomial time if only one solution is sought. Second, we show that all possible optimal reconstructions for a fixed node can be represented using a graph computable in polynomial time. While previous studies have proposed graph-based representations of ancestral reconstructions, this result is the first to offer a solid mathematical justification for this approach. Finally we discuss the relevance of the deletion-only case for the general case.Competing Interest StatementThe authors have declared no competing interest.

All routine clinical treatments for colorectal cancer include 5-fluorouracil (5-FU), which cannot counteract recurrence and metastases formation. As the pyrimidine analog 5-FU can impact multiple pathways including both DNA and RNA metabolism, studying its mode of actions could lead to improved therapies. Using a dedicated reporter system for lineage-tracing and deep translatome profiling we demonstrate that 5-FU causes some colorectal cancer cells to tolerate the drug, due to a durable translational reprogramming that sustains cell plasticity. This period of drug tolerance coincides with specific translational activation of genes coding for proteins with major pro-tumoral functions. We unravel a major unexpected translational overexpression of the pro-inflammatory and pro-tumoral IL-8 cytokine, alongside other anti-apoptotic, senescence-associated secretory phenotype and cancer-related senescence phenotype genes. Given the adverse prognostic implications of elevated IL-8 levels across various cancers, our findings suggest IL-8 targeting could counteract 5-FU resistance.

Popular Topics

68W32 adaptation Aho-Corasik algebraic technique algorithm algorithms alignment alignment score ALPACA alphabet size Anchor-based strategy ancient DNA Approximability approximate match approximate pattern matching approximate repeats approximation Approximation algorithm approximation algorithms APX assembly autocorrelation award bacteria Bacterial genomes Basic Period binary alphabet Binary Vector binding binding site bioinformatics Biological Physics (physics.bio-ph) biophysics BLAST bounds Burrows-Wheeler cancer cancer stem cell cDNA character Characterisation chemical mark chromatin chromosome circular permutation cluster analysis clustering clustering algorithms coding coiled coil Collinear fragment chaining common word Comparative genomics complexity compressed data structures compression compression algorithms compression gain computer science Concat-Cycles concensus string conference conformation Connectivity cross-over cyclic cover Cyclic string cyclic strings Cytoplasmic Male Sterility Data compression data structure Data structures Data Structures and Algorithms (cs.DS) database DCJ de Bruijn graph defense diagnostic discrete line DNA dominance order double cut and join duplication dynamic programming edit distance encoding enumeration epitranscriptome equality EST Eulerian tour evaluation evolution Exact Match exponential F.2.2 filtration FOS: Computer and information sciences FOS: Physical sciences Gapped seed genetics genome genome rearrangement genome sequencing genomics Golomb ruler graph greedy greedy algorithm Greedy conjecture Haemophilus influenzae Hamiltonian path heuristic algorithms Hi-C homologous sequences human hybrid zone Hypergraph incomplete lineage sorting indexing information content information theory input string INS insulin integer sequence internal duplication intragenic recombination irreducible factor kinship Kolmogorov complexity lattice LCS Levenshtein distance linear superstring linear time linear time algorithm Longest common subsequence machine learning malaria mapping tool matroid maximal chain Maximum coverage Maximum independent set Maximum stable set medecine memory metagenome metagenomics microorganisms microsatellite evolution Minimum assignment minisatellite minisatellite locus minisatellites MIS modification monkey test motif motif size mouse mRNA MS-Align multiple alignment multiple read mutation MVR N-gram NGS NP-complete NP-hard On-line algorithms optimal coding oryza line overlap overlap graph Pairwise alignment parameterized complexity path pattern Pattern recognition pattern search perfect detection periods Permutation phylogenetic profile Polynomial Time Approximation Scheme price proportional length protein domain proteome publication PWM python radish genome random-access memory random text Read mapping rearrangement Recognition recombinant regular expression regularity detection regulation relative compression repeats replication reverse complementary sequence RLE RNA search algorithm seed segment tree seminar sequence sequence alignment sequence classification sequence comparison Sequence graph short tandem repeats Shortest Common Superstring Problem Shortest cyclic cover of strings shortest DNA cyclic cover problem Shortest Superstring Problem similarity similarity metrics single cell soft software spaced-seed Statistical Mechanics (cond-mat.stat-mech) string String matching stringology Stringology Text Algorithms Indexing Data Structures De Bruijn Graph Assembly Space Complexity Dynamic Update strings student sturgeon phylogeny subset system suffix array suffix tree superstring sweep line tandem duplication tandem repeat tandem repeat alignment tandem repeats team text text compression Text indexing Tiling time complexity tool training transcription transcription factor transcriptome transcriptomics translation treatment tree tree alignment validation score virus VNTR W[1]-hard web resource web server Whole genome alignment word word enumeration word RAM model workshop Yakuts zebra fish

Contact

Connect with me

  • (33) 04 67 41 86 64
  • LIRMM - UMR 5506 & University Montpellier (CC 05016) 860 rue de St Priest - 34095 Montpellier cedex 5 FRANCE