A new method for the detection of gene-environment interactions in cancer studies

The grant is funded by la Ligue Nationale Contre le Cancer

Principal curve based on the three principal components.

In recent years, the detection of heterogeneity in epidemiology has attracted increasing interest. One of the main reasons comes from the fact that taking into account heterogeneity in data analysis makes it possible to gain statistical power. In addition, the detection of gene-environment interactions is of major interest in epidemiology because it makes it possible to identify subgroups with high risks in the population. Although several methods have already been proposed for this problem the detection of gene-environment effects remains difficult, in particular because the causal effect is generally not directly observed (as for some treatments for example) and only proxi variables (such as BMI for example) are accessible.

The aim for this project is to develop a new statistical method for detecting gene-environment effects associated with the occurrence of cancer. The proposed approach will detect groups of individuals characterized by their environmental factors with different cancer risks.

The method works as follows: each patient is positioned in a proximity space using multiple covariates (personal, clinical, environmental, etc.). The approach then seeks to exploit the fact that two neighboring patients in this proximity space are more likely to be exposed to latent (and therefore not necessarily observed) common factors. Then a Principal Component Analysis (PCA) is applied to the proximity space and a smoothing curve, called "principal curve", is applied to it. This makes it possible to project each individual on this curve and to obtain an order on the individuals. Thus, close individuals on this curve will share similar exposure profiles. An example of a principal curve construction on the three main components of a dataset is shown in the figure above.

The grant (140k euros) is funded by the French National League Against Cancer (LNCC).


  • Developping the method by combining principal curves and the breakpoint methods from
  • Developing new test statistics for heterogeneity that will allow to test if gene-environment interactions exist in the data.
  • Investigating other approaches not based on principale curves such as minimun spanning tree.
  • Applying the method to EPIC and UK Biobank datasets.

The data:

  • EPIC is a multi-centric European cohort with more than 500,000 individuals, recruited in the 1990s. This dataset contains 7,491 genotyped women (3,831 cases of breast cancer and 3,623 controls) with different clinical and environmental information on patients: for example, socio-economic status, height, weight, BMI, smoking status, alcohol consumption, eating habits (obtained from a questionnaire), the status of menopause, the use of hormonal treatment (contraception or for menopause) etc.
  • UK Biobank is a prospective cohort on 488,377 British individuals, all genotyped. Recruitment took place from 2006 to 2010, for individuals aged 40 to 69 years. In May 2018, 79,000 cases of cancer were diagnosed. The main ones are melanoma, breast cancer, uterine cancer, prostate cancer and colon cancer. These data also contain several environmental and clinical information about these individuals: lifestyle, biological measurements, biomarkers in the blood and urine, also images of the brain and heart as well as repeated measures of physical activity.


  • Olivier Bouaziz, maître de conférence, Université de Paris, laboratory MAP5. Principal investigator of the project.
  • Grégory Nuel, senior CNRS researcher of the Institute of Mathematics (INSMI), Laboratory of Probability, Statistics and Modeling (LPSM), Sorbonne Université.
  • Vivian Viallon, maître de conférence, Université Claude Bernard, Lyon, currently on leave at Internation Agency for Research on Cancer (IARC).
  • Marie Chion, Postdoc, Université de Paris, laboratory MAP5. Marie has started her Postdoc in October 2021 and she will work for this project until the end of September 2022.
  • Modibo Diabaté, now researcher at ISEN Yncréa, Université Grenobles Alpes, has worked on this project as a Postdoc from October 2020 until August 2021.