STrengthening the Reporting of OBservational studies in Epidemiology--Molecular Epidemiology (STROBE-ME): an extension of the STROBE Statement.
Keywords
Article abstract
-
Advances in laboratory techniques have led to the increasing use of biomarkers in epidemiological studies, but the quality of reporting of such studies varies.
-
The STROBE (STrengthening Reporting of Observational studies in Epidemiology) initiative, established in 2004, provides guidance on reporting observational epidemiology studies.
-
Here, the STROBE-ME (Strengthening the Reporting of OBservational studies in Epidemiology – Molecular Epidemiology) initiative builds on STROBE and provides additional guidance on reporting biomarker studies.
-
Specific additions relate to the collection, handling and storage of biological samples; laboratory methods, validity and reliability of biomarkers; specificities of study design; and ethical considerations.
-
A checklist to help authors in reporting biomarker studies is published as supporting information (Table S1).
Article content
Introduction
In recent years, advances in laboratory techniques have led to a rapidly increasing use of biomarkers in epidemiological studies, a field known as molecular epidemiology [1]–[5]. Biomarkers are any substance, structure or process that can be measured in biospecimens and may be associated with health-related outcomes. Biomarkers of internal dose, of early biological change and of susceptibility (see Figure 1 and Box 1 for definitions) are used as proxies for investigating the interplay between external and/or endogenous agents and the body. Biomarkers may provide valuable scientific tools because of their ability to inform biological mechanisms through the examination of early, intermediate and late molecular and cellular events. Moreover, a biomarker may capture several external exposure variables in a single biologically relevant quantity, provide quantitative measurements, increase statistical power or be used as an efficient and informative intermediate outcome. Finally, biomarkers can be used to identify susceptible individuals and to improve diagnosis and early detection of disease as well as prediction of major clinical outcomes in patients with a given disease. Figure 1 describes the whole spectrum of applications of biomarkers; the scheme uses cancer as an example because this is the field in which the conceptual framework of molecular epidemiology has had the greatest development and numerous postulated potential applications; however, similar concepts apply to many other fields.
Biomarker-based measurements are not, however, problem free. As in classical biomedical and epidemiological research, considering methodological issues concerning the design, conduct, analysis and interpretation of the results is essential to adequately address a research question [6]. In addition to the usual problems of bias and confounding that affect all clinical and epidemiological studies, particular issues when using biomarkers include (i) validity and reliability of biomarker measurements, (ii) special sources of bias, (iii) reverse causality and (iv) false positives as a result of multiple testing or selective reporting. To conceive relevant and valid studies, in biomarker-based research, we need an in-depth understanding and integration of methodological and substantive (i.e. biological, clinical and environmental) knowledge. Complete, accurate and transparent reporting of study design, methods, conduct and findings is required to allow the study to be fairly and adequately evaluated and summarized including avoidance of selective reporting of positive results [7]–[10]. Empirical evidence suggests that the results of the most highly cited biomarker studies across medicine almost consistently report larger effect estimates than those reported in subsequent meta-analyses [11]. Suboptimal reporting may also lead to inflated expectations on the translational potential and clinical utility of findings [12]. At the other end of the spectrum, false negatives are also a common problem [9], and they may result from limited sample size, poor study design or inappropriate laboratory assays [13].
The need for improved reporting of scientific research in general led to influential statements of recommendations such as CONSORT for randomized controlled trials [14],[15] and STrengthening Reporting of OBservational studies in Epidemiology (STROBE) statement [16]. The STROBE initiative was established in 2004 aiming at providing guidance on how to report observational research. The resultant STROBE statement was simultaneously published in several medical journals in 2007 [16],[17]. Its guidelines provide a user-friendly checklist of 22 items to be reported in epidemiological studies, with items specific to the three main study designs: cohort studies, case–control studies and cross-sectional studies. The STROBE statement has had an important impact. Its recommendations were adopted by several journals, and there is evidence that they have affected the style of result reporting [18]. However, there is also evidence of misuse of the STROBE statement [19].
Recent advances in molecular biology and the vast amount of data generated by high-throughput techniques (and consequent changes and improvement in terms of epidemiology, statistical analysis and study design) warrant implementing the STROBE recommendations specifically for molecular epidemiology studies. For a review of the state of the art of molecular epidemiology and the ensuing methodological problems, see [1]. Molecular tools (biomarkers) are also increasingly applied in epidemiology because of new and difficult issues that are addressed, such as the effects of chronic low-level exposures. While important discoveries of the past – such as the role of cholesterol or tobacco smoking – originated from studies with strong associations identified based on single measurements, there is now a challenge to identify weaker associations, and these require more accurate and sensitive tools. This increases the importance of a meticulous, comprehensive and transparent description of studies involving biomarkers.
Herein, we propose an extension of STROBE, i.e. STROBE for molecular epidemiology, STROBE-ME. The guidelines aim to provide an easy-to-use checklist of items that authors may use for reporting molecular epidemiology studies other than genetic association studies.
Recommendations already exist for genetic association studies, a field that has specific characteristics and requirements of reporting which have been included in a separate recent statement (STREGA, an extension of STROBE) [20]. There is some necessary overlap between the current guidelines and STREGA, insofar as ‘susceptibility biomarkers’ are included in the present recommendations. Communication of results of molecular epidemiology studies is a still underdeveloped field. This paper refers only to scientific communication of study results and does not address the ethical problem of communicating results to single individuals, see [21],[22].
Aims and Use of the STROBE-ME Statement
The expected outcome of the present recommendations is an improvement in the reporting of results, such that the editors, reviewers of papers and the readers understand better what was actually done by the authors. STROBE-ME is expected to lead to more organized and transparent papers and to a better understanding of both the strengths and weaknesses of the studies in molecular epidemiology. Our recommendations do not dictate how studies should be performed nor do they serve as a basis to evaluate the quality of observational studies; they only try to help improve the reporting of research. The adoption of improved reporting standards may nevertheless have also an indirect benefit on the quality of study design.
The parent STROBE statement is a checklist of 22 items to be addressed when observational epidemiological studies are reported. The STROBE items cover different aspects of reporting a study: the title (one item), introduction (two items), methods (nine items), results (five items), discussion (four items) and funding of research (one item) [16]. The explanation and elaboration document of STROBE [17] explains these items in detail and provides good real-life examples in published works for their application.
The statement proposed here is intended to be an extension of the STROBE statement for molecular epidemiology studies. The present recommendations are intended only for those studies in which biomarkers are used as an explanatory variable; these include biomarkers of exposure/internal dose, biomarkers of early biological change and biomarkers of susceptibility (Box 1, and Figure 1). This set of biomarkers is used as measurable proxy for the process of the interaction between an external/endogenous agent and the body at different biological levels. Other study designs involving biomarkers are not covered by the present recommendations, including transitional studies of validation and reliability of measurement.
Some items belonging to the original STROBE checklist have been implemented for molecular epidemiology studies; other items have been added de novo to the original checklist. The 10 implemente items include issues on study design specificities in molecular epidemiology studies; description of relevant participant conditions at the time of sample collection; and particular statistical aspects if the biomarker measurements are introduced into statistical models. The seven new specific items added to the original STROBE checklist include biological sample collection, storage and processing; and the laboratory methods used for the analyses. The present extended checklist was developed as an extension of the STROBE checklist (Table 1). The recommendations are intended to complement the existing STROBE guidelines, not to replace them; therefore, all previously described items concerning observational studies such as cohort, case–control and cross-sectional studies apply to molecular epidemiological studies (when appropriate).
The present statement contains a checklist of items for reporting molecular epidemiology studies (Table 1); some explanatory text referring to single item description; and some Boxes in which specific aspects of molecular epidemiology are briefly addressed for readers' reference. Although the current recommendations could apply also to biomarkers used for the prediction of clinical course and outcomes of disease, for tumour marker prognostic studies the reader should refer to the REMARK guidelines [7].
Concerning the uses of the present statement, additional details on how the parent STROBE statement was used can be found on the website (http://www.strobe-statement.org/). It is expected that the statement will be adopted and referred to by journals that publish molecular epidemiology papers, as well as by journals that publish clinical research in which biomarkers have an important role [23].
Development of the STROBE-ME Statement
A multidisciplinary group of epidemiologists, biostatisticians and laboratory scientists (overall approximately 15 scientists) developed the current recommendations. Also, editors of several specialist journals were involved from the outset. The group met twice in London (UK) in 2008 and 2009, once in Turin (Italy) in 2009 and once in Łódź (Poland) in 2010; it sought external opinions from partners of the Environmental Cancer Risk, Nutrition, and Individual Susceptibility (ECNIS) European Network of Excellence – which was the initiator of the STROBE-ME initiative. Overall, the process lasted 3 years. While no formal process such as a Delphi consultation was used for development, consensus was built by circulating several versions of the statement within the group of developers and an external circle of potential users. In all, over 30 scientists were involved in the process.
Checklist of Items
The items that should be considered when reporting molecular epidemiology studies are shown in Table 1 and available as supporting information. These items are similar to those that were originally recommended in STROBE, however, with modifications that are specific to molecular epidemiology. Later, we give a detailed description of each item. The purpose is not to suggest how to set up a research project but how to improve reporting of the research to allow readers (and reviewers) to better understand what was actually done by the researchers.
ME-1 – State the use of biomarker(s) in the title and/or in the abstract if they contribute substantially to the findings
When one or more biomarkers are measured in an epidemiological study, it may be more informative reporting this in the title or at least in the abstract of the article. This helps the reader to identify immediately molecular epidemiology studies and ensures a correct indexing in electronic databases.
ME-2 – Explain in the scientific background of the paper how/why the specific biomarker(s) have been chosen, potentially among many others
The process leading to the choice of one or more specific biomarkers for inclusion in a paper should be made clear in the Introduction. Background information and rationale for the choice of the specific biomarker(s) should be explicitly stated; also, how the biomarker is introduced in the study design should be made explicit (biomarker of exposure, internal dose, early biological change and susceptibility). It should also be clarified whether the biomarker is used as a proxy, and if so, what it is intended to be a proxy for.
ME-3 – A priori hypothesis: if one or more biomarkers are used as proxy measures, state the a priori hypothesis on the expected values of the biomarker(s)
When stating the objective(s) of a study according to the STROBE guidelines [16], it might be helpful to state explicitly the a priori hypothesis on the expected values of the biomarker(s).
ME-4 – Describe the special study designs for molecular epidemiology (in particular nested case–control and case–cohort) and how they were implemented
Study design details should be reported in the Methods section. For traditional designs such as case–control, cohort and cross-sectional studies, the STROBE recommendations can be followed, with extra care in reporting the biological sample collection integration within study design; for nested case–control and case–cohort studies, selection criteria for cases and controls, sampling frame and matching criteria should be reported with extra care, as they represent a main potential source of bias in these study designs (see Box 2). In addition to matching criteria for individuals, all methods used for selecting or matching biological samples (i.e. by storage time and by batch) should be reported. Also, it is recommended to describe briefly the cohort in which nested studies were implemented, in terms of description of the population, sampling, outcome ascertainment, follow-up period, number of subjects lost to follow-up and primary objective for which the cohort was established.
ME-4•1 – Report on the setting of the biological sample collection; amount of sample; nature of collecting procedures; participant conditions; time between sample collection and relevant clinical or physiological endpoints
An accurate description of the sample collection and shipment is necessary to enable the reader to evaluate potential sources of bias or errors in the biomarker measurement and for ensuring an appropriate reproducibility of the scientific experiment (see Box 3). The following items should be reported: (i) the setting of the biological sample collection (place, time of the day, time of the year, laboratories involved, personnel involved, etc.); (ii) amount/volume/size of sample(s); (iii) nature of the collecting procedure (anticoagulant involved, e.g. heparin, EDTA) (iv) if the participant is healthy, participant condition at the sample collection (fasting status, position, etc.) when appropriate; (v) if participants are not healthy individuals in stable physiological conditions, then report the relevant aspects of the health status and clinical conditions of the participants [24],[25]; (vi) in all instances, consider reporting the time between sample collection and relevant clinical or physiological endpoints that might have affected the characteristics or concentrations of the biomarker [26]. In particular, report any relevant characteristic of the participants, which might influence the biomarker levels in any known or unknown way. For example, position of the study subjects, such as orthostatism decreases plasma volume, so that proteins and cholesterol levels can be lowered by 5–15% relative to the supine position.
Detailed information on all critical steps that might have altered the biological samples or influenced the final biomarker measurement should be identified and reported accordingly in the Methods section.
ME-4•2 – Describe sample processing (centrifugation, timing, additives, etc.)
A comprehensive description of all steps of sample processing is needed in the Methods section to assess experimental reproducibility. This description ranges from manual handling of samples to specific machinery used for laboratory processing (see Box 3). When a well-established technique is used, the main process can be referred to by quoting the article where the technique is described and any variation from the initially described laboratory technique should be explicitly stated.
ME-4•3 – Describe sample storage until biomarker analysis (storage, thawing, manipulation, etc.)
Particularly in nested case–control and case–cohort studies, biomarkers can be measured in biological samples stored for extended durations; sometimes, samples may have already undergone freeze–thaw cycles. As these processes can partially alter the biomarker values under examination, it is important to report in the Methods section any manipulation that the biological samples may have undergone, together with a detailed description of how the samples were stored.
ME-4•4 – Report the half-life of the biomarker and chemical and physical characteristics (e.g. solubility)
For new biomarker(s) only, some basic biochemical information relevant to the interpretation of the measured values should be reported in the Methods section. This includes biochemical and biophysical characteristics that might be relevant when interpreting the results, such as half-life, solubility or lipophilicity.
ME-6 – Report any habit, clinical condition, physiological factor, or working or living condition that might affect the characteristics or concentrations of the biomarker
Report any relevant characteristic of the participants, which might influence the biomarker levels in any known or unknown way [24]. For example, exposure to air pollution [27] or seasonality [28] might influence DNA adduct levels in healthy subjects; similarly, type of diet [29],[30] or amount of sunlight exposure [28],[31] might influence DNA damage biomarkers in healthy subjects.
ME-8 – Laboratory methods: report type of assay used, detection limit, quantity of biological sample used, outliers, timing in the assay procedures (when applicable) and calibration procedures or any standard used
The methods used in the laboratory for biomarker analyses should be described in detail in a dedicated section of the Methods. Particular care should be taken to describe new or modified techniques, while for a well-established technique, the main process can be referred to by quoting the article where the technique is described, and any variation from the initially described laboratory technique should be explicitly stated. Any calibration procedures or external standards used in the laboratory (or for comparing data coming from different laboratories) should also be described. The definition of ‘outlier’ should be clearly given (for example, whether it is based on pathophysiological, technical or statistical grounds).
ME-12 – Describe how biomarkers were introduced into statistical models
Usually, statistical methods that apply to biomarkers do not differ from those used in other branches of epidemiology and clinical research. Here, we mainly refer to specificities of biomarker research. When continuous variables are used (a very common occurrence for biomarkers), testing for linearity may be useful when the marker is used as a covariate, in addition to checking other statistical model assumptions when it is used as an outcome. Statistical manipulation of a variable derived from biomarker measurement values should be described in detail as for other variables included in the statistical models. Whether the variable is introduced as a continuous or categorical variable (and if categorical what criterion has been used for identifying cut-off points); whether extreme values have been excluded, and with which criteria; whether the original variable has been log transformed or manipulated in any other way; whether crude measurements or corrected/adjusted values (e.g. ratios to binding hormones and creatinine-adjusted values) were analysed; and how samples with nondetectable biomarker levels were dealt with (e.g. considered as zero, as the detection limit, as half of that level or imputed) should be clearly stated.
ME-12•1 – Report on the validity and reliability of measurement of the biomarker(s) coming from the literature and any internal or external validation used in the study
Validity and reliability of biomarker(s) measurement should be reported when every specific biomarker is introduced (see Box 4). Measurement error has several components, and there is ambiguity on the use of the term, because ‘error’ encompasses both true ‘variations’ and ‘mistakes’. ‘Analytical’ measurement errors originate from the laboratory technique(s), including between-batch variation, while other sources of ‘pre-analytical error’ include variations in the individuals or the samples that are investigated [1]. Ideally, the inter-individual, intra-individual and inter-laboratory variations should be reported for each biomarker to enable the reader to understand the potential source of error for each specific biomarker. Literature-based reliability estimates should be properly referenced. When these figures are not available from the literature, this should also be stated. If aspects of the validity and reliability have been determined as part of the current study, the methods and process should be briefly stated. When a specific laboratory procedure or method for biomarker measurement has been standardized across laboratories for facilitating the comparability, this should be clearly stated [32],[33].
Besides validity and reliability of biomarker measurements, it is increasingly recognized that the study results are likely to be more credible when they have been reproduced by some additional validation process, either internally (e.g. by cross-validation) or preferably with external independent validation in samples that are totally different from those where the biomarker was first tested [35]. All attempts at internal and external validation carried out by the authors should be reported in detail in the Methods section, and the respective results should be shown in the Results section.
ME-13 – Give reasons for the loss of biological samples at each stage
Loss of specimens, non-evaluable samples (because of poor quality) or assay failures are common occurrences. When some samples are not included in the final analysis because of problems in sample quality, quantity, availability, timing of sample collection or technical failure give detailed reasons. This will help in tracking the final sample size and the reasons for sample exclusions.
ME-14•1 – Give the distribution of the biomarker measurement (including mean, median, range and variance)
An appropriate description of the biomarker measurement distribution is of help for interpreting results and for comparing similar biomarker measurements by other scientists. It also often facilitates the biological interpretation of the results. A graph of the full distribution may be useful (when relevant, also by exposure status or case/control status).
ME-19 – Describe main limitations in laboratory procedures
Potential and actual limitations met in laboratory procedures should be described in detail in the Discussion. It may be helpful also to report whether the limitation would likely have introduced a random or systematic error and, if systematic, to suggest in which direction this might have biased the results. Validation of results of biomarker studies is of major importance, and the discussion should address whether any validation procedure was used in the study [36].
ME-20 – Give an interpretation of results in terms of a priori biological plausibility. Results should be interpreted in the light of the mechanism(s)
Of action of the biomarker(s) and of the a priori hypothesis, thus offering a biologically plausible interpretation. It may be useful to stress the added value of the biomarker(s) in explicating the biological mechanism underlying the association reported.
ME-22•1 – Describe informed consent and approval from ethical committee(s). Specify whether samples were anonymous, anonymized or identifiable
Molecular epidemiology poses special ethical issues that are summarized in Box 5.
Box 5. Ethical ConsiderationsLegal issues related to the use of stored human biological material are contained in a European guideline issued by the Council of Europe (http://www.coe.int). In the United States, a useful website is http://nih.gov/sigs/bioethics. When incorporating biospecimen-derived measurements, the following requirements should be met: follow respectful protocols in eliciting information; avoid harm to participants; secure proper informed consent, manage anonymization of interlinking databases; establish confidentiality and security safeguards; develop proper responses to requests for personal data by various parties; devise sound data access, ownership and intellectual property policies; be clear about whether and how individuals will be informed of findings that might be medically helpful for them; and arrange supervision by research ethics and privacy protection bodies [53]. Clearly, each of these requirements would need extensive comments. In particular, how ‘broad’ should the consent be? On the one hand, a broad consent (e.g. ‘the biological samples will be used for the identification of gene variants that may predispose to chronic diseases’) implies a greater freedom of the researcher, who is not obliged to collect further consent forms each time a new gene is investigated. On the other hand, such a generic informed consent form explains very little to the recruitees. The concept of informed consent was initially formulated in the Declaration of Helsinki in 1964, with the latest revision in 2000 (http://www.wma.net). Recent developments in molecular epidemiology tend to overcome the conflict between ‘broad’ and ‘narrow’ consent forms, introducing the idea of a ‘two-level consent’, i.e. a relatively broad procedure at first, followed by a more specific and detailed approach when studies on single genes/biomarkers are conducted. For example, there is a broad agreement that low-penetrant variants that are common in the general population and are associated with a slight increase in the risk (interacting with environmental exposures) should not be subject to strict rules as far as ethical implications are concerned. In fact, knowledge of presence or absence of a single allele involved in metabolic pathways neither allows the carrier to modify her/his risk profile substantially nor allows the researcher to identify other members of the family, which would violate confidentiality. The case of highly penetrant gene variants is different: e.g., the identification of the carrier of a rare mutation allows the researchers to identify other family members possibly affected, with potential detrimental effects (e.g. on insurance policies). The same reasoning applies to biomarkers. The majority of biomarkers used in observational epidemiological research are of little utility to the subjects participating in the research, when taken alone. This is particularly true for the biomarkers of exposure, but also some biomarkers of early biological change/effect may not be meaningful when extrapolated from the research context; for example, DNA adduct level is difficult to interpret at a personal level. Researchers should have a clear view of the practical implications of testing for the study subjects, and in particular what to do in each of these situations: when no effective treatment is possible; when treatment is available with close balance of favourable/unfavourable effects; and effective treatment is available with scarce unfavourable effects. Similar considerations apply to biomarkers, which can be weakly or strongly associated with diseases and less or more associated with family history. Anonymization of information is another difficult issue. First, there is a problem of definitions: ‘identifiable’ is a sample with name or social security number on it; ‘coded’ is a sample with a code that allows relatively easy identification of the person; ‘encrypted’ is a sample with a code that does not allow easy identification of the person, but this is possible with extra effort; finally, ‘anonymous’ is a sample for which there is no possibility of linking to a person. Clearly, a really anonymous collection of samples is of very little use for epidemiological research, which is based on follow-up and linkage of laboratory data and health-related data. |
Discussion
Transparent reporting is essential in epidemiology as in science in general, and in molecular epidemiology in particular. Given that the use of biomarkers has raised great expectations in terms of potential elucidation of disease aetiology and pathogenesis, it is important to raise awareness on the intrinsic limitations of biomarker measurements. In particular, measurement error is a common problem and can cause both false-negative and false positive results [9]. Also, the lack of a formal study design may substantially impair the interpretation of the results, and selective reporting of results can be detrimental.
The present STROBE-ME checklist should strengthen primarily the reporting and interpretability of molecular epidemiology studies, if used widely and systematically. It has been developed based on two strong foundations: (i) the well-established STROBE collaboration and the related statement and (ii) an ECNIS working group formed by epidemiologists, biostatisticians and laboratory scientists with extensive experience in the field of molecular epidemiology and biomarker analyses.
We hope that these guidelines will improve the quality of reporting of molecular epidemiology and other biomarker based research, including studies conducted within the growing number of biobanks and of biomonitoring projects.
The ethical duty of researchers includes reporting findings with accuracy, completeness and transparency, and in sufficient detail to allow the scientific community to consider them adequately, assess their strengths and weaknesses and make fair comparisons. Well-reported published studies can contribute to and be summarized with an evidence-based approach in an appropriate manner (i.e. on sound scientific grounds) to arrive at unbiased conclusions that lead to better knowledge and the advancement of citizens' health [37],[38].
Finally, we would like to stress that these recommendations, as the original STROBE statement and other guidelines on reporting research [7],[14],[16],[20], are evolving documents requiring continuous feedback, reassessment and refinement. The STROBE-ME guidelines will be published on the STROBE website (http://www.strobe-statement.org) where a forum for discussion and improvement of the checklist and related material will be available.
Guidance documents should also be appraised for their eventual impact. The EQUATOR initiative [39]–[41] has found that only 17% of the surveyed guideline developers performed a formal evaluation of the impact. We will engage journal editors in attempts to evaluate the impact of the present statement in the long run.
Supporting Information
Table S1
The Strengthening the Reporting of Observational studies in Epidemiology – Molecular Epidemiology (STROBE-ME) Reporting Recommendations: Extended from STROBE statement.
(DOC)