intestazione

Back Data Quality Checks Process

The EUROCARE-6 database
Data on all invasive, primary, malignant cancers (excluding skin cancers different from melanomas), benign tumours of the central nervous system and non-invasive tumours of the urinary bladder, diagnosed from 1978 to 2013 and followed up to December 31, 2014 were sent from 117 cancer registries of 31 European countries for the EUROCARE-6 study. 

Data were first uploaded in the ENCR-JRC portal and then sent to the EUROCARE Data Analysis Team. All incidence files (entailing information on last life status ascertainment), life-tables and resident population were checked and uploaded in the EUROCARE-6 database.  

Data standardization 
Before be imported in the management system, data files were standardised in order to have the same record structure for all the registries.  

Conversion and standardization procedures have regarded: 

  • imputation missing dates: days, months (birth, incidence, follow-up) using specific algorithms (see Appendix 1 for details)
  • imputation of the third digit (ICDO-3) of topography in cases where sub-site code is unique (Ex. Prostate C61à C61.9, Kidney C64 à C64.9)  
  • Stage standardisation: TNM information was standardised with specific trans-coding tables based on the different editions of TNM classifications. Stage grouping and Ann Arbor information was made uniform                

Data checks 
Data quality checks were applied to the raw individual incidence records to verify the compliance to the study protocol requirements and to identify systematic or sparse errors.  

Quality controls were harmonized with ENCR-JRC data quality checks (Martos C, 2018 A proposal on cancer data quality checks-one common procedure for European cancer registries 1.1) and were integrated with the specific criteria foreseen for the analysis of survival data developed by EUROCARE. All single incidence data records (including 56 distinct variables) were processed with automated checking procedures. Data quality was evaluated in three different phases: 

  • Phase I including checks evaluating the formal adherence to the data collection protocol and assessing the global validity of incidence data files. Checks consist in verifying the valid range of values and the format of each variable (vertical checks) 
  • Phase II including checks assessing the consistency between different variables in the same patient's record in order to evaluate the individual record validity. These checks allow to identify errors and anomalies regarding either compulsory or optional variables (horizontal checks)

The consistency of combinations of two or more fields was also checked and concerned: 

  • Consistency between dates of birth, diagnosis and follow-up
  • Consistency of site-morphology combinations. The standard IARC criteria, as described in IARC Technical report n. 42, were applied first, followed by additional EUROCARE criteria
  • Consistency of age-site, age-morphology, sex-site, and sex-morphology combinations. Unlikely combinations were checked against IARC criteria  
  • Consistency of morphology-behaviour combinations. Combinations not listed in ICD-O-3 classification were flagged as unlikely  
  • Consistency of stage information (EOD and TNM, EOD and condensed TNM, TNM and condensed TNM, site and EOD, behaviour and stage, number of metastatic nodes and stage)

Most inconsistencies among fields combinations are classified as warnings, which means possible but not certain error. Table 1 shows the full list of errors and warnings generated by the EUROCARE-6 check procedures. 

  • Phase III including preliminary statistical analyses on the overall registries' datasets to derive standard indicators on accuracy and completeness of the provided information, with a special focus on the quality of follow-up data which is crucial for survival and prevalence estimation

Registries’ revision of errors and inconsistencies
Systematic errors in one or more variables required CRs revision and re-submission of the entire dataset. Sometimes issues regarding Phase I (formal adherence to the study protocol) could be solved by simple recoding to comply with data collection protocol requirements. Data standardization was always done in agreement with the CRs. Major errors in some specific patients‘ record (inconsistent or invalid values in the compulsory core variables) were reported to the registries as well for their revision and possible correction of the data sub-set (implying further exchanges).  

At the end of the quality check and data standardization process a validity code was associated to each patient's record to create a cleaned dataset.   


Eurocare

EUROCARE-6