intestazione

Back Data Quality Checks Process

EUROCARE-5 DATA QUALITY CHECK PROCESS

The EUROCARE-5 database

All incidence and follow up data files arrived by June 2011 were checked and uploaded in the EUROCARE-5 database. Data files were transmitted through the IARC gateway or via a direct upload on the EUROCARE web portal or through both means. The database includes information on more than 20 million cancer cases diagnosed from 1978 to 2007 in 30 European countries

Data standardization

Previous to be imported in the management system, data files were standardised in order to have the same record structure for all the registries. For example, the field life status had to be derived from three fields included in the IARC protocol (base of diagnosis, life status, autopsy). The information on the dates’ day was imputed for registries not supplying it and was computed for those providing follow-up duration in days (see Appendix 1 for details). Topography and morphology had to be converted to ICDO-3 coding system for the very few registries providing data according to ICD-9, ICD-10 or ICDO-2 systems.

Data checks

Automated procedures checked data fields in each case record, including benign and in-situ cases, and those diagnosed before the current study period (2000-2007). Table 1 shows the full list of errors and warnings generated by the EUROCARE-5 check procedures. A specific error code is associated to each type of error/warning.

The consistency of each field was checked by comparison with the range of validity stated in the EUROCARE-5 protocol. Topography and morphology codes were checked against ICD-O-3 lists. Out of range values are considered as errors.

The consistency of combinations of two or more fields was also checked and concerned:

  • consistency between dates of birth, diagnosis and follow-up
  • consistency of site-morphology combinations. The standard IARC criteria, as described in IARC Technical report n. 42, were applied first, followed by additional EUROCARE criteria
  • consistency of age-site, age-morphology, sex-site, and sex-morphology combinations. Unlikely combinations were checked against IARC criteria
  • consistency of morphology-behaviour combinations. Combinations not listed in ICD-O-3 classification were flagged as unlikely
  • consistency of stage information (EOD and TNM, EOD and condensed TNM, TNM and condensed TNM, site and EOD, behaviour and stage, number of metastatic nodes and stage)

Most inconsistencies among fields combinations are classified as warnings, which means possible but not certain error. Inconsistencies on stage information were in general classified as errors (see Table 1).

Registries’ revision of errors and inconsistencies

The records flagged by the data checking process are sent back to the registries for their further revision and correction. Revision of individual records has not been required for all records or warnings, depending on several criteria such as the presence of microscopic verification (non MV are more frequently sent), behaviour (non malignant are in general not sent), age (a few number of childhood cancers are also accepted for adolescents and young adults), and the frequency a given problem is met in a registry file. Furthermore, whenever possible corrections were recovered from the revisions already provided for the EUROCARE-4 run.


Eurocare

EUROCARE-5