Data editing conference: new methods and standards

/ Author: Masja de Ree
Between 24 and 26 April, the United Nations (UN)’ Work Session on Statistical Data Editing was held at the premises of Statistics Netherlands (CBS) in The Hague. Over 60 representatives from 25 different countries discussed new methodology and international standards in statistics production.

Raw data

Data editing is what statisticians do in order to prepare raw data for statistical use. CBS methodologist Sander Scholtus explains: ‘We obtain huge quantities of raw data on a very regular basis, both from surveys and from registers. These cannot be used directly in the production of our statistics. First, we detect and remove errors and add missing data. For example: very often, businesses reporting their turnover forget that they should do so in units of 1,000 as prescribed. These major errors are immediately visible in our figures, so we need to correct them.’

Quality of statistics

While hosting the conference, CBS also delivered presentations on promising new data editing methods. One such method is the correction of data from combined sources. Scholtus: ‘For instance, data from surveys filled out by enterprises combined with register data from the Dutch tax authorities. Although both sources supply information on the same subject, we often see discrepancies. How do we deal with those and how can we utilise the different sources to remove such data errors? In addition, some speakers discussed the influence of data editing on the quality of statistics.’

Exchange of ICT resources

Another important topic on the agenda was standardisation with a view to exchanging ICT resources. ‘Several countries have developed proper software for correction of raw data,’ says Scholtus. ‘International exchange of this software requires the combined effort of ICT experts. That is why over the past couple of years, data editing conferences have increasingly involved ICT experts, and with noticeable impact: the number of tools we can exchange internationally is growing. For instance, there were software demonstrations by Canada, Slovenia and Spain in a dedicated session on tools.’

‘Data editing at CBS is highly advanced, but there are always new things to learn’

International collaboration

Behind the conference is the United Nations Economic Commission for Europe (UNECE). Europe is perceived in quite a broad sense: there were participants from Canada, Israel, Kazakhstan, Mexico and the United States. UNECE is committed to fostering global cooperation among statistical agencies and ensuring that as far as possible, they apply the same definitions and methods. A few decades ago, all raw data were corrected and supplemented manually. Moreover, statisticians would continue until the data contained no more errors at all. ‘That was costly and very time consuming,’ Scholtus says. ‘Therefore, different correction methods have been developed over the past few decades. One can predict which errors are important enough to be filtered out of results, another one can correct them automatically. The UNECE conference contributes to deployment of these methods at almost all statistical institutes across the developed world.’

Intensive talks

The work session on data editing has been taking place every eighteen months since the early 1990s with the various statistical institutes as rotating hosts. This time the organising host was CBS. ‘It was good to meet so many peers from different countries,’ says Scholtus. ‘The nice thing about this series of conferences is that it focuses entirely on data editing, a topic which on the other hand gets little attention at other meetings. Data editing at CBS is highly advanced, but there are always new things to learn.’ A new feature on day 3 of the event were the so-called ‘mini sprints’. ‘There, we were able to conduct intensive talks in small groups about closer international cooperation. This has provided a wealth of practical ideas on further exchange of experiences, methodology and standardised software.’

More details about the results from the UNECE conference at: http://www1.unece.org/stat/platform/display/WSSDE/Work+Session+on+Statistical+Data+Editing+2017