Auteur: Sofie De Broe, Olav ten Bosch, Piet Daas, Gert Buiten, Ben Laevens, Bert Kroese

The need for timely official statistics

the pandemic as a driver for innovation

Over deze publicatie

This paper discusses how Statistics Netherlands managed to respond quickly with a range of new outputs to the outbreak of the covid-19 pandemic and draws lessons for the future.

Introduction

This paper discusses how Statistics Netherlands managed to respond quickly with a range of new outputs to the sudden increase in the need for statistical information following the outbreak of the covid-19 pandemic. It describes the innovation process already in place, as well as the innovations in response to the pandemic. This is followed by a discussion of what made speedy innovation and implementation possible, after which lessons are drawn in order to maintain the ability to react quickly to future policy questions. One important success factor is the combination of new data sources with already existing statistics for calibration.

1. New impetus for statistical innovation

1.1. Innovation in times of a pandemic

In 2020 the world was struck by a pandemic and peoples’ lives were dominated by health concerns, broken routines, reduced social contacts and crisis management. National statistical Institutes (NSIs) are there to provide statistical output and information to allow policy makers and government to develop policy guidelines and decide quickly, for example, on health interventions.  As for many governmental organizations, at Statistics Netherlands this goal became very prominent in view of the covid-19 crisis. Working from home became the new normal and all activities were set up digitally; output based on surveys had to be delivered without the Computer Assisted Personal Interview (CAPI) observation and solely based on Computer Assisted Telephone Interview (CATI) and Computer Assisted Web Interview (CAWI); new output in the form of dashboards was requested to address the urgency for timely output. Mortality and morbidity statistics were delivered faster and more frequent because new procedures came in place and more recent statistical information (using register data and data sources such as public transport data) on mobility, the labour market, the economy and social consequences was produced to address the need for rapid information and trends in the year 2020. Data access to new data sources such as mobile phone data and transport data  (OV chip card data) was  prioritized in almost all EU countries in order to make new outputs possible on contact tracing or the spread of the virus. These developments, accompanied by intensified collaboration with other governmental and private institutes, brainstorming on the additional information that could be available in our in-house data sources, gave a new impetus for innovation not only in timeliness, but also in new and detailed statistical output. 

1.2. Pre-pandemic statistical innovation

During the 1990’s and 2000’s, major innovations were implemented at Statistics Netherlands. Government registers replaced surveys as the primary input in many areas of official statistics. Data collection, editing, processing and dissemination were digitalized and automated to a great extent. Important drivers behind this process were the availability of massive, digital governmental data sources, new technological opportunities, political pressure to decrease the administrative burden on businesses – as well as budget cuts and retrenchments.

In recent years, Statistics Netherlands has put a lot of effort in further stimulating innovation, with new (big) data sources, Data Science and methods such as Machine Learning and Artificial Intelligence as main drivers. A lot was learned by developing innovation as a part of a pipeline process from Proof of Concept (PoC) and experimental (beta) statistics to the implementation of an official statistical product. However, the final step, implementing an experimental product into the official statistical process turned out to be challenging. The corona crisis brought an urgency for output which meant that a lot of the previously perceived barriers to implementation and publication were partially overcome. A lot of  innovative output was developed and published within months – sometimes even weeks. The answers to  what made this possible and which lessons can be drawn from this for future  innovation and improving response to policy issues  are the focus of this paper. Chapter 2 describes the innovation process that has been developed at Statistics Netherlands;  chapter 3 provides some examples of innovative output at Statistics Netherlands following the start of the corona crisis. Chapters  4 and 5 discuss these developments by looking into what made speedy innovation possible and which lessons can be drawn for future innovation. Finally, chapter 6 draws some final conclusions.

2. Innovation at CBS

2.1. Innovation in general at CBS

Statistics Netherlands supports decision-making by providing the public and private sector with reliable, transparent and coherent statistics of undisputed quality. These statistics are also used in scientific research. The information (official statistics) published by Statistics Netherlands covers topics that are relevant to society and government such as economic activity and consumer confidence, safety, health and leisure. Within the Dutch public sector, Statistics Netherlands is the data expert, having amassed 120 years of experience. Its employees have a wide expertise of data sources, ranging from  survey and administrative data sources  to alternative, (big) data for quite some time. Individual Big Data sources usually only tell part of the story, are often owned by private parties and their quality varies widely. There is a need for an independent and trustworthy party who combines these big data sources with other sources such as survey and administrative sources, sets quality standards and adheres to transparency, while safeguarding privacy of individual citizens and companies at all time. To create and maintain this trust as an NSI, simply complying with privacy regulations is not enough. This must be accompanied by open and transparant communication and a diaogue with privacy organizations and society as a whole. 
Statistics Netherlands has a wide range of innovation activities at input, throughput and output level from the improvement of (big) data access with data holders, automatizing and digitalization of processing and tooling, new data infrastructure, a data strategy, an agile way of working to  improving access to systems to make processes more efficient and output more relevant for policy makers. Statistics Netherlands has also set up an Observation Innovation Network where experiments are being done with Apps, sensor data coupled to surveys (smart surveys) and finally several initiatives to make maximal use of administrative data sources.

2.2. Innovation and Big Data at CBS

The launch of the Centre for Big Data Statistics (CBDS) in September 2016  was one of Statistics Netherlands’ innovation initiatives in order to support evidence-based policy with new, detailed or real-time information. The CBDS offers opportunities for developing data science methods and techniques through partnerships with the academic world and the exploitation of new data sources (data scouting). Knowledge, infrastructure and data are brought together by Statistics Netherlands and its partners in order to meet current information needs of society. The CBDS works on socially relevant themes such as economic growth, the energy transition, mobility, the labour market, health, the housing market and safety and cross-border statistics.  The innovation consists of the development of experimental statistics through product development: identifying data sources and new methods and techniques to improve existing official statistics or develop new statistics in order to address policy questions in a more timely or detailed manner. These activities lead to the publication on the innovation website of beta products and working papers on the methodologies used. 

One such example of a new experimental statistic is an improved model to determine the solar energy yield from photovoltaic (PV) systems on a regional (municipality) and daily basis (Laevens et all, 2020). Currently Statistics Netherlands produces yearly, national estimates using a register containing most PV systems in the Netherlands. A growing need for more high resolution data led to an improved method where new, alternative data sources were identified such as high resolution data from satellite images in the form of solar irradiance data and yield data from PV systems, available on an online portal. The combination of these data into a new model, led to new insights in the production of solar energy on the local level. This is useful for local authorities so they can better understand the amounts of energy that are generated in their municipality.
Another example is detecting small innovative companies using text data from their websites (Daas P. and van der Doef S., 2020). Statistics Netherlands sends out surveys to collect information on innovation in companies but these do not include the smaller innovative companies. With this new approach Statistics Netherlands and other NSIs have been able to detect innovation in smaller companies and startups

2.3. The innovation process

The aim was to implement these experimental statistics in the official statistical output so that policy makers could make use of the validated information. As a rule, we start new innovation projects with the development of a Proof of Concept (POC) to demonstrate the capabilities of a new method or data source. Successful POCs can be further developed into experimental statistics called beta products. Beta product development looks at the stability of the data source, validates the method, and tests the requirements for further implementation. An innovation is completed when it has succeeded in converting an experimental statistic into a full-fledged one-time publication or official statistic. However, many barriers have been met that made implementation rather challenging which range from methodological to technical and cultural challenges. An overview of these can be found in (De Broe et all., 2021a). In effect, Statistics Netherlands was not able to implement these new outputs until now except for the two already existing statistics (traffic intensity using traffic loop data and Consumer Price Index using the scanner data). In order to allow a process for implementation, the division of research and development has designed an innovation pipeline model that would facilitate and coordinate the process from Proof of Concept to beta publication to official statistic. A short description of the innovation pipeline is below. 

An innovation begins in the idea phase. Ideas can be intended to replace regular processes in the long run, but also create new products/statistics. Important criteria for deciding whether an exploration is worthwhile are of course whether there are sponsors (but is not a precondition), ideas fit within the Statistics Netherlands Act (Article 3), that there is adequate staffing from within the different statistical divisions and that the new output addresses an information  need among users and policy makers. In the exploration phase possibilities of obtaining grants for the ideas (POC) and possibilities for new output with the data and/or the methods are investigated. The end result of this phase should be a working prototype and an insight into what still needs to be done to bring the product to a final result. At the end of the exploration phase it is determined whether these preconditions have been met and whether there is sufficient reason to proceed to the product development phase. 

In the product development phase the prototype is further developed, methodological issues are addressed, the quality, stability etc. of the data is examined and the results are validated. At the end of this phase there should not be any fundamental issues that stand in the way of implementation. An important part of the implementation phase is to ensure that employees are adequately trained and IT infrastructure is at hand to deal with the production of the (new) output. In order to emphasize the importance of innovation for  Statistics Netherlands, CBDS was positioned as an incubator for the exploration and product development phase.
An important aspect during the entire innovation process is transparency: at any time it must be clear which users  could benefit from the planned innovation. Benefits may be new information for policy makers and the population, but also efficiency gains for NSIs and Eurostat, lower burden for data suppliers and respondents by shortening or abolishing questionnaires.  Especially because innovations may also fail, it is important to make the hypothetical value and business case explicit. This also requires continuous contacts between the researchers and the beneficiaries involved and to assess the viability of the innovation at regular time intervals. 

3. The covid-19 effect: reducing uncertainty with dedicated output

After weeks of increasing infections due to covid-19 , the Dutch government announced on 12 March 2020 that the Netherlands would go into a lockdown. A real sense of urgency arose when frightening images emerged of Italian hospitals at breaking point due to a high number of covid patients. This unprecedented situation caused an immediate feeling of great uncertainty about what this meant for Dutch society and the economy. Statistics Netherlands tried to contribute to reducing the uncertainty through the  introduction of new types of output dedicated to the covid-19 crisis, mainly disseminated via its own website in the form of dashboards that described important aspects of the covid-19 crisis. In part, these were based on combining output from existing statistics based on register data around themes that had become relevant because of the crisis (see the website). Other parts of these dashboards were produced by process or product innovation, such as the introduction of new breakdowns and aggregates, speeding up production and release, increasing the frequency, accessing new data sources and the development of new kinds of output.

3.1. Dashboards

Dashboards were an important way to disseminate the data. A first important example described the medical consequences of the epidemic in the Netherlands, including the number of deaths, sickness leave at work, the effect on life expectancy and the pressure on medical care. Another one showed the social consequences of the crisis, including data on criminality, overnight stays in hotels and other lodging facilities and the development of the number of asylum requests by refugees.  Other dashboards focused on the economic effectschanges in mobility patterns, consequences on employment and income, the development of government finances and regional differences.

3.2. Increased timeliness and frequency

Existing indicators that appeared in the dashboard were published in a more timely manner such as the monthly figures on retail sales, which were published two weeks earlier. In other cases, the timeliness and frequency of statistics was increased. The monthly figures on mortality were published on a weekly basis, after a quick adaptation of the production process. Also, mortality figures were linked to other administrative data such as benefits for long-term medical care. Furthermore, comparing mortality figures with figures of the past years, estimates were published on excess mortality as an indication for the effect of the corona pandemic broken down by age groups. For these figures Statistics Netherlands worked closely with The National Institute for Public Health and Environment (RIVM). Another example is the introduction of weekly instead of monthly figures on firms’ bankruptcies. This was made possible by an adaptation of the statistical production process. 

3.3. New outputs

In some cases, new outputs were made possible due to access to new data sources. One example is the publication of weekly data on various types of payment transactions, together with an external party (the Dutch payments association ‘Betaalvereniging Nederland’). A second example consists of weekly figures on check-ins in public transportation, together with Translink. The same data sources also allowed analysis of changes in the use of transportation during the day. These innovations were possible by using data from digital production systems from other companies and institutions. These production systems basically operate in real-time and allow high frequent and timely statistical results. A third example is the help Statistics Netherlands could offer in sewage analysis in the covid crisis together with RIVM. Because of covid-19, sewage treatment plants in The Netherlands test for RNA traces of covid-19 at least once a week. By combining the results with demographic data virus spread based on sewage data could be published on a more the local level.

Also a number of potentially new outputs were investigated using for example social media data. This resulted in (online) brainstorm sessions of statistical and Big Data experts which aimed to identify data sources with timely available data that had the potential to fill the need for indicators on new phenomena. After identifying these sources, which often included social media and web data, short exploratory studies were performed to determine to what extent these sources were able to cover the information demand. Successful examples of these studies are: social media to detect users with corona-related symptoms (De Broe et. all., 2021b) and social media and web posts to detect changes in the attitude(s) towards vaccination (work in progress). However, not every idea succeeded. An example of the latter was a study with the aim to predict the potential effect of the corona crisis on the birth rate in the Netherlands. In this study, social media and discussions on webfora were scraped to determine if and how often people (usually woman) posted that they were pregnant, including the number of weeks, and the estimated date of birth. Here, it was found that limited data was available and that the trend of posting this information online decreased over time during the years for which the data was available (from 2013 onwards). Because of this, no reliable indicator could be developed for this phenomena. One idea which remains to be investigated is the sale of folic acid in scanner data as an indicator of pregnancy and early indicator of births. 

4. What made speedy innovation and implementation possible?

Now that we are a year in crisis it is a good time to look back and reflect on the circumstances that caused an acceleration in development of new or partly new indicators and their publications. What were the circumstances that made this possible? Without claiming to be exhaustive we present some of the causes.

Sense of urgency: Crisis creates a sense of urgency at all levels. It was without any doubt that within the statistical office and governmental bodies people felt a common need to do everything possible to help combat the pandemic with everything available from our statistical toolkit. Management quickly prioritized resources in favour of covid-related projects, statisticians worked hard (from home) to speed up their normal work and have shown to be creative in thinking about new, helpful, products from their domain expertise and - not to forget- IT quickly improved the (already existing) home working and virtual meeting facilities to let it all happen. The spirit, attitude and desire  to help was without any doubt one of the enablers of the speedup.

Existing regular and close contacts with users and policy makers: contacts with e.g. Ministries allowed us to quickly gain insight in what kind of information was required during the crisis. In other words: a swift articulation of the information demand as a trigger for innovation.

Timeliness versus quality: Traditionally official statistics has a major focus on delivering good quality indicators. Traditional indicators form the basis for long-running policy decisions and guidelines. In a pandemic the focus is obviously more on timeliness. Results are to be taken as ‘the best possible image of the current situation’ instead of the ‘final truth’. The acceptance of choices in the trade-off between timeliness and quality or preciseness was in our view an enabling factor. This was obvious in the publication of the mortality figures: in the beginning of the pandemic testing was low and the tracing of the cause of death difficult. Statistics Netherlands receives the cause of deaths information with a 6 months delay. However total mortality figures were published within one week and excess mortality appeared to be entirely explained by covid-19 and were therefore a reliable figure on covid-19 related mortality. 

Time to market: New ideas are tested in a Proof of Concept (PoC). Since time to market is crucial in a pandemic, valuable PoC results are to be disseminated as soon as possible. Where in normal times a prototype would be developed and brought into production, it was possible to change publish a quickly developed first PoC (draft) result, followed by the production-ready publication later. Of course the actual decision to do so depends on the specifics of the situation at hand.

Data partners: The sense of urgency was not only present within the statistical office, there was more willingness to deliver data among some of the data partners. Long running negotiations to explore big data sources such as transaction data resulted in an agreement where all partners could show their added value to quickly deliver fast crisis-related economic indicators.

Previous capacity building: Besides all of the above factors, the flexibility to quickly create new output would not have been there, were it not that  the organization  explored new data sources and new techniques during the past few years. The new way of working, the data scouting workforce and the data science knowledge that were built up in the pre-pandemic years proved to be a valuable and flexible factor in the crisis work.

Human resources:  Statistics Netherlands has been able to attract and keep the right people for the task Statistics Netherlands is set out to do: providing reliable, transparent and timely information for policy makers. The recruitment of new people with the right skills and the retaining of competent staff has been a great contribution to the success of dealing with the pandemic.

Europe: International co-operation is a long-running process. Statistics Netherlands traditionally participates in many international projects, task forces and working groups. This network of international statisticians can be used to quickly share knowledge and experience on successes and challenges in the development of new, pandemic-supporting, indicators. It is also a way to reflect on bottlenecks that might be experienced differently in different countries, such as for example using mobile phone data for indicators on mobility or the use of certain types of microdata.

5. Lessons for future response to policy questions?

We can definitely draw some conclusion for the future in terms of what NSIs have learned from the covid-19 crisis. 

Urgency: even though not all policy questions are as urgent as covid-19, to some extent urgency also arises with policy questions such as the energy transition, climate and bio-diversity, labour market skills to avoid unemployment, elderly care in an ageing population. The crisis has shown that NSIs can react quickly, on the ball, and act accordingly to urgent policy questions. It is therefore crucial to keep in close communication with local and national policy makers to address their urgent information needs. 

Close interaction with users and policy makers: the need for close interaction with users and policy makers has been highlighted. The crisis also showed a great need for transparency and clarity around the figures, but also a need for the protection of privacy when it came to tracing the virus or contact behavior. The Netherlands was one of the countries  where access to mobile phone data was deemed as too privacy sensitive. A close dialogue with the public has proven crucial for NSIs to remain a trusted partner; the discussions around mobile phone data has shown that this dialogue needs to be continuous.

PoC results: official statistics adhere via the Code of Practice (European Statistical System Committee (2017) to high quality standards which are necessary for trustworthy policy making. However, high standards often (but not always) seem to be incompatible with quick (and often associated with dirty) output.   However, if circumstances ask for up to date preliminary output before final results are released and transparency is given in term of the process and analysis of the data, NSIs should not hesitate to publish even first (PoC) results. 

Information extraction: the crisis has also shown that more relevant information to address complex issues is obtained through the linkage of data sources already in-house, by publishing at different aggregate levels, different frequencies and for different categories and localities. 

Close contact with data holders: the urgency of the crisis made data holders more willing to share data and partners more open for collaboration. It is therefore of utmost importance to sustain these good working collaborations for data access, future publishing of complex policy issues which often revolve around mapping in a timely manner and at the lowest possible (privacy protected) statistical or regional level, offer and demand of skills on the labour market, economic activities, elderly care and energy. 

Ecosystems: In order to address complex policy questions, NSI will need to work even more with other public and private partners and ministries who have other data sources, knowledge and expertise; expert knowledge is necessary to provide contextual information. NSIs can learn from one another, e.g. Belgium has handed in a request for a changing the law in order to publish mortality figures more rapidly. As a non-commercial partner, NSI should be able to profile themselves better than now as a trusted partner. 

Question output and processes: the crisis has also shown that existing processes which were never questioned could be made more efficient by skipping steps or even changing laws. Similarly, different one-off official output was suddenly possible. A sustained flexibility in output would be a valuable attitude for NSIs to have. However, NSIs have an obligation to produce statistical output pre-determined by the European Commission. It should certainly be a role of the NSIs to address the need to adjust the statistical programme in terms of the changed and more complex information needs of the society. 

The digital workplace: not all organizations are able to work predominantly digital but Statistics Netherlands has shown to be very effective in continuing to produce statistics almost entirely working from home. Statistics Netherlands has also acquired a lot of expertise in setting up online conferences. This offers new opportunities and a new work life balance for the future with a lower ecological footprint for future generations.

Train quickly: obtaining the required skills for innovation remains a priority. Statistics Netherlands has organized courses offered through the Academia as online courses and has been very successfully trained data scientists from the Ministry of the Interior. Similarly it has continued to train own personnel in short courses online, a very efficient way of  offering training.  

6. Conclusion

If NSIs want to continue playing an important role in providing policy relevant information for society they will have to be flexible, remain transparent in terms of their output, address policy questions in a timely manner, collaborate with an ecosystem of public and private national and international partners to create the added value of data and use state of the art technologies. Other private parties such as Google clearly dominate in the information they provide on mobility and transport and NSIs have not been able to deliver added value in terms of information when it comes to these themes. 

The crisis has also highlighted continued challenges in terms of data access, data processing and IT infrastructures. NSIs still struggle with data access; a lot more output would have been possible if high value data sets were to be available from privately own data sources.  Similarly, IT processes are not yet in place to automate statistical output using not only in-house data but also newly available data.  A continuous investment in data acquisition and processing, the public image of the trustworthiness of an NSI, state of the art IT infrastructures and a more flexible statistical output programme are some of the challenges for the future. Finally, NSI should look for different approaches to collect data. A lot more sensors are currently on the market that can measure environmental or personal health information or economic activities more objectively than that surveys do. Citizens are through the General Data Protection Regulation (GDPR) the owners of their data which offers a lot of potential for data altruism (the sharing of own data that is collected by companies with public  instances such as an NSI for public good).

When an NSI wants to produce a completely new statistic, the covid-crisis has taught us that the most successful approach is to use a new, readily available, data source that provides the necessary information and calibrate this with an already existing (traditional) statistic that measures the same or a very similar concept. The latter is needed as we found that creating a new statistic from scratch is nearly impossible in a limited time-frame. For new data, it just takes a lot of effort and time to understand the way the data is generated and the kind of errors it contains.  

References

Daas, P.J.H., van der Doef, S. (2020) Detecting Innovative Companies via their Website. Statistical Journal of IAOS 36(4), pp. 1239-1251, doi/10.3233/SJI-200627. 

De Broe S., Struijs P., Daas P., van Delden A., Burger J., van den Brakel J., ten Bosch O., Zeelenberg K., Ypma W. (2021a). Updating the Paradigm of Official Statistics: New Quality Criteria for Integrating New Data and Methods in Official Statistics. Statistical Journal of IAOS, accepted for publication, doi/ 10.3233/SJI-200711.

De Broe S., ten Bosch O., Puts M., Koren W., Henning F., Bakker J. (2021b) VEO: how official statistics can help preventing emerging infectious diseases,  NTTS conference paper.

Laevens, B.P.M, ten Bosch, O., Pijpers, F., van Sark, W.G.J.H.M., (2020) Observational daily and regional photovoltaic solar energy production for the Netherlands. pre-print available. 

EU Statistical System Committee, (2017) EUROPEAN STATISTICS CODE OF PRACTICE For the National Statistical Authorities and Eurostat