Data protocol for common data assembly and standardisation

A data protocol for common data assembly and standardisation for optimal data analysis processing was established. This was necessary to allow for data comparison among sources. All data sources were filtered for outliers and duplicates.

Infection and hospitalisation data were provided as daily values per municipality. Their raw data was used.

WWTP data was sampled on different days of the week with time intervals of 2 to 14 days in between 2 measurements. At the start of the WWTP measurement campaign, sampling frequency was much lower than later on. Therefore linear interpolation was applied to valid measurements up to 10 days prior to or after a certain date. In this way daily hospitalisation/infection data could be compared to daily interpolated WWTP data directly.

Next WWTP data was interpolated at municipality level by using population weighted averages of all WWTPs which are servicing a part of the inhabitants of a municipality. Municipalities cover several sewer catchment areas and WWTP sewer catchment areas cover multiple municipalities.

Also the number of WWTP sampling sites was a lot less at the start. Now almost all WWTPs are sampled. A correction was added that only WWTP data is shown for municipalities in which at least 50% of inhabitants is represented by a WWTP measurement or a linear interpolated value.

WWTP measurements are difficult to relate to the number of inhabitants they represent and for dilution by rainwater and industrial wastewater. In the nationwide WWTP data from RIVM, flow is used to correct for rainwater, but KWR has tested another approach (as per Activity 2.4 - Analytical method to assess faecal loads in urban water streams) at its two sample sites for KWR WWTP data. Recent research has indicated that CrAssphage is a very relevant marker for faecal contamination in water, more so than is expected from the previously suggested copronastol[1]. In situations where flows cannot be measured, for instance in potentially contaminated receiving waters, determining trends in SARS-CoV-2 levels becomes a complex task as variability, due to dilution cannot be accounted for. In such cases, having a parameter to determine dilution of faecal matter could facilitate comparisons (as gene copies could be normalised by the concentration of CrAssphage in the samples). KWR normalised using CrAssphage for the SARS-CoV-2 data in the Netherlands (only KWR WWTP data, not nationwide WWTP data) and South Africa made use of the CrAssphage normalisation method towards the latter part of the study.

KWR uses the ratio N2/CrAss for the sewage data. N2 is one of the genes used to recognize the corona virus. CrAss (CrAssphage) is a common virus that is estimated to affect half of the world's population. Using the ratio ensures that it is estimated that, among other things, the number of people in an area is normalised.

Note: South Africa uses various other normalisers e.g. extracted RNA concentrations, and pellet size of a fixed volume of wastewater. Nationwide WWTP data in the Netherlands also implements other methodologies based on flow.

  • 1 Takada and Eganhouse, 1998