The need for data standardisation

In this section we point out the demand for data standardisation and the current information gaps.

Data standardisation is the process of bringing data into a common format that allows for collaborative research, large-scale analytics, and sharing of sophisticated tools and methodologies. We feel that this area is lacking at the moment. This was highlighted in this project as it was difficult to compare the information captured between regions, in this case Netherlands and South Africa. This is due to many factors, most notably the difference between the developed and developing environments and the difference methods used for standardisation in the data capture.

In the context of this project, we found that using the cycle threshold (Ct) values for the measurements was the best way to compare datasets across regions, although these comparisons will have a very low accuracy due to the different methodologies between the countries; in South Africa sampling was often done in rivers and water bodies, while in the Netherlands, sampling was all done within a sewered network. The methods used in South Africa also did not capture the same level of detail as the Dutch study. South Africa often made use of grab samples for sites where composite samplers were not available, for example at the remote wastewater treatment works and for river and stream samples. CrAssphage analyses in South Africa was only initiated toward the latter part of the study due to supply delivery delays, whereas KWR used weekly continuous measurements spread over a 24h sampling period including flow and CrAssphage data.

Some basic data standardisation being set on all similar projects, while still allowing for additional data to be included, would go a long way in making the data more accessible and comparable. More than just data being standardised, would be standardising the process for the data capture and the locations at which this kind of capture could occur.

The expected result is also very important to consider, i.e. what is the end goal for each specific project? The goal impacts the sampling frequencies, the location of sampling sites and the required sensitivity of the laboratory testing methods. For example, if our goal was to capture information for the whole of South Africa to get a general view of the trend of infections for each province, hypothetically we could define 3 representative sites in each province with specific demographics (high concentration of elderly). We would decide to use passive sampling running from Monday to Friday and collect the samples on Saturday at 10am weekly. We would also include flow testing to understand how much additional (storm) water infiltrates the system at the time. Finally, analysis methods would focus on Ct values without expensive e.g. CrAssphage testing. This information could be used as a general indicator for the trend in infections.

Even if the standardisation is broad and has allowances built in, this would be a huge improvement going forward. We do understand that this is a huge undertaking and would require a large network of willing participants to create a basic process.

However, the process is promising, since Dutch data has shown that the wastewater treatment plant (WWTP) data is a reliable source to monitor the number of infections in an area, comparable to a free and accessible COVID testing program (GGD test site data). The Dutch nationwide WWTP measurement program, run by RIVM, encompasses all Dutch WWTP plants. Every 2-10 days a sample is taken at each site with increasing sampling frequency and number of sample sites over time, since the start of the outbreak in March 2020. A nationwide comparison of WWTP measurements against infections for Amsterdam shows a good trend over time since the summer of 2020 (see figure below).

RIVM's Nationwide WWTP measurements for Amsterdam compared to Daily new reported cases (infections)

WWTP servicing areas do not overlap with municipality borders: municipalities use several WWTPs and a single WWTP can be connected to sewage systems from several municipalities. The RIVM WWTP data was therefore reassessed/recalculated to match municipality borders, taking the population weighted sum of all WWTPs connected to inhabitants of a municipality.