Optimisation of storage space and connections
All separate Dutch data sources where provided in the form of flat files. These flat files were filtered, grouped and cleaned. Next, cleaned data was loaded into a single data model, to which a STAR schema was applied to split data into facts and dimensions. In this way, all date columns, municipality names and WWTP names were mentioned only once in a dimension table and subsequently linked with foreign keys to relevant fact tables. This reduced the storage size and enabled simultaneous data filtering of WWTP measurements, infections and hospitalisations.
Since the zoning and naming of municipalities changes over time, the following occurs: WWTPs are connected to new neighbourhoods and new WWTPs are constructed, the overlap in WWTP servicing areas, and municipality areas change over time. In order to show the relevant WWTP plants in a certain municipality or vice versa for a specific date, a bridge table was made with valid combinations of WWTP, municipality and date. This bridge table linked all data together.
Next, the valid connections within the bridge table were used to resample WWTP data from WWTP areas to municipality areas with Data Analysis Expressions (DAX). DAX is a library of functions and operators that can be combined to build formulas and expressions in Power BI, Analysis Services, and Power Pivot in Excel data models.
The data model allows for regular data updates from the original file lay-outs. The model is uploaded to the cloud as part of a PowerBI dataset.
The interactive content below might not be fully accessible.