Sparse Data Problem

What we call the sparse data problem is an issue found in a large number of databases and historian products, except for those specifically built for the industrial and process market.

This problem is seen when a database or other recording system is capturing sparse data – data that is spread out over time, either because of resolution or simply because not many changes are happening.

Many sources of data can’t ‘read between the lines’ to tell you what the values were at a specific time.

An Example

Let’s take a look at an example.

We’re recording a number of measurements from a piece of equipment. These measurements are being captured every 60 seconds.

If we have a recording at 10:00 and one at 10:01, what happens when we ask for data at 10:00:30?

Historical sources that have solved the sparse data problem will give you values – either by interpolating, forward or back-filling data as required.

Many others will quite literally give you nothing. This means ARDIs drivers will have to make special, customised and sometimes inefficient queries to fill in the gaps, resulting in slower-than-ideal query times.

Products

Most relational databases like MSSQL, MySQL, PostGRESQL etc. suffer from this problem. They are also very poor with high-resolution time-series data and should be avoided where possible.

Industrial historians such as Aveva PI, IBA, IP21 and eDNA all work well with sparse data.

The super-fast open-source time-series historian InfluxDB does not do well with sparse data, even though it’s a long-standing feature request. But the alternative Prometheus database works well.