The BETTER project will be identifying up to 36 big data challenges over the course of 3 years, starting with the 9 identified for 2018. Strictly viewed from a data ‘volume’ perspective, some of the identified challenges do not at first glance appear to handle ‘big’ data. On closer inspection however, they nevertheless pose a big data problem. In this blogpost we explain why.
Generally, satellite imagery and other geo-spatial data provide a veritable big data problem. Various satellite missions, of both commercial and non-commercial nature provide terabytes of streaming data consisting of high resolution imagery. These images, some of which are freely made accessible through programs like the EU/ESA's Copernicus, provide the grounds for various change detection methods, a difficulty in itself given the "volume" and the "velocity" of the images being streamed from space. In addition, the combination of the raw images and results of their analysis attain an ever higher value when combined with additional data, especially of a geo-spatial nature. The joint analysis of such a "variety" of data offers an opportunity for all societal challenges (SCs): food and agriculture (SC2), transport (SC4), climate (SC5) and security (SC7) being the obvious primary benefitting areas; although the other areas can have various indirect benefits, e.g., health (SC1) through improved food production, energy (SC3) optimization through improved climate monitoring, and healther, more equal and more prosperous societies (SC6).
The above trio of generic big data problems - volume, velocity and variety; are often complemented by veracity, a fourth ‘V’ emerging as a direct result of the huge increase in data throughput and a frequent inability to properly monitor and ensure that the quality of the data is maintained to a high standard.
However, as explained in the introduction and perhaps a bit counter-intuitively, 'Big Data' does not necessarily have to constitute equally high of volumes, velocity, variety or veracity simultaneously. In reality, any problem whereby the value creation from raw data (through pre-processing, integration, processing or analysis) faces a difficulty with any number of these four Vs is classified as a big data problem. Therefore, one could work with a series of data points to the tune of just a few Megabytes (low volume), but to continuously match and extract patterns in high velocity stream of such 'small' data volumes is still problematic. Similarly, the data might be historic and finite with no new input (zero velocity), but its size requires a big data solution that can simultaneously process all the data to yield any useful results. Variety can also be considered a dark horse in the race between the most offending big data issues, since the joint processing of medium-sized and medium-speed data streams is severely slowed down (or rendered impossible) by an inability to effectively making sense of different kinds of data together. In fact, variety has been identified as one of the most difficult problems hampering the effective generation of value from geo-spatial data applications. Results from
a series of surveys and other stakeholder studies in the BigDataEurope project confirm that variety is perhaps one of the issues faced by analysts working with data from several satellites (optical and radar sensors with different spatial resolution) and other aerial, in situ and collateral data that need to be jointly analyzed to provide new insights and solve societal issues.
Each of the identified BETTER challenges has distinct requirements, and the impact of each of the four Vs above varies. Thus, each of the implemented pipelines will be distinct, targeting the relevant Vs in the required proportions - some more focused on volume, velocity, variety or veracity - based on a carefully designed architectural solution. Each of the challenges identified, and the pipelines designed, will be described on our project channels in time - stay tuned!