The term Big Data is increasingly present in the development of software applications and services on different application areas such as health or digital economy. The term is usually associated with technological concerns, related to solutions that manage and physically store big volumes of data. This interpretation has caused a proliferation of isolated Big Data technological solutions, generating a huge data chaos. However, a high quality technological infrastructure is not enough if it lacks the suitable mechanisms to organize and extract value from the stored data. This is not just a big data problem. The data that come from these “data lakes” and that are really relevant to understand and manage a given domain, conform those Smart/Master Data that are much shorter in volume than the original ones. Solutions to move from the Big Data perspective to the pragmatic Small / Master Data view must be provided, in order to interpret and exploit data correctly, while inferring new knowledge to better understand the complex domain where those data come from,
In this context, one area in which we have been involved is in analysing, formalising and solving conceptual and methodological challenges that arise while developing applications and services based on Big Data in industrial environments. Starting from a foundational ontology that describes a complex domain without ambiguities and applying conceptual model-driven software development (MDSD) principles, we propose a conceptual model-driven method for developing Big Data applications. The goal is defining precise and rigorous conceptual models that drive the development of Big Data applications and services in order to provide business value, identifying precisely the “Smart” data that are really relevant to understand and manage a selected complex domain where the Big Data are generated.
We have been engaged in pioneering work in the design and development of a Big Data application for the management of genomic data generated by several organizations. Identifying the Smart Data in this case means to accumulate more and more distilled genome knowledge while the amount of data to be analysed increases continuously due to the advances in DNA sequencing technologies. As more and more data become available, the need for identifying the right conceptual patterns that should guide the process of understanding how the genome works, will open possibilities never before available to humanity. Understanding the genome represents a disruptive civilizational change, where for the first time in history we can explore answers never before considered under our human scope, with potentially profound impact on applications in the domain of health (medicine of precision).