Dataspace
A dataspace is an abstraction in data management that aims to overcome some of the problems encountered in a data integration system. A dataspace is defined as a set of "participants", or data sources, and the relations between them: for example that dataset A is a duplicate of dataset B.[1] It can contain all data sources of an organization regardless of their format, physical location, or data model.[1] The data space then provides a unified interface to query data regardless of format, sometimes in a "best-effort" fashion, and ways to further integrate the data when necessary.[1] It is very different than a traditional relational database, which requires that all data be in the same format.[1] The aim of the concept is to reduce the effort required to set up a data integration system by relying on existing matching and mapping generation techniques,[clarification needed] and to improve the system in "pay-as-you-go" fashion as it is used.[2][3] Labor-intensive aspects of data integration are postponed until they are absolutely needed.[4]
Traditionally, data integration and data exchange systems have aimed to offer many of the purported services of dataspace systems. Dataspaces can be viewed as a next step in the evolution of data integration architectures, but are distinct from current data integration systems because they require semantic integration before any services can be provided. Hence, although there is not a single schema to which all the data conforms and the data resides in a multitude of host systems, the data integration system knows the precise relationships between the terms used in each schema. As a result, significant up-front effort is required in order to set up a data integration system.[5]
Dataspaces shift the emphasis to a data co-existence approach providing base functionality over all data sources, regardless of how integrated they are. For example, a DataSpace Support Platform (DSSP) can provide keyword search over all of its data sources, similar to that provided by existing desktop search systems. When more sophisticated operations are required, such as relational-style queries, data mining, or monitoring over certain sources, then additional effort can be applied to more closely integrate those sources in an incremental fashion. Similarly, in terms of traditional database guarantees, initially a dataspace system can only provide weaker guarantees of consistency and durability. As stronger guarantees are desired, more effort can be put into making agreements among the various owners of data sources, and opening up certain interfaces (e.g., for commit protocols).[6][7]
History
According to a cyclic model of technology development, new technologies progress by first going through a phase of design competition, where the technology is explored and experiments are done, until the industry settles upon a dominant design and ceases to iterate so much.[1] As of 2019, Edward describes dataspaces having already undergone a "first wave" of adoption, composed of exploratory and proof-of-concept projects, and have begun a "second wave" in which they are being adapted for more general and less nice use cases.[1]
The European Commission has been working on the development of shared dataspaces for various industries called "Common European Data Spaces" since February 2020.[8] Dataspaces are planned for the agriculture, energy, finance, health, media, manufacturing, mobility, and tourism industries as well as for the European Green Deal, languages, public administration, research and innovation, and skills.[8][9][clarification needed] The first concrete steps taken were a number of research and innovation initiatives funded as part of the European Public-Private Partnership on Big Data Value (Big Data Value PPP).[10]
See also
- Data integration
- Data mapping
- Information integration
- Linked data
- Semantic integration
- Semantic query
References
- ^ a b c d e f Curry, Edward (2020), Curry, Edward (ed.), "Dataspaces: Fundamentals, Principles, and Techniques", Real-time Linked Dataspaces: Enabling Data Ecosystems for Intelligent Systems, Cham: Springer International Publishing, pp. 45–62, doi:10.1007/978-3-030-29665-0_3, ISBN 978-3-030-29665-0
- ^ Belhajjame, K.; Paton, N. W.; Embury, S. M.; Fernandes, A. A. A.; Hedeler, C. (2013). "Incrementally improving dataspaces based on user feedback". Information Systems. 38 (5): 656. CiteSeerX 10.1.1.303.1957. doi:10.1016/j.is.2013.01.006.
- ^ Belhajjame, K.; Paton, N. W.; Embury, S. M.; Fernandes, A. A. A.; Hedeler, C. (2010). "Feedback-based annotation, selection and refinement of schema mappings for dataspaces". Proceedings of the 13th International Conference on Extending Database Technology - EDBT '10. p. 573. CiteSeerX 10.1.1.298.3519. doi:10.1145/1739041.1739110. ISBN 9781605589459.
- ^ Dong, X.; Halevy, A. (2007). "Indexing dataspaces". Proceedings of the 2007 ACM SIGMOD international conference on Management of data - SIGMOD '07. p. 43. doi:10.1145/1247480.1247487. ISBN 9781595936868. S2CID 1184444.
- ^ Howe, B.; Maier, D.; Rayner, N.; Rucker, J. (2008). "Quarrying dataspaces: Schemaless profiling of unfamiliar information sources". 2008 IEEE 24th International Conference on Data Engineering Workshop. p. 270. doi:10.1109/ICDEW.2008.4498331. ISBN 978-1-4244-2161-9. S2CID 14039616.
- ^ Sarma, A. D.; Dong, X. (L.; Halevy, A. Y. (2009). "Data Modeling in Dataspace Support Platforms". Conceptual Modeling: Foundations and Applications. Lecture Notes in Computer Science. Vol. 5600. pp. 122–138. doi:10.1007/978-3-642-02463-4_8. ISBN 978-3-642-02462-7.
- ^ Franklin, M.; Halevy, A.; Maier, D. (2005). "From databases to dataspaces". ACM SIGMOD Record. 34 (4): 27. doi:10.1145/1107499.1107502. S2CID 14092111.
- ^ a b "Shaping Europe's digital future: Common European Data Spaces". European Commission. Retrieved 2024-08-24.
- ^ "A view from Brussels: European strategy for data takes shape". International Association of Privacy Professionals. 11 January 2024. Retrieved 2024-08-24.
- ^ Scerri, Simon; Tuikka, Tuomo; de Vallejo, Irene Lopez; Curry, Edward (2022), Curry, Edward; Scerri, Simon; Tuikka, Tuomo (eds.), "Common European Data Spaces: Challenges and Opportunities", Data Spaces : Design, Deployment and Future Directions, Cham: Springer International Publishing, pp. 337–357, doi:10.1007/978-3-030-98636-0_16, ISBN 978-3-030-98636-0
Further reading
- Partha Pratim Talukdar, Marie Jacob, Muhammad Salman Mehmood, Koby Crammer, Zachary G. Ives, Fernando Pereira, Sudipto Guha: Learning to create data-integrating queries. PVLDB 1(1): 785-796 (2008)
- Michael J. Franklin, Alon Y. Halevy, David Maier: A first tutorial on dataspaces. PVLDB 1(2): 1516-1517 (2008)
- Jens-Peter Dittrich, Marcos Antonio Vaz Salles: iDM: A Unified and Versatile Data Model for Personal Dataspace Management. VLDB 2006: 367-378.