Global serializability
In concurrency control of databases, transaction processing (transaction management), and other transactional distributed applications, global serializability (or modular serializability) is a property of a global schedule of transactions. A global schedule is the unified schedule of all the individual database (and other transactional object) schedules in a multidatabase environment (e.g., federated database). Complying with global serializability means that the global schedule is serializable, has the serializability property, while each component database (module) has a serializable schedule as well. In other words, a collection of serializable components provides overall system serializability, which is usually incorrect. A need in correctness across databases in multidatabase systems makes global serializability a major goal for global concurrency control (or modular concurrency control). With the proliferation of the Internet, Cloud computing, Grid computing, and small, portable, powerful computing devices (e.g., smartphones), as well as increase in systems management sophistication, the need for atomic distributed transactions and thus effective global serializability techniques, to ensure correctness in and among distributed transactional applications, seems to increase.
In a federated database system or any other more loosely defined multidatabase system, which are typically distributed in a communication network, transactions span multiple (and possibly distributed) databases. Enforcing global serializability in such system, where different databases may use different types of concurrency control, is problematic. Even if every local schedule of a single database is serializable, the global schedule of a whole system is not necessarily serializable. The massive communication exchanges of conflict information needed between databases to reach conflict serializability globally would lead to unacceptable performance, primarily due to computer and communication latency. Achieving global serializability effectively over different types of concurrency control has been open for several years.
The global serializability problem
Problem statement
The difficulties described above translate into the following problem:
- Find an efficient (high-performance and fault tolerant) method to enforce Global serializability (global conflict serializability) in a heterogeneous distributed environment of multiple autonomous database systems. The database systems may employ different concurrency control methods. No limitation should be imposed on the operations of either local transactions (confined to a single database system) or global transactions (span two or more database systems).
Quotations
Lack of an appropriate solution for the global serializability problem has driven researchers to look for alternatives to serializability as a correctness criterion in a multidatabase environment (e.g., see Relaxing global serializability below), and the problem has been characterized as difficult and open. The following two quotations demonstrate the mindset about it by the end of the year 1991, with similar quotations in numerous other articles:
- "Without knowledge about local as well as global transactions, it is highly unlikely that efficient global concurrency control can be provided... Additional complications occur when different component DBMSs [Database Management Systems] and the FDBMSs [Federated Database Management Systems] support different concurrency mechanisms... It is unlikely that a theoretically elegant solution that provides conflict serializability without sacrificing performance (i.e., concurrency and/or response time) and availability exists."[1]
Proposed solutions
Several solutions, some partial, have been proposed for the global serializability problem. Among them:
- Global conflict graph (serializability graph, precedence graph) checking
- Distributed Two phase locking (Distributed 2PL)
- Distributed Timestamp ordering
- Tickets (local logical timestamps which define local total orders, and are propagated to determine global partial order of transactions)
Relaxing global serializability
Some techniques have been developed for relaxed global serializability (i.e., they do not guarantee global serializability; see also Relaxing serializability). Among them (with several publications each):
Another common reason nowadays for Global serializability relaxation is the requirement of availability of internet products and services. This requirement is typically answered by large scale data replication. The straightforward solution for synchronizing replicas' updates of a same database object is including all these updates in a single atomic distributed transaction. However, with many replicas such a transaction is very large, and may span several computers and networks that some of them are likely to be unavailable. Thus such a transaction is likely to end with abort and miss its purpose.[4] Consequently, Optimistic replication (Lazy replication) is often utilized (e.g., in many products and services by Google, Amazon, Yahoo, and alike), while global serializability is relaxed and compromised for eventual consistency. In this case relaxation is done only for applications that are not expected to be harmed by it.
Classes of schedules defined by relaxed global serializability properties either contain the global serializability class, or are incomparable with it. What differentiates techniques for relaxed global conflict serializability (RGCSR) properties from those of relaxed conflict serializability (RCSR) properties that are not RGCSR is typically the different way global cycles (span two or more databases) in the global conflict graph are handled. No distinction between global and local cycles exists for RCSR properties that are not RGCSR. RCSR contains RGCSR. Typically RGCSR techniques eliminate local cycles, i.e., provide local serializability (which can be achieved effectively by regular, known concurrency control methods); however, obviously they do not eliminate all global cycles (which would achieve global serializability).
References
- ^ Amit Sheth, James Larson (1990): "Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases" Archived 2007-10-19 at the Wayback Machine, ACM Computing Surveys, Vol. 22, No 3, pp. 183-236, September 1990 (quotation from page 227)
- ^ Weimin Du and Ahmed K. Elmagarmid (1989): "Quasi Serializability: a Correctness Criterion for Global Concurrency Control in InterBase" Archived 2010-12-21 at the Wayback Machine, Proceedings of the Fifteenth International Conference on Very Large Data Bases (VLDB), August 22–25, 1989, Amsterdam, the Netherlands, pp. 347-355, Morgan Kaufmann, ISBN 1-55860-101-5
- ^ Sharad Mehrotra, Rajeev Rastogi, Henry Korth, Abraham Silberschatz (1998): "Ensuring Consistency in Multidatabases by Preserving Two-Level Serializability", ACM Transactions on Database Systems (TODS), Vol. 23, No. 2, pp. 199-230, June 1998 (quotation from the article's Abstract)
- ^ Gray, J.; Helland, P.; O’Neil, P.; Shasha, D. (1996). The dangers of replication and a solution. Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. pp. 173–182. CiteSeerX 10.1.1.43.6498. doi:10.1145/233269.233330.