Data Modeling Object Oriented Data Model Encyclopaedia of Information Systems, Academic Press Vaz
9902.ch55 2/5/02 2:28 PM Page 1 PROOF Data Modeling Object-Oriented Data Model Michalis Vazirgiannis Athens University of Economics and Business I. INTRODUCTION IV. CONCLUSION RESEARCH ISSUES AND PERSPECTIVES II. MOTIVATION AND THEORY V. CASE STUDY: OBJECT RELATIONAL SOLUTIONS FOR III. INDUSTRIAL SYSTEMS STANDARDS MULTIMEDIA DATABASES GLOSSARY the framework of the program, the internal mech- anism determines what specific name of different data model In data modeling we try to organize data purposes is known as function overloading. so that they represent as closely as possible a real relational model A data modeling approach having world situation, yet representation in computers is found very successful industrial implementations still feasible. A data model encapsulates in DBMS. The fundamental modeling constructs three elements: objects structure, behavior and in- are the relations consisting of tuples of values each tegrity constraints. one taking its semantics from an appropriate at- tribute. The relations represent entities of the real world and relationships among entities. encapsulation A property of the object oriented model, promoting data and operation indepen- dence. This is achieved by hiding the internal struc- I. INTRODUCTION ture and implementation details from the external world simplifying thus the maintenance and usage A. Need for Data Modeling of a multitude of object classes in an application. inheritance The ability of one class to inherit the The word datum comes from Latin and, literally in- structure and behavior of its ancestor. Inheritance terpreted, means a fact. However, data do not always allows an object to inherit a certain set of attributes correspond to concrete or actual facts. They may be from another object while allowing the addition of imprecise or may describe things that have never hap- specific features. pened (e.g., an idea). object-oriented modeling It is an abstraction of the real world that represents objects structural content and behavior in terms of classes hierarchies. The struc- Data will be of interest to us if they are worth not only tural content is defined as a set of attributes and at- thinking about, but also worth recording in a tached values and the behavior as a set of methods precise manner. (functions that implement object s behavior). Many different ways of organizing data exist. polymorphism The ability of different objects in a For data to be useful in providing information,they class hierarchy to have different behaviors in re- need to be organized so that they sponse to the same message. Polymorphism derives can be processed effectively. In data modeling we try to its meaning from the Greek for many forms. A organize data so that they represent as closely as possi- single behavior can generate entirely different re- ble a real world situation, yet are still represetation in sponses from objects in the same group. Within computers is still feasible. These two requirements are Encyclopedia of Information Systems, Volume One Copyright 2002, Elsevier Science (USA). All rights reserved. 55-1 9902.ch55 2/5/02 2:28 PM Page 2 PROOF 55-2 Data Modeling Object-Oriented Data Model frequently conflicting. The optimal way to organize IBM s addition of data communication facilities to data for a given application can be determined by un- its IMS software gave rise to the first large-scale data- derstanding the characteristics of data that are im- base/data communication (DB/DC) system, in which portant for capturing their meaning. These charac- many users access the DB through a communication teristics allow us to make general statements about network. Since then, access to DBs through commu- how data are organized and processed. nication networks has been offered by commercially It is evident that an interpretation of the world is available DBMSs. needed, sufficiently abstract to allow minor C.W. Bachman played a pioneering role in the de- perturbations, yet sufficiently powerful to give some velopment of network DB systems (IDS product and understanding concerning how data about the world Codasyl DataBase Task Group, or DBTG, proposals). are related. An intellectual tool that provides such an The DBTG model is based on the data structure dia- interpretation will be referred to as a data grams, which are also known as Bachman s diagrams. model. It is a model about data by which a reasonable In the model, the links between record types, called interpretation of the data can be obtained. A data Codasyl sets, are always one occurrence of one record model is an abstraction device that allows us to focus type. To many, that is, a functional link. In its 1978 on theinformation content of the data as op- specifications, Codasyl also proposed a data definition posed to individual values of the data). language (DDL) at three levels (schema DDL, sub- schema DDL, and internal DDL) and a procedural (prescriptive) data manipulation language (DML). B. Historical Overview: First and In 1969 1970, Dr. E. F. Codd proposed the rela- Second Database Model Generations tional model, which was considered an elegant math- ematical theory without many possibilities of effi- cient implementation in commercial products. In 1970, few people imagined that, in the 1980s, the re- Informaton lational model would become mandatory (a decoy ) system demands more and more services for the promotion of DBMSs. Relational products like from information stored in computing systems. Gradu- Oracle, DB2, Ingres, Informix, Sybase, etc., are con- ally, the focus of computing, shifted from process-oriented sidered the second generation of DBs. These prod- ucts have more physical and logical independence, to data-oriented systems, where data play an important greater flexibility, and declarative query languages role for software engineers. Today, many design prob- (users indicate what they want without describing how lems center in data modeling and structuring. to get it) that deal with sets of records, and they can After the initial file systems in be automatically optimized, although their DML and the 1960s and early 1970s, the first generation host language are not integrated. With relational of database products was born. Database sys- DBMSs (RDBMSs), organizations have more facilities tems can be considered as intermediaries between for data distribution. RDBMSs provide not only better physical devices where data are stored and the users usability but also a more solid theoretical foundation. (humans) of the data. Database management systems Unlike network models, the relational model is (DBMS) are the software tools that enable the man- value-oriented and does not support object identity. agement (definition, creation, maintenance, and use) Needless to mention, there is an important trade-off of large amounts of interrelated data stored in between object identity and declarative features. As a computer-accessible media. The early DBMSs, which result of Codasyl DBTG and IMS support object iden- were based on hierarchical and network (Codasyl) tity, some authors introduced them in the object- models, provided logical organization of data in trees oriented DB class. and graphs. IBM s IMS, General Electric s IDS, Uni- The initial relational systems, suffered from vac s DMS 110, Cincom s Total, MRI s System 200, and performance problems. While nowadays these Cullinet s (now Computer Associates) IDMS are some products have achieved wide acceptance, Au: unclear of the well-known representatives of this generation. it must be recognized Although efficient, these systems used proce- that they are not exempt from difficulties. dural languages and did not offer physical or logical Perhaps one of the greatest demands on independence, thus limiting its flexibility. In RDBMSs is the support of complex data spite of that, DBMSs were an important advance com- types; also, null values, recursive queries, and scarce pared to the file systems. support for integrity rules and for domains (or ab- 9902.ch55 2/5/02 2:28 PM Page 3 PROOF Data Modeling Object-Oriented Data Model 55-3 stract data types) are now other weaknesses of rela- Today, we are witnessing an extraordinary develop- tional systems. Some of those problems are solved in ment of DB technology. Areas that were exclusive of the recent version of Structured Query Language research laboratories and centers are appearing in (SQL), SQL: 1999 (previously SQL3). DBMSs latest releases: World Wide Web, multimedia, In the 1970s, debate emerged on the relative mer- active, object-oriented, secure, temporal, parallel, and its of Codasyl and relational models. It resutled in multidimensional DBs. The need for exploiting the pare both classes of models and to obtain a better un- Object-Oriented Model for such complex systems is derstanding of their strengths and weaknesses. apparent. During the late 1970s and in the 1980s, research work (and, later, industrial applications) focused on query optimization, high-level languages, the normal- II. MOTIVATION AND THEORY ization theory, physical structures for stored relations, bugger and memory management algorithms, index- A. Motivation ing techniques (variations of B-tress), distributed sys- tems, data dictionaries, transaction management, and Although one might think that DB technology has so on. That work allowed efficient and secure on-line reached its maturity, the new DB generation has transactional processing (OLTP) environments (in demonstrated that we still ignore the solutions to the first DB generation, DBMSs were oriented toward some of the problems of the new millennium. In spite batch processing). In the 1980s, the SQL language of the success of this technology, different preoccu- was also standardized (SQL/ANS 86 was approved by pation signals must be taken into account. We iden- the American National Standard Institute, ANSI and tify the following architectural issues that need to be the International Standard Organization, ISO in solved in the light of new application domains: 1986), and today, every RDBMS offers SQL. Many of the DB technology advances at that time " Current DBMSs are monolithic; they offer all were founded on two elements: reference models and kinds of services and functionalities in a single data models. ISO and ANSI proposals on reference package, regardless of the users needs, at a very models have positively influenced not only theoretical high cost, and with a loss of efficiency researches but also practical applications, especially in " About half of the production data are in legacy DB development methodologies. In most of those ref- systems erence models, two main concepts can be found; the " Workflow management (WFM) systems are not well-known three-level architecture (external, logical, based on DB technology; they simply access DBs and internal layers), also proposed by Codasyl in 1978, though application programming interfaces (APIs) and the recursive data description. The separation be- " Replication services do not scale well over 10,000 tween logical description of data and physical imple- nodes mentation (data application independence) devices " Integration of strictly structured data with loosely was always an important objective in DB evolution, structured data (e.g., data from a relational DB and the three-level architecture, together with the re- with data from electronic mail) lational data model, was a major step in that direction. In terms of data models, the relational model has On the other hand there is wealth of new application influenced research agendas for many years and is domains that produce huge amounts of data and supported by most of the current products. Recently, therefore call for database support. Such domains are other DBMSs have appeared that implement other computer-aided design (CAD), computer-aided soft- models, most of which are based on object-oriented ware engineering (CASE), office-automation, multi- principles. media databases, geographic information systems Three key factors can be identified in the evolution (GIS), scientific experiments, telecommunications, etc. of DBs: theoretical basis (resulting from researcher s These application domains present some impor- work), products (developed by vendors), and practi- tant common characteristics that make their database cal applications (requested by users). These three fac- support by traditional relational systems problematic. tors have been present throughout the history of DB, Such features include: but the equilibrium among them has changed. What began as a product technology demanded by users " Hierarchical data structures (complex objects) needs have always influenced the evolution of DB " New data types for storing images or large textual technology, but especially so in the last decade. items 9902.ch55 2/5/02 2:28 PM Page 4 PROOF 55-4 Data Modeling Object-Oriented Data Model " No general-purpose data structure available an object-oriented database management system " Nonstandard application-specific operations (OODBMS). Furthermore such a system must sup- " Dynamic changes port Inheritance (single multiple) and handle object " Cooperative design process among designers identity issues. Then OO languages providing persis- " Large number of types tency (persistence by class, creation, marking, refer- " Small number of instances ence) are necessary so that users of an OODBMS are able to define and manipulate database objects. " Longer duration transactions The database technology has to respond to these chal- 2. Object-Oriented Modeling lenges in a way that the above requirements are ad- and Programming Concepts dressed as database technology design features. In the sequel we identify the shortcomings of current data- Hereafter an overview of object-oriented concepts will base technology in the context of the new applications: be presented. Object orientation has its origins in object-oriented programming languages (OOPLS). " Poor representation of real-world entities, need The class concept is introduced by SIMULA, where to decompose objects over relations as abstract data types encapsulation, message passing, " Fixed build-in types; no set-valued attributes are and inheritance features are further introduced by supported, thus complex and highly nested the pioneering SMALLTALK. Another language of this family is C that integrates the strengths of C objects cannot be represented efficiently with object-oriented concepts. The newest OO lan- " Semantic overloading guage is Java, inherently object-oriented providing a " Poor support for integrity and enterprise wide selection of classes for different tasks (i.e., visu- constraints alization, network and task management, persistent " No data abstraction such as aggregation and features). Its portability across platforms and operat- generalization, thus inheritance and specialization ing systems made it a very attractive development en- cannot be addressed vironment, widely used and with important impact on " Limited operations programming large-scale applications. " Difficulty handling recursive queries An object has an inherent state followed by its be- " No adequate version control is supported havior, which defines the way the object treats its state as well the communication protocol between the ob- ject and the xternal world. We have to differentiate B. Object-Oriented Model between the transient objects in OOPLs and the persistent objects in object-oriented databases 1. Historical Preview of (OODBs). In the first case the objects are eliminated Object-Oriented Databases from the main memory as soon as they are not needed Before we proceed with our discussion of data mod- whereas in the case of OODBMSs objects are persis- eling, it is necessary to define, even if only approxi- tently stored and other mechanisms such as indexing, mately, the elementary objects that will be modeled concurrency, control, and recovery are available. (i.e., what a datum is). Suppose that we accept as a OODBMSs usually offer interfaces with one or more working definition of an atomic piece of data the tu- OOPLs. ple