9902.ch55 2/5/02 2:28 PM Page 1 PROOF Data Modeling Object-Oriented Data Model Michalis Vazirgiannis Athens University of Economics and Business I. INTRODUCTION IV. CONCLUSION RESEARCH ISSUES AND PERSPECTIVES II. MOTIVATION AND THEORY V. CASE STUDY: OBJECT RELATIONAL SOLUTIONS FOR III. INDUSTRIAL SYSTEMS STANDARDS MULTIMEDIA DATABASES GLOSSARY the framework of the program, the internal mech- anism determines what specific name of different data model In data modeling we try to organize data purposes is known as function overloading. so that they represent as closely as possible a real relational model A data modeling approach having world situation, yet representation in computers is found very successful industrial implementations still feasible. A data model encapsulates in DBMS. The fundamental modeling constructs three elements: objects structure, behavior and in- are the relations consisting of tuples of values each tegrity constraints. one taking its semantics from an appropriate at- tribute. The relations represent entities of the real world and relationships among entities. encapsulation A property of the object oriented model, promoting data and operation indepen- dence. This is achieved by hiding the internal struc- I. INTRODUCTION ture and implementation details from the external world simplifying thus the maintenance and usage A. Need for Data Modeling of a multitude of object classes in an application. inheritance The ability of one class to inherit the The word datum comes from Latin and, literally in- structure and behavior of its ancestor. Inheritance terpreted, means a fact. However, data do not always allows an object to inherit a certain set of attributes correspond to concrete or actual facts. They may be from another object while allowing the addition of imprecise or may describe things that have never hap- specific features. pened (e.g., an idea). object-oriented modeling It is an abstraction of the real world that represents objects structural content and behavior in terms of classes hierarchies. The struc- Data will be of interest to us if they are worth not only tural content is defined as a set of attributes and at- thinking about, but also worth recording in a tached values and the behavior as a set of methods precise manner. (functions that implement object s behavior). Many different ways of organizing data exist. polymorphism The ability of different objects in a For data to be useful in providing information,they class hierarchy to have different behaviors in re- need to be organized so that they sponse to the same message. Polymorphism derives can be processed effectively. In data modeling we try to its meaning from the Greek for many forms. A organize data so that they represent as closely as possi- single behavior can generate entirely different re- ble a real world situation, yet are still represetation in sponses from objects in the same group. Within computers is still feasible. These two requirements are Encyclopedia of Information Systems, Volume One Copyright 2002, Elsevier Science (USA). All rights reserved. 55-1 9902.ch55 2/5/02 2:28 PM Page 2 PROOF 55-2 Data Modeling Object-Oriented Data Model frequently conflicting. The optimal way to organize IBM s addition of data communication facilities to data for a given application can be determined by un- its IMS software gave rise to the first large-scale data- derstanding the characteristics of data that are im- base/data communication (DB/DC) system, in which portant for capturing their meaning. These charac- many users access the DB through a communication teristics allow us to make general statements about network. To many, that is, a functional link. In its 1978 specifications, Codasyl also proposed a data definition language (DDL) at three levels (schema DDL, subschema DDL, and internal DDL) and a procedural (prescriptive) data manipulation language (DML).

In 1969-1970, Dr. E. F. Codd proposed the relational model, which was considered an elegant mathematical theory without many possibilities of efficient implementation in commercial products. In 1970, few people imagined that, in the 1980s, the relational model would become mandatory (a decoy) for the promotion of DBMSs. Relational products like Oracle, DB2, Ingres, Informix, Sybase, etc., are considered the second generation of DBs. These products have more physical and logical independence, greater flexibility, and declarative query languages (users indicate what they want without describing how to get it) that deal with sets of records, and they can be automatically optimized, although their DML and host language are not integrated. With relational DBMSs (RDBMSs), organizations have more facilities for data distribution. RDBMSs provide not only better usability but also a more solid theoretical foundation. Unlike network models, the relational model is value-oriented and does not support object identity. Needless to mention, there is an important trade-off between object identity and declarative features. As a result of Codasyl DBTG and IMS support object identity, some authors introduced them in the object-oriented DB class.

The initial relational systems suffered from performance problems. While nowadays these products have achieved wide acceptance, it must be recognized that they are not exempt from difficulties. Perhaps one of the greatest demands on RDBMSs is the support of complex data types; also, null values, recursive queries, and scarce support for integrity rules and for domains (or abstract data types) are now other weaknesses of relational systems. Some of those problems are solved in the recent version of Structured Query Language (SQL), SQL: 1999 (previously SQL3).

In the 1970s, debate emerged on the relative merits of Codasyl and relational models. It resulted in pare both classes of models and to obtain a better understanding of their strengths and weaknesses.

During the late 1970s and in the 1980s, research work (and, later, industrial applications) focused on query optimization, high-level languages, the normalization theory, physical structures for stored relations, buffer and memory management algorithms, indexing techniques (variations of B-trees), distributed systems, data dictionaries, transaction management, and so on. That work allowed efficient and secure on-line transactional processing (OLTP) environments (in first DB generation, DBMSs were oriented toward batch processing). In the 1980s, the SQL language was also standardized (SQL/ANS 86 was approved by the American National Standard Institute, ANSI and the International Standard Organization, ISO in 1986), and today, every RDBMS offers SQL.

Today, we are witnessing an extraordinary development of DB technology. Areas that were exclusive of research laboratories and centers are appearing in DBMSs latest releases: World Wide Web, multimedia, active, object-oriented, secure, temporal, parallel, and multidimensional DBs. The need for exploiting the Object-Oriented Model for such complex systems is apparent.

II. MOTIVATION AND THEORY

A. Motivation

Although one might think that DB technology has reached its maturity, the new DB generation has demonstrated that we still ignore the solutions to some of the problems of the new millennium. In spite of the success of this technology, different preoccupation signals must be taken into account. We identify the following architectural issues that need to be solved in the light of new application domains:

• Current DBMSs are monolithic; they offer all kinds of services and functionalities in a single package, regardless of the users needs, at a very high cost, and with a loss of efficiency
• About half of the production data are in legacy systems
• Workflow management (WFM) systems are not based on DB technology; they simply access DBs though application programming interfaces (APIs)

Many of the DB technology advances at that time were founded on two elements: reference models and data models. ISO and ANSI proposals on reference models have positively influenced not only theoretical researches but also practical applications, especially in DB development methodologies. In most of those reference models, two main concepts can be found; the well-known three-level architecture (external, logical, and internal layers), also proposed by Codasyl in 1978, and the recursive data description. The separation between logical description of data and physical implementation (data application independence) devices was always an important objective in DB evolution, and the three-level architecture, together with the relational data model, was a major step in that direction. In terms of data models, the relational model has influenced research agendas for many years and is supported by most of the current products. Recently,

• Replication services do not scale well over 10,000 nodes
• Integration of strictly structured data with loosely structured data (e.g., data from a relational DB with data from electronic mail)

On the other hand there is wealth of new application domains that produce huge amounts of data and therefore call for database support. Such domains are computer-aided design (CAD), computer-aided software engineering (CASE), office-automation, multimedia databases, geographic information systems (GIS), scientific experiments, telecommunications, etc.

These application domains present some important common characteristics that make their database support by traditional relational systems problematic.

other DBMSs have appeared that implement other models, most of which are based on object-oriented principles.

Three key factors can be identified in the evolution of DBs: theoretical basis (resulting from researcher's work), products (developed by vendors), and practical applications (requested by users). These three factors have been present throughout the history of DB, but the equilibrium among them has changed. What began as a product technology demanded by users have always influenced the evolution of DB technology, but especially so in the last decade.

Such features include:

• Hierarchical data structures (complex objects)
• New data types for storing images or large textual items • No general-purpose data structure available
• Nonstandard application-specific operations
• Dynamic changes
• Cooperative design process among designers
• Large number of types
• Small number of instances
• Longer duration transactions

The database technology has to respond to these challenges in a way that the above requirements are addressed as database technology design features. In the sequel we identify the shortcomings of current database technology in the context of the new applications:

an object-oriented database management system (OODBMS). Furthermore such a system must support Inheritance (single multiple) and handle object identity issues. Then OO languages providing persistence (persistence by class, creation, marking, reference) are necessary so that users of an OODBMS are able to define and manipulate database objects. 2. Object-Oriented Modeling and Programming Concepts

Hereafter an overview of object-oriented concepts will be presented. Object orientation has its origins in object-oriented programming languages (OOPLS). The class concept is introduced by SIMULA, where as abstract data types encapsulation, message passing, and inheritance features are further introduced by the pioneering SMALLTALK. Another language of this family is C++ that integrates the strengths of C with object-oriented concepts.

• Poor representation of real-world entities, need to decompose objects over relations
• Fixed build-in types; no set-valued attributes are supported, thus complex and highly nested objects cannot be represented efficiently
• Semantic overloading
• Poor support for integrity and enterprise constraints
• No data abstraction such as aggregation and generalization, thus inheritance and specialization cannot be addressed
• Limited operations
• Difficulty handling recursive queries
• No adequate version control is supported

B. Object-Oriented Model

1. Historical Preview of Object-Oriented Databases

Before we proceed with our discussion of data modeling, it is necessary to define, even if only approximately, the elementary objects that will be modeled (i.e., what a datum is). Suppose that we accept as a working definition of an atomic piece of data the tuple The newest OO language is Java, inherently object-oriented providing a wide selection of classes for different tasks (i.e., visualization, network and task management, persistent features). Its portability across platforms and operating systems made it a very attractive development environment, widely used and with important impact on programming large-scale applications.

An object has an inherent state followed by its behavior, which defines the way the object treats its state as well the communication protocol between the object and the external world. We have to differentiate between the transient objects in OOPLs and the persistent objects in object-oriented databases (OODBs). In the first case the objects are eliminated from the main memory as soon as they are not needed whereas in the case of OODBMSs objects are persistently stored and other mechanisms such as indexing, concurrency, control, and recovery are available. OODBMSs usually offer interfaces with one or more OOPLs. Object-Oriented Model between the transient objects in OOPLs and the persistent objects in object-oriented databases 1. Historical Preview of (OODBs). In the first case the objects are eliminated Object-Oriented Databases from the main memory as soon as they are not needed Before we proceed with our discussion of data mod- whereas in the case of OODBMSs objects are persis- eling, it is necessary to define, even if only approxi- tently stored and other mechanisms such as indexing, mately, the elementary objects that will be modeled concurrency, control, and recovery are available. (i.e., what a datum is). Suppose that we accept as a OODBMSs usually offer interfaces with one or more working definition of an atomic piece of data the tu- OOPLs. ple