Data Administration at the National Library of Canada

by Pierre Dorion

Network Notes #22
ISSN 1201-4338
Information Technology Services
National Library of Canada

December 29, 1995

"Underlying any database is an organized structure of entity types and relationships, which also defines and explains the enterprise in a fundamental way. Capturing this structure in a model (entity data-model, or E/R diagram) is a crucial step in understanding the corresponding data in its most basic, stable and nonredundant form."

(Ronald G. Ross, Editor/Publisher of the Database Research Group, Inc.)

Introduction

Data Administration was established at the National Library of Canada (NLC) in 1991 with the introduction of the AMICUS project. Data Administration defines the standards for data modelling, the contents of the data models, and the techniques involved in the creation and maintenance of these models. Data modelling is a technique used to represent the nature of the data required for the organization to meet its objectives. The role of Data Administration is to inventory and classify the data of the business and to provide a uniform model for the integration of systems.

The challenges of Data Administration at the Library include: the introduction of data modelling techniques for the development of relational databases to end users and ITS staff; the introduction of techniques for developing data models with Computer Aided Software Engineering (CASE) tools; and the development of the Data Administration Standards and Procedures Guide.

Data Administration staff, ITS staff and end users are jointly responsible for the development of a collection of data models addressing the view of the Library's business requirements. One of the primary functions of Data Administration is to participate in the development, approval and maintenance of these models. A Computer Aided Software Engineering (CASE) tool is used by Data Administration to define, document and store all data required to develop data models.

One of the biggest implementation challenges was the introduction of the Data Administration standards to maintain consistency in the naming, definition, attributes and contents of entities. It would be impossible to share data if an entity had more than one name. It is the responsibility of Data Administration staff to oversee standards, policies, and procedures to minimize inconsistencies in data.

Mission

The primary goal of Data Administration is to participate in the development of conceptual, logical, and physical data models. Data Administration manages NLC data by ensuring that all metadata is up-to-date, consistent, integrated and easily accessible. NLC data comprises all objects relevant to NLC business where information is retained. As well, Data Administration staff ensure the maintenance and sharing of NLC information by the establishment of standards, procedures and guidelines.

The Data Administration Standards and Procedures Guide (DA Guide) provides a single source of reference for standards, procedures and guidelines required for the development of data models at the National Library of Canada. The DA Guide provides a coherent set of naming conventions for entities (diagrams, data elements, data relationships, code tables, etc.) which is crucial to the documentation and central management of corporate data. Naming conventions provide greater efficiency in data handling, reduce data redundancy and inconsistency, and minimize confusion among staff, management and the system integrator.

Data Modelling

The Data Administration group is the custodian of all data models (entity/relationship) at the National Library. A data model is a graphical representation of data that is used by the organization to meet its objectives. A list of definitions provides the same information as shown graphically in the entity/relationship (E/R) diagram.

The evolution of a data model addresses several levels: the conceptual model; the logical model, and; the physical model. The participation of end-users, Information Technology Services (ITS) staff and management is mandatory for the development of a data model to ensure that all business requirements for the system are supported.

Different data models were developed for Phase 1 of the AMICUS project. Here are the primary models:

Manage Bibliographic data
Manage Client Services data
Manage Billing data
Manage Product data

During the initial stage of the AMICUS project, a separate project team was responsible for the development of each model. With the assistance of the Data Administration group, each project team was responsible for the delivery of the model.

The subject of data modelling at the National Library will be explored in detail in a future issue of the Network Notes.

CASE Tools

Computer-Aided Software Engineering (CASE) tools support Data Administration activities by providing an integrated set of analysis and design tools that automates the development of specifications for software systems. A CASE tool helps data analysts define, verify and document the design before coding begins.

All data required to develop data models are managed, defined, organized, stored and maintained with the use of a CASE Tool/Data Dictionary/Repository. The CASE technology is characterized by components such as diagramming tools, prototyping, re-engineering tools, import/export, and error-checking. Although all these components are important, the "central repository" component is the keystone.

A central repository is more than just a dictionary. It is the place where all the system data is kept, the repository for data about the system, graphs, data, and rules. Central control over the logical data model and data element definitions allows applications to share data structures and field validations. There is no need for each application to incur the cost of recreating the Bibliographic Item table layout (e.g., rewriting, retesting, and maintaining code that validates each column in the table).

Since November 1995, SILVERRUN is the NLC CASE tool. SILVERRUN is a single-user application that runs on a PC connected to the LAN. The SILVERRUN relational data model (RDM) tool allows for:

the consolidation of multiple project encyclopedias into an integrated enterprise-wide repository to achieve a shared data environment;
the documentation of user-defined entities;
the documentation and selection of alternate names to display the entities: full name, coded name or alias. Models can also be generated with the full name, coded name or alias. Diagrams can be generated with the full dictionary name for end users and the same model can be produced for programmers with the coded names;
the creation of user-defined reports by selecting the data object-types to be included in the report. For example, the report may include the table definition, column names and definitions, characteristics for each column, and the associated primary and foreign keys. Report definitions can be saved for reuse;
the development of Conceptual, Logical and Physical relational data models.

Data Administration Group -- Ongoing Tasks

Reviews data requirements
Acts as a leader or participates in the development of data models
Enforces Data Administration Standards and Procedures
Actions data-entity changes
Actions data-element changes
Actions code-table changes
Actions error-message changes
Conducts repository audits
Maintains project E/R data models
Prepares migration object lists
Manages and controls all changes to the corporate data or data models with a change control procedure

Statistics (AMICUS)

Item	#
Data-Entities	200
Data Elements	1600
Data	220
Relationships	-
Code Tables	395

Glossary Of Terms

ATTRIBUTE	The lowest piece of data that describes an entity. This term is used during the development of the Conceptual data model. Attributes become data elements in the Logical data model.
COMPUTER-AIDED SOFTWARE ENGINEERING (CASE)	A group of tools designed to work together to integrate the process of data modelling and data management.
CODE TABLE	A code table is used to provide rules for automated editing.
CONCEPTUAL DATA MODEL	A high-level model to describe data that is maintained in the database.
CORPORATE MODEL	A model that applies across the organization.
DATA ELEMENT	See ATTRIBUTE. Also known as field and column.
DATA ADMINISTRATION	The body that oversees the management of data across all functions of the organization.
DATA DICTIONARY	A central, integrated and active control facility that provides the basis for shared and consistent system resource management. A data dictionary is used to: manage definitions and syntax, enforce naming standards, support data protection, create data relationships, and act as a central point of communication.
DATA-ENTITY	A person, place or thing about which data is collected and kept.
DATA MODEL	A model that diagrams data used by the business or organization.
ENTITY/RELATIONSHIP MODEL (E/R)	A technique for modelling an organization's data, involving the description of entities and data relationships between them.
LOGICAL DATA MODEL	Derives from the Conceptual data model. Data is fully attributed and normalized.
METADATA	Data about data; the names and attributes of data-entities as stored in the data dictionary.
PHYSICAL DATA MODEL	Derives from the Logical data model. Storage and performance may dictate changes to the model.
RECORD/TABLE	A Record is a grouping of elements in a certain order required by a program. Also known as a Table.
USER	An entity representing a person, department, or functional group that is involved with an information system. Users can be identified in terms of the functions they perform and their responsibilities with respect to a given system.