NLC HOMESEARCHSITE INDEXCOMMENTSFRANÇAIS
Publications*

FederaIdentifier for the NationaLibrary of Canada


A Glossary of Digital Library Standards, Protocols and Formats

by Susan Haigh
Network Notes #54
ISSN 1201-4338
Information Technology Services
National Library of Canada

May 6, 1998


Introduction

This document provides a structured overview of over 90 selected technology-based standards, protocols, and document formats that are pertinent to digital library activities. Each entry provides the acronym, full name and a succinct description/commentary as appropriate.

While most library staff, even those directly involved in digital library initiatives, do not need to know the details about all these standards, one hears or reads about them frequently. This primer is intended as a quick reference tool for library staff who have little time—and perhaps even less interest—to learn the technical details of technology-based standards and protocols pertinent to the library world.

Some Initial Definitions

Standards are a set of rules or specifications for the design or operation of a computing device. There are proprietary standards, which are those developed and promulgated by companies in the hope of assuring or increasing their market share, and open standards, which are published and available for use by anyone. Either type may become a de facto standard, a set of rules or specifications that comes into such widespread use in the marketplace that it becomes normative; or a de jure standard, a standard given the endorsement of an official standards body such as the International Organization for Standardization (ISO).

Protocols are standard sets of rules that govern network communications functions by describing both the format that a message must take and the way in which messages are exchanged between computers.

Formats are the various ways in which information is stored. A file format is a software algorithm for encoding the data, as well as any information about the data (e.g. structure, layout, compression algorithm). Hundreds of different file formats exist, but only a few are essential to digital library activities.

Some Caveats

  • This document is a broad sweep by a non-expert. While every effort has been made to identify and include the most pertinent standards, protocols and formats, the list is not intended to be comprehensive. There are whole areas of standards activity—colour information interchange, digital commerce, authentication, security, and the more technical areas of telecommunications, as examples—that have not been covered. Also, emerging formats for compound documents and maintaining relationships among digital objects are not listed.
  • Structuring the entries by function and describing them accurately and succinctly have been challenging objectives. Neither the structure nor the definitions should be considered authoritative. Selected sources that have been consulted are listed at the end.
  • A few of the entries are not standards so much as new technologies or programming languages. They have been included for their pertinence to a particular function or type of information.

ENABLING INTERNET & COMMUNICATION STANDARDS

  • Data transmission between computers

    OSI: Open Systems Interconnection

    Has been gradually eclipsed by TCP/IP, a comparable suite of protocols. OSI-based standards (e.g. X.400 mail, X.500 directories, Z39.50, ILL protocol) now run over TCP/IP networks. The OSI architecture is split into seven layers, each of which uses the layer immediately below it and provides a service to the layer above. Outgoing data goes down the stack, across the network, then up the stack at its destination. The layers are, from lowest to highest:

    1. physical layer - physical cables, etc. through which the signals travel

    2. datalink layer - addresses the problem of data integrity in transmitting packets across different physical networks.

    3. network layer - addresses the routing of data packets and its multiplexing/de-multiplexing (streaming with other data packets)

    4. transport layer - manages creation of packets from data, and the reassembly of data from packets

    5. session layer - more important in LANs, manages sessions among network users

    6. presentation - deals with compression and encryption (if applicable)

    7. application layer - the user programs which are transmitting/receiving data, and the protocols governing these.

    TCP/IP: Transmission Control Protocol over Internet Protocol

    Specifically denotes network and transport layer protocols within the Internet 5-layer protocol stack, although is often used to refer to the entire stack. The Internet protocol suite generally corresponds to the datalink, network, transport, presentation and application layers as described above. The Internet non-proprietary suite of protocols allows communications among more hosts and more types of data formats than any other protocol suite.

    IPv6: Internet Protocol, version 6 (also IPng, "I-Ping", for Internet Protocol, next generation)

    IPv6 is a new system, currently under development, to be used for assigning Internet Protocol addresses in the future. A consensus of the Internet Engineering Task Force (IETF) has determined that IPv6 will be the next generation system for IP Addressing. IPv6, which is eight sections of 32-bit numbers, will eventually replace the current Internet Protocol addressing scheme, known as IPv4, which is four sections of 8-bit numbers.

  • E-mail

    SMTP: Simple Mail Transfer Protocol

    A TCP/IP application protocol that is the most widely employed e-mail standard. Does not support transfer of non-text messages or message parts such as images, audio or video, nor word processing or spreadsheet files because their proprietary format coding is non-text. Supports only standard ASCII character set (i.e. no diacritics).

    MIME: Multi-Purpose Internet Mail Extensions

    Adds multi-part (e.g. WWW hypertext documents, word processing file attachments, etc.), multi-media (non-text such as audio, graphics) messaging support to SMTP. Also supports message encryption.

    X.400: Message Oriented Text Interchange System (ISO 10021/CCITT X.400 1988)

    Supports encryption, so more secure than SMTP without MIME, and supports non-text message parts. The only protocol from the OSI suite to be currently in common use, mainly in governments.

  • News

    NNTP: Network News Transfer Protocol

    A TCP/IP protocol that enables newsgroup articles to move smoothly through the Internet. Most newsgroups exist on the USENET network.

  • File Transfer

    FTP: File Transfer Protocol

    A TCP/IP application protocol used as a means of on-demand transfer of text, non-text (e.g. audio) and software files from one system to another. Anonymous FTP allows user to retrieve selected files from a remote system without authorization.

    FTAM: File Transfer Access Method

    OSI-based file transfer protocol, which has been largely eclipsed by FTP.

  • Document Transfer

    HTTP: Hypertext Transfer Protocol (Version 1.1)

    Supports the hypertext linkages among multimedia documents that characterize the World Wide Web, a collection of HTTP servers. HTTP 1.1 is twice as fast as the initial version, 1.0.

    Gopher

    Uses hierarchical menu structures (no hypertext linkages) and a character-based (non-graphical) interface to access multimedia documents. Has been rapidly eclipsed by HTTP/World Wide Web.

  • Terminal Access (Remote login)

    Telnet

    A TCP/IP application that allows a remote user to login to applications such as library catalogues. Becoming superseded by Web-based search interfaces that employ a CGI (Common Gateway Interface) program to query a database and return results.

    DATA INTERCHANGE STANDARDS AND FORMATS

  • Character sets

    ASCII (American Standard Code for Information Interchange) (ISO 641)

    The most widely used character set encoding. 7 bits per character; limited to 128 mainly English-language characters.

    ISO Latin 1 (ISO 8859-1:1987 part 1)

    A common extension of and replacement for ASCII, "Latin 1" is the most commonly used character set of ISO 8859-n (n=1-9), a series of nine 8-bit, 256-character alphabets, for mainly European languages. Used by Windows.

    UNICODE (ISO 10646-1 Universal Character Set)

    16-bit character code set intended to cover all the world’s writing systems. Not widely implemented at present, although XML (see below) will support it.

  • Documents: Plain text formats

    ASCII (ISO 641)

    Lowest common denominator: unstructured ASCII text code (see above). Fast becoming obsolete as some of the more sophisticated document formats get more widely adopted.

  • Documents: Proprietary formatted-text formats

    MSWord, WordPerfect, etc. (word processing applications)

    WYSIWYG approach to format coding (coding is hidden). Format coding is proprietary, so can be problems of portability among packages or even between different versions of same package. Commonly used for revisable texts. Can incorporate multimedia file parts.

    PageMaker, Ventura, etc. (desktop publishing applications)

    Proprietary format coding is input as specific commands, and effected only upon output.

    RTF: Rich text format

    A portable output/input format for many word processing packages developed by Microsoft.

  • Documents: Page description formats

    These describe shapes on the page, i.e. the layout of a document but not the content/structure. Used for read-only presentation of formatted, final page images for output on any printer or other output device. Increasing use for electronic publications.

    PostScript (".ps")

    Adobe-developed programming language of 420 format command operators (Level-2) which control printing (but not screen display). Allows formatted printing on any printer from any platform (Windows, UNIX, etc.). Encapsulated PostScript (".eps") are subroutines included in PostScript files, usually used for images produced with a non-PostScript package.

    PDF: Portable Document Format (".pdf")

    A further development is Adobe’s PDF proprietary format, which employs the Acrobat suite of software products to be created, edited, viewed, etc. Is printing device-independent, and supports e-publishing using sophisticated formatting and graphics including embedded links, annotations, thumbnails of pages, and chapter outlines for direct access. Adobe has indicated the intent to incorporate structure as well as layout into PDF by extending it to encompass SGML.

  • Documents: Structured information formats

    Compared to page description languages, structured information mark-up languages describe the information (content) and structure, not the layout. They are device and processing platform independent and facilitate automatic indexing by describing headings, chapters, paragraphs, footnotes, etc.

    SGML: Standard Generalized Mark-up Language (ISO Standard 8879-1986)

    A standard meta-language, or syntax, for the specification of an unlimited number of mark-up languages. An SGML document has three elements: the Declaration (describes processing environment needed); the Document Type Definition (DTD) (a defined tag set that forms a template for describing the structure and content of a specific type of document); and the Document stream itself. SGML is independent of any system, device, language or application, and, because it separates document content definition from presentation, it allows information to be accessed or presented in ways not predicted at the time of mark-up. SGML viewing software (e.g. Panorama) parses/interprets the SGML document content according to its DTD instructions. SGML is anticipated to be a key standard in digital library development.

    Some relevant DTDs are :

    EAD (Encoded Archival Description) - a DTD for archival material.

    US MARC DTD (see under Metadata-Description)

    TEI (Text Encoding Initiative) - a DTD for a wide range of scholarly resources, initially developed for the humanities.

    XML (see below)

    HTML is the most common DTD (see below).

    DSSSL: Document Style, Semantic and Specification Language (ISO 10179)

    A standard associated with SGML that specifies the rules for a non-proprietary language to govern the appearance and style for the logical components (e.g. chapter headings) defined by SGML.

    XML: Extensible Mark-up Language (".xml")

    A simple, reduced subset of SGML designed (in 1996) for ease of implementation and interoperability with both full SGML and HTML. Currently a draft meta-language application profile, it is simpler than SGML (reducing a 500-page reference to 26 pages). Unlike HTML, XML supports (optionally) user-defined tags and attributes, allows nesting within documents to any degree of complexity, and can contain an optional description of its grammar for use by applications that need to perform structural validation. Every valid XML document will be a conformant SGML document. Not backward compatible with HTML documents, although those conforming to HTML 3.2 can easily be converted. Not intended to supplant HTML but to complement it. The XML character set is Unicode. XML is being widely discussed currently and future releases of MS Internet Explorer and Netscape browsers may be XML-enabled.

    HTML: Hypertext Mark-up Language (".htm" ".html")

    A reduced tag set version of an SGML DTD that provides a set of platform-independent styles (defined by tags) used to define the components of a Web document. HTML 2.0 is an IETF standard; 3.0 was an IETF draft (which have 6 month lifespans); HTML 3.2 was announced May 1996 to supplant 2.0 as lowest common denominator. Version 3.2 incorporates all of 2.0 and popular features of 3.0 such as tables but not frames. Version 4.0 was released as a draft in July 1997. While HTML tags are primarily structure-related, there are increasingly accepted tags for specifying presentation and layout.

    DHTML: Dynamic HTML

    Denotes recent developments by both Netscape and Microsoft that use a combination of Cascading Style Sheets (see below) and a scripting language such as Visual Basic script or Javascript to merge the HTML document with the style sheet. Supports greater creative control over the visual presentation of an HTML page and allows the page to respond dynamically, without a call to the server, to user-generated events.

    Cascading Style Sheets (".css")

    A new approach to increasing control over the visual formatting of HTML documents (e.g. spacing, colours, backgrounds, choice of fonts, drop shadows, layering, relative and absolute positioning, on/off visibility of options, choice of media such as print, display, braille, aural). CSS tags are in a separate document or part of the document (rather than being embedded in the text as with traditional HTML), so can be changed, updated across multiple documents quickly. Cascading style sheets can be cached locally and reused, so their deployment can result in bandwidth/response time gains.

  • Documents: Container document formats

    OpenDoc

    Aims to enable embedding of features from different application programs into a single working document.

    OLE: Object Linking and Embedding

    Microsoft’s proprietary distributed object system that allows an application to manage part of its contents in another application. For example, an Excel spreadsheet of changing data could be invoked in its up-to-date version from a word processing document.

  • Still images: Bitmapped (or Raster) formats

    These store information about individual pixels or dots. Generally storage-intensive, so tend to be used for single images.

    GIF: Graphics Interchange Format (".gif")

    Widely used image format that displays well on most computer systems, but is limited to 256 colours. Uses a lossless compression technique. Results in relatively small files available for immediate display alongside text in Web documents, so commonly used for toolbars, icons and inline images. Can be "interlaced" (whole image displays with sharpening clarity rather than sequential line-by-line clear display). One colour can be transparent (good for floating images/icons on backgrounds). Better than JPEG for sharp line, black-and-white, and gray-scale images.

    JPEG: Joint Photographics Expert Group (".jpg")

    A lossy compression format. Over 16 million colour hues available. Better than GIF for real-world images such as colour photographs.

    TIFF: Tagged-Image File Format (".tif" or "tiff")

    Stores very large amount of information about an image. Supports different types of compression (lossy and lossless). Widely used, but mostly as an intermediary format between scanners and desk-top publishing programs.

    PNG: Portable Network Graphics (".png"; pronounced "ping")

    Intended to replace GIF, with improvements in error detection and interlacing speed and greater compression rates. An emerging format, but is not yet widely used.

    Photo CD

    PhotoCD is Kodak’s proprietary format for the digital storage of high resolution images on CD. The images can be viewed at a range of resolutions and manipulated using image processing software.

    CCITT Group 4 Fax

    CCITT have developed a series of compression mechanisms for transmitting black and white images across telephone lines by fax machines. The standards are officially known as CCITT Recommendations T.4 and T.6 but are more commonly known as Group 3 and Group 4 compression respectively. Group 4 Fax is in common use.

  • Still images: Vector formats

    These store information (mathematical algorithms) about lines and curves making up the image. Used for compound documents (e.g. docs. combining complex formatted text, images, etc.). Scale well to display at various degrees of magnification. PostScript and PDF use vector imaging (see Page Description Formats above).

    CGM: Computer Graphics Metafile

    Standard for the storage and exchange of 2D graphical data. Initially was a vector format, but has recently been extended to include Raster storage capabilities. Four international standardized profiles have been developed which specify how CGM will be used in within MIME-compliant e-mail and on the Web.

  • Audio formats

    There is a recent proliferation of proprietary Internet audio products/formats. These are the most widely employed.

    AIFF: Audio Interchange File Format (".aif" or "aiff")

    Macintosh audio file format.

    RIFF WAVE (".wav")

    Originally Microsoft Windows’ audio file format, now extended to other platforms. Stereophonic.

    mLaw (".au")

    Another common Internet audio file format, from Sun Microsystems. Works on all platforms, but of lower quality. Stereophonic.

    RealAudio (".ra" or ".raf")

    Progressive Networks’ very popular proprietary audio product. Uses "stream" delivery, which means the audio starts to play as soon as first bits are received by user’s computer. Sound document is not saved on the client. Stereophonic with version 3.0 (previous versions were monophonic only).

  • Moving image file formats

    There is likewise a recent proliferation of proprietary Internet video products/formats. The following are the most widely employed.

    QuickTime Movies (".mov")
    Apple’s proprietary compression standard for video files, now works on other platforms.

    AVI: Audio-Video Interleaved (".avi")
    Video for Windows file format.

    MPEG: Moving Picture Expert Group (".mpg")
    A compression standard for video, which transfers all information of every tenth frame, with the subsequent nine frames being transmitted only as significant changes to that reference frame. Works on all platforms, but of lower quality.

    RealVideo
    Progressive Networks’ video product that uses stream delivery.

    GIF 89a: Graphic Interchange Format 89a ("animated GIF")
    A simple, popular and ubiquitous approach to single image animation.

  • Multimedia / Interactive formats

    ShockWave
    Macromedia’s multimedia product supports games, animated interfaces, interactive ads and demos, and CD-quality audio streaming. Included in Netscape Navigator and Microsoft Explorer.

    Java, Active X
    One use of Sun Microsystem’s Java programming language, or Microsoft’s Active X, is to support multi-window, multimedia data streaming. Microsoft and Netscape are integrating both technologies into their browsers.

  • 3-dimensional modeling formats

    QTVR: QuickTime Virtual Reality
    Apple’s proprietary product that allows a user to view objects and locations from 360-degrees by accessing a small file and playing it like a QuickTime movie. The user’s viewing options (or movement) are actually limited to the number of fixed points from which the real object was still-photographed. Built into Netscape 3.0.

    VRML: Virtual Reality Modeling Language (".wrl")
    The current public domain specification (Version 2.0) provides for the design and implementation of a platform-independent, ASCII-based, language for virtual reality scene description. The object must be modeled (i.e. unlike QuickTime VR, it is not a real object that has been photographed). Its appearance will be like computer animation, and is often slow and jerky. Built into Netscape 3.0.

  • Data streaming protocols

    RTSP: Real Time Streaming Protocol
    A proposed industry standard announced by Netscape, Progressive Networks and 40 other companies, this protocol addresses issues like reliability, quality, fidelity, packet loss, and start/stop commands for audio and video real-time data streaming.

    NetShow Standard
    Microsoft’s competing specification for data streaming. Goes farther than RTSP insofar as attempts to specify a compression and decompression standard in addition to start/stop.

    Metadata
    Term means "data about data", or specifically in the Web context, machine-understandable information to identify, locate, and/or describe Web resources. Equivalent traditional library standards include ISBN and ISSN (identification), shelf mark/call number (location), ISBD and AACR2 (bibliographic description), LC and DDC (subject classification), LCSH (subject headings), and MARC (machine-readable communication format).

  • Metadata exchange

    RDF: Resource Description Framework
    A technical framework being developed by the W3C to support interoperability of metadata describing any item that can have a URI (see below). Schemes such as PICS and Dublin Core (see below) are standard, predefined vocabularies within the framework. XML is the encoding syntax.

  • Unique identification

    URI: Uniform Resource Identifier
    The inclusive term for the set of technologies -- currently including URLs, URCs and URNs but extendable -- that have been developed under the auspices of the Internet Engineering Task Force (IETF) for naming, addressing, and to some extent describing Web resources.

    URN: Uniform Resource Name
    Developed to address the need for a global, persistent, and unique identifier for an electronic resource (as opposed to a URL, which although currently serving an identification function, is tied to the resource’s location, which can be multiple and can change at any time). Requires URN registries (to ensure no duplication) and resolution systems (to map to the location(s) of the resource). An example syntax is urn:hdl:cnri.dlib/august95. No implementations to date.

    ISBN, ISSN, ISMN
    Standard bibliographic identifiers for print material that can be applied to electronic books, serials, and music.

    SICI: Serial Item and Contribution Identifier (ANSI/NISO Z39.56-1996 Vers. 2)
    Being developed to identify serial issues and articles uniquely regardless of distribution medium (paper, electronic, microform). Not yet implemented.

    DOI: Digital Object Identifier
    Being jointly developed by the Association of American Publishers and the Corporation for National Research Initiatives (CNRI) to identify digital objects—which could be books, chapters, articles, images, recordings, videos or other creative works—primarily for the purposes of effective rights management and digital commerce. Not unlike ISBNs in formulation: a component to the left of the slash denotes the registrant’s prefix, and the component to the right of the slash is the object’s unique identifier, as assigned by the registrant (e.g. 10.65478/45920). Some publishers have begun implementing a DOI prototype.

  • Location

    URL: Uniform Resource Locator
    Electronic address that specifies (in order): communication protocol, host domain/server, directory path, file name and file type. Whenever any of these (location, access method, or name) is changed, link will be broken unless there is a page linking user to new location, or the URL is actually a PURL.

    PURL: Persistent Uniform Resource Locator
    An approach to the URL permanence problem proposed by OCLC. A PURL is a public alias for a document. A PURL remains stable, while the document's background URL will change as it is managed (e.g. moved) over time. A PURL is created by a Web administrator who is registered as a PURL "owner" and who maintains a mapping of the PURL to a current and functioning URL. A PURL is a form of URN.

  • Description

    URC: Uniform Resource Citation, or Uniform Resource Characteristics
    Developed in conjunction with URNs as a means of describing Internet-accessible resources. URCs are a set of values which may include authorship, publisher, datatype, date, copyright status, etc. as well as URIs of various kinds (i.e. its URN, and applicable URLs). The standard also specifies the structure for storing metadata, and the operations for building and querying that structure. It encourages development of URC Subtypes to define data elements appropriate for different types or classes of resources. Not yet implemented.

    Dublin Core
    Dublin Core Metadata Element Set consists of 15 descriptive data elements relating to content, intellectual property and instantiation. The elements are title, creator, publisher, subject, description, source, language, relation, coverage, date, type, format, identifier, contributor and rights. They are to be supplied by the producer of the resource. The Warwick Framework set out a conceptual approach to implementing the Dublin Core, one of which is embedding the data in an HTML document using the META tag. DC is being widely discussed and there is a growing corpus of implementation projects in over 10 countries. There is a Dublin Core-USMARC mapping.

    GILS: Government Information Locator Service
    Developed in the U.S. and now being adopted in other countries, GILS is a decentralized collection of systems containing databases of GILS records describing location and access information for publicly-available government information resources. Z39.30 is the access mechanism that has been specified for searching these systems, but they can also be searched through the Web. There is a GILS-USMARC mapping, and an SGML profile has been developed for GILS records.

    TEI Headers: Text Encoding Initiative headers
    Headers for TEI (an SGML DTD) documents, usually scholarly resources such as prose, verse, drama, dictionaries, etc. describing the file (title statement, publication statement, source description), its encoding, its revision history, etc. Becoming more widely implemented.

    EAD: Encoded Archival Description
    An SGML DTD for archival finding aids. Not widely implemented.

    ISBD, AACR2, LC, DDC, LCSH, MARC, etc.
    Traditional library standards for describing traditional library resources that can be used for electronic resources. Recently-added US MARC tag 856 supports provision of URLs in bibliographic records.

    MCF: Meta Content Framework
    An open format for representing information about content. The content targeted includes web pages, gopher and ftp files, desktop files, email and structured databases. The corresponding meta-content includes indices such as Yahoo!, gopher and ftp directory structures, email headers, data dictionaries, etc. Currently version 0.95.

    SOIF: Summary Object Interchange Format
    A structured indexing format that permits structured queries (e.g., matching keywords only against author or title lines in documents). SOIF's support for arbitrary data means it can be used for more complex search applications, such as image and audio searching.

    PICS: Platform for Internet Content Selection
    A WC3 (World Wide Web Consortium) working group has proposed a set of metadata labels to support both self-description and third-party description and rating of electronic document/resource content. Goal is to give Internet users the ability to select resources effectively based on that metadata description, which will be carried with resources.

    INFORMATION SEARCH AND RETRIEVAL

  • Web Search & Retrieval

    Web browsers
    Netscape and Microsoft Internet Explorer are standards for Web browsers, which are software that acts as an interface between the user and the World Wide Web. Browsers are also referred to as web clients because in the client/server model, the browser functions as the client program.

    Web search engines
    No standards govern Web search engines. Most are databases of indexed keywords from Web resources, and feature boolean search capability, phrase or word searches, and relevancy ranking. Many provide browsable directory (structured) access in addition to keyword indexing.

  • Database Search & Retrieval

    Z39.50: ANSI/NISO Information Retrieval Standard Z39.50-1995

    As of late 1996, also adopted as:
    ISO 23950

    Specifies the rules and procedures of two systems communicating for the purposes of database searching and information retrieval. There are two parts to the standard: the "origin" portion supports the querying of remote systems; the "target" portion translates queries to the logic of the target database system and returns records or results sets. From a searcher’s perspective, the standard enables the searching of different systems through use of one familiar user interface.

    SQL: ISO/IEC 9075:1992 Information Technology --- Database Languages --- SQL, also ANSI X3.135-1992 Database Language SQL (Structured Query Language)

    SQL is a popular standard interactive and programming language for getting information from and updating a relational database. It allows DBMS products from different vendors to interoperate. SQL defines common data structures (tables, columns, views) and provides a data manipulation language to update and query those structures.

  • Interfaces

    Z39.59: ANSI/NISO Common Command Language Standard
    Prescribes interface commands for standard database search operations.

  • Document Request

    ILL Protocol: ISO 10160 and 10161
    OSI-based protocol that specifies rules to permit the automated exchange of ILL messages between diverse ILL systems and supports the management of ILL transactions.

  • Directories

    Directories manage distributed collections of information about people or resources. A directory is typically used to hold addressing information, but it can also be used to hold information on capabilities, accounting, or other attributes of the object being described.

    X.500: CCITT X.500/ISO 9594 Directory Standard
    An OSI protocol for managing online directories of users or resources which provides a hierarchical structure corresponding to the world's classification system: countries, states, cities, streets, houses, families, etc. The goal is to have a directory that can be used globally. An X.500 directory is distributed in nature (i.e. directory information may be distributed among several open systems) but provides a single view to its users.

    LDAPv3: Lightweight Directory Access Protocol, Version 3
    A client/server protocol for accessing a directory service, LDAP is a simplified version of the directory access protocol portion of the X.500 protocol designed to run directly over the TCP/IP stack. LDAPv3 is an update developed in the IETF, which addresses the limitations found during deployment of the previous version of LDAP, RFC 1779. It also adds new features, improves compatibility with X.500(1993) and also better specifies how LDAP can be used with non-X.500 and standalone directories.

    WHOIS++
    WHOIS++ is another lightweight client/server distributed directory search mechanism that initially had widespread adoption as a means of locating Internet users, but is gradually being replaced by X.500 or LDAP services.

    INFORMATION STORAGE

  • Optical storage media

    CD-DA: Compact Disc-Digital Audio, or CD-Audio
    1980 standard known as Red Book that defined music compact discs in terms sampling rate (44,100 samples per second), range of values (65,536), and physical format: lead in (contains table of contents and each track’s location), program area (tracks) and lead out (silence).

    CD-ROM: Compact Disc-Read Only Memory
    1983 Yellow Book standard retained the physical format of Red Book, added more error correction, and allowed two data structures: Mode 1 (ISO 9660), best for data unforgiving of error such as computer programs or databases; and Mode 2, for data more tolerant of error such as audio, video and graphics.

    CD-i: Compact Disc Interactive
    Green Book standard, based on Yellow Book Mode 2 that defines the disc, its contents, special compression methods for audio and visual data, an interleaving method for audio, video and textual data, and a hardware and software system designed to interact with television and stereo systems and, more recently, with the Web.

    CD-R Compact Disc-Recordable
    Orange Book, Part II, 1988 specification that supports one-time recording onto disc of all types of random access data, using a laser-sensitive dye layer in the disc make-up.

    CD-RW Compact Disc-Rewritable
    Orange Book, Part III, 1994 supports erasable disc recording based on a phase-change- sensitive film in the disc make-up. Not backward compatible with CD players or CD-ROM drives due to a difference in media reflectivity.

    Photo CD, Video CD
    Video-CD is the White Book specification.

    DVD Audio, DVD-ROM, DVD-R, DVD-RAM, DVD Video
    The next-generation optical disc formats, with higher storage capacity (8-15 times more) than its CD equivalents. "DVD" used to stand for Digital Video Disc, then Digital Versatile Disc, now it is just DVD.

  • Magnetic storage media

    Magnetic storage media such as magnetic tape, diskettes, and cartridges are prolific and largely proprietary, and thus were excluded from this paper.

    Selected Sources

    Alschuler, Liora. ABCD...SGML: A user’s guide to structured information. International Thomson Computer Press, 1995.

    Cleveland, Gary. Electronic Document Delivery: Converging standards and technologies. IFLA UDT series on data communication technologies and standards for libraries, 1991.

    Dempsey, Lorcan, et.al. eLib Standards Guidelines. Version 1.0, February 26, 1996. http://ukoln.bath.ac.uk/elib/wk_papers/stand2.html

    Dictionary of PC Hardware and Data Communications Terms. http://www.ora.com/reference/dictionary/

    EWOS Guide to Open Systems Specifications (GOSS). http://www.ewos.be/dir/gtop.htm

    Free On-line Dictionary of Computing. http://wombat.doc.ic.ac.uk/foldoc/index.html

    Guenette, David R. and Dana J. Parker. "CD, CD-ROM, CD-R, CD-RW, DVD, DVD-R, DVD-RAM: The Family Album." E-media Professional. Vol. 10, no. 4, April 1997, pp. 31-52.

    Hodges, Jeff. et.al. An LDAP Roadmap & FAQ: http://www.kingsmountain.com/ldapRoadmap.shtml

    Info2000 Directory Services. http://www2.echo.lu/oii/en/directory.html

    Internet Users' Glossary. http://ds.internic.net/rfc/rfc1983.txt

    InterNIC. 15-minute series. http://rs.internic.net/nic-support/15min/

    National Institute of Standards and Technology. http://www.nist.gov/

    NetLingo: A dictionary of the Internet Language. http://www.netlingo.com/

    Network Notes. National Library of Canada. 1995- . http://www.nlc-bnc.ca/pubs/netnotes/netnotes.htm

    The Open Information Interchange Initiative. http://www2.echo.lu/oii/en/oiistand.html

    Pfaffenberger, Bryan. Internet in Plain English. MIS Press, 1994.

    TechWeb Tech Encyclopedia. http://www.techweb.com/encyclopedia/defineterm.cgi

    U-Geek Glossary. http://www.ugeek.com/glossary/glossary_search.htm

    UKOLN Directory Services: http://www.bath.ac.uk/~ccsap/Directory/

    Weibel, Stuart and Juha Hakala. "DC-5: The Helsinki Metadata Workshop". D-Lib Magazine, Feb. 1998. http://www.dlib.org/dlib/february98/02weibel.html

    Welz, Gary. "Multimedia comes of age," Internet World. Vol.8, no.2, pp. 44-49.

    W3C. Naming and Addressing: URI's. http://www.w3.org/pub/WWW/Addressing/Addressing.html

    W3C. Resource Description Framework (RDF). http://www.w3.org/RDF/

    Whatis.com, Inc. http://whatis.com/

    Acknowledgements

    I would like to thanks my colleagues in Information Analysis and Standards at the National Library of Canada—namely, Gary Cleveland, Terry Kuny, Chris Robertson, Barbara Shuh, Leigh Swain, Fay Turner, and Michael Williamson—for reviewing and suggesting revisions to sections of this paper.


    Canada Copyright. The National Library of Canada. (Revised: 1998-06-23).