Globally Unique Identifiers

By Claude Ostyn
Copyright © 2006 Claude Ostyn

Overview

SCORM packages, Reusable Competency Definitions and various digital objects intended for reuse require globally unique identifiers. Globally unique identifiers are required to guarantee that when you refer to something you are not accidentally referring to something else even when content or other digital objects from different sources are mixed and matched in the same application or repository. There are two ways to generate a globally unique identifier. One is to get one from a registry, and the other is to generate it by using an algorithm that guarantees uniqueness to an extremely high degree of probability. Since registries are often not practical, most globally unique identifiers in use today are generated algorithmically.

GUID or UUID

The acronyms GUID (globally unique identifier) and UUID (universally unique identifier) basically mean the same thing, although GUID most often refers to a particular form of universally unique identifier that is built according to an algorithm that uses various techniques to achieve uniqueness. A popular algorithm is specified in IETF RFC 4122.

GUIDs are used in all kinds of applications, from cryptography to email. For example, since different email messages may accidentally have the same sender and recipient, the same subject line and even the same timestamp, a GUID is usually created for each message to help distinguish that message from every other message. It is much easier to just look at the GUID to determine that two messages are different than to look at all the other information in the message header.

Working with GUIDs

Globally unique identifiers are typically not human readable, and they are not intended to be read or interpreted by humans. In fact, when you are sending or receiving email or working with many productivity applications, you are using GUIDs all the time but they are wisely hidden from you by the applications. A GUID looks like a meaningless sequence of characters, but that sequence of characters is guaranteed to be globally unique -- there is no other identifier in the world that has the same sequence of characters. Since the identifiers carry no meaning, they are intended for machine processing and manipulation. When working with GUIDs in manual processes, always use copy and paste to avoid transcription errors.

Registries

The Handle System is a registry-based system to administer globally unique identifiers. Often, registry based unique identifiers combine two parts. The first part is the registration number of the entity that created the second part. For example, ISBN numbers what you find on any book in a bookstore are guaranteed unique for every edition of every book. The first digits of the ISBN are always the same for a publisher or some other entity who then assigns the following digits for individual books. The main registrar only registers "naming authorities" and assigns them a unique identifier which is part of the final identifier for any item. The "naming authority" then assigns the rest of the identifier. The Handle System identifiers are based on this two part this model also. See http://www.handle.net for more information.

Registry-based identifier schemes like the Handle System support additional functionality that is not available with algorithmically generated GUIDs. Since the IDs are registered, they can be used to look up additional information. For example, the Handle System allows you to register a unique identifier for a web page, and the identifier can be used with a resolution service to get the actual URL of the web page, wherever it may be. This is why a Handle System globally unique identifier is called a "handle". GUIDs, on the other hand, are just meaningless identifiers. They are used to guarantee that when you refer to something you are not accidentally referring to something else.

Algorithms

There are a number of web sites with useful information about the creation of GUIDs. By searching the Web with the keywords "generate GUID", for example, you will find several sites that generate a GUID at the click of a button. The better sites include links and explanations to confirm that they are using a good algorithm. Generally the GUID is calculated in part by using the current date and time; however, since that is not sufficient to guarantee uniqueness, various forms of entropy must be added to help ensure uniqueness. Note that it is easy to write an algorithm to generate something that appears to be a GUID, but rather difficult to do so in a way that generate a GUID that you can use confidently without fear of a collision with some other identifier. So, if you are using such a web site to generate your GUIDs, do verify that it is credible and documents how it generates the GUID.

For example, this Web UUID Generator is very easy to use casually and seems legitimate.

If you use Windows and you have a Windows .NET development environment, a GUI generator is already on your system. Otherwise, Microsoft has a free downloadable utility to generate GUIDs named GuidGen that creates one GUID at a time and puts it on the clipboard so you can paste it into your application. The download page contains instructions on how to install and use this utility, as well as the actual download link. If the above link is broken, go to Microsoft Downloads and search for "GuidGen".

Identifier string formats

You will sometimes see a GUID in the form "{xxxxxxxx}". For example:
   {F07CDF2A-2E1E-11DB-8AF6-B622A1EF5492}.
However, most specifications, including XML, require that the ID values be a "name production". What this means is that typically the value must be a string of alphanumeric characters. Some additional characters might be allowed, such as the underline character, but the hyphen may be problematic. Usually, the name must begin with a character that is not numeric. To play it safe and provide the maximum compatibility in future applications, it may be a good idea to format or reformat your identifiers to conform to this before you use them. For example, this would mean removing the opening and closing braces in a GUID and adding an arbitrary alphabetic prefix if the first character of the resulting string is not alphabetic. The example above would become
   F07CDF2A_2E1E_11DB_8AF6_B622A1EF5492
This is also a legal object or variable name in most programming languages as well as an indexable key value in most database systems.

In some contexts, you may have to massage the identifier in other ways. The Handle System handles do begin with a numeric prefix and include at least one "/". When used in a context other than the Handle System that requires a name production, for example as value for the XML Schema type "ID", it may be necessary to add a prefix such as "handle_" or some other format, and otherwise escape the rest of the string to sanitize the "/" and any other offending character. If a URN is required, the handle must be encapsulated in a URN. The format to do this is not entirely clear, but the format format URN:hdl: naming authority / name has been proposed in the past in at least one Library of Congress document.

Terms of use for this document

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.