Skip to content

Glossary

Several publicly available websites were consulted to create the final definitions you see here. Principal among them is Wikipedia.org.

Byte Order Mark (BOM)

The byte order mark (BOM) is a Unicode character, FEFFhex BYTE ORDER MARK (BOM), whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text:

  • The byte order, or endianness, of the text stream
  • The fact that the text stream's encoding is Unicode, to a high level of confidence
  • Which Unicode encoding the text stream is encoded as

Warning

BOM use interferes with the use of UTF-8 by software that does not expect non-ASCII bytes at the start of a file but that could otherwise handle the text stream.

Media Type

A media type (formerly known as MIME type) is a two-part identifier for file formats and format contents transmitted on the Internet. A media type consists of a type and a subtype, which is further structured into a tree. A media type can optionally define a suffix and parameters: Type ".html" [tree "."] subtype ["+" suffix] *[";" parameter].

Common Examples

application/javascript
application/json
text/html; charset=UTF-8
text/plain
text/xml

Named Entity

In addition to native character encodings, characters can also be encoded as character entity references. Character entity references are also sometimes referred to as named entities, or HTML entities for HTML.

Named Entities have the format &name; where name is a case-sensitive alphanumeric string. For example, "λ" can also be encoded as &lambda; in an HTML document. The character entity references &lt;, &gt;, &quot; and &amp; are predefined in HTML and SGML, because <, >, " and & are already used to delimit markup. XML adds other named entities, notably the &apos; (') entity.

For the purposes of this CLOD definition all SGML, HTML and XML named entities are allowed.

Newline

Newline (frequently called line ending, end of line (EOL), line feed, or line break) is a control character or sequence of control characters in a character encoding specification (e.g. ASCII or EBCDIC) that is used to signify the end of a line of text and the start of a new one. Text editors set this special character when pressing the Enter key. When displaying (or printing) a text file, this control character causes the text editor to show the following characters in a new line.

The Unicode standard defines several characters that conforming applications should recognize as line terminators:

  • LF: Line Feed, 000Ahex
  • VT: Vertical Tab, 000Bhex
  • FF: Form Feed, 000Chex
  • CR: Carriage Return, 000Dhex
  • CR+LF: CR (000Dhex) followed by LF (000Ahex)
  • NEL: Next Line, 0085hex
  • LS: Line Separator, 2028hex
  • PS: Paragraph Separator, 2029hex

Numeric Character Reference

In addition to native character encodings, characters can also be encoded as numeric character references (decimal or hexadecimal).

A numeric character reference refers to a character by its Universal Character Set/Unicode code point, and uses the format &#nnnn; or &#xhhhh; where nnnn is the code point in decimal form, and xhhhh is the code point in hexadecimal form.

Schema

A Schema is an underlying organizational pattern or structure.

A CLOD Schema is a document or documentation that describes a CLOD Set, typically expressed in terms of constraints on the structure and content of that CLOD Set, above and beyond the basic syntactical constraints imposed by the CLOD definition itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.

Tip

A CLOD Schema is to a CLOD document what an XML Schema is to an XML document.

URL (Uniform Resource Locator)

A URL is a compact string representation for a resource available via the Internet. URLs are used to "locate" resources, by providing an abstract identification of the resource location.

The Url For This Page

https://clod.omegatower.net/glossary/

UTF-8

UTF-8 is a variable width character encoding capable of encoding all 1,112,064 valid code points in Unicode using one to four 8-bit bytes. The encoding is defined by the Unicode standard, and was originally designed by Ken Thompson and Rob Pike. The name is derived from Unicode (or Universal Coded Character Set) Transformation Format – 8-bit.

UUID

A UUID is a 128-bit number used to identify information in computer systems. When generated according to the standard methods, UUIDs are, for practical purposes, unique. Their uniqueness does not depend on a central registration authority or coordination between the parties generating them, unlike most other numbering schemes. While the probability that a UUID will be duplicated is not zero, it is close enough to zero to be negligible.

Example

9e1defb2-d94a-4ae9-b78d-157eb0f19a3c