A Formal Public Identifier (FPI) is a short piece of text with a particular structure that may be used to uniquely identify a product, specification or document. FPIs were introduced as part of Standard Generalized Markup Language (SGML), and serve particular purposes in formats historically derived from SGML (HTML and XML). Some of their most common uses are as part of document type declarations (DOCTYPEs) and document type definitions (DTDs) in SGML, XML and historically HTML, but they are also used in the vCard and iCalendar file formats to identify the software product which generated the file.
More recently, Uniform Resource Identifiers (URIs) and universally unique identifiers (UUIDs) are usually used to uniquely identify objects. FPIs have become a legacy system.
An FPI consists of an owner identifier, followed by a double slash, followed by a text identifier.[1] For example, the identifier "-//W3C//DTD HTML 4.01//EN
" can be broken down into two parts: the owner identifier which indicates the issuer of the FPI, and the text identifier which indicates the particular document or object the FPI identifies. In the example, the owner identifier is "-//W3C
" and the text identifier is "DTD HTML 4.01//EN
".
The text identifier itself consists of multiple constituent parts.[1] Sequences of whitespace are treated as equivalent to a single space.[1]
There are three types of owner identifier, distinguished by their first three characters, which are for an ISO owner identifier, for an unregistered owner identifier or for a registered owner identifier.[1]
An ISO owner identifier is either an ISO publication number such as, or an ISO-IR registration number given as e.g. for ISO-IR-111. The latter type is only permitted for FPIs (see below). In either case, it is distinguished by beginning with the characters, and does not require any prefix before those characters.[1]
The year was formerly separated from the standard number by a hyphen (e.g.),[2] which use is now deprecated.[3] The hyphen is now, instead, used to separate the part number from the standard number (replacing earlier use of a single slash for that purpose);[4] the year follows any part number if present, and is separated by a colon .
An unregistered owner identifier begins with .[1] Owners which use unregistered identifiers include the W3C, the Internet Engineering Task Force, the United States Department of Defense,[5] the European Parliament and others. Since it is not registered, it is not guaranteed to be unique (another owner may choose the same owner identifier), which weakens the uniqueness guarantee of the FPI as a whole, although it is still guaranteed to be distinct both from all other FPIs with the same owner, and also from all FPIs with registered owners.[1]
A registered owner identifier begins with the characters . It refers to a registered identifier as stipulated by ISO 9070.[1] The portion which is actually registered is the registered owner prefix, which follows the and may optionally be followed by one or more owner-assigned portions which might identify, for example, departments within an organisation.[1] If owner name components additional to the registered prefix are used, they are separated from the prefix by a pair.[6]
A registered owner prefix conforming to ISO 9070 may be one of the following:[6]
example.net
could issue FPIs using the owner identifier "+//IDN example.net
".Text identifiers can be broken down into the class, description and language. In the example -//W3C//DTD HTML 4.01//EN
, the class is "DTD
", indicating that the FPI represents a document type definition; the description is "HTML 4.01
"; and the language is "EN
" which suggests that the document type definition is written in English (though documents conforming to the DTD do not need to be in English). The class is separated from the description using a space character; the description is separated from the language using a double slash. The text identifier may optionally contain a version indicator after the language, also separated by a double slash.
The text identifier immediately follows the pair after the owner identifier, and must begin with one of the following block-capital words followed by a space, specifying the public text class:[1]
, and refer to SGML documents or fragments of SGML documents.[1] Those of the class are intended to be referenced using a text entity (without an entity-type keyword, i.e. inserted directly into the document), while those of the class are intended to be referenced using a subdocument entity (with the keyword in the entity declaration, i.e. interpreted with their own individual schemas, namespaces, and so forth).[1] Those of the class are not intended to be referenced as an entity from an enclosing document.
and refer to portions of an SGML declaration. (for an entire SGML declaration) was added to this list by a later extension added to the standard as an annex, which also specifies certain extensions required by XML. refers to an SGML link process definition (defining a transformation from one SGML format to another)., and refer to portions of a document type definition (DTD) consisting of specific types of markup declaration. refers to an entire DTD.
The remaining three refer to concepts from outside of SGML: refers to a coded character set, to a format such as a file format (either for references to entities from external files, or for interpreting a textual format contained within an element),[1] and to an asset in a non-SGML format.
The space after the text class name is followed by the sequence if the FPI refers to unavailable public text[1] —i.e. a document, file or specification which is not available for access or purchase by the general public.[1] The public text description follows this marker; for an available public text, the description immediately follows the space after the text class name.[1] For an ISO publication, the description is taken from the final element of the title of the publication, not counting any part number; otherwise, it can be any suitably unique string of permitted characters.[1] The description is terminated by another pair.[1]
The part of the FPI following the description depends on the text class. For FPIs, it is a public text designating sequence,[1] giving a textual representation of an ISO/IEC 2022 designation escape sequence in column/line notation (e.g.); registered designation escapes are expected to match the ISO owner identifier given, while private-use designation escapes are namespaced by the FPI owner identifier.[1] As an example of this type of FPI, the FPI is used in HTML 4's SGML declaration to identify Unicode.[7]
For all other FPIs (i.e. those where the class is not), the part following the description is a public text language which is a sequence of uppercase letters, strongly encouraged (but not mandated) to be an ISO 639-1 code.[1] Stopping short of mandating the use of an ISO 639-1 code avoids requiring validating software to check whether the language is an ISO 639-1 code, and also allows for extensibility:[1] for example, a small number of FPIs used in practice use ISO 639-3 codes (such as for Low German)[8] or IETF language tags with hyphens removed (such as for Serbian written in Gajica)[9] for cases where ISO 639-1 codes prove insufficient for distinguishing a resource from versions in other languages or language varieties. In accordance with recommendations made by ISO 9070, Steven DeRose and David G. Durand suggest using if no ISO 639 code is applicable.
The specification notes that while the language of the resource might affect the data and names defined and the language of any source-code comments, the language affects the usability of some text classes more than others.[1] For example, the language given in the FPI in an HTML 4 or XHTML 1 DOCTYPE declaration should not be changed, regardless of the language of the web page itself; by contrast, the DSSSL stylesheets for DocBook internally use FPIs with different languages to identify string-table entity sets for particular localisations.[10]
Additionally, except for,, and FPIs, for which the designating sequence or language must be the final part,[1] the language code may be followed by another pair,[1] followed by a public text display version, which specifies a particular platform that the implementation of SGML entities should target.[1] For example, the base entity set defines the Latin-1 named entities using tautological entities,[1] [11] while implements them using Unicode code point references for use in XML.[12] Similarly, the common entity set for HTML 5 and MathML uses the FPI .[13]
FPI | Meaning | |
---|---|---|
+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION CompuServe Graphic<!--yes, the S is omitted--> Interchange Format//EN [14] | The GIF image format (equivalent to MIME type)—an example of an ISBN FPI owner, and of a FPI used as a file-format identifier. | |
The French: Système International base unit of kilogram | ||
The Unicode (or UCS) coded character set, as referenced by numeric character references in HTML—an example of a FPI | ||
Document type definition of HTML 4.01 Transitional; corresponds to the system identifier URL [15] | ||
[16] [17] | Facebook Events as an iCalendar file generator—an example of a FPI used to identify a software package | |
[18] | Results of the European Parliament roll-call vote on Support for EU Strategic Framework on health and safety at work 2014-2020 (A8-0312/2015)—an example of a FPI used to identify a document in a non-SGML format (in this case, PDF) | |
Character entity set of HTML 5 and MathML; corresponds to the system identifier URL —an example of an FPI | ||
[19] | Document type definition of a variant of DocBook 3.1 which can simultaneously constitute a DSSSL stylesheet—an example of a domain name used as an FPI owner | |
[20] | The LaTeX format, as used for formulae in academic works—an example of an FPI using a language code other than to refer to non-English reference material | |
Respectively English, French and Greek string tables used by DSSSL stylesheets for DocBook (for example, the entity is defined as French: {{code|"Chapitre" in French[24] and in English)[25] —an example of use of a FPI language code to differentiate between different localisations |
The FPI is undoubtedly the least well-understood part of the document type declaration (DOCTYPE), an integral component of valid HTML, XML and Standard Generalized Markup Language (SGML) documents.[26] The Formal Public Identifier's effect upon its host document is unusual in that it can depend not only upon its own syntactical correctness and the behaviour of the program parsing it, but also upon the ISO-registration status of the organisation responsible for schema referenced by the FPI.[27]
SGML uses two forms of identifier for resources: system identifiers are unique and meaningful only within a particular system, while public identifiers are unique and meaningful within a wider scope.[1] The term "public" here does not necessarily mean that the resource is available to the general public—it may only be available within a single organisation, for example (in which case, it is an unavailable public text)—but only that it exists outside of the context of the particular system environment or document which it is referenced in.[1] An FPI is a "formal" public identifier in the sense that it follows the formal structure laid down by the SGML standard (ISO 8879);[1] public identifiers which do not follow the formal structure, and thus are not FPIs, are sometimes referred to as "informal" public identifiers.[1]
Although the constraints of formal (as opposed to informal) public identifiers are an optional feature, due to the specification for FPIs being introduced late in the development of ISO 8879, use of FPIs for public identifiers is strongly recommended, since the FPI structure ensures that the FPIs assigned by one owner do not collide with FPIs assigned by other owners (except in the case of unregistered owners with colliding names), while informal public identifiers have no uniqueness guarantee, meaning that those assigned by one owner may collide with formal or informal public identifiers assigned by another.[1] A feature enabling the interpretation of public identifiers using the formal structure, thus requiring public identifiers to be FPIs, can be enabled within the SGML declaration using the feature name.[1]
System identifiers, by contrast, have no structure defined by SGML itself—they might be filenames, database keys or even addresses for indexable storage—but are interpreted by the SGML system's entity manager component to identify the location of the entity.[1] As such, ISO/IEC 8879 itself does not use the term formal system identifier (FSI), which is instead defined in an amendment to ISO/IEC 10744 (HyTime).
An SGML external identifier consists either of the keyword followed by a literal for the public identifier and an optional literal for the system identifier, or the keyword followed by an optional literal for the system identifier.[1] The literals are prefixed and suffixed with either the literal delimiter or the alternative literal delimiter,[1] usually set by the SGML declaration to the double and single ASCII quotation marks, as they are in the reference concrete syntax for SGML,[1] and also in XML.[28] The use of the keyword in an SGML entity definition without a following system identifier is permitted, if the entity manager is able to resolve the entity from its name alone.[1] External identifiers are used in document type declarations (DOCTYPEs) referencing document type definitions (DTDs),[1] in external entity specifications[1] and notation declarations[1] within DTDs, and in link type declarations referencing link process definitions (LPDs).[1]
External identifiers in XML are more constrained than they are in general SGML, with the changes shifting the focus away from public identifiers such as FPIs and towards standardising the form taken by system identifiers. The system identifier is to be treated as an (absolute or relative) URI, but must not contain a URI fragment identifier (portion beginning with). The system identifier is also generally required: the keyword must be followed by a system identifier literal, and the keyword must, in the syntax for general external identifiers, be followed by literals for both the public and system identifiers.[29] As an exception to this, however, notation declarations may use a public identifier without a system identifier.[30]
In contrast to the requirement that the system identifier be a URI (classified for purposes of HyTime as a type of formal system identifier or FSI, or more narrowly as a storage object identifier),[31] the SGML feature is disabled in XML,[32] [33] since the format of public identifiers is not specified by XML (i.e. they are not explicitly required to be FPIs, although they may be). The only details which the XML specification stipulates about the public identifier are that it may be given alongside the system identifier, and may be used by an XML processor along with other information to determine an alternative URI (failing which, it is required to use the URI given in the system identifier).[29]
Identifying strings for XML namespaces are required to be non-empty URIs (such as an absolute URL; use of relative URLs is deprecated),[34] although they are not required to be resolvable URLs and may, for example, be URNs.
Additionally, alternative schema formats such as XML Schema (XSD) serve as a competitor to DTD in an XML context, overcoming some of the limitations of DTDs. XSD can (unlike DTDs) be validated using the same tools as any other XML document,[35] includes support for XML namespaces (which DTDs can only interpret as fixed portions of the element and attribute names in question), allows regular expression constraints to be placed on the format of text data such as telephone numbers, and is better able to express complex content-model structures.[35]
Thus, it is less common for XML formats to use a DTD (such as which might use FPIs for notations or external entities), and thus less common for one to contain a DOCTYPE referencing a DTD (either by FPI or only by URI—although a DOCTYPE may still be used for entity definitions embedded within the XML file itself). For example, most versions of RSS (excepting RSS 0.91) do not have an official DTD. Similarly, the DocBook format, which initially used a document type declaration identifying a DTD by an FPI, switched its primary schema definition from DTD to RELAX NG in version 5.0, and ceased to use document type declarations at that time,[36] and Scalable Vector Graphics (SVG) did the same in version 1.2.[37]
If a system identifier (such as a path or URL) is not given for a resource identified by a public identifier such as an FPI, an SGML system's entity manager will generate one with reference to the public identifier. Although the SGML specification itself does not specify how the entity manager should do this, the intention was for it to use a table mapping public identifiers to system identifiers.[1] Accordingly, an SGML catalog format was created to contain mappings from public to system identifiers; the catalog file can also specify rules for overriding the given system identifier.[38] [39] [40]
Although XML mandates the use of system identifiers in more places than does SGML itself, catalogs may still be needed for remapping and overriding the given system identifier: a system identifier which is a local path may not be useful on other machines, while one which is a network URL will not be useful when a network connection is not available, for example.[41] Accordingly, an alternative XML-based catalog format exists for use by XML software, supporting rules for replacing or rewriting URIs, as well as for mapping FPIs to URIs.[41]
For example, an entry in an SGML catalog may give the local path (relative to the catalog file) to a copy of the Scalable Vector Graphics 1.1 DTD, and specify the SGML declaration (in this case, the declaration for the XML syntax) which an SGML processor should use for it:[42]
PUBLIC "-//W3C//DTD SVG 1.1//EN" svg11.dtd DTDDECL "-//W3C//DTD SVG 1.1//EN" /usr/share/xml/declaration/xml.dcl
The schema for the alternative XML catalog format is itself defined in a DTD, itself identified by an FPI . It similarly allow the mappings of FPIs to paths to be expressed although, since it is intended for use only with XML, it does not support specifying an alternative SGML declaration,[41] although extensions exist to express the remainder of the information expressible in an SGML catalog.[43] The above DTD FPI mapping is represented as follows:[44]
HTML versions 2 through 4 (including the XML-based XHTML 1.x) were defined as profiles of SGML, and specified with an SGML declaration and a document type definition (DTD). The particular DTD version in use was specified in a document type declaration using an FPI, sometimes (especially in the later versions, and required in XML as mentioned above) in combination with a URL for the DTD file as a system identifier.[15] In contrast to the SGML declaration for XML,[32] the SGML declaration for HTML enabled the feature,[7] meaning that public identifiers used for and within HTML DTDs were required to be FPIs.
A document type declaration (for HTML 4.01 Strict)[45] containing an FPI:
The FPI in the document type declaration above reads -//W3C//DTD HTML 4.01//EN
,[27] while the URL is given as a system identifier. The FPI was, strictly speaking, optional: it was also possible (but uncommon) to define a custom HTML DTD and omit the FPI; in this case, the inclusion of a system identifier an FPI is signified by the keyword.[15] One example of such a custom system identifier without an associated FPI is:
Since they were principally intended for use by SGML validators, document type declarations were initially ignored by browsers. However, older web pages were designed to display correctly in the browsers in use the time when they were created, which did not necessarily comply with the specifications for, for example, CSS in how they rendered web pages. Since this meant that improving their standards-compliance would cause browsers to display existing web pages incorrectly, browsers used the document type declaration to trigger between "modes" under which the page would be rendered.[46]
"Quirks mode" retained legacy behaviour from earlier browser versions to avoid breaking existing pages—for example, Internet Explorer versions 6 and 7 would render the page using the Internet Explorer 5.5 box model. "Standards mode" would conform more closely to the relevant specifications. What was at the time called "almost standards mode" and initially implemented by Firefox and Safari would use traditional behaviour when determining the height of table cells containing images, but otherwise behave like standards mode; this corresponded to the behaviour of the "standards mode" of Internet Explorer at the time it was introduced.[15] [46]
For example, a DOCTYPE using the HTML 4.01 Strict FPI would trigger standards mode in Internet Explorer 6, meaning that it would use a content-box box model, while a DOCTYPE using the HTML 4.01 Transitional FPI would trigger quirks mode, including the use of an Internet Explorer 5.5 (border-box) box model.[15] In addition to the FPI, browsers would consider the presence or absence of a system identifier when deciding between quirks mode and standards mode. The absence of a DOCTYPE declaration altogether (or, for Internet Explorer 6, the DOCTYPE declaration not being the first line in the file) would trigger quirks mode.[46]
HTML 5 is not defined as a profile of SGML, except in its XHTML representation. As such, it is not defined using a DTD.
Early drafts for HTML 5 used the -type FPI in the DOCTYPE in place of a DTD FPI, since it did not activate Internet Explorer 6's quirks mode.[47] This was ultimately done away with altogether, and the final HTML 5 DOCTYPE does not use an FPI. The preferred form is simply (with neither a public nor system identifier), although a system identifier of (using the URI scheme) is condoned.[48]
The XML representation (XHTML), by contrast, is permitted but not required to bear any DOCTYPE, but no validating DTD is provided for the HTML 5 schema.[49] However, various FPIs for XHTML 1.0, XHTML 1.1 and MathML DTDs are defined as instead pointing to a URI (so as to avoid requiring network access) containing the definitions for the character entities.[50]
The sole function of an FPI in HTML 5's HTML (as opposed to XHTML) representation is triggering legacy modes. The WHATWG HTML standard specifies a list of which FPIs should trigger quirks mode. These include the FPIs for various vendor-customised HTML DTDs. They also include the FPIs for the DTDs of the various HTML 2.0 "levels", as well as those for HTML 3.0, 3.2 and the Transitional and Frameset versions of HTML 4.0 and 4.01—except that when the HTML 4.01 (but not HTML 4.0) Transitional and Frameset FPIs are accompanied by a system identifier, they instead trigger almost‑standards mode (renamed to "limited‑quirks mode"). The XHTML 1.0 Transitional and Frameset FPIs trigger limited‑quirks mode unconditionally. Mostly, these are specified as prefixes including the owner, class and description (but matching any language part).[51]
Increasingly, specifications use URIs rather than FPIs to handle the task of unique identification. For example, XML namespace names are URIs.
A Uniform Resource Name (URN) namespace has been defined to allow any FPI to be rewritten as a URI, replacing double slashes with colons. The earlier example may be written as the following URI:
urn:publicid:-:W3C:DTD+HTML+4.01:EN