|
|
Definition of the
CIDOC
Conceptual Reference Model
This page is the introductory page of Definition of CIDOC object-oriented Conceptual Reference Model and Crossreference Manual.
Contents:
This document is the formal definition of the
CIDOC Conceptual Reference Model (“CRM”), a formal ontology intended to
facilitate the integration, mediation and interchange of heterogeneous cultural
heritage information. The CRM is the culmination of more than a decade of
standards development work by the International Committee for Documentation
(CIDOC) of the International Council of Museums (ICOM). Work on the CRM itself
began in 1996 under the auspices of the ICOM-CIDOC Documentation Standards
Working Group. Since 2000, development of the CRM has been officially delegated
by ICOM-CIDOC to the CIDOC CRM Special Interest Group, which collaborates with
the ISO working group ISO/TC46/SC4/WG9 to bring the CRM to the form and status
of an International Standard.
The primary role of the CRM is to enable information exchange and integration between heterogeneous sources of cultural heritage information. It aims at providing the semantic definitions and clarifications needed to transform disparate, localised information sources into a coherent global resource, be it within a larger institution, in intranets or on the Internet.
Its perspective is supra-institutional and abstracted from any specific local context. This goal determines the constructs and level of detail of the CRM.
More specifically, it defines and is restricted
to the underlying semantics of database schemata and document structures
used in cultural heritage and museum documentation in terms of a formal
ontology. It does not define any of the terminology appearing
typically as data in the respective data structures; however it foresees the
characteristic relationships for its use. It does not aim at proposing
what cultural institutions should document. Rather it explains the logic
of what they actually currently document, and thereby enables semantic
interoperability.
It intends to provide an optimal analysis of the intellectual structure of cultural documentation in logical terms. As such, it is not optimised to implementation-specific storage and processing aspects. Rather, it provides the means to understand the effects of such optimisations to the semantic accessibility of the respective contents.
The CRM aims to support the following specific
functionalities:
Users of the CRM
should be aware that the definition of data entry systems requires support of
community-specific terminology, guidance to what should be documented and in
which sequence, and application-specific consistency controls. The CRM does not
provide such notions.
By its very structure and formalism, the CRM is extensible and users are
encouraged to create extensions for the needs of more specialized communities
and applications.
The overall scope of the CIDOC CRM can be
summarised in simple terms as the curated knowledge of museums.
However, a more detailed and useful definition
can be articulated by defining both the Intended Scope, a broad and
maximally-inclusive definition of general application principles, and the
Practical Scope, which is expressed by the overall scope of a reference set of
specific identifiable museum documentation standards and practices that the CRM
aims to encompass, however restricted in its details to the limitations of the
Intended Scope.
The Intended Scope of the CRM may be defined as
all information required for the exchange and integration of heterogeneous
scientific documentation of museum collections. This definition requires
further elaboration:
The Practical Scope
[2]
of the CRM is expressed in terms of the current reference standards for museum
documentation that have been used to guide and validate the CRM’s development. The
CRM covers the same domain of discourse as the union of these reference
standards; this means that data correctly encoded according to any of these
museum documentation standards can be expressed in a CRM-compatible form,
without any loss of meaning.
Users intending
to take advantage of the semantic interoperability offered by the CRM may want
to make parts of their data structures compatible with the CRM. The respective
parts should pertain either to the associations by which users would like their
data to be accessible in an integrated environment, or to contents intended for
transport to other environments, so that the meaning encoded by its structure
is preserved in another target system.
In that sense,
the CRM is not aimed at proposing a complete matching of user documentation
structures with the CRM, nor that a user should always implement all CRM
concepts and associations; rather it is intended to leave room for all kinds of
extensions to capture the richness of cultural information, but also for
simplifications for reasons of economy.
Further, the CRM
is a means to interpret structured information in a way, so that large amounts
of data contents can be transformed or mediated automatically. As a
consequence, the CRM aims not at resolving free text information into a formal
logical form. In other terms, it does not intend to provide more structuring
than the users have done before, and free text information does not fall under
the scope of compatibility considerations. The CRM foresees however the
associations to transport such information in relation to structured
information.
The CRM is a
formal ontology, expressible in terms of logic or a suitable knowledge
representation language. Its concepts can be instantiated as sets of statements
that form models of the assumed reality referred to in a structured document.
Any encoding of CRM instances in a formal language that preserves the relations
to the CRM classes, properties and inheritance rules among them is regarded a
“CRM-compatible form”.
A part of a documentation structure is
compatible with the CRM, if a deterministic logical algorithm can be found,
that transforms any data correctly encoded in this structure into a
CRM-compatible form without loss of meaning. No assumptions are made about the
nature of this algorithm. It may in particular draw on other formal ontologies
expressing background knowledge such as thesauri. The algorithm itself can only
be found and verified intellectually by understanding the meaning intended by
the designer of the data structure and the CRM concepts. By the term “correctly
encoded” we mean that the data are encoded so that the meaning intended by the
designer of the data structure is correctly applied to the intended meaning of
the data.
Information system implementers may choose to
provide export facilities of selected data into a CRM-compatible form.
They may further choose to provide a service to access selected data by
querying with CRM concepts. It is not regarded a loss of compatibility, if
certain subclasses and subproperties of the CRM are not supported in such a
service. In that case it is regarded essential that the services publishes the
set of CRM concepts it supports.
The CRM is a domain ontology in the sense used
in computer science. It has been expressed as an object-oriented semantic
model, in the hope that this formulation will be comprehensible to both
documentation experts and information scientists alike, while at the same time
being readily converted to machine-readable formats such as RDF Schema, KIF,
DAML+OIL, OWL, STEP, etc. It can be implemented in any Relational or
object-oriented schema. CRM instances can also be encoded in RDF, XML,
DAML+OIL, OWL and others.
Although the definition of the CRM provided
here is complete, it is an intentionally compact and concise presentation of
the CRM’s 81 classes and 132 unique properties. It does not attempt to
articulate the inheritance of properties by subclasses throughout the class
hierarchy (this would require the declaration of several thousand properties,
as opposed to 132). However, this definition does contain all of the
information necessary to infer and automatically generate a full declaration of
all properties, including inherited properties.
The following definitions of key terminology
used in this document are provided both as an aid to readers unfamiliar with
object-oriented modelling terminology, and to specify the precise usage of
terms that are sometimes applied inconsistently across the object oriented
modelling community for the purpose of this document. Where applicable, the
editors have tried to consistently use terminology that is compatible with that
of the Resource Description Framework (RDF)
[3]
,
a recommendation of the World Wide Web Consortium. The editors have tried to
find a language which is comprehensible to the non-computer expert and precise
enough for the computer expert so that both understand the intended meaning.
Class |
A class is a category of items that share one
or more common traits serving as criteria to identify the items
belonging to the class. These properties need not be explicitly
formulated in logical terms, but may be described in a text (here called a scope
note) that refers to a common conceptualisation of domain experts. The
sum of these traits is called the intension of the class. A class may
be the domain or range of none, one or more properties formally
defined in a model. The formally defined properties need not be part of the
intension of their domains or ranges: such properties are optional. An item
that belongs to a class is called an instance of this class. A class
is associated with an open set of real life instances, known as the extension
of the class. Here “open” is used in the sense that it is generally beyond
our capabilities to know all instances of a class in the world and indeed
that the future may bring new instances about at any time (Open World).
Therefore a class cannot be defined by enumerating its instances. A class
plays a role analogous to a grammatical noun, and can be completely defined
without reference to any other construct (unlike properties, which
must have an unambiguously defined domain and range). In some contexts, the
terms individual class, entity or node are used synonymously with class. For example: Person is a class. To be a Person may
actually be determined by DNA characteristics, but we all know what a Person
is. A Person may have the property of being a member of a Group, but it is
not necessary to be member of a Group in order to be a Person. We shall never
know all Persons of the past. There will be more Persons in the future. |
subclass |
A subclass is a class that is a
specialization of another class (its superclass). Specialization or
the IsA relationship means that:
A subclass can have more than one immediate
superclass and consequently inherits the properties of all of its
superclasses (multiple inheritance). The IsA relationship or
specialization between two or more classes gives rise to a structure known as
a class hierarchy. The IsA relationship is transitive and may not be cyclic.
In some contexts (e.g. the programming language C++) the term derived class
is used synonymously with subclass. For example: Every Person IsA Biological Object, or Person
is a subclass of Biological Object. Also, every Person IsA Actor. A Person may
die. However other kinds of Actors, such as companies, don’t die (c.f. 2). Every
Biological Object IsA Physical Object. A Physical Object can be moved. Hence
a Person can be moved also (c.f. 3). |
superclass |
A superclass is a class that is a
generalization of one or more other classes (its subclasses), which
means that it subsumes all instances of its subclasses, and that it
can also have additional instances that do not belong to any of its
subclasses. The intension of the superclass is less restrictive than
any of its subclasses. This subsumption relationship or generalization is the
inverse of the IsA relationship or specialization. In some contexts (e.g. the programming
language C++) the term parent class is used synonymously with superclass. For example: “Biological Object subsumes Person” is
synonymous with “Biological Object is a superclass of Person”. It needs fewer
traits to identify an item as a Biological Object than to identify it as a
Person. |
intension |
The intension of a class or property
is its intended meaning. It consists of one or more common traits shared
by all instances of the class or property. These traits need not be
explicitly formulated in logical terms, but may just be described in a text
(here called a scope note) that refers to a conceptualisation common to
domain experts. In particular the so-called primitive concepts, which
make up most of the CRM, cannot be further reduced to other concepts by
logical terms. |
extension |
The extension of a class is the set of
all real life instances belonging to the class that fulfil the
criteria of its intension. This set is “open” in the sense that it is
generally beyond our capabilities to know all instances of a class in the
world and indeed that the future may bring new instances about at any time (Open
World). An information system may at any point in time refer to some
instances of a class, which form a subset of its extension. |
scope note |
A scope note is a textual description of the intension
of a class or property. Scope notes are not formal modelling
constructs, but are provided to help explain the intended meaning and
application of the CRM’s classes and properties. Basically, they refer to a
conceptualisation common to domain experts and disambiguate between different
possible interpretations. Illustrative example instances of classes
and properties are also regularly provided in the scope notes for explanatory
purposes. |
instance |
An instance of a class is an item that
has the traits that match the criteria of the intension of the class. For example: The painting known as the “The Mona Lisa” is an instance of the class Physical Man Made Object. An instance of a property is a factual
relation between an instance of the domain and an instance of the range
of the property that matches the criteria of the intension of the
property. For example: “The Louvre is current owner of The Mona Lisa” is an
instance of the property “is current owner of”. |
property |
A property serves to define a relationship of
a specific kind between two classes. The property is characterized by
an intension, which is conveyed by a scope note. A property
plays a role analogous to a grammatical verb, in that it must be defined with
reference to both its domain and range, which are analogous to
the subject and object in grammar (unlike classes, which can be defined
independently). It is arbitrary, which class is selected as the domain, just
as the choice between active and passive voice in grammar is arbitrary. In
other words, a property can be interpreted in both directions, with two
distinct, but related interpretations. Properties may themselves have
properties that relate to other classes (This feature is used in this model
only in order to describe dynamic subtyping of properties). Properties can
also be specialized in the same manner as classes, resulting in IsA
relationships between subproperties and their superproperties. In some contexts, the terms attribute,
reference, link, role or slot are used synonymously with property. For example: “Physical Man-Made Stuff depicts CRM Entity” is equivalent
to “CRM Entity is depicted by Physical Man-Made Stuff”. |
subproperty |
A subproperty is a property that is a
specialization of another property (its superproperty). Specialization
or IsA relationship means that:
A subproperty can have more than one
immediate superproperty and consequently inherits the properties of all of
its superproperties (multiple inheritance). The IsA relationship or
specialization between two or more properties gives rise to the structure we
call a property hierarchy. The IsA relationship is transitive and may not be
cyclic. Some object-oriented languages, such as C++,
have no equivalent to the specialization of properties. |
superproperty |
A superproperty is a property that is
a generalization of one or more other properties (its subproperties),
which means that it subsumes all instances of its subproperties, and
that it can also have additional instances that do not belong to any of its
subproperties. The intension of the superproperty is less restrictive
than any of its subproperties. The subsumption relationship or generalization
is the inverse of the IsA relationship or specialization. |
domain |
The domain is the class for which a property
is formally defined. This means that instances of the property are
applicable to instances of its domain class. A property must have exactly one
domain, although the domain class may always contain instances for which the
property is not instantiated. The domain class is analogous to the
grammatical subject of the phrase for which the property is analogous to the
verb. It is arbitrary, which class is selected as the domain and which as the
range, just as the choice between active and passive voice in grammar
is arbitrary. Property names in the CRM are designed to be semantically
meaningful and grammatically correct when read from domain to range. In
addition, the inverse property name, normally given in parentheses, is also
designed to be semantically meaningful and grammatically correct when read
from range to domain. |
range |
The range is the class that comprises
all potential values of a property. That means that instances
of the property can link only to instances of its range class. A property
must have exactly one range, although the range class may always contain
instances that are not the value of the property. The range class is
analogous to the grammatical object of a phrase for which the property is
analogous to the verb. It is arbitrary, which class is selected as domain
and which as range, just as the choice between active and passive voice in
grammar is arbitrary. Property names in the CRM are designed to be
semantically meaningful and grammatically correct when read from domain to
range. In addition the inverse property name, normally given in
parentheses, is also designed to be semantically meaningful and grammatically
correct when read from range to domain. |
inheritance |
Inheritance of properties from superclasses
to subclasses means that if an item x is an instance of a class
A, then
all optional properties that may hold for the instances of any of the
superclasses of A may also hold for item x. |
strict inheritance |
Strict inheritance means that there
are no exceptions to the inheritance of properties from superclasses
to subclasses. For instance, some systems may declare that elephants
are grey, and regard a white elephant as an exception. Under strict
inheritance it would hold that: if all elephants were grey, then a white
elephant could not be an elephant. Obviously not all elephants are grey. To
be grey is not part of the intension of the concept elephant but an
optional property. The CRM applies strict inheritance as a normalization
principle. |
multiple inheritance |
Multiple inheritance
means that a class A may have more than one immediate superclass.
The extension of a class with multiple immediate superclasses is a
subset of the intersection of all extensions of its superclasses. The intension
of a class with multiple immediate superclasses extends the intensions of all
its superclasses, i.e. its traits are more restrictive than any of its superclasses.
If multiple inheritance is used, the resulting “class hierarchy” is a
directed graph and not a tree structure. If it is represented as an indented
list, there are necessarily repetitions of the same class at different
positions in the list. For example,
Person is both, an Actor and a Biological Object. |
instance |
An instance of
a class is a real world item that fulfils the criteria of the intension
of the class. Note, that the number of instances declared for a class
in an information system is typically less than the total in the real world.
For example, you are an instance of Person, but you are not mentioned in all
information systems describing Persons. |
endurant, perdurant |
“The difference between enduring
and perduring entities (which we shall also call endurants and perdurants)
is related to their behaviour in time. Endurants are wholly present (i.e.,
all their proper parts are present) at any time they are present. Perdurants,
on the other hand, just extend in time by accumulating different temporal
parts, so that, at any time they are present, they are only partially
present, in the sense that some of their proper temporal parts (e.g., their
previous or future phases) may be not present. E.g., the piece of paper you
are reading now is wholly present, while some temporal parts of your reading
are not present any more. Philosophers say that endurants are entities that
are in time, while lacking however temporal parts (so to speak, all their
parts flow with them in time). Perdurants, on the other hand, are entities
that happen in time, and can have temporal parts (all their parts are fixed
in time).” |
shortcut |
A shortcut is a formally defined single property
that represents a deduction or join of a data path in the CRM. The scope
notes of all properties characterized as shortcuts describe in words the
equivalent deduction. Shortcuts are introduced for the cases where common
documentation practice refers only to the deduction rather than to the fully
developed path. For example, museums often only record the dimension of an
object without documenting the Measurement Event that observed it. The CRM
allows shortcuts as cases of less detailed knowledge, while preserving in its
schema the relationship to the full information. |
monotonic reasoning |
Monotonic
reasoning is a term from knowledge representation. A reasoning form is
monotonic if an addition to the set of propositions making up the knowledge
base never determines a decrement in the set of conclusions that may be
derived from the knowledge base via inference rules. In practical terms, if
experts enter subsequently correct statements to an information system, the
system should not regard any results from those statements as invalid, when a
new one is entered. The CRM is designed for monotonic reasoning and so
enables conflict-free merging of huge stores of knowledge. |
disjoint |
Classes are disjoint if the intersection of their extensions
is an empty set. In other words, they have no common instances in any
possible world. |
primitive |
The term primitive as used in knowledge
representation characterizes a concept that is declared and its meaning is
agreed upon, but that is not defined by a logical deduction from other
concepts. For example, mother may be described as a female human with child.
Then mother is not a primitive concept. Event however is a primitive concept.
Most of the CRM is made up of primitive
concepts. |
Open World |
The “Open World Assumption” is a term from
knowledge base systems. It characterizes knowledge base systems that assume
the information stored is incomplete relative to the universe of discourse
they intend to describe. This incompleteness may be due to the inability of
the maintainer to provide sufficient information or due to more fundamental
problems of cognition in the system’s domain. Such problems are
characteristic of cultural information systems. Our records about the past
are necessarily incomplete. In addition, there may be items that cannot be
clearly assigned to a given class. In particular, absence of a certain property
for an item described in the system does not mean that this item does not
have this property. For example, if one item is described as Biological
Object and another as Physical Object, this does not imply that the latter
may not be a Biological Object as well. Therefore complements of a
class with respect to a superclass cannot be concluded in general from
an information system using the Open World Assumption. For example, one
cannot list “all Physical Objects known to the system that are not Biological
Objects in the real world”, but one may of course list “all items known to
the system as Physical Objects but that are not known to the system as
Biological Objects”. |
complement |
The complement of a class A with
respect to one of its superclasses B is the set of all instances
of B that are not instances of A. Formally, it is the set-theoretic
difference of the extension of B minus the extension of A. Compatible
extensions of the CRM should not declare any class with the intension
of them being the complement of one or more other classes. To do so will
normally violate the desire to describe an Open World. For example,
for all possible cases of human gender, male should not be declared as the
complement of female or vice versa. What if someone is both or even of
another kind? |
query containment |
Query
containment is a problem from database theory: A query X contains another
query Y, if for each possible population of a database the answer set to
query X contains also the answer set to query Y. If query X and Y were
classes, then X would be superclass of Y. |
interoperability |
Interoperability means the capability of
different information systems to communicate some of their contents. In
particular, it may mean that
Generally, syntactic interoperability
is distinguished from semantic interoperability. Syntactic
interoperability means that the information encoding of the involved systems
and the access protocols are compatible, so that information can be processed
as described above without error. However, this does not mean that each
system processes the data in a manner consistent with the intended meaning.
For example, one system may use a table called “Actor” and another one called
“Agent”. With syntactic interoperability, data from both tables may only be
retrieved as distinct, even though they may have exactly the same meaning. To
overcome this situation, semantic interoperability has to be added. The CRM
relies on existing syntactic interoperability and is concerned only with
adding semantic interoperability. |
semantic interoperability |
Semantic interoperability means the
capability of different information systems to communicate information
consistent with the intended meaning. In more detail, the intended meaning
encompasses
Obviously communication about data structure
must be resolved first. In this case consistent communication means that data
can be transferred between data structure elements with the same intended
meaning or that data from elements with the same intended meaning can be
merged. In practice, the different levels of generalization in different
systems do not allow the achievement of this ideal. Therefore semantic
interoperability is regarded as achieved if elements can be found that
provide a reasonably close generalization for the transfer or merge. This
problem is being studied theoretically as the query containment problem.
The CRM is only concerned with semantic interoperability on the level of data
structure elements. |
property quantifiers |
We use the
term property quantifiers for the declaration of the allowed number of instances
of a certain property that an instance of its range or domain
may have. These declarations are ontological, i.e. they refer to the nature
of the real world described and not to our current knowledge. For example,
each person has exactly one father, but collected knowledge may refer to
none, one or many. |
universal |
The fundamental ontological distinction
between universals and particulars can be informally understood by
considering their relationship with instantiation: particulars are entities
that have no instances in any possible world; universals are entities
that do have instances. Classes and properties (corresponding
to predicates in a logical language) are usually considered to be universals.
(after Gangemi et al. 2002, pp. 166-181). |
Quantifiers for
properties are provided for the purpose of semantic clarification only, and
should not be treated as implementation recommendations. The CRM has
been designed to accommodate alternative opinions and incomplete information,
and therefore all properties should be implemented as optional and
repeatable for their domain and range (“many to many (0,n:0,n)”). Therefore the
term “cardinality constraints” is avoided here, as it typically pertains to
implementations.
The following table lists all possible property quantifiers occurring in this document by their notation, together with an explanation in plain words. In order to provide optimal clarity, two widely accepted notations are used redundantly in this document, a verbal and a numeric one. The verbal notation uses phrases such as “one to many”, and the numeric one, expressions such as “(0,n:0,1)”. While the terms “one”, “many” and “necessary” are quite intuitive, the term “dependent” denotes a situation where a range instance cannot exist without an instance of the respective property. In other words, the property is “necessary” for its range.
many to many (0,n:0,n) |
Unconstrained: An individual domain instance and range instance of this property can have zero, one or more instances of this property. In other words, this property is optional and repeatable for its domain and range. |
one to many (0,n:0,1) |
An individual domain
instance of this property can have zero, one or more instances of this
property, but an individual range instance cannot be referenced by more than
one instance of this property. In other words, this property is optional for
its domain and range, but repeatable for its domain only. In some contexts
this situation is called a “fan-out”. |
many to one (0,1:0,n) |
An individual domain instance of this property can have zero or one instance of this property, but an individual range instance can be referenced by zero, one or more instances of this property. In other words, this property is optional for its domain and range, but repeatable for its range only. In some contexts this situation is called a “fan-in”. |
many to many, necessary (1,n:0,n) |
An individual domain instance of this property can have one or more instances of this property, but an individual range instance can have zero, one or more instances of this property. In other words, this property is necessary and repeatable for its domain, and optional and repeatable for its range. |
one to many, necessary (1,n:0,1) |
An individual domain instance of this
property can have one or more instances of this property, but an individual
range instance cannot be referenced by more than one instance of this property.
In other words, this property is necessary and repeatable for its domain, and
optional but not repeatable for its range. In some contexts this situation is
called a “fan-out”. |
many to one, necessary (1,1:0,n) |
An individual domain instance of this property
must have exactly one instance of this property, but an individual range
instance can be referenced by zero, one or more instances of this property.
In other words, this property is necessary and not repeatable for its domain,
and optional and repeatable for its range. In some contexts this situation is
called a “fan-in”. |
one to many, dependent (0,n:1,1) |
An individual domain instance of this
property can have zero, one or more instances of this property, but an
individual range instance must be referenced by exactly one instance of this
property. In other words, this property is optional and repeatable for its
domain, but necessary and not repeatable for its range. In some contexts this
situation is called a “fan-out”. |
one to many, necessary, dependent (1,n:1,1) |
An individual domain instance of this
property can have one or more instances of this property, but an individual
range instance must be referenced by exactly one instance of this property.
In other words, this property is necessary and repeatable for its domain, and
necessary but not repeatable for its range. In some contexts this situation
is called a “fan-out”. |
many to one, necessary, dependent (1,1:1,n) |
An individual domain instance of this
property must have exactly one instance of this property, but an individual
range instance can be referenced by one or more instances of this property.
In other words, this property is necessary and not repeatable for its domain,
and necessary and repeatable for its range. In some contexts this situation
is called a “fan-in”. |
one to one (1,1:1,1) |
An individual domain instance and range
instance of this property must have exactly one instance of this property. In
other words, this property is necessary and not repeatable for its domain and
for its range. |
The CRM defines
some properties as being necessary for their domain or as being dependent
from their range, following the definitions in the table above. Note
that if such a property is not specified for an instance of the respective
domain or range, it means that the property exists, but the value on one side
of the property is unknown. In the case of optional properties, the methodology
proposed by the CRM does not distinguish between a value being unknown or the
property not being applicable at all. For example, one may know that an object
has an owner, but the owner is unknown. In a CRM instance this case cannot be
distinguished from the fact that the object has no owner at all. Of course,
such details can always be specified by a textual note.
The following naming conventions have been applied throughout the CRM:
·
Classes are identified by numbers preceded by the letter “E”
(historically classes were sometimes referred to as “Entities”), and are named
using noun phrases (nominal groups) using title case (initial capitals). For
example, E63 Beginning of Existence.
·
Properties are
identified by numbers preceded by the letter “P,” and are named in both
directions using verbal phrases in lower case. Properties with the character of
states are named in the present tense, such as “has type”, whereas properties
related to events are named in past tense, such as “carried out.” For example, P126
employed (was employed by).
·
Property names should be read in their non-parenthetical form for the
domain-to-range direction, and in parenthetical form for the range-to-domain
direction.
·
Properties with a range that is a subclass of E59 Primitive Value (such
as E1 CRM Entity. P2 has note: E62 String, for example) have no
parenthetical name form, because reading the property name in the
range-to-domain direction is not regarded as meaningful.
·
Properties that
have identical domain and range are either symmetric or transitive.
Instantiating a symmetric property implies that the same relation holds for
both the domain-to-range
and the range-to-domain directions. An example of this is E53 Place. P122
borders with: E53 Place. The names of symmetric properties have no
parenthetical form, because reading in the range-to-domain direction is the
same as the domain-to-range reading. Transitive asymmetric properties, such as E4
Period. P9 consist of (forms part of): E4 Periodhave a parenthetical
form that relates to the meaning of the inverse direction.
·
The choice of the domain of properties, and hence the order of their
names, are established in accordance with the following priority list:
·
Temporal Entity and its subclasses
·
Stuff and its subclasses
·
Actor and its subclasses
·
Other
The following
modelling principles have guided and informed the development of the CIDOC CRM.
Because the CRM’s primary role is the
meaningful integration of information in an Open World, it aims to be monotonic
in the sense of Domain Theory. That is, the existing CRM constructs and the
deductions made from them must always remain valid and well-formed, even as new
constructs are added by extensions to the CRM.
For example:
One may add a subclass of E7 Activity to
describe the practice of an instance of group to use a certain name for a place
over a certain time-span. By this extension, no existing IsA Relationships or
property inheritances are compromised.
In addition, the CRM aims to enable the formal
preservation of monotonicity when augmenting a particular CRM compatible
system. That is, existing CRM instances, their properties and deductions made
from them, should always remain valid and well-formed, even as new instances,
regarded as consistent by the domain expert, are added to the system.
For example:
If someone describes correctly that an item is
an instance of E19 Physical Object, and later it is correctly characterized as
an instance of E20 Biological Object, the system should not stop treating it as
an instance of E19 Physical Object.
In order to formally preserve monotonicity for
the frequent cases of alternative opinions, all formally defined properties
should be implemented as unconstrained (many:many) so that conflicting
instances of properties are merely accumulated. Thus knowledge integrated
following the CRM serves as a research base, accumulating relevant alternative
opinions around well-defined entities, whereas conclusions about the truth are
the task of open-ended scientific or scholarly hypothesis building.
For example:
El Greco and even King Arthur should always
remain an instance of E21 Person and be dealt with as existing within the sense
of our discourse, once they are entered into our knowledge base. Alternative
opinions about properties, such as their birthplaces and their living places,
should be accumulated without validity decisions being made during data
compilation.
Although the scope of the CRM is very broad,
the model itself is constructed as economically as possible.
· A class is not
declared unless it is required as the domain or range of a property not
appropriate to its superclass, or it is a key concept in the practical scope.
· CRM classes and
properties that share a superclass are non-exclusive by default. For example,
an object may be both an instance of E20 Biological Object and E22 Man-made
Object.
· CRM classes and
properties are either primitive, or they are key concepts in the practical
scope.
· Complements of CRM
classes are not declared.
Some properties are declared as shortcuts of
longer, more comprehensively articulated paths that connect the same domain and
range classes as the shortcut property via one or more intermediate classes.
For example, the property E18 Physical Stuff. P52 has current owner (is
current owner of): E39 Actor, is a shortcut for a fully articulated path
from E18 Physical Stuff through E8 Acquisition to E39 Actor. An instance of the
fully-articulated path always implies an instance of the shortcut property.
However, the inverse may not be true; an instance of the fully-articulated path
cannot always be inferred from an instance of the shortcut property.
The class E13
Attribute Assignment allows for the documentation of how the assignment of any
property came about, and whose opinion it was, even in cases of properties not
explicitly characterized as “shortcuts”.
Classes are disjoint if they share no common instances in any possible world. There are many examples of disjoint classes in the CRM.
A comprehensive declaration of all possible disjoint class combinations afforded by the CRM has not been provided here; it would be of questionable practical utility, and may easily become inconsistent with the goal of providing a concise definition. However, there are two key examples of disjoint class pairs that are fundamental to effective comprehension of the CRM:
· E2 Temporal Entity is
disjoint from E77 Persistent Item. Instances of the class E2 Temporal Entity are
perdurants, whereas instances of the class E77 Persistent Item are endurants.
Even though instances of E77 Persistent Item have a limited existence in time,
they are fundamentally different in nature from instances of E2 Temporal
Entity, because they preserve their identity between events. Declaring
endurants and perdurants as disjoint classes is consistent with the
distinctions made in data structures that fall within the CRM’s practical
scope.
·
E18 Physical Stuff is disjoint from E28 Conceptual Object. The distinction is
between material and immaterial items, the latter being exclusively man-made.
Instances of E18 Physical Stuff and E28 Conceptual Object differ in many
fundamental ways; for example, the production of instances of E18 Physical
Stuff implies the incorporation of physical material, whereas the production of
instances of E28 Conceptual Object does not. Similarly, instances of E18 Physical
Stuff cease to exist when destroyed, whereas an instance of E28 Conceptual
Object perishes when it is forgotten or its last physical carrier is destroyed.
Virtually all structured descriptions of museum
objects begin with a unique object identifier and information about the “type”
of the object, often in a set of fields with names like “Object Type,” “Object
Name,” “Category,” “Classification,” etc. All these fields are used for terms
that declare that the object is a member of a particular class or category of
items, and are described by the CRM as instances of E55 Type. Since the
instances of this class are themselves classes, E55 Type is in fact a
metaclass.
The class E1 CRM Entity is the domain of the
property P2 has type (is type of), which has the range E55 Type.
Consequently, every class in the CRM, with the exception of E59 Primitive
Value, inherits the property P2 has type (is type of). This provides a
general mechanism for refining the classification of CRM instances to any level
of detail, by linking to external vocabulary sources, thesauri, classification
schema or ontologies that function as extensions to the CRM class and
property hierarchies. The external vocabularies do not themselves fall within
the scope of the CRM.
The class E55 Type also serves as the range of
properties that relate to categorical knowledge commonly found in cultural
documentation. For example, the property P125 used object of type (was type
of object used in) enables the CRM to express statements such as “this
casting was produced using a mould”, meaning that there has been an unknown or
unmentioned instance of “mould” that was actually used. This enables the
specific instance of the casting to be associated with the entire class of
manufacturing devices known as moulds. Further, the objects of type “mould”
would be related via P2 has type (is type of) to this term. This
indirect relationship may actually help in detecting the unknown object in an
integrated environment. On the other side, some casting may refer directly to a
known mould via P16 used specific object (was used for). So a statistical question to how many
objects in a certain collection are made with moulds could be answered
correctly (following both paths through P16 used specific object (was used
for) - P2 has type (is type of) and
P125 used object of type (was type of object used in). This consistent
treatment of categorical knowledge significantly enhances the CRM’s ability to
integrate cultural knowledge.
Some properties in the CRM are associated with an additional property.
These are numbered in the CRM documentation with a ".1" extension.
These do not appear in the property hierarchy list but are included as part of
the property declarations and referred to in the class declarations. For
example, P62.1 mode of depiction: E55 Type is associated with E24
Physical Man-made Stuff. P62 depicts (is depicted by): E1 CRM Entity. The
range of these properties of properties always falls within the type hierarchy
E55 Type. Their purpose is to allow dynamic extensions to their parent property
through the use of property subtypes declared as instances of E55 Type. This
function is analogous to that of the P2 has type (is type of) property, which all CRM classes inherit from
E1 CRM Entity. System implementations and schemas that do not support
properties of properties may use dynamic subtyping of the parent properties
instead.
Finally, types
play a central role in the history of human understanding; they are
intellectual products, and documentation about the history and justification by
physical evidence of types (particularly in disciplines such as archaeology and
natural history) falls squarely within the intended scope of the CRM. Therefore
types are modelled as “conceptual objects,” in parallel to their structural
role as metaclasses. This approach elegantly addresses the dual nature of types
in a manner consistent with material culture and natural history documentation.
Since the intended scope of the CRM is a subset of the “real” world and is therefore potentially infinite, the model has been designed to be extensible through the linkage of compatible external type hierarchies.
Compatibility of extensions with the CRM means
that data structured according to an extension must also remain valid as a CRM
instance. In practical terms, this implies query containment: any
queries based on CRM concepts should retrieve a result set that is correct
according to the CRM’s semantics, regardless of whether the knowledge base is
structured according to the CRM’s semantics alone, or according to the CRM plus
compatible extensions. For example, a query such as “list all events” should
recall 100% of the instances deemed to be events by the CRM, regardless of how
they are classified by the extension.
A sufficient
condition for the compatibility of an extension with the CRM is that CRM
classes subsume all classes of the extension, and all properties of the
extension are either subsumed by CRM properties, or are part of a path for
which a CRM property is a shortcut. Obviously, such a condition can only be
tested intellectually.
Of necessity, some concepts covered by the CRM are less thoroughly elaborated than others: E39 Actor and E30 Right, for example. This is a natural consequence of staying within the CRM’s clearly articulated practical scope in an intrinsically unlimited domain of discourse. These ‘underdeveloped’ concepts can be considered as hooks for compatible extensions.
The CRM provides a number of mechanisms to ensure that coverage of the
intended scope is complete:
In mechanisms 1
and 2 the CRM concepts subsume and thereby cover the extensions.
In mechanism 3,
the information is accessible at the appropriate point in the respective
knowledge base. This approach is preferable when detailed, targeted queries are
not expected; in general, only those concepts used for formal querying need
to be explicitly modelled.
fig. 1 reasoning about spatial
information
The diagram above shows a partial view of the
CRM, representing reasoning about spatial information. Five of the main hierarchy
branches are included in this view: E39 Actor, E51 Contact Point, E41
Appellation, E53 Place, and E70 Stuff. The relationships between these main
classes and their subclasses are shown as branching lines. Properties between
classes are shown as green ovals. A ‘shortcut’ property is included in this
view: P59 has section (is located on or within) between E53 Place
and E19 Physical Object is a shortcut of the path through E46 Section
Definition. In some cases the order of priority for property names has been
modified in order to facilitate reading the diagram from left to right.
As can be seen, an instance of E53 Place is
identified by an instance of E44 Place Appellation, which may be an
instance of E45 Address, E47 Spatial Coordinates, E48 Place Name, or E46
Section Definition such as ‘basement’, ‘prow’, or ‘lower left-hand corner.’ An
instance of E53 Place may consist of or form part of another
instance of E53 Place, thereby allowing a hierarchy of physical ‘containers’ to
be constructed.
An instance of E45 Address can be considered
both as an E44 Place Appellation–a way of referring to an E53 Place–and as an
E51 Contact Point for an E39 Actor. An E39 Actor may have any number of
instances of E51 Contact Point. E18 Physical Stuff is found on locations as a
consequence of being created there or being moved there. Therefore the
properties P53 has former or current location (is former or current location
of) (and P55 has current location (currently holds) are regarded as
shortcuts of the fully articulated paths through the respective events. P55
has current location (currently holds) is a subproperty of P53 has
former or current location (is former or current location of). The latter
is a container for location information in the absence of knowledge about time
of validity and related events.
An interesting aspect of the model is the P58
has section definition (defines section) property between E46 Section
Definition and E18 Physical Stuff (and the corresponding shortcut from E53
Place to E19 Physical Object). This allows an instance of E53 Place to be
defined as a section of an instance of E19 Physical Object. For example, we may
know that Nelson fell at a particular spot on the deck of H.M.S. Victory,
without knowing the exact position of the vessel in geospatial terms at the
time of the fatal shooting of Nelson. Similarly, a signature or inscription can
be located “in the lower right corner of” a painting, regardless of where the
painting is hanging.
fig. 2 reasoning about
temporal information
This second example shows how the CRM handles reasoning about temporal information. Four of the main hierarchy branches are included in this view: E2 Temporal Entity, E52 Time-Span, E77 Persistent Item and E53 Place.
The E2 Temporal Entity class is an abstract class
(i.e. it has no instances) that serves to group together all classes with a
temporal component, such as instances of E4 Period, E5 Event and E3 Condition
State.
An instance of E52 Time-Span is simply a
temporal interval that does not make any reference to cultural or geographical
contexts (unlike instances of E4 Period, which took place at a
particular instance of E53 Place). Instances of E52 Time-Span are sometimes
identified by instances of E49 Time Appellation, often in the form of E50 Date.
Both E52 Time-Span and E4 Period have
transitive properties. E52 Time-Span has the transitive property P86 falls
within (contains), denoting a purely incidental inclusion, whereas E4
Period has the transitive property P9 consists of (forms part of) that
supports the decomposition of instances of E4 Period into their constituent
parts. For example, the E52 Time-Span during which a building is constructed
might falls within the E52 Time-Span of a particular government,
although there is no causal or contextual connection between the two instances
of E52 Time-Span; conversely, the E4 Period of the Chinese Song Dynasty consists
of the Northern Song Period and the Southern Song Period.
Instances of E52 Time-Span are related to their
outer bounds (i.e. their indeterminacy interval) by the property P82 at some
time within, and to their inner bounds via the property P81 ongoing
throughout. The range of these properties is the E61 Time Primitive class,
instances of which are treated by the CRM as
application or system specific date intervals that are not further
analysed.
Although they do not provide comprehensive
definitions, compact monohierarchical presentations of the class and property
IsA hierarchies have been found to significantly aid comprehension and
navigation of the CRM, and are therefore provided below.
The class hierarchy presented below has the
following format:
The property hierarchy presented below has the
following format:
Copyright © 2003 International Council of Museums
[1] The ICOM Statutes provide a definition of the term “museum” at http://icom.museum/statutes.html#2