Issue 383: 'has content' property
In the 41st joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 34th FRBR - CIDOC CRM Harmonization meeting, the sig resolving the issue 363, decided to open a new issue about the definition of a new property of E90 for capturing the the actual content of a symbolic object. This property should be modelled on the R33 property of FRBRoo. HW to MD for formulation of this property
Lyon, May 2018
Posted by Martin on 6/11/2018
I had sent the below as new issue, but it is indeed the answer to Issue 383.
The question is, how to deal with a file, which is more specific in content, such as an MS Word, but represents the character sequence that defines the content of the respective E90. Is is "is incorporated in", or a subproperty of it?
On 9/19/2018 11:09 PM, Martin Doerr wrote:
> Here my scope note:
> Pxxx has symbolic content
> Domain: E90 Symbolic Object
> Range: E62 String
> Quantification: many to many (0,n:0,n) ??
In CRM RDFS subproperty of: rdfs:value
> Scope note: This property associates an instance of E90 Symbolic Object with a complete, identifying representation of its content in the form of an instance of E62 String. This property only applies to instances of E90 Symbolic Object that can be represented completely in this form. The representation may be more specific than the symbolic level defining the identity condition of the represented. This depends on the type of the symbolic object represented. For instance, if a name has type "Modern Greek character sequence", it may be represented in a loss-free Latin transcription, meaning however the sequence of Greek letters. As another example, if the represented object has type "English words sequence", American English or British English spelling variants may be chosen to represent the English word "colour" without defining a different symbolic object. If a name has type "European traditional name", no particular string may define its content.
> * The materials description (E33) of the painting (E22) _has symbolic content_ “Oil, French Watercolors on Paper, Graphite and Ink on Canvas, with an Oak frame.”
> * The title (E35) of Einstein’s 1915 text (E73) _has symbolic content_ “Relativity, the Special and the General Theory“
> * The story of Little Red Riding Hood (E33) _has symbolic content_ “Once upon a time there lived in a certain village …”
> * The inscription (E34) on Rijksmuseum object SK-A-1601 (E22) _has symbolic content_ “B”
Posted by Robert Sanderson on 6/11/2018
Thank you for pushing this forward, Martin!
Quantification wise, I would be in favor of 0,1 : 0,1.
If the structure of the set of symbols changed, then it would be a different symbolic object according to my understanding of E90:
> … identifiable symbols and any aggregation of symbols … that have an objectively recognizable structure and
that are documented as single units.
Similarly, if the same string was used by different Symbolic Objects, then it seems like they would actually be the same symbolic object (or you would instead use two strings with the same data).
(And in the RDF projection this makes no difference, as literal values do not have their own separate identity)
For the examples, I would replace the Little Red Riding Hood example with one that is complete, to avoid confusion with the scope note requirement of being represented completely.
> The Accession Number (E42) of the J. Paul Getty Museum’s “Abduction of Europa” (E22) _has symbolic content_ “95.PB.7“
And for the file question, do you mean that the symbolic object is the MS Word file, which has a representable set of (binary) symbols, or that the symbolic object is text which is incorporated within the file, but not verbatim (as the characters in the (e.g.) paragraph are likely to be represented in the file using very a different structure).
Posted by Martin on 9/11/2018
On 11/6/2018 9:00 PM, Robert Sanderson wrote:
> Thank you for pushing this forward, Martin!
> Quantification wise, I would be in favor of 0,1 : 0,1.
I prefer 0,1:0,n or 0,n:0,n
> If the structure of the set of symbols changed, then it would be a different symbolic object according to my understanding of E90:
> > … identifiable symbols and any aggregation of symbols … that have an objectively recognizable structure and
> that are documented as single units.
Correct. The question is, if we encounter different representations, for instance one giving a text "hello world" in Latin 1, and another in ASCII, but the E90 instance is of type Latin characters only, or if you write my name DOERR or DÖRR, both regarded by German authorities as identical variants representing the "Umlaut" OE or Ö. Of course, in that case, having both representations would be redundant. In that case, 0:n is more tolerant.
Another opinion being, that one string is enough to define the E90. Then, 0,1.
> Similarly, if the same string was used by different Symbolic Objects, then it seems like they would actually be the same symbolic object (or you would instead use two strings with the same data).
This is a long debated question. In most cases, this appears as reasonable, but we do have cases in which the identity of the E90, seen as a message in the sense of Claude Shannon, is bound to the "sender". Discussing the sense of E35 Title, it appears that we cannot take the identity of the Title detached from the thing it was given to. This creates a precedent for the latter interpretation.
As a general principle, a 1:1 dependency is a thing subject to the suspicion of a hidden identity. To be on the safe side, I would rather not identify the E90 with the content model.
Two strings with the same data to be different is a (good) implementation choice of RDF, which assigns the identity to the link rather to the string, exactly in order to distinguish where the message comes from. If two strings with the same data are regarded as different, then we have actually a 0,x:0,n model in the ontology.
> (And in the RDF projection this makes no difference, as literal values do not have their own separate identity)
> For the examples, I would replace the Little Red Riding Hood example with one that is complete, to avoid confusion with the scope note requirement of being represented completely.
> How about:
> > The Accession Number (E42) of the J. Paul Getty Museum’s “Abduction of Europa” (E22) _has symbolic content_ “95.PB.7“
> And for the file question, do you mean that the symbolic object is the MS Word file, which has a representable set of (binary) symbols,
> or that the symbolic object is text which is incorporated within the file, but not verbatim (as the characters in the (e.g.) paragraph are likely to be represented in the file using very a different structure).
Posted by Martin on 15/11/2018
Continuing the question from my last message below:
Very large strings one would normally describe in a file and instantiate E90 Symbolic Object or a subclass of it with the URL. However, the question is, if the URL would indeed be a good persistent identifier, since the URL stands for a physical location, albeit indirectly addressed. The Linked Open Data community has not yet given satisfactory answers for the long term validity of resolvable URIs. If the URL is not a good identifier, another, primary URI should be chosen, and the content found in the URL should be related to the primary URI as a representative of the content of the symbolic object identified with the primary URI.
I would like to discuss a new property,
PXXX has content representation
domain: E90 Symbolic Object
range: E90 Symbolic Object
Tentative scope note:
Scope note: This property associates an instance of E90 Symbolic Object with another instance of E90 Symbolic Object (or any of its subclasses) that represents completely the content of the former identically concerning the the symbol set in which the former is defined and nothing more. For instance, a text of Aristotle may be defined in terms of the ancient Greek alphabet, paragraphs and section titles, but the representing object may use some type phases and page layout. Metadata in the range instance are not regarded as part of the content.
What about introductions, foot notes etc.?
Can someone make a scenario with a real canonical instance of a text of Aristotle or Platon, with indexed phrases, and propose how the text itself should be identified, possibly independent from spelling variants?
Another case: I submit to Springer a paper in .doc and they create a pdf, and a Journal image. How do we define "my paper" regardless these embodiments??
In the worst case, we would need yet another node in order to specify the part of the file that is the defining text.
Further, P165 incorporates is from information object to symbolic object, hence not compatible.
Another argument being, that an ontological link from E90 to E90 doesn't make sense. If the target should be a URL, we may regard this as an implementation level question.
In the 42nd joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 35th FRBR - CIDOC CRM Harmonization meeting, the crm-sig has discussed MD’s proposal regarding defining a new property of E90 Symbolic Object, such that it captures the actual content of a symbolic object and has accepted it as is .(Issue 395).
HW: The crm-sig has assigned GB, NC and RS to come up with solutions accounting for both cases of linking an instance of E90 Symbolic Object to other instances of E90 Symbolic Object composing it, as well as for cases where the same instance of E90 Symbolic Object is conveyed through different means/encodings –i.e. things that might be considered as the equivalent to ‘spelling variants’.
Berlin, November 2018
Posted by Robert Sanderson on 23/2/2019
Fellows, shall we discuss next Friday when we continue the work on Dig?
My current feeling is close to Martin’s final question – that this doesn’t actually make sense for Symbolic Object directly. It would result in a 2^n style relationship where every expression of the content was related to all of the others. It also comes dangerously close to FRBR and LRM.