Issue 363: Form and persistence of RDF identifiers

Starting Date: 
2018-01-24
Working Group: 
2
Status: 
Open
Background: 

Following  the dialog for issue 361, Richard Light posted on 18/1/2018

Gordon,

Looking at the RDF XML for F10, I see (a) that you make F10 equivalent to the full F10_Person, as the core CRM does in its RDFS Schema and (b) when subclassing from the CRM core, you use the full form E21_Person:
<rdf:Description rdf:about="http://iflastandards.info/ns/fr/frbr/frbroo/F10">
...
<rdfs:subClassOf rdf:resource="http://www.cidoc-crm.org/cidoc-crm/E21_Person"/>
<owl:sameAs rdf:resource="http://iflastandards.info/ns/fr/frbr/frbroo/F10_Person"/>
</rdf:Description>

So I think there are still issues to resolve in this area for FRBRoo.

posted by Richard on 18/1/2018

This is alarming.  I have always assumed that a superseded class or property would simply be flagged as "deprecated" and a new one minted to replace it. There is absolutely no need to re-use numbers, and I am hoping someone will come forward to say that this was a mistake, and not a change which accords with CRM-SIG policy.  Otherwise, as you say, we can have no confidence in the CRM as a persistent RDF framework, whether or not the class and property identifiers include a textual component.  Is this an isolated case, or does anyone know of other cases where domain and range (and indeed meaning) of a class or property has been changed after its initial publication?

(The textual component is, in any case, only meant to be guidance and is explicitly stated not to be unique: 'is identified by' below is a good example of this.)
 

posted by Gordon Dunsire on 18/1/2018Richard

I guess we were waiting for this discussion; we can only use what is documented in the CRM itself.

posted by Gordon Dunsire on 18/1/2018

[My first response was blocked because the thread was “too long”; here it is again]

I agree with Philip [and Richard]

If the domain or range of a FRBRoo property is changed, or there was a significant change in definition, we would deprecate the old version and declare a new URI. This hasn’t happened yet, but would beg the question of what to use as a new URI – perhaps add a version number to the alphanumeric part. For that reason we would advise the FRBR Review Group to mint a new alphanumeric designation.

posted by Martin on 19/1/2018

We never continue an alphanumeric designation when there is a significant change in definition. You can take for granted that continuing the
designation means that the change is not significant.

The case below (P148) should be due to an internal processing problem, and will never reoccur. It is characteristically the last property of this edition.
The reason, if I am not wrong, was that we got out of sync with the ISO version with the latest changes. Since the ISO team does in general not respect our
continuity concerns when there was parallel work, we had some times the bitter choice between our continuity and not to create a different branch from ISO for
typical reasons. Probably should have been explicitly justified.

If you sport any other reuse of an alphanumeric code, please inform us.

Since we have discussed for years the issues with changing labels, I repeat quickly the reasons:
Labels are taken for mnemonics, and people associate, even they shouldn't, semantics with it.
Therefore labels change when they render better the concept and serious misunderstandings can be reduced following explicit community requests.
The fact that the alphanumeric code is continued, marks absolutely clear that this is a change of name and not meaning.
Labels are also translated, and work as mnemonics of the respective language.
Therefore labels are not part of the definition.

The rest are considerations of use, and a question of utilities, which cannot dictate our practice.
Anyone working in an IT environment should have access to someone doing the trivial task of mapping label changes in his S/W,
if he has chosen to include labels in the URIs without "same_as" statements. Please consider in your requirements, that continuity of meaning is as important as comprehensibility. We cannot follow advise which considers only one side of the medal.

F10 was deliberately declared as "F" in FRBRoo to be an FRBRoo concept "same as" E21, for didactic reasons. There is no continuity break.

Please let me know if there is anything wrong with this.

Posted by Richard Light on 22/1/2018

On 19/01/2018 13:36, Martin Doerr wrote:
> Dear All,
>
> We never continue an alphanumeric designation when there is a significant change in definition. You can take for granted that continuing the
> designation means that the change is not significant.
>
> The case below (P148) should be due to an internal processing problem, and will never reoccur. It is characteristically the last property of this edition.
> The reason, if I am not wrong, was that we got out of sync with the ISO version with the latest changes. Since the ISO team does in general not respect our
> continuity concerns when there was parallel work, we had some times the bitter choice between our continuity and not to create a different branch from ISO for
> typical reasons. Probably should have been explicitly justified.

OK, thanks for the explanation.  Though I don't understand why 'ISO' (who, exactly?) was doing active development work on the CRM.  I thought that they simply took the SIG's work through the ISO formalization process.

> Since we have discussed for years the issues with changing labels, I repeat quickly the reasons:
> Labels are taken for mnemonics, and people associate, even they shouldn't, semantics with it.
> Therefore labels change when they render better the concept and serious misunderstandings can be reduced following explicit community requests.
> The fact that the alphanumeric code is continued, marks absolutely clear that this is a change of name and not meaning.
> Labels are also translated, and work as mnemonics of the respective language.
> Therefore labels are not part of the definition.
>
> The rest are considerations of use, and a question of utilities, which cannot dictate our practice.
> Anyone working in an IT environment should have access to someone doing the trivial task of mapping label changes in his S/W,
> if he has chosen to include labels in the URIs without "same_as" statements. Please consider in your requirements, that continuity of meaning is as important as comprehensibility. We cannot follow advise which considers only one side of the medal.

I think that this argument is perfectly valid for the 'Definition of CRM' document.  However, by publishing an RDFS expression of the CRM we are moving, whether we like it or not, into the realm of 'utilities'.  People are picking up and using our RDFS definitions in a variety of ways.  In this particular implementation context, I would argue that we should ensure that there is a label-free version of each CRM class and property.  Also, our guidance on the use of our RDFS implementation should recommend the use of this label-free version, on the grounds that we cannot guarantee the stability of the version which includes a label.

This talk of preferred labels and your mention of the labels in other languages leads me to wonder whether anyone has produced a SKOS version of the CRM.  This might be a useful exposition of the logic of the CRM, expressed in a format which is widely used and supported.  We could have 'preferred labels' for each concept in as many languages as we like.  A SKOS version would be no use for instance data, because each SKOS concept is itself an instance, in OWL terms, but it might be a powerful tool for expressing relationships between concepts in different schemes, i.e. exactly the purpose for which the CRM was originally created.  Thoughts, anyone?

Posted by Robert Sanderson on 22/1/2018

An interesting investigation would be to try and reuse existing terms from well-known ontologies, rather than creating yet another one.

To Martin’s point about just renaming all the things … that sounds easy in theory, but in the distributed real world of implementations and datasets, in practice it means that everyone needs to support all of the different permutations as there’s always some product or some piece of data that hasn’t updated to the most recent version.

One small benefit would be that new serializations like JSON-LD would have more liberty to assert their own mappings over top of the alphanumeric designations, rather than feeling beholden to the labels.  Of course for every other serialization it’s going to be completely unintelligible and thereby unusable.

Posted by Martin on 22/1/2018

Dear Robert,

On 1/22/2018 7:12 PM, Robert Sanderson wrote:
>

>
> An interesting investigation would be to try and reuse existing terms from well-known ontologies, rather than creating yet another one.
>

>
> To Martin’s point about just renaming all the things … that sounds easy in theory, but in the distributed real world of implementations and datasets, in practice it means that everyone needs to support all of the different permutations as there’s always some product or some piece of data that hasn’t updated to the most recent version.

It is not about renaming all things, it is about not excluding renaming while preserving the identification code.
An unchanged standard is an illusion, a fiction not worthwhile sticking to. I use to present it this way:

Making Standards

The good with standards is there are so many!

When you have a standard,

You need to transform to the standard

You need to renew and adapt the standard

You need to transform to the renewed standards

Why not just transform data?

There are too many transformations, you need a standard



> One small benefit would be that new serializations like JSON-LD would have more liberty to assert their own mappings over top of the alphanumeric designations, rather than feeling beholden to the labels.  Of course for every other serialization it’s going to be completely unintelligible and thereby unusable.
The challenge is a) to understand that mapping, not the standard is the ultimate solution
b) how to standardize the mapping
c) to minimize the needs to map  

Posted by Martin on 23/1/2018

Dear Richard,

On 1/22/2018 4:37 PM, Richard Light wrote:
>
> On 19/01/2018 13:36, Martin Doerr wrote:
>> Dear All,
>>
>> We never continue an alphanumeric designation when there is a significant change in definition. You can take for granted that continuing the
>> designation means that the change is not significant.
>>
>> The case below (P148) should be due to an internal processing problem, and will never reoccur. It is characteristically the last property of this edition.
>> The reason, if I am not wrong, was that we got out of sync with the ISO version with the latest changes. Since the ISO team does in general not respect our
>> continuity concerns when there was parallel work, we had some times the bitter choice between our continuity and not to create a different branch from ISO for
>> typical reasons. Probably should have been explicitly justified.

> OK, thanks for the explanation.  Though I don't understand why 'ISO' (who, exactly?) was doing active development work on the CRM.  I thought that they simply took the SIG's work through the ISO formalization process.
ISO working group decisions supersede ours. They will listen to arguments of our liaison people, but often it is better to accept than to risk another year of discussions about a label.
>

>> Since we have discussed for years the issues with changing labels, I repeat quickly the reasons:
>> Labels are taken for mnemonics, and people associate, even they shouldn't, semantics with it.
>> Therefore labels change when they render better the concept and serious misunderstandings can be reduced following explicit community requests.
>> The fact that the alphanumeric code is continued, marks absolutely clear that this is a change of name and not meaning.
>> Labels are also translated, and work as mnemonics of the respective language.
>> Therefore labels are not part of the definition.

>>
>> The rest are considerations of use, and a question of utilities, which cannot dictate our practice.
>> Anyone working in an IT environment should have access to someone doing the trivial task of mapping label changes in his S/W,
>> if he has chosen to include labels in the URIs without "same_as" statements. Please consider in your requirements, that continuity of meaning is as important as comprehensibility. We cannot follow advise which considers only one side of the medal.

> I think that this argument is perfectly valid for the 'Definition of CRM' document.  However, by publishing an RDFS expression of the CRM we are moving, whether we like it or not, into the realm of 'utilities'.  People are picking up and using our RDFS definitions in a variety of ways.  In this particular implementation context, I would argue that we should ensure that there is a label-free version of each CRM class and property.  Also, our guidance on the use of our RDFS implementation should recommend the use of this label-free version, on the grounds that we cannot guarantee the stability of the version which includes a label.
The issue was decided in the 27th meeting, as documented in the agenda. We had produced label-free definitions with language labels, as you propose, which caused an outcry from implementers that saw only numbers and had not tools showing the display labels. Since there is no new evidence to the issue, I'd propose to stay as we are and I'll try to make the respective discussion thread accessible, so that all the old arguments can be read again.

The current RDFS reads, e.g.:
<rdfs:Class rdf:about="E2_Temporal_Entity"><rdfs:label xml:lang="fr">Entité temporelle</rdfs:label><rdfs:label xml:lang="en">Temporal Entity</rdfs:label><rdfs:label xml:lang="ru">Временная Сущность</rdfs:label><rdfs:label xml:lang="el">Έγχρονη  Οντότητα</rdfs:label><rdfs:label xml:lang="de">Geschehendes</rdfs:label><rdfs:label xml:lang="pt">Entidade Temporal</rdfs:label><rdfs:label xml:lang="zh">时间实体</rdfs:label><rdfs:comment>

as outcome of a long-standing discussion...........
>
> This talk of preferred labels and your mention of the labels in other languages leads me to wonder whether anyone has produced a SKOS version of the CRM.
Your suggestions well taken, but I do not see what this would offer in contrast to the current international display labeling as shown above.

>   This might be a useful exposition of the logic of the CRM, expressed in a format which is widely used and supported.  We could have 'preferred labels' for each concept in as many languages as we like.  A SKOS version would be no use for instance data, because each SKOS concept is itself an instance, in OWL terms, but it might be a powerful tool for expressing relationships between concepts in different schemes, i.e. exactly the purpose for which the CRM was originally created.  Thoughts, anyone?
CRM classes are not terms. The CRM is an ontology of relationships. Classes are only auxiliary for relationships. Therefore we delete classes without relationships. The classes belong to a completely artificial language. Therefore I'd argue there is nothing like a "preferred label". People must understand the scope notes, nothing else.  The purpose for which the CRM was and is created is to mediate data structures, i.e. between relations connection "fields", not between "terms". If this is not clear enough from the CRM introduction, please let us know how to improve the text.

Therefore, to my opinion, it is impossible in SKOS to represent the logic of the CRM. A pure class-class-mapping is usually misleading. However the X3ML mapping language can map relationships in a declarative way.
 

Posted by Martin on 23/1/2018

Dear All,

Thank you very much for your engagement in these issues!
Let me remark, for all those that find our practices alarming, that nobody of us is paid for the maintenance of the CRM.
It is exclusively an engagement of volunteers and engagement of organizations for a common good.
What is really alarming for me is the lack of users offering active work beyond criticism.

We are now in the 22th year of development. If you want to have a CRM in which you can find some practices alarming in the future, better engage now and support us by coming to the meetings, learn understanding the methods and do editing work, tools development, didactic material etc;-).

Besides inviting people to our meetings and learning in the discussions, we'll be very glad to offer intensive training in our methods
and principles to anybody interested. Without the one or the other, some e-mail discussions may repeat old arguments in a fragmented way,
never convincing, because the overall logic is not exposed. The art is balancing all practical requirements and a crystal-clear separation
between the intellectual and technological levels.

Interested people may be domain professionals with a long-term data modeling and standards mission, consultants, but in particular also
post-graduate students that can combine their subjects with methodological research and become trainers themselves.

So I hope some of you are alarmed enough to join us actively:-D! 

 

Posted by Richard on 23/1/2018

On 23/01/2018 16:07, Martin Doerr wrote:

>> I think that this argument is perfectly valid for the 'Definition of CRM' document.  However, by publishing an RDFS expression of the CRM we are moving, whether we like it or not, into the realm of 'utilities'.  People are picking up and using our RDFS definitions in a variety of ways.  In this particular implementation context, I would argue that we should ensure that there is a label-free version of each CRM class and property.  Also, our guidance on the use of our RDFS implementation should recommend the use of this label-free version, on the grounds that we cannot guarantee the stability of the version which includes a label.
>>

> The issue was decided in the 27th meeting, as documented in the agenda. We had produced label-free definitions with language labels, as you propose, which caused an outcry from implementers that saw only numbers and had not tools showing the display labels. Since there is no new evidence to the issue, I'd propose to stay as we are and I'll try to make the respective discussion thread accessible, so that all the old arguments can be read again.
I would be interested to (re-)read the arguments.  However, I repeat my assertion above that there should also be a label-free version of each CRM class and property identifier.  I am perfectly happy that this should be flagged as being the same concept as a variant whose identifier includes a label (i.e. "E2 owl:sameAs E2_Temporal_Entity").  I'm even happy for the label-free identifier not to be the "preferred" form.

Implementers who want their CRM RDF to be valid into the long-term future must surely realise that the convenience of a "human-readable" identifier is negated by the possibility that the CRM SIG might change that identifier at some point in the future - which you are claiming the right to do - and so invalidate their RDF.  (Maybe they just need better tools.  Linked Data resources with opaque URIs are hardly unusual: take Geonames and the Getty vocabularies as examples.)  Has this specific argument been made before, and, if so, how was it rebutted?

> The current RDFS reads, e.g.:
> <rdfs:Class rdf:about="E2_Temporal_Entity"><rdfs:label xml:lang="fr">Entité temporelle</rdfs:label><rdfs:label xml:lang="en">Temporal Entity</rdfs:label><rdfs:label xml:lang="ru">Временная Сущность</rdfs:label><rdfs:label xml:lang="el">Έγχρονη  Οντότητα</rdfs:label><rdfs:label xml:lang="de">Geschehendes</rdfs:label><rdfs:label xml:lang="pt">Entidade Temporal</rdfs:label><rdfs:label xml:lang="zh">时间实体</rdfs:label><rdfs:comment>
>
> as outcome of a long-standing discussion...........

>>
>> This talk of preferred labels and your mention of the labels in other languages leads me to wonder whether anyone has produced a SKOS version of the CRM.
> Your suggestions well taken, but I do not see what this would offer in contrast to the current international display labeling as shown above.
>
>> This might be a useful exposition of the logic of the CRM, expressed in a format which is widely used and supported.  We could have 'preferred labels' for each concept in as many languages as we like.  A SKOS version would be no use for instance data, because each SKOS concept is itself an instance, in OWL terms, but it might be a powerful tool for expressing relationships between concepts in different schemes, i.e. exactly the purpose for which the CRM was originally created.  Thoughts, anyone?

> CRM classes are not terms. The CRM is an ontology of relationships. Classes are only auxiliary for relationships. Therefore we delete classes without relationships. The classes belong to a completely artificial language. Therefore I'd argue there is nothing like a "preferred label". People must understand the scope notes, nothing else.  The purpose for which the CRM was and is created is to mediate data structures, i.e. between relations connection "fields", not between "terms". If this is not clear enough from the CRM introduction, please let us know how to improve the text.
In that case it might be argued that the multilingual labels are not helpful.  And, in fact, we could not meet the uniqueness requirement for SKOS preferred labels, since our labels are not guaranteed to be unique.  What we should have are translations of the full scope notes for each class and property.

> Therefore, to my opinion, it is impossible in SKOS to represent the logic of the CRM. A pure class-class-mapping is usually misleading. However the X3ML mapping language can map relationships in a declarative way.
Having thought about this some more, and started to write an RDF-to-SKOS transformation, I have come to the same conclusion.  Also, I didn't know about X3ML, which I agree is exactly the right sort of approach for expressing mappings declaratively.
 

Posted by Robert Sanderson on 23/1/2018

Hi Martin,

Could you lay out, beyond costly and lengthy in person meetings, could you describe how best someone might actively participate? In our experience of bringing issues that have arisen in the linked.art work to the list, there has been some discussion, but no useful resolutions that might become editorial work that could be taken on.  The text of the specification, by being managed in a word document (it would appear, from the PDF), is not able to be edited by a distributed team of volunteers.

Modern specification efforts, including the W3C and IIIF, with linked.art following their lead, use github to manage issues, changes and publication of the content. A modernization of the specification management practices might enable volunteers to be more active, with their efforts recognized and tracked.  If this was decided to be a good way forwards, I would be happy to volunteer the time to set up the repository and publishing platform.

Posted by Richard on 23/1/2018

On 23/01/2018 16:39, Martin Doerr wrote:
> Dear All,
>
> Thank you very much for your engagement in these issues!
> Let me remark, for all those that find our practices alarming, that nobody of us is paid for the maintenance of the CRM.
> It is exclusively an engagement of volunteers and engagement of organizations for a common good.
> What is really alarming for me is the lack of users offering active work beyond criticism.

I think the recent discussions about RDF and the CRM go well beyond just offering criticism.  You'll find the document which I started on Google Docs:

https://docs.google.com/document/d/1zCGZ4iBzekcEYo4Dy0hI8CrZ7dTkMD2rJaxa...

which is an attempt at a self-contained 'how to' guide for CRM RDF implementers, and which reflects the recent discussions on this topic. I'm happy to develop this document further, and would welcome input from others on this list.  Conversely, if this document doesn't meet a real need, I would be equally happy to be told why this is the case.

Posted  by George on 29/1/2018

Dear all,

I think that an official RDF implementation recommendation guide as has been started would be a very useful document. I think the google doc format is a good place for formulating. We should aim to consider the resulting doc at the next SIG, hopefully with as much input from across the community as possible before hand. Having such a document should be a big help in eliminating unnecessary variance in implementation. A nice addition to the document would be to have example accompanying rdf.

Posted by Thanasis Velios on 29/1/2018

I think that offering examples in the form of 3M mappings would also be helpful.

Current Proposal: 

The 41st joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 34th FRBR - CIDOC CRM Harmonization meeting,the sig decided to merge this issue with 361 since both are referred to encoding in rdf CIDOC classes and properties. Then the sig reviewed the text initiated by Richard Light . They agreed about particular challenges that should be explained and clarified in this text. These are:

1. identifiers - their role and value, labels and reconciliationn in RDFnds, 
2. RDF and the question has the problem of primitive values (strings, names, dates, space, spacetime), (ontology as it exists must deviate what a machine can represent)
3. how to do properties of properties,
4. identifiers - their role and value in RDFnds,  (questions of reconciliation and instance managing)
5. Types SKOS recommendation and other possible ontological extensions
6. Identifiers of CRM classes and n relations themselves% (update processing)
7. Recording string
8. Statement expressing the translation of the ontology into RDFS using IsA and other mechanisms

Also the sig discussed about the creation of manuals like this for other applying other formats
In addition, the sig took the following decisions and recommendations about the CRMbase:
(a)  create a new issue ‘has content’ that will work from E90 and allow the semantic capture of the actual content of a symbolic object. To be modelled on the R33 property of FRBRoo. HW to MD for formulation of this property (issue 383)
(b) it is recommended that all nodes in rdf should have labels. If someone need to track appellation, he can capture the content through the new property of E90. 
(c) Should create a general section recording symbolic objects (to talk about the content question) and then the name recording section can reference this.
(d) should create a list of recommended data types for the primitive types

RL, MD, GB, OE, TV should  review the text about encoding CRM in rdf.

Lyon, May 2018

Posted by Martin on 21/11/2018

Dear All,

Here is my new version of the document, now in a coherent form, all previous comments taken into account.

Still missing, a guideline for the limits of Dimension values:
https://docs.google.com/document/d/1NdrWpzo7EFChryh4Qg-Ue8WLvnwejHx20eiwdJuZEck/edit#

Posted by Martin on 25/11/2018

Dear All,

Here the new, completely reworked version of the document about implementing CRM in RDF, I hope I have taken all comments into account. A guideline for value limits of dimensions is still missing, and a guideline about implementing multiple instantiation.

See https://docs.google.com/document/d/1NdrWpzo7EFChryh4Qg-Ue8WLvnwejHx20eiw...

Comments most welcome!

Outcome: 

 

 
 

Reference to Issues:

Meetings discussed: