Issue 555: RDFS Implementation and related issues

ID: 
555
Starting Date: 
2021-09-29
Working Group: 
2
Status: 
Done
Background: 

Posted by Pavlos Fafalios on 1/9/2021

Dear all,

The "Resources" page of the CIDOC-CRM website (http://www.cidoc-crm.org/versions-of-the-cidoc-crm) has been recently updated to include:

- An RDFS implementation (not yet approved by SIG) of the last official version of CIDOC-CRM (7.1.1). The link points to a gitlab web page which also includes the policies adopted for generating the RDFS file.
- An XML file for each version of CIDOC-CRM, including the classes and properties of the corresponding version.
- An HTML page for each version of CIDOC-CRM, containing declarations for all classes and properties (facilitating navigation to the classes and properties of each version).
- An HTML page for each version of CIDOC-CRM, containing translations and versioning information of all classes and properties.

We (at FORTH) believe that the above will facilitate the adoption of CIDOC-CRM.

We will start gathering comments about errors, improvements, etc., so please do not hesitate to provide your critical feedback.

All the above will be presented and discussed during the next CIDOC-CRM meeting.

Posted by Miel Vander Sande on 1/9/2021

Thanks to all involved for this contribution. This is indeed an important step towards adoption.

I was wondering: is a SHACL profile and a JSON-LD context being considered?

Posted by Robert Sanderson on 1/9/2021

Miel, all,

4 issues below:

(1) There is a 7.1.1 compatible JSON-LD context at:  https://linked.art/ns/v1/linked-art.json 
The description for how the JSON keys are derived from the ontology is: https://linked.art/api/1.0/json-ld/#context-design 
Comments welcome and happy to contribute it to the SIG, and make only a secondary linked art context for the profile specific features!

(2) A second request from me ... it would be great to have owl:inverseOf between each of the property pairs in the ontology.

e.g.

  <rdf:Property rdf:about="P1i_identifies">
    <rdfs:label xml:lang="en">identifies</rdfs:label>
    <rdfs:label xml:lang="de">bezeichnet</rdfs:label>
    <rdfs:label xml:lang="el">είναι αναγνωριστικό</rdfs:label>
    <rdfs:label xml:lang="fr">identifie</rdfs:label>
    <rdfs:label xml:lang="pt">identifica</rdfs:label>
    <rdfs:label xml:lang="ru">идентифицирует</rdfs:label>
    <rdfs:label xml:lang="zh">标识</rdfs:label>
    <rdfs:domain rdf:resource="E41_Appellation" />
    <rdfs:range rdf:resource="E1_CRM_Entity" />
    <owl:inverseOf rdf:resource="P1_is_identified_by" />
  </rdf:Property>

And (3) a minor typo:

  <rdfs:Class rdf:about="E41_E33_Linguistic_Appellation">
    <rdfs:label xml:lang="en">Linguistic Appellation</rdfs:label>
    <rdfs:subClassOf rdf:resource="E41_Appellation" />
    <rdfs:subClassOf rdf:resource="E33_Linguistic_Object" />
  </rdfs:Class>

It was agreed that this was going to be E33_E41 to keep the numbers in order, and to coincidentally correspond to the two parts of the label (E33 -> linguistic, E41-> appellation)
Great if this could be fixed.

And (4) a concern. I don't think that it is good practice to make assertions about other ontologies' predicates:
Line 1176 asserts:

  <rdf:Property rdf:about="http://www.w3.org/2000/01/rdf-schema#label">
    <rdfs:subPropertyOf rdf:resource="P1_is_identified_by" />
  </rdf:Property>

So this means that all of the objects of instances of rdfs:label are, due to the range of P1_is_identified_by, suddenly instances of E41_Appellation.
A system that does even basic inferencing will produce very different results, by assigning E41_Appellation as another class for all of the literals which are the object of rdfs:label.

This doesn't affect me, because while inferencing is a good idea in practice in some domains with very tightly controlled data and precisely applied ontologies and vocabularies, I have yet to see any benefit at all from it in ours.

Might I suggest as a compromise instead having this assertion published, but in a different rdfs file? That would make it more noticeable to people who might otherwise have no clue why their system was misbehaving all of a sudden, and also make it easier for it to be omitted from processing if it was not useful in practice. Then we're still making the assertion in an official, public capacity, but we're also giving agency to our users as to whether they want to use it.

Thanks for your hard work on this!

Posted by Alexander Huber on 2/9/2021

Dear Pavlos,

This is fabulous news, thanks to all involved!  One question: is there a timeline for bringing the compatible models in line with this release, particularly LRMoo, CRMinf, and CRMdig?

Thanks! 

Posted by Pavlos on 7/9/2021

Dear Miel, 

Thank you for your comments. 

Having a JSON-LD serialization seems useful, given the increasing adoption of this encoding format. We can start considering its implementation once the RDFS is approved by CRM SIG. 

About SHACL: do you mean using SHACL for schema validation, or for defining constraints/requirements over crm-compliant RDF graphs? I guess the latter. This is interesting as well, but it also requires careful design and thinking/discussions on what types of constraints to consider and define.  

Posted by Pavlos on 7/9/2021

Dear Alexander,

Yes. As far as I know, the plan is to review and update all models based on 7.1.1 (by the corresponding working groups). 
Then, we plan to also provide an RDFS implementation for these models (whose generation has been automated to a great extent; we just need to check if specific policies need to be considered). 

Posted by Pavlos on 7/9/2021

Dear Robert,

Thank you for your comments and feedback. Some answers inline:

On Wed, Sep 1, 2021 at 4:40 PM Robert Sanderson <azaroth42@gmail.com> wrote:

    Miel, all,

    4 issues below:

    (1) There is a 7.1.1 compatible JSON-LD context at:  https://linked.art/ns/v1/linked-art.json 
    The description for how the JSON keys are derived from the ontology is: https://linked.art/api/1.0/json-ld/#context-design 
    Comments welcome and happy to contribute it to the SIG, and make only a secondary linked art context for the profile specific features!

Please see my reply to Miel. 
 

    (2) A second request from me ... it would be great to have owl:inverseOf between each of the property pairs in the ontology.

    e.g.

      <rdf:Property rdf:about="P1i_identifies">
        <rdfs:label xml:lang="en">identifies</rdfs:label>
        <rdfs:label xml:lang="de">bezeichnet</rdfs:label>
        <rdfs:label xml:lang="el">είναι αναγνωριστικό</rdfs:label>
        <rdfs:label xml:lang="fr">identifie</rdfs:label>
        <rdfs:label xml:lang="pt">identifica</rdfs:label>
        <rdfs:label xml:lang="ru">идентифицирует</rdfs:label>
        <rdfs:label xml:lang="zh">标识</rdfs:label>
        <rdfs:domain rdf:resource="E41_Appellation" />
        <rdfs:range rdf:resource="E1_CRM_Entity" />
        <owl:inverseOf rdf:resource="P1_is_identified_by" />
      </rdf:Property>

Our intention was to provide a 'pure' RDFS implementation, since one of our next steps is to provide a rich OWL implementation (and also automate its production, as we do for RDFS). 
Nevertheless, including this OWL property does not seem to cause any problem and allows for at least some basic reasoning. Not sure if it is better to provide it as a separate module/file, or just enrich the existing file. 

 

    And (3) a minor typo:

      <rdfs:Class rdf:about="E41_E33_Linguistic_Appellation">
        <rdfs:label xml:lang="en">Linguistic Appellation</rdfs:label>
        <rdfs:subClassOf rdf:resource="E41_Appellation" />
        <rdfs:subClassOf rdf:resource="E33_Linguistic_Object" />
      </rdfs:Class>

    It was agreed that this was going to be E33_E41 to keep the numbers in order, and to coincidentally correspond to the two parts of the label (E33 -> linguistic, E41-> appellation)
    Great if this could be fixed.

Thanks for spotting, we will fix it
 

    And (4) a concern. I don't think that it is good practice to make assertions about other ontologies' predicates:
    Line 1176 asserts:

      <rdf:Property rdf:about="http://www.w3.org/2000/01/rdf-schema#label">
        <rdfs:subPropertyOf rdf:resource="P1_is_identified_by" />
      </rdf:Property>

    So this means that all of the objects of instances of rdfs:label are, due to the range of P1_is_identified_by, suddenly instances of E41_Appellation.
    A system that does even basic inferencing will produce very different results, by assigning E41_Appellation as another class for all of the literals which are the object of rdfs:label.

    This doesn't affect me, because while inferencing is a good idea in practice in some domains with very tightly controlled data and precisely applied ontologies and vocabularies, I have yet to see any benefit at all from it in ours.

    Might I suggest as a compromise instead having this assertion published, but in a different rdfs file? That would make it more noticeable to people who might otherwise have no clue why their system was misbehaving all of a sudden, and also make it easier for it to be omitted from processing if it was not useful in practice. Then we're still making the assertion in an official, public capacity, but we're also giving agency to our users as to whether they want to use it.

The reason for making this assertion is the fact that rdfs:label has been widely used for providing names/appellations (without making use of "P1 is identified by"). However, all these labels are (semantically) appellations of the corresponding resources. So, using this subproperty declaration, a system can use P1 together with an inference rule for retrieving both names expressed using rdfs:label and instances of E41 (or appellations that are in URL/URI form --a more complex case). 

It's not very clear to me why some systems will start misbehaving. Could you please provide an example of such misbehaviour and the platform of reference? The only case I can imagine (but I might be wrong!) is when a system uses P1 together with an inference rule for retrieving appellations, but for some reason it does not want to get back values of rdfs:label, although these are appellations (but again here SPARQL offers constructs that can be used to distinguish between the different types of appellations).

Posted by Miel Vander Sande on 8/9/2021

Hi Pavlos,

Op di 7 sep. 2021 om 09:38 schreef Pavlos Fafalios <fafalios@ics.forth.gr>:

    Dear Miel, 

    Thank you for your comments. 

    Having a JSON-LD serialization seems useful, given the increasing adoption of this encoding format. We can start considering its implementation once the RDFS is approved by CRM SIG. 

When done right, it can make complex models like CIDOC-CRM look a lot less scary. I think the goal should not be a complete implementation of CIDOC-CRM in a single context, but rather lead to an entry-level format that can be expanded when necessary (json-ld allows you to do this). I'm working on similar examples for PREMIS OWL.

    About SHACL: do you mean using SHACL for schema validation, or for defining constraints/requirements over crm-compliant RDF graphs? I guess the latter. This is interesting as well, but it also requires careful design and thinking/discussions on what types of constraints to consider and define.  

same here. I think both are very useful, but the latter can provide an easy to adopt, entry-level view on CIDOC-CRM. In any case, you cannot capture an open ontology in a single SHACL schema; they will always be one or more constraint views aimed at implementation and easy adoption. 

In fact, this is what we have done in Belgium with the OSLO programme by defining a consensus-driven application profile of cidoc-crm. You can find JSON-LD and SHACL implementations at the bottom of this page: https://data.vlaanderen.be/doc/applicatieprofiel/cultureel-erfgoed-event...

Posted by Thomas Francart on 8/9/2021

Hello

Le mar. 7 sept. 2021 à 12:43, Pavlos Fafalios via Crm-sig <crm-sig@ics.forth.gr> a écrit :

    Dear Robert,

    Thank you for your comments and feedback. Some answers inline:

    On Wed, Sep 1, 2021 at 4:40 PM Robert Sanderson <azaroth42@gmail.com> wrote:

        Miel, all,

        4 issues below:

        (1) There is a 7.1.1 compatible JSON-LD context at:  https://linked.art/ns/v1/linked-art.json 
        The description for how the JSON keys are derived from the ontology is: https://linked.art/api/1.0/json-ld/#context-design 
        Comments welcome and happy to contribute it to the SIG, and make only a secondary linked art context for the profile specific features!

    Please see my reply to Miel. 
     

        (2) A second request from me ... it would be great to have owl:inverseOf between each of the property pairs in the ontology.

        e.g.

          <rdf:Property rdf:about="P1i_identifies">
            <rdfs:label xml:lang="en">identifies</rdfs:label>
            <rdfs:label xml:lang="de">bezeichnet</rdfs:label>
            <rdfs:label xml:lang="el">είναι αναγνωριστικό</rdfs:label>
            <rdfs:label xml:lang="fr">identifie</rdfs:label>
            <rdfs:label xml:lang="pt">identifica</rdfs:label>
            <rdfs:label xml:lang="ru">идентифицирует</rdfs:label>
            <rdfs:label xml:lang="zh">标识</rdfs:label>
            <rdfs:domain rdf:resource="E41_Appellation" />
            <rdfs:range rdf:resource="E1_CRM_Entity" />
            <owl:inverseOf rdf:resource="P1_is_identified_by" />
          </rdf:Property>

    Our intention was to provide a 'pure' RDFS implementation, since one of our next steps is to provide a rich OWL implementation (and also automate its production, as we do for RDFS). 
    Nevertheless, including this OWL property does not seem to cause any problem and allows for at least some basic reasoning. Not sure if it is better to provide it as a separate module/file, or just enrich the existing file. 

I would also find it very useful to have the inverseOf declarations directly in this file. It cannot harm and would be indeed very useful for some simple reasoning use-cases in triplestore.
 

     

        And (3) a minor typo:

          <rdfs:Class rdf:about="E41_E33_Linguistic_Appellation">
            <rdfs:label xml:lang="en">Linguistic Appellation</rdfs:label>
            <rdfs:subClassOf rdf:resource="E41_Appellation" />
            <rdfs:subClassOf rdf:resource="E33_Linguistic_Object" />
          </rdfs:Class>

        It was agreed that this was going to be E33_E41 to keep the numbers in order, and to coincidentally correspond to the two parts of the label (E33 -> linguistic, E41-> appellation)
        Great if this could be fixed.

    Thanks for spotting, we will fix it. 
     

        And (4) a concern. I don't think that it is good practice to make assertions about other ontologies' predicates:
        Line 1176 asserts:

          <rdf:Property rdf:about="http://www.w3.org/2000/01/rdf-schema#label">
            <rdfs:subPropertyOf rdf:resource="P1_is_identified_by" />
          </rdf:Property>

        So this means that all of the objects of instances of rdfs:label are, due to the range of P1_is_identified_by, suddenly instances of E41_Appellation.
        A system that does even basic inferencing will produce very different results, by assigning E41_Appellation as another class for all of the literals which are the object of rdfs:label.

        This doesn't affect me, because while inferencing is a good idea in practice in some domains with very tightly controlled data and precisely applied ontologies and vocabularies, I have yet to see any benefit at all from it in ours.

        Might I suggest as a compromise instead having this assertion published, but in a different rdfs file? That would make it more noticeable to people who might otherwise have no clue why their system was misbehaving all of a sudden, and also make it easier for it to be omitted from processing if it was not useful in practice. Then we're still making the assertion in an official, public capacity, but we're also giving agency to our users as to whether they want to use it.

    The reason for making this assertion is the fact that rdfs:label has been widely used for providing names/appellations (without making use of "P1 is identified by"). However, all these labels are (semantically) appellations of the corresponding resources. So, using this subproperty declaration, a system can use P1 together with an inference rule for retrieving both names expressed using rdfs:label and instances of E41 (or appellations that are in URL/URI form --a more complex case). 

    It's not very clear to me why some systems will start misbehaving. Could you please provide an example of such misbehaviour and the platform of reference? The only case I can imagine (but I might be wrong!) is when a system uses P1 together with an inference rule for retrieving appellations, but for some reason it does not want to get back values of rdfs:label, although these are appellations (but again here SPARQL offers constructs that can be used to distinguish between the different types of appellations). 

While not incorrect, I also think this could cause confusion; the use-case I have in mind would be a triplestore loaded with both CIDOC-CRM data combined with data from other ontologies using rdfs:bale, where all of the sudden these entities would get a P1_is_identified_by.
I suggest that these "alignments" with other ontologies (we could think about skos:note <-> P3 has note, E21 Person <-> foaf:Person, etc.) be separated in another file (or other files), so that users have a choice on using them or not, depending on the ontologies they use and the level of interoperability they want to have.

Posted by Robert Sanderson on 8/9/2021

Dear Miel, all,

On Wed, Sep 8, 2021 at 4:11 AM Miel Vander Sande via Crm-sig <crm-sig@ics.forth.gr> wrote:

    Op di 7 sep. 2021 om 09:38 schreef Pavlos Fafalios <fafalios@ics.forth.gr>:

        Having a JSON-LD serialization seems useful, given the increasing adoption of this encoding format. We can start considering its implementation once the RDFS is approved by CRM SIG. 

    When done right, it can make complex models like CIDOC-CRM look a lot less scary. I think the goal should not be a complete implementation of CIDOC-CRM in a single context, but rather lead to an entry-level format that can be expanded when necessary (json-ld allows you to do this). I'm working on similar examples for PREMIS OWL.

I agree that a well crafted context can greatly improve usability of the ontology in modern software frameworks. This has been demonstrated very clearly in different domains since the standardization of JSON-LD 1.0 in 2014. I'm very happy to put as much effort as needed into this.

However, I disagree about the goal. I feel that the SIG should provide a context that covers the same set of classes and predicates as in the RDFS. Composing multiple contexts together with no overlap would be extremely error-prone, with little way to detect those errors. For Linked Art, we started with two contexts ... the terms that we recommend from CRM in the profile, and the second was all the other terms. If you wanted to stick with just Linked Art, you imported one. If you wanted to use anything extra, you also imported the rest. And even that level of composition was highly frustrating for implementation, as you needed to know all of the terms used in the document in order to know whether you should use one or both. The easy solution was to always use both... defeating the purpose of splitting them.

And the errors are difficult to detect because if a key in JSON-LD doesn't match an entry in an included context, it is silently ignored. So data would just disappear, and you had to go hunting through other people's documents (the contexts) to figure out why.

By sticking to the same divisions as the RDFS files (e.g. one context for base, one for sci, one for geo, etc) it would be straightforward to manage from a publishing and consumption perspective at the ontology level, rather than at an application profile level.

Posted by Miel Vander Sande   on 9/9/2021

Hi Robert,

That makes a lot of sense. If you were to only support a subset, you'd need to accompagny it with a SHACL profile to indicate what shape the context offers and to make sure that things don't go missing. Although in practice, this might indeed create chaos rather than prevent it.

Posted by Frank Fichtner on 9/9/2021

Dear Pavlos,

to my knowledge up to now the ecrm is the official OWL-implementation of the CIDOC CRM. Automation of the process seems to be a good idea, however in the last years we could provide many feedback while we were implementing cidoc crm (style/writing mistakes but also logical inconsistencies etc.). We are currently working on 7.1.1 but are not completely done yet. I think we should not mix owl and rdfs on the rdfs-level because that simply would make the rdfs-file obsolete. If we do that we could just use OWL because it is rdfs anyway.

Posted by Robert Sanderson on 9/9/2021

Hi Mark,

Could you expand a little about "OWL is RDFS anyway"? The advantage of the current RDFS file is that it's easy to understand and process as XML without a full semantic stack. Once decorated with all of the owl details, it becomes more complex. Apart from owl:inverseOf and perhaps owl:ObjectProperty vs owl:DataProperty, is there anything else that would be added? Cardinality? Definition of shortcuts using axioms / rules?

I'm curious also as to your thoughts on the rdfs:label / P1_identifies issue?

Many thanks!

Posted by Francesco Beretta on 9/9/2021

Dear Rob, all,
>> I'm curious also as to your thoughts on the rdfs:label / P1_identifies issue?

À propos: there is a question that has been on my mind for some time, perhaps you can give me some insights.

The P1 is identified by (identifies) property has E41 Appellation as range. This class is subclass of Symbolic Object and Legal Object, therefore a E77 Persistent Item and not a E62 String which is a E59 Primitive Value.

Therefore an instance of E41 Appellation — rdfs:label —> '[label]', right ? So it crm:P1 cannot be equivalent to rdfs:label?

E1 Entity —rdfs:label—> rdfs:Literal  would appear to be a shortcut of:

E1 Entity —P1 is identified by—> E41 Appellation —rdfs:label—> '[label]'

My question(s):

1. is this correct ?
2. is there any way in OWL-DL or any other formal language to express the notion of shortcut or path, with a start and an end class, and the whole path inbetween ?

<There was unfortunately a copy-paste issue in my email.>
Le 09.09.21 à 17:35, Francesco Beretta a écrit :
> The P1 is identified by (identifies) property has E41 Appellation as range. This class is subclass of Symbolic Object and Legal Object, therefore a E77 Persistent Item and not a E62 String which is a E59 Primitive Value.
>
> Therefore an instance of E41 Appellation — rdfs:label —> '[label]', right ? So it crm:P1 cannot be equivalent to rdfs:label?

I mean:

An instance of E41 can have this property:
E41 Appellation — rdfs:label —> '[label]', right ?

So the crm:P1 property cannot be equivalent to rdfs:label, right?

 

Posted by Thomas Francart on 9/9/2021

Le jeu. 9 sept. 2021 à 17:51, Francesco Beretta via Crm-sig <crm-sig@ics.forth.gr> a écrit :

    Dear Rob, all,
    >> I'm curious also as to your thoughts on the rdfs:label / P1_identifies issue?

    À propos: there is a question that has been on my mind for some time, perhaps you can give me some insights.

    The P1 is identified by (identifies) property has E41 Appellation as range. This class is subclass of Symbolic Object and Legal Object, therefore a E77 Persistent Item and not a E62 String which is a E59 Primitive Value.

    Therefore an instance of E41 Appellation — rdfs:label —> '[label]', right ? So it crm:P1 cannot be equivalent to rdfs:label?

    E1 Entity —rdfs:label—> rdfs:Literal  would appear to be a shortcut of:

    E1 Entity —P1 is identified by—> E41 Appellation —rdfs:label—> '[label]'

    My question(s):

    1. is this correct ?
    2. is there any way in OWL-DL or any other formal language to express the notion of shortcut or path, with a start and an end class, and the whole path inbetween ?

Indeed, this is OWL2 Property Chains : https://www.w3.org/TR/owl2-primer/#Property_Chains

Posted by Martin on 9/9/2021

Dear Francesco,

This is a complex issue, which has been discussed in length in 2018 and basically was spelled out in the implementation guidelines for RDFS by Richrad Light and me.

All these questions you pose have been taken into account carefully. The text may need improvements, but I'd kindly ask all CRM-SIG members having respective questions to read it carefully and give us feedback.

Let me explain just a bit here from the side of logic, which is tricky and not the usual reasoning we apply within the CRM:

A superproperty is not equivalent to a subproperty. A superproperty is only implied by a subproperty.
 
 Therefore: Once E41 Appellation has no necessary property, an instance of E41 Appellation without having a property of its own does not violate the range of the superproperty. Its just a poor case.

(But it is completely true that rdfs:label is without properties. From the time of RDFS 1.1 on, which recommends the use of xsd values in literals, there are hidden properties in the label, such as the language tags.)

This statement does also strictly not hold: "This class is subclass of Symbolic Object and Legal Object, therefore a E77 Persistent Item and not a E62 String which is a E59 Primitive Value",

because a) there is no axiom in CRM saying that Persistent Item and E62 String are disjoint.
                b) There is no declaration in the RDFS implementation that rdf:Literal equals E62 Sting or
                        E59 Primitive Value.

Obviously, RDFS makes rich use of Literal, packing stuff like WKT geometric values  into them, which are used in geo-enabled triple stores.

With the superproperty declaration, we say that whowever uses rdfs:label refers to a name (E41 Appellation). Unfortunately, RDFS does not allow us smarter things to do, but this gives the right answers to queries.

Posted by Mark Fichtner on 9/9/2021

Dear all,

I am speaking from OWL-point of view and agree with most of the other writers. 
Concerning the P1-issue:
- rdfs:label has rdfs:Literal as range
- P1 in OWL typically is an object property and not a datatype property. It has E41 as a range and E41 is not in the E59 primitive value subtree. Its subclasses are via multiple inheritance, but it does not hold for E41 itself.
- If you declare rdfs:label a subproperty of P1 you are changing in fact the definition of rdfs:label and the definition of E41. This means you simply change the data on nearly the whole world without even having a glance at a single dataset. This is not only a worst-practice but would lead to massive inconsistencies when it comes to reasoning with OWL. I don't want to tell you what you should do in rdf - because rdf is more flexible here. But it does not seem logical to me. Ontology alignment is a difficult task.
I  understand that it might be helpful in some scenarios. But I think it would confusing if the official CIDOC CRM RDF file would do ontology integration that way. Furthermore RDF is just one implementation of CIDOC CRM and typically when it comes to implementation only primitive datatypes are replaced by the implementation - not object properties.

In the Erlangen CRM we use:
- owl:Class for the classes,
- we have some owl:Restrictions (74)
- owl:ObjectProperty for the object properties
- owl:DatatypeProperty for the datatypes
- owl:inverseOf for the inverses
If I didn't miss anything, thats all. See http://erlangen-crm.org/200717/

So the difference between OWL and RDF variant won't be big after adding owl:inverse and if you start using owl in your ontology definition it is pretty straight forward to use OWL completely anyway ;-)

We had a long discussion on shortcuts - I think the conclusion was we hardly have shortcuts in CIDOC CRM that could be used as Property chains as the implications are not strong enough. Martin probably can add some in here. 

 

Posted by Robert Sanderson on 9/9/2021

Dear all,

I am speaking from OWL-point of view and agree with most of the other writers. 
Concerning the P1-issue:
- rdfs:label has rdfs:Literal as range
- P1 in OWL typically is an object property and not a datatype property. It has E41 as a range and E41 is not in the E59 primitive value subtree. Its subclasses are via multiple inheritance, but it does not hold for E41 itself.
- If you declare rdfs:label a subproperty of P1 you are changing in fact the definition of rdfs:label and the definition of E41. This means you simply change the data on nearly the whole world without even having a glance at a single dataset. This is not only a worst-practice but would lead to massive inconsistencies when it comes to reasoning with OWL. I don't want to tell you what you should do in rdf - because rdf is more flexible here. But it does not seem logical to me. Ontology alignment is a difficult task.
I  understand that it might be helpful in some scenarios. But I think it would confusing if the official CIDOC CRM RDF file would do ontology integration that way. Furthermore RDF is just one implementation of CIDOC CRM and typically when it comes to implementation only primitive datatypes are replaced by the implementation - not object properties.

In the Erlangen CRM we use:
- owl:Class for the classes,
- we have some owl:Restrictions (74)
- owl:ObjectProperty for the object properties
- owl:DatatypeProperty for the datatypes
- owl:inverseOf for the inverses
If I didn't miss anything, thats all. See http://erlangen-crm.org/200717/

So the difference between OWL and RDF variant won't be big after adding owl:inverse and if you start using owl in your ontology definition it is pretty straight forward to use OWL completely anyway ;-)

We had a long discussion on shortcuts - I think the conclusion was we hardly have shortcuts in CIDOC CRM that could be used as Property chains as the implications are not strong enough. Martin probably can add some in here. 

--------------------------

from rdflib import ConjunctiveGraph

from owlrl import RDFSClosure

g = ConjunctiveGraph()

g.parse("Downloads/minimal_schema.rdfs.xml", format="xml")

rdfs_sems = RDFSClosure.RDFS_Semantics(g, axioms=True, daxioms=True, rdfs=True)

rdfs_sems.closure()

out = rdfs_sems.graph.serialize(format="ttl")

print(out.decode('utf-8'))

 ...

"fish"@en a crm:E1_CRM_Entity,

        crm:E41_Appellation,

        crm:E90_Symbolic_Object,

        rdfs:Literal,

        rdfs:Resource .

-------------------------------

As expected, it entails the nonsense that the literal "fish"@en is an E1, E41, E90, etc. which is garbage caused by this pollution in the ontology, as literals cannot be the subject of triples.

Hope that helps explain my unease!

Posted by Martin on 9/9/2021

Dear Robert, Mark,

Of course this is not elegant schema design. Unease is accepted, but what are the alternatives??

On 9/9/2021 10:30 PM, Robert Sanderson wrote:
>
>
>
> As expected, it entails the nonsense that the literal "fish"@en is an E1, E41, E90, etc. which is garbage caused by this pollution in the ontology, as literals cannot be the subject of triples.
This is, in my eyes, not nonsense, but simply reality. The literal "fish" is used as a name. Hence it is ontologically an E41. Following the definition of E90, "fish"@en is also symbolic object, regardless whether one distinguishes data objects and literals. Note, that the definitions of the CRM are ontological, not syntactic in the first place.

This is a classical problem of data integration, and why formal ontologies were invented. Literature in the 1980ties discussed that classes can be hidden in boolean values, strings, or be explicit tables. There is an arbitrary decision of applications to name things via labels, or via classes in RDF/OWL. SKOS exclusively names things via labels.

So, if one makes a knowledge base that commits to the CRM, I would like to have a query that returns all names in the whole world I can reach, regardless what encoding variant and KR paradigm is used. Otherwise, SKOS names will not be appellations.

Alternatively, we close our eyes, and hard code in data entry and query that "fish" is used as Appellation, but just don't write it down.

@en actually is equivalent to "has language" etc. With these hidden properties RDFS itself violates the separation of Literals and data objects.  It opens up a whole world of user-defined data objects within Literals, with no logical connection to the data objects. This is nothing than a bad later patch to a problem not initially anticipated. How are these compatible with OWL reasoners?

There is no elegant solution to providing an ontology that describes a reality based on FOL to fitting it exactly with Schema languages.

At least, this is how I perceive this problem, having seen enough knowledge representation languages and information integration literature from the eighties and implementations from the nineties on.

For me, the question is completely practical: We have a CRM compatible KB, a real platform. What is the simplest form that I get all names in the KB back?  I have not seen a whole "RDF" world that my statement label IsA P1 would turn upside down. Do you have one?
 

Posted by Martin on 9/9/2021

Sorry, I just forgot:

Of course we can provide guidelines and S/W how to query all names etc. We can hardly forbid CRM users to put appellations into rdfs:label.

So, how do this problem solved in OWL? Those of you opposing to the superproperty hack, how do you solve the query question? 

Posted by Robert on 10/9/2021

Thanks Martin :)

As Francesco asked and Thomas answered, I would also recommend a property chain axiom that says:

If: x rdfs:label y 
then: x P1_is_identified_by z ; z a E41_Appellation , P190_has_symbolic_content y .

I quickly defer to those who do OWL more often than I, but I think it's as easy as:

    rdfs:label owl:propertyChainAxiom (crm:P1_is_identified_by, crm:P190_has_symbolic_content) .

Posted by Pavlos Fafalios on 10/9/2021

Dear all, 

Thank you for the interesting discussion. A notice that might be important: 

In 'OWL Full', which was designed to allow flexibility and preserve compatibility with RDFS: "object properties and datatype properties are not disjoint" and "datatype properties are effectively a subclass of object properties", see: 

https://www.w3.org/TR/owl-ref/#OWLFull   (not sure if there is a change on this in OWL 2) 

On the downside, as also described in the above link, the OWL Full features means that one loses some guarantees that OWL DL and OWL Lite can provide for reasoning systems. 

So, we can have an RDFS implementation that includes the superproperty hack (and which seems compatible with OWL Full, allowing for some basic reasoning). And then focus on having an OWL DL implementation (Mark's effort) which is more strict, provides guarantees and will be useful for advanced reasoning engines (this implementation can include the property chain axiom). 

Posted by Martin on 10/9/2021

Dear All,

I'd also like to point you to a subtle distinction which is very important for us:

Mark correctly states:

If you declare rdfs:label a subproperty of P1 you are changing in fact the definition of rdfs:label and the definition of E41. This means you simply change the data on nearly the whole world without even having a glance at a single dataset.

This is the mathematical, model-theoretic point of view about what a definition is.

In contrast, we commit to Nicola Guarinos formulation, that the logical declarations of a formal ontology are an approximation of a conceptualization (rather than being it), trying to minimize "unintended models".

Therefore, there exist textual definitions for all constructs of formal ontologies, in order to render the intended models. This is the ontological (philosophical, cognitive) point of view of what a definition is.

The CRM states for E41 Appellation:
This class comprises signs, either meaningful or not, or arrangements of signs following a specific syntax, that are used or can be used to refer to and identify a specific instance of some class or category within a certain context.

 RDFS 1.1 states for rdfs:label:
rdfs:label is an instance of rdf:Property that may be used to provide a human-readable version of a resource's name.

According to that, E41 constitutes a generalization of all adequate uses of rdfs:label, and indeed is intended to be so for all possible worlds. Methodologically, for CRM-SIG, the intended meaning has priority over preserving the formal definition in the sense Mark mentioned. This principle is also behind the formulation of the "conservative extension" in the CRM introduction.

Further, I want to express our particular gratitude to Mark Fichtner for creating the OWL implementations of the CRM and his careful semantic checking, which has been helping us a lot. When I mentioned automatic generation, I rather spoke about a utility making work easier, not a mechanism replacing manual scrutiny.

Finally, I would like to confirm, that no more constructs of this kind are intended, that it is not intended to introduce a new practice of this kind (please do not generalize and panick! and that indeed the rdfs:label problem and identifiers play an exceptional role in the ontology - schema transition.
 

Current Proposal: 

Posted by Pavlos Fafalios on 29/9/2021

Dear all, 

I think there is no open issue on this (please let me know if this is not true), so I suggest opening a new issue in order to finalize the discussion on the RDFS implementation. 

Based on the discussion on the other email thread (title: "RDFS, XML and more"), I created the below google doc (homework) where the different issues are summarised. Also, there are suggestions on how to proceed. 

https://docs.google.com/document/d/1oq02aS8xENzGBJAdxlSJzX_n9CE43_Aycl8NttReqis/edit?usp=sharing

There will probably be a slot on this at the forthcoming SIG meeting, so that we can make some final decisions. 
So, please kindly check the doc before the next SIG meeting and feel free to comment (especially in case I forgot something, or something seems not to be the case), or directly reply to this email.

Thank you all for the contributions! 

Posted by Robert Sanderson on 29/9/2021

Thanks Pavlos, that's a great write up!

In case the discussion happens when I can't be at the SIG meeting (likely due to timezone issues), my votes:

A - YES to the suggested scenario of creating a second file, that might currently only hold this one alignment, but in the future might also map between other core properties or classes. I'm okay with leaving it out completely. I would be disappointed if it were left in, but not to the point of a veto -- it's possible to ignore, just annoying to have to do so.

B - I'm okay with any of the results, so long as B3 (don't include them) is also backed up with an OWL representation that /does/ include them.

C - YES. Also, FWIW, my code that generates a context file given the ontology's RDFS: 
    https://github.com/linked-art/crom/blob/master/utils/make_jsonld_context...
    Which generates the context:
    https://linked.art/ns/v1/linked-art.json

D - Would like to see what benefits a SHACL shape file would bring. (abstain)

E - YES
F - YES

And the URI construction is a separate issue?
 

Posted by Pavlos Fafalios on 30/9/2021

Thank you Robert.

About the URI construction: Yes, it is related to issue 460. I will soon send a similar 'summary' document. 

Posted by martin on 29/9/2021

Dear All,

Please let me summarizing the discussions about making rdfs:label  IsA P1 is identified by, without triggering any more discussion on it:

a) This declaration obviously is against syntactic practice in RDFS

b) It appears to be a logically correct rendering of the CRM, because we can infer that uses of Literal via rdfs:label are instances of E41 Appellation

c) The ontological interpretation of the (textual) definition of RDFS for rdfs:label supports the interpretation
hat uses of Literal via rdfs:label are instances of E41 Appellation

d) obviously it does not support inheritance of properties, but this is not fixed by the CRM FOL definition.

e) RDF platforms and SPARQL appear to behave as expected when confronted with the statement

f) Usual OWL versions seem to cause conflicts with this statement, whereas OWL full does not strictly separate Literals from objects anymore.

g) The latter is serious.

i) making rdfs:label  IsA P1 is identified by is a CRM-SIG decision from 2018. No new evidence since then.

Therefore I propose to follow Robert's proposal, to put into a separate RDFS module the statements that declare subproperties of P1 is identified by and in RDFS have range Literal (albeit in practice filled with syntactic structures, such as xsd:datatypes).

Then, users of OWL version that would create conflicts may omit this module, and write adequate query inferences to get the respective values.

Other users may use both files in combination.

In the 51st CIDOC CRM & 44th FRBRoo SIG meeting, PF walked participants through open issues that relate to the RDFS implementation of the CIDOC CRM. 

The subtopics discussed and a summary of the decisions can be found below: 

  • Inclusion of ‘rdfs:label – subPropertyOf – P1 is identified by’

Decision: statements declaring subproperties of P1 is identified by, whose range is Literal in RDFS, be put to a separate RDFS module (one that extends the CRMbase RDFS module). Depending on one’s implementation, they can use both files or only the CRMbase RDFS. The other file will also contain triplets for primitives (that are also isA E41 Appellation). 

  • Inclusion of ‘owl:inverseOf’ assertions in the RDFS

Decision: Since using the ‘owl:inverseOf’ property in the same RDFS module will not cause platforms to crash, and in view of the fact that it is part of the model, it should be introduced. Postpone reaching a decision re. whether it will be put in a different module or in the CRMbase

  • Providing a JSON-LD context

Decision
•    proceed with automatically generating a JSON-LD context (together with the generation of the RDFS).
•    Start a new issue, where to determine the rules for automatically generating such a context. 
•    Start a new issue on other serializations that it might useful to autogenerate for different audiences. 

  • Providing a SHACL profile

Decision: formally invite Miel Vander Sande to present creating SHACL profiles in a future SIG meeting (use cases etc.)
Nb. Connect to discussion on creating application profiles (see issues: 236, 364)

  • Change ‘E41_E33_Linguistic_Appellation’ to ‘E33_E41_Linguistic_Appellation’

Decision: keeping numbers of the numeric identifier in order. 

  • Bringing Compatible models in line with this release

Decision: once the family models are reviewed and updated to the point they ‘re stable (+style guide based on CIDOC CRM v7.1.1) to proceed by providing an RDFS implementation for these models as well. 
Will be done in a separate issue per model

 

The policies for the RDFS derivations of the CIDOC CRM can be found here

Post by Pavlos Fafalios (25 November 2021): 

Dear all,

 
We are happy to announce that the RDFS implementation of CIDOC-CRM 7.1.1 has been finalized (based on the discussions and decisions of the last SIG meeting) and is now available through the webpage of CIDOC-CRM: 
 
 
There, you can find 3 links next to 'RDFS':
1) One pointing to the main RDFS file 
2) One pointing to the supplementary RDFS file which contains subproperties declarations that can cause inconsistencies when reasoning with OWL (mainly because a datatype property is subproperty of an object property)
3) One pointing to a GitLab web page which describes the policies followed for generating the RDFS file, as well as a folder with additional encodings/serialisations, currently: .nt, .ttl, .json-ld (not context; we work on also providing a JSON-LD context).
 
Please let us know if you have any questions/suggestions, or if you spot an error. 
Some more details:
 
(A) Inclusion of ‘rdfs:label - subPropertyOf - P1 is identified by’
The subproperty relation 'rdfs:label supropertyOf P1' has been moved to the supplementary RDFS file for the already discussed reason. For exactly the same reason (datatype property subpropertyOf an object property), we moved there also the below 3 subproperty relations:
- `P168 place is defined by` subPropertyOf `P1 is identified by`
- `P169i spacetime volume is defined by` subPropertyOf `P1 is identified by`
- `P170i time is defined by` subPropertyOf `P1 is identified by`

(P168, P169i and P170i all have range Literal, while P1 has range Appellation).

 
The file also contains a comment on the top describing the reason for moving these 4 subproperty declarations to a different file.  
 
(B)  Inclusion of ‘owl:inverseOf’ 
As decided at the last SIG meeting, we have included all inverseOf assertions. 
 
(C) Providing a JSON-LD context
We have already started working on this. The plan is to include it in the 'other_encodings' folder of the "more" link when it is ready.
 
(D) Providing a SHACL profile
The plan is to invite someone with experience in SHACL (Miel?) to make a presentation at a next SIG meeting, describing use cases, usefulness, etc.
 
(E) Change ‘E41_E33_Linguistic_Appellation’ to ‘E33_E41_Linguistic_Appellation’
Done
 
(F) RDFS for compatible models 
The plan is to provide an RDFS for each compatible model that has been updated based on version 7.1.1 of CIDOC-CRM. When such an update is ready, we can open a new issue for its RDFS implementation. 
 
 
Thank you all again for all the feedback and discussions on this issue.
 
Best regards,
Pavlos and Elias
 
Outcome: 

In the 52nd joint meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9; 45th FRBR - CIDOC CRM Harmonization meeting; the SIG accepted PF's proposal to formally close the issue on the grounds of all prior decisions having been implemented (see post on the sg list -November 25th, 2021)