Issue 230: Coreference statement in CIDOC CRM

ID: 
230
Starting Date: 
2013-06-07
Working Group: 
3
Status: 
Done
Closing Date: 
2015-02-12
Background: 

In the meeting of co reference group of CIDOC, in Stockholm in June 2013, there was a proposal by coref group, to introduce a co reference statement in CIDOC–CRM.

Old Proposal: 

In the meeting, it was assigned to Øyvind Eide to write the scope note for this statement and to propose it for discussion in the next CRM meeting. 

Posted by Yvind Eide on 13/10/2013 

Emm Co-Reference Assignment 
Subclass of:    E13 Attribute Assignment 

Scope note:    This class compromises the action of making assertions about the fact that two or more E89 Propositional Objects refer to the same E1 CRM Entity. The assertion is based on the assumption that this was an implicit fact being made explicit by this assignment. In the context of this assignment the interpretation of the referring entity is fixed. 

Examples:
    the assertion that the author name "Hans Jæger" on the title page of the novel "Fra Christiania-Bohêmen" refers to the same historical person as the motive of the painting "Forfatteren Hans Jæger" by Edvard Munch. 

Properties: 
     Pmm assigned co-reference to (co-reference was assigned by): E89 Propositonal Object 

     Pnn has co-reference target (is co-reference target of): E1 CRM Entity a

Inherited properties: 
    P14 carried out by: E39 Actor 

    P4 has time-span: E52 Time-Span 

    P2 has type: E55 Type 


Pmm assigned co-reference to (co-reference was assigned by) 
Subproperty of:     P140 assigned attribute to 

Scope note:     This property connects an Emm Co-Reference Assignment to one of the entities co-referring to the co-reference target 

Pnn has co-reference target (is co-reference target of) 
Subproperty of:     P67 refers to [it may be that this is not a sub-property, but rather just another P67] 

Scope note:     This property connects an Emm Co-Reference Assignment to the target of the co-referring references. 

[Do we also need the following? In case properties should be added as well] 


Enn Negative Co-Reference Assignment 
Subclass of:     E13 Attribute Assignment 

Scope note:     This class compromises the action of making assertions about the fact that two or more E89 Propositional Objects do not refer to the same E1 CRM Entity. The assertion is based on the assumption that this was an implicit fact being made explicit by this assignment. In the context of this assignment the interpretation of the referring entity is fixed. 

Examples: 

     the assertion that the author name "Hans Jæger" on the title page of the novel "Fra Christiania-Bohêmen" does not refer to the same historical person as the author of the collection of drawings "Til Julebordet : ti Pennetegninger / af H.J." incorrectly attributed to Hans Jæger in the Bibsys database. 

Properties: 
    Pmm assigned co-reference to (co-reference was assigned by): E89 Propositonal Object 

    Pnn has co-reference target (is co-reference target of): E1 CRM Entity 

Inherited properties: 

    P14 carried out by: E39 Actor 

    P4 has time-span: E52 Time-Span 

    P2 has type: E55 Type

The CRM-SIG made the above changes. MD will write an extension about the levels of belief) See above 

29th CRM-SIG meeting, Heraklion, October 2013 

E91 Co-Reference Assignment 
Subclass of:     E13 Attribute Assignment 

Scope note:     This class comprises actions of making the assertion whether two or more particular instances of E89 Propositional Object refer to the same instance of E1 CRM Entity. The assertion is based on the assumption that this was an implicit fact being made explicit by this assignment. Use of this class allows for the full description of the context of this assignment.(MD will write an extension about the levels of belief) 

Examples: 

     the assertion that the author name "Hans Jæger" on the title page of the novel "Fra Christiania-Bohêmen" refers to the same historical person as the motive of the painting "Forfatteren Hans Jæger" by Edvard Munch. 
     the assertion that the author name "Hans Jæger" on the title page of the novel "Fra Christiania-Bohêmen" does not refer to the same historical person as the author of the collection of drawings "Til Julebordet : ti Pennetegninger / af H.J." incorrectly attributed to Hans Jæger in the Bibsys database. 

Properties: 
    P154 assigned non co-reference to (was regarded not to co-refer by): E89 Propositional Object 

    P155 has co-reference target (is co-reference target of): E1 CRM Entity 

Posted by Martin 26/3/2014 

Dear All, 

Here my homework: 
E91 Co-Reference Assignment 

Subclass of:E13 Attribute Assignment 

Scope note:This class comprises actions of making the assertion whether two or more particular instances of E89 Propositional Object refer to the same instance of E1 CRM Entity. The assertion is based on the assumption that this was an implicit fact being made explicit by this assignment. Use of this class allows for the full description of the context of this assignment. (MD will write an extension about the levels of belief) 

A co-reference assertion may admit a certain degree or strength of belief, such as "possibly", "most likely" etc. This can be modelled using the property P2 has type with a suitable terminology. However, this degree of belief will be common to all statement asserted by one instance of E91 Co-Reference Assignment. Otherwise, the assertion must be broken down into a suitable number of instances with different degrees of belief. 

If there exists a document describing particular evidence, this can be referred to by using P used specific object. There may nothing more be known about the instance of E1 CRM Entity to which the described statements are assumed to refer to than the facts expressed by these very statements. 

Frequently, scholars may like to contradict to a co-reference statement or point to frequent confusions. This can be modelled using the property P154 assigned non co-reference to. 

The property P155 has co-reference target/allows for associating an ??? 

In the end, I got confused: The range of P155 can be interpreted as a URI used within the same knowledge base as the instance of E91. Then, it would correspond to a co-reference between some text element and the knowledge base in which we implement the CRM, the "local truth". In that case, also one instance of P153 would make sense, even two instances of P155 only. In case we talk about Linked Open Data, the issue becomes more obscure. We could regard the co-reference to be between some text element and the document the URI resolves into. 

If however someone uses this very URI in another context, the question of co-reference is again there. 

It appears as if we need a construct to refer to the use of a URI within a knowledge base or RDF document as an instance of Propositional Object. If we follow this line, then the interpretation of P155 pointing to a "self co-reference" would be consistent, and any other meaning of referring to a URI would need a contextualization of the URI to be discussed. 

Opinions? 

Posted by Max Schich 26/3/2014 

Whoever collects co-references: The number of co-reference links explodes n*(n-1) with the number of references to the reference object. Imagine co-reference links between all books citing the bible. This is likely to result in a high error-rate, especially in manual curation. 

Workaround: Use simple references to an "unknown/potential reference object" and put a probability on those links => scales with n. 

 

Posted by Martin 27/3/2014 

Hi Maximilian, 

Yes, sure! Self-evident that you would not state derived links manually. The process we describe here here is based on scholarly insight. Only those links are described, which result directly from insight, and are not already deduced due to transitivity. The minimal number of links is always the same. The deductions, the rest of the n*(n-1), can be managed by a program, even be calculated at query time from the original statements. 

A "high error rate in manual curation" does not make sense for the rest: We talk here about primary knowledge from scholarly research. There is no other source than curatorial knowledge, regardless whether it is supported by an automated "instance matching algorithm", which produces identity assumptions, or detected manually. 

See also: 

1. Meghini, C., Doerr, M., & Spyratos, N. (2009). Managing Co-reference Knowledge for Data Integration. /Proceeding of the 2009 conference on Information Modelling and Knowledge Bases XX/, (pp. 224-244). Amsterdam, The Netherlands, The Netherlands: IOS Press (978-1-58603-957-8), (pdf). 
2. Doerr, M., Meghini, C., & Spyratos, N. (2007). Leveraging on Associations - a New Challenge for Digital Libraries. /In Proc of the First International Workshop on Digital Libraries Foundations In conjunction with ACM IEEE Joint Conference on Digital Libraries (JCDL 2007)/, Vancouver, Canada, 23 June. (pdf ). 

See VIAF, they have created the co-reference network to over 30% manually. Only 69% could be done by an algorithm that was manually! confirmed to be sufficiently reliable. Simple references to "unknown objects" miss the point: Who said that they are the same? 

I'll take your point to add this intention to the scope note! 

 

Posted by Martin 27/3/2014 

Dear All, 

I have now changed also the first paragraph following Maximilian's objections to make clear that we talk about scholarly judgements only. 

Best, 

Martin 

E91 Co-Reference Assignment 

Subclass of:E13 Attribute Assignment 

Scope note:This class comprises actions of making the assertion whether two or more particular instances of E89 Propositional Object refer to the same instance of E1 CRM Entity. The assertion is based on the assumption that this was an implicit fact being made explicit by this assignment following scholarly judgement, possibly using background knowledge and other sources of evidence. Use of this class allows for the full description of the context of this assignment. (MD will write an extension about the levels of belief). This class should not be used to describe purely logical deductions of co-reference from scholarly co-reference judgements based on the transitivity of co-reference. 

 

Posted by Eyvind 29/3/2014 

It is quite possible that I do not understand what you say about P155. So this is my understanding of it: 

Co-reference assignment is about making explicit a fact which is assumed by the one making the assignment to be true. An example: 

I claim that the word "Passau" on my train ticket and the place referred to by "city where the first DHd conference took place" both refer to the physical place Passau. If I make this statement in an information system, I would say: 

E91 Co-Reference Assignment P155 has co-reference target P53 Passau. 
(+ the other properties) 

The P155 points to the thing in the world to which the person making the co-reference assignment believes the references to point to. 

It may not be known, and it may not be documented -- open world. But the links from the two strings above to the physical place we know as Passau was not created in or by the co-reference assignment, it was made explicit in the assignment. 

Of course I can use a URI for Passau. But that is at implementation level, is it not? 

 

Posted by Martin 30/3/2014 

Dear Oeyvind, 

On 29/3/2014 10:13 πμ, Øyvind Eide wrote: 

It is quite possible that I do not understand what you say about P155. So this is my understanding of it: 

Co-reference assignment is about making explicit a fact which is assumed by the one making the assignment to be true. 


Yes. 

An example: 

I claim that the word "Passau" on my train ticket and the place referred to by "city where the first DHd conference took place" both refer to the physical place Passau. If I make this statement in an information system, I would say: 

E91 Co-Reference Assignment P155 has co-reference target P53 Passau. (+ the other properties) 

The P155 points to the thing in the world to which the person making the co-reference assignment believes the references to point to. 


Yes. How do I know that the URI (or whatever the range instance of P155 is) points to the Passau I ment when I make this statement? This is why I said,"The range of P155 can be interpreted as a URI (or whatever identity) used within the same knowledge base as the instance of E91. Then, it would correspond to a co-reference between some text element and the knowledge base in which we implement the CRM, the "local truth". 

It means, the Co-Reference statement shares the same reality (my understanding of the world ) as the identifier for "Passau" at p155. In other words, I know by sure how to relate this identifier to the City in Bavaria. It could however be, that I refer to a URI for "Passau", which has been imported in my knowledge base, and indeed was used for another Passau in another knowledge base which coined the URI. Then, my coref statement would be misleading. Indeed, it would be yet another co-reference, but this time to the use of a URI within a knowledge base, rather than a word within a text. 

All the magic is in your phrase: "The P155 points to the thing in the world". Whose world? 

Therefore, I'd suggest that P155 must point to an identifier of something the person who makes the co-ref statement has an unambiguous notion of reality about, either a thing in the world by use of an identifier the person "knows" to interpret, or pointing to a hypothetical thing "the thing referred in these two texts, whatever it is". In the latter case, it has an identity condition based on the text. In any case, the scope note must make clear what difference is between P155 the other links in terms of knowing. Therefore I proposed a "local shared truth" for P155. 

Opinions? 

 

Posted by Eyvind 1/4/2014 


n 30. mars 2014, at 21:25, martin wrote: 

Dear Oeyvind, 

On 29/3/2014 10:13 πμ, Øyvind Eide wrote: 
It is quite possible that I do not understand what you say about P155. So this is my understanding of it: 

Co-reference assignment is about making explicit a fact which is assumed by the one making the assignment to be true. 
Yes. 
An example: 
I claim that the word "Passau" on my train ticket and the place referred to by "city where the first DHd conference took place" both refer to the physical place Passau. If I make this statement in an information system, I would say: 

E91 Co-Reference Assignment P155 has co-reference target P53 Passau. 
(+ the other properties) 

The P155 points to the thing in the world to which the person making the co-reference assignment believes the references to point to. 
Yes. How do I know that the URI (or whatever the range instance of P155 is) points to the Passau I ment when I make this statement? This is why I said, "The range of P155 can be interpreted as a URI (or whatever identity) used within the same knowledge base as the instance of E91. Then, it would correspond to a co-reference between some text element and the knowledge base in which we implement the CRM, the "local truth". 

It means, the Co-Reference statement shares the same reality (my understanding of the world ) as the identifier for "Passau" at p155. In other words, I know by sure how to relate this identifier to the City in Bavaria. It could however be, that I refer to a URI for "Passau", which has been imported in my knowledge base, and indeed was used for another Passau in another knowledge base which coined the URI. Then, my coref statement would be misleading. Indeed, it would be yet another co-reference, but this time to the use of a URI within a knowledge base, rather than a word within a text. 

All the magic is in your phrase: "The P155 points to the thing in the world". Whose world? 

Therefore, I'd suggest that P155 must point to an identifier of something the person who makes the co-ref statement has an unambiguous notion of reality about, either a thing in the world by use of an identifier the person "knows" to interpret, or pointing to a hypothetical thing "the thing referred in these two texts, whatever it is". In the latter case, it has an identity condition based on the text. In any case, the scope note must make clear what difference is between P155 the other links in terms of knowing. Therefore I proposed a "local shared truth" for P155. 

Opinions? 


Dear Martin, 

I think I understand now. But to make it clear if I do: 

in a normal reference situation, for instance (to go back to the situation of the example in CRM): 

E82 Hans J�ger (the name on the title page of the book) P131 identfies E21 Hans J�ger (the historical person) 

In that case the problem of reference you talk about does not apply. 

But in the situation: 

E91 Co-Reference Assignment P155 has co-reference target E21 Hans J�ger (the historical person) the problem does arise. 

The difference between the two situations is that in the former (P131) the reference is expressed in the model, whereas in the latter (P155) the references is expressed in a statement recorded in the model. 

Right? 

The questions is: do we need to record what the person making the co-reference statement believes the propositional objects refer to? The reference from each of them will/should/may be recorded in the system throught systems such as the various "identifies" properties.

 

Posted by Dominic 1/4/2014 

I don't understand this properly despite reading this email quite a few times. 

I have some text contained in a book or on an object that refers to or is about a city in Bavaria called Passau. 

I have a local knowledge of this place that I have my own identifier for. I can therefore co reference the text to my identifier. 

Someone else has an identifier for Passau but I don't know for sure whether it is taking about the same thing. It is outside my local knowledge. 

Is the suggestion that P155 should only be used for co referencing local knowledge not external uri outside local knowledge. 

Thanks and apologies if this is obvious. 
Dominic 

 

Posted by Richard Light 1/4/2014 

I must confess that I have problems with this co-reference framework. 

What I would expect us to be developing is a way of making statements that, in someone's opinion, this CRM entity X is (or is not) the same real-world entity as that CRM entity Y. This is what I understand real-world co-referencing exercises do, e.g. aligning person identifiers in BNB with those in VIAF [1], using owl:sameAs or skos:exactMatch relationships. The difference we would bring to current practice is that we would be providing an attribute assignment context which allows us to state who is asserting the co-reference, when they did it, etc., and allowing them to give their reasons. Having "reified" the initial co-reference assertion, it would be possible for others to take issue with it, support it, etc. Have I misunderstood the intention here? 

If that is what we are aiming to develop, I would expect the co-reference assignment to be populated by two or more E1 CRM Entities, being the subjects of the co-reference statement. Instead, we have a situation where there is only one E1 CRM Entity, the "co-reference target", with E89 Propositional Objects being said to co-refer to it (or not). 

In the example being discussed, I would prefer to say: 

1. The word 'Passau' on my train ticket refers (in my opinion) to an E53 Place X. 
2. The first DHd conference took place in E53 Place Y. 
3. X co-refers to Y. 

Only assertion (3) is a co-reference statement; the other two are simply ways of defining E53 Places by stating their properties, so that the co-reference statement has some meaningful context. 

A weakness of the current approach, in my view, is that it "finesses" one of the E1 CRM Entities involved, by only inferring it from an E89 Propositional Object, rather than stating it explicitly. In many real-world cases, the entities to be co-referred will be identified by an E42 Identifier (e.g. a Linked Data URL), which would be disallowed by the current range of P153/P154. 

What we currently have, in my view, is actually "reference", rather than "co-reference", i.e. a means of assigning an additional property to a single E1 CRM Entity. 

Richard 
[1] see e.g. http://bnb.data.bl.uk/doc/person/LightAlanR1950-.rdf 

 

Posted by Martin 1/4/2014 

Hi Dominic, 

At least for me, it is not obvious at all. 

I think we have several choices: 

Either we use P155 only for the local knowledge, i.e., within the same knowledge base in which we describe the co-reference assertion. Then, P155 points to the "real thing", i.e., the reference is locally resolved. If the assertion assumes some probability, it is not about what P155 means. 

Or we use P155 for any URI, but then the person who makes the co-reference statement would make just another assumption what the URI P155 points to is about. Then, it would essentially make no difference to P153, except for the range class. 

The latter provokes the next question: By what is the meaning of a URI determined? Has a URI always a unique "owner", the one who introduced it? Or is the meaning of the URI, as with any other name, determined by the context of use. For instance, I take a URI for Passau from a gazetteer, mean the city in Bavaria, but choose one of an American city, and use it in my knowledge base. Which city does it represent? 
On the other side, I could use the gazetteer, check the georeference etc., and I am sure it is the city I mean. Then, the external knowledge becomes local knowledge. 

Depending, pointing to a URI in a coreference statement may require to add a knowledge base of reference to the URI, and then the URI becomes part of a statement of an information object, the knowledge base of reference", as any other target of P153. This knowledge base of reference could be of the defining Actor or of the using Actor. 

We could drop P155 all over, but I think the question what meaning a reused URI has is interesting on its own. I fear it can only be resolved, if the Actor who has introduced the URI is defined. 

Opinions? 

 

Posted by Martin 1/4/2014 
Hi Richard, 

Actually what we wanted to develop was comparing text sources, not URIs. The discussion, and your comment, shows that probably we have to regard comparing identifiers in texts and in knowledge bases on an equal level. 

I'd modify your example: 
1. The word 'Passau' on the train ticket I found refers to a E53 Place X known exactly by the ticket authority. 
2. The first DHd conference took place in E53 Place Y , following a knowledge base. 
3. X co-refers to Y in my opinion. 

If the train ticket is mine, I should have definite knowledge which place it means. Then it would be a simple refers, in my local knowledge. 

Generally, a CRM instance is an information object in its own right. We can naively assume, that the maintainer knows what its identifier mean (or knows who knows). Then, we need no discourse about potential matches, only about "errors".

We made the following assumptions about what qualifies a URI to represent a reality or to refers to something: 

The URI only makes sense in combination with a knowledge base which describe the reality 

URI & P155 has co-reference target (is co-reference target of) 

    (1) P155 points to a local model of reality => unique URI and knowledge of real world object 
    (2) External URI is used, it is assumed to be known and unique 
    (3) If (1),(2) hold then P155 and P153 is a valid statement 

URI & P153 assigned co-reference to (was regarded to co-refer by) 

    (4) P153 can be used for external URI? Assumed context of URI to be implicit 

Finally we decided 

    (a) to be a note in the scope note of E91 for external URI 
    (b) Resolving "same as" within my knowledge base is not E91
    (c) MD will try to formulate these things 

Hague 30th CRM-SIG meeting

Current Proposal: 
Posted by Martin   27/09/2014
 
Dear All,
 
Here my final attempt to define E91 and to clarify epistemological positions wrt URI use:
E91 Co-Reference Assignment
 
Subclass of:           E13 Attribute Assignment
 
Scope note:          This class comprises actions of making the assertion whether two or more particular instances of E89 Propositional Object refer to the same instance of E1 CRM Entity. The assertion is based on the assumption that this was an implicit fact being made explicit by this assignment. Use of this class allows for the full description of the context of this assignment.   The Actor making the assertion may have different kinds of confidence in the truth of the asserted fact of co-reference, because it may imply an interpretation of the (past) knowledge behind the propositional objects assumed to be co-referring. This kind of confidence can be described by using the property P2 has type (is type of). In case different propositional attitudes should be expressed per asserted propositional object, the assertion has accordingly to be divided into one instance of E91 Co-Reference Assignment for each kind of confidence.
 
This class aims at the problem of interpreting within a particular passage of an historical text, to which real-world entity a particular name, pronoun or equivalent expression was intended to refer by the texts author. In other words, it expresses the uncertainty of the creator of the assertion about the meaning of the information provided by another person.
 
Each such interpretation can only be documented with respect to another reference – either found in another text by the same or a different author, and/or by referring to the world known to the creator of the co-reference assertion. To do the latter, the property P155 has co-reference target (is co-reference target of) allows for referring to an instance of CRM Entity of the creator’s world. In a sense, the respective instance of E91 Co-Reference Assignment using the property P155 has co-reference target (is co-reference target of) in a knowledge base forms propositional object referring to the creator’s target entity, since a knowledge base as a whole can be seen as a propositional object. Consequently, if in a Semantic Web implementation the target entity is instantiated by a URI, the meaning of this identifier must be unambiguous to the creator of the co-reference assignment. Similarly, a URI of another authority, such as an author catalogue of a library, can be interpreted as a referring proposition of this catalogue, and be referred to by the property P153 assigned co-reference to (was regarded to co-refer by) or P154 assigned non co-reference to (was regarded not to co-refer by): E89 Propositional Object in order to express that it does not immediately represent the creator’s known world. In this case, the authority that knows the meaning of this URI must be unambiguous by the form of the URI itself.
 
In contrast, the meaning of the property ‘owl:same_as’ of  the OWL knowledge representation language cannot specify who’s knowledge it represents and cannot express kind of confidence. Therefore it is not adequate to model the progress of scholarly co-reference research .
 
Examples:
 
§   the assertion that the author name “Hans Jæger” on the title page of the novel “Fra Christiania-Bohêmen” refers to the same historical person as the motive of the painting “Forfatteren Hans Jæger” by Edvard Munch.
 
§   the assertion that the author name “Hans Jæger” on the title page of the novel “Fra Christiania-Bohêmen” does not refer to the same historical person as the author of the collection of drawings “Til Julebordet : ti Pennetegninger / af H.J.” incorrectly attributed to Hans Jæger in the Bibsys database.
 
Properties:
 
P153 assigned co-reference to (was regarded to co-refer by): E89 Propositional Object
 
P154 assigned non co-reference to (was regarded not to co-refer by): E89 Propositional Object
 
P155 has co-reference target (is co-reference target of): E1 CRM Entity
In 31st joined meeting of the CIDOC CRM SIG, ISO/TC46/SC4/WG9 and the 24th FRBR - CIDOC CRM, in Heraklion Crete the crm-sig proposed that  Oeyvind will refine the scope note provided by MD

Posted by Øyvind on 4/2/2015

Please find enclosed my homework for issue 230. It consists of two things:

* New scope notes for E91 Co-Reference Assignment, shortened to keep semantic web complexity out of the CRM. Thanks to Gerald for input. 

* A draft for a document describing the complexity left out of the scope notes, based on Martin's previous scope notes and input from Arianna (but no responsibility on any of them for the result!). This document could be developed into a technical paper referred to from CRM, to an article, or both. 

------------------------------------------------------------

E91 Co-Reference Assignment

Subclass of:          E13 Attribute Assignment

Scope note:           This class comprises actions of making the assertion whether two or more particular instances of E89 Propositional Object refer to the same instance of E1 CRM Entity. The assertion is based on the assumption that this was an implicit fact being made explicit by this assignment. Use of this class allows for the full description of the context of this assignment.

The Actor making the assertion may have different degrees of confidence in the truth of the asserted fact of co-reference, because it may imply an interpretation of the (past) knowledge behind the propositional objects assumed to be co-referring. This degree of confidence, also known as “propositional attitude”, can be described by using the property P2 has type (is type of) for the whole of the co-reference activity. No such degree of confidence can be connected to each of the co-referring propositional objects. In the case one E39 Actor creates a E91 Co-Reference Assignment between two or more E89 Propositional Objects and wants to express two or more different propositional attitudes for one E89 Propositional Object, In case different propositional attitudes should be expressed per asserted propositional object, the assertion has accordingly to be divided into one instance of E91 Co-Reference Assignment for each propositional attitude.

The use of P155 has co-reference target is limited to entities within the knowledge base in which the E91 Co-Reference Assignment is found. This is because the E91 Co-Reference Assignment is making explicit the world view of the E39 Actor carrying out the assignment and this world view is expressed as such only within that specific knowledge base. For a further discussion of the relationship from E91 Co-Reference Assignment via P155 has co-reference target to E1 CRM Entity, see the technical paper … .

This class aims mainly at the problem of interpreting in historical texts to what particular entity a name, pronoun or equivalent kind of expression was intended to refer to within a particular passage of the text by its author. In other words, it expresses the uncertainty of the creator of the assertion about the meaning of information provided by another person. Such an interpretation can only be documented with respect to another reference – either found in another text by the same or another author, and/ or by referring to the world known to the creator of the co-reference assertion himself. To do the latter, the property P155 has co-reference target (is co-reference target of) allows for referring to an instance of CRM Entity of the creator’s world. In a sense, the respective instance of E91 Co-Reference Assignment using the property P155 has co-reference target (is co-reference target of) in a knowledge base forms propositional object referring to the creator’s target entity, since a knowledge base as a whole can be seen as a propositional object. Consequently, if in a Semantic Web implementation the target entity is instantiated by a URI, the meaning of this identifier must be unambiguous to the creator of the co-reference assignment. Similarly, a URI of another authority, such as an author catalogue of a library, can be interpreted as a referring proposition of this catalogue, and be referred to by the property P153 assigned co-reference to (was regarded to co-refer by) or P154 assigned non co-reference to (was regarded not to co-refer by): E89 Propositional Object in order to express that it does not immediately represent the creator’s known world. In this case, the authority that knows the meaning of this URI must be unambiguous by the form of the URI itself.

In contrast, the meaning of the property ‘owl:same_as’ of  the OWL knowledge representation language cannot specify who’s knowledge it represents and cannot express propositional attitudes. Therefore it is not adequate to model the progress of scholarly co-reference research.

Examples:

§   the assertion that the author name “Hans Jæger” on the title page of the novel “Fra Christiania-Bohêmen” refers to the same historical person as the motive of the painting “Forfatteren Hans Jæger” by Edvard Munch.

§   the assertion that the author name “Hans Jæger” on the title page of the novel “Fra Christiania-Bohêmen” does not refer to the same historical person as the author of the collection of drawings “Til Julebordet : ti Pennetegninger / af H.J.” incorrectly attributed to Hans Jæger in the Bibsys database.

§   Insert example of the use of P2 has type as describe in the scope notes.

Properties:

P153 assigned co-reference to (was regarded to co-refer by): E89 Propositional Object

P154 assigned non co-reference to (was regarded not to co-refer by): E89 Propositional Object

P155 has co-reference target (is co-reference target of): E1 CRM Entity

Posted by Richard Light on 6/2/2015

Hi,

If I have interpreted your longer paper correctly, that means that the whole co-reference mechanism that the CRM has erected fails to address the practical requirement which I would have.  That is, the ability for me to indicate that a word or phrase in a source document refers (in my opinion), to a specified real-world person (or other non-information object).

Have I got this right, and, if so, is there a CRM mechanism which does allow me to make this kind of assertion?

Posted by Øyvind on 6/2/2015

If one source refers to one object, then it is not a co-reference. Then it is a reference. 

 
Co-reference is there to say that you know (for some reason you may specify if you want to) that two or more word/phrases refer to the same real-world person. The latter can be specififed or it can be left undefined.
 
I fail to see why co-reference should solve the problem of single propositional objects referring to real world objects — we already had mecanisms for that.
 
I have a feeling that the problems documented in the long paper would apply to single references too if the target is not modelled within your information system. This may be linked to fundamental problems with the whole linked data paradigm. But this is just a feeling so I have to flesh it out more to say something evidence based on it.
 
I may have misunderstood you question so please use smaller spoons if I did!
 
 

Posted by Richard Light 6/2/2015

On 06/02/2015 18:11, Øyvind Eide wrote:

If one source refers to one object, then it is not a co-reference. Then it is a reference. 

Co-reference is there to say that you know (for some reason you may specify if you want to) that two or more word/phrases refer to the same real-world person. The latter can be specififed or it can be left undefined.
 
I fail to see why co-reference should solve the problem of single propositional objects referring to real world objects — we already had mecanisms for that.

OK, here is an example.  This section of Linked Data text from the recently-opened EEBO:

http://data.modes.org.uk/TEI-P5/EEBO-TCP/id/A01483.d1e2619

is, in my opinion, talking about this non-information object:

http://dbpedia.org/resource/Edward_Plantagenet,_17th_Earl_of_Warwick

How would you model that in the CRM?
I have a feeling that the problems documented in the long paper would apply to single references too if the target is not modelled within your information system. This may be linked to fundamental problems with the whole linked data paradigm. But this is just a feeling so I have to flesh it out more to say something evidence based on it.

This is an aspect of the issue which I don't understand.  If you can't (knowingly) decide that you trust an external Linked Data resource and are allowed to make assertions which touch on the entities which it defines, what hope is there for the whole Linked Data project?  (Or, if this constraint is specific to the CRM, then the same point applies more locally. )

Posted by Øyvind   on 6/2/2015

6. feb. 2015 kl. 19:07 skrev Richard Light <richard@light.demon.co.uk>:

On 06/02/2015 18:11, Øyvind Eide wrote:

If one source refers to one object, then it is not a co-reference. Then it is a reference. 
 
Co-reference is there to say that you know (for some reason you may specify if you want to) that two or more word/phrases refer to the same real-world person. The latter can be specififed or it can be left undefined.
 
I fail to see why co-reference should solve the problem of single propositional objects referring to real world objects — we already had mecanisms for that.

OK, here is an example.  This section of Linked Data text from the recently-opened EEBO:

http://data.modes.org.uk/TEI-P5/EEBO-TCP/id/A01483.d1e2619

is, in my opinion, talking about this non-information object:

http://dbpedia.org/resource/Edward_Plantagenet,_17th_Earl_of_Warwick

How would you model that in the CRM?

 
I would say the two are propositional objects co-referring. No problem.
 
I have a feeling that the problems documented in the long paper would apply to single references too if the target is not modelled within your information system. This may be linked to fundamental problems with the whole linked data paradigm. But this is just a feeling so I have to flesh it out more to say something evidence based on it.

This is an aspect of the issue which I don't understand.  If you can't (knowingly) decide that you trust an external Linked Data resource and are allowed to make assertions which touch on the entities which it defines, what hope is there for the whole Linked Data project?  (Or, if this constraint is specific to the CRM, then the same point applies more locally. :-) )

 
Sure you can trust something external to your infomration system. As, for instance, a propositional object.
 
I am afraid we may be talking past each other but it may be too late for me to see how…
 
 

Posted by  Øyvind         on 7/2/2015

Dear Richard, and all,

 
To try to clarify in the light of the morning, there are (as I see it) two separate issues:
 
1. A simplification of the scope notes for E91 Co-Reference Assignment. This is a topic of the meeting next week. It would be good if you could comment if:
 
a) the changes in scope notes introduces any new problems or solves old problems
b) there are problems in E91 which was not solved by the proposed changes.
 
2. A document about (co)referencing. I would surely be happy to discuss that (at length!) but it is kept outside of CRM on purpose. It is a discussion document, not a part of the standard. Once finalised, it could be referred to from the standard if we so wish. As you can see it is still in draft form. 
 
Also remember the construct:
 
E89 Propositional Object —> P67 refers to (is referred to by) —> E1 CRM Entity
 
which can be used for reference without the co. The latter will easily introduce implicit co-reference. E91 is for explicit co-reference. I would be happy to go on about the difference (and can do it if it helps) but now my train is arriving and I have to do other things.
 
I hope this helps to focus the disagreements,
 

Posted by Richard Light on 7/2/2015

On 07/02/2015 09:42, Øyvind Eide wrote:
> Dear Richard, and all,
>
> To try to clarify in the light of the morning, there are (as I see it) two separate issues:
>
> 1. A simplification of the scope notes for E91 Co-Reference Assignment. This is a topic of the meeting next week. It would be good if you could comment if:
>
> a) the changes in scope notes introduces any new problems or solves old problems
> b) there are problems in E91 which was not solved by the proposed changes.
>
> 2. A document about (co)referencing. I would surely be happy to discuss that (at length!) but it is kept outside of CRM on purpose. It is a discussion document, not a part of the standard. Once finalised, it could be referred to from the standard if we so wish. As you can see it is still in draft form.
Hi,

While the document is indeed separate, a key conclusion from it is included within the revised E91 scope notes:

The use of P155 has co-reference target is limited to entities within the knowledge base in which the E91 Co-Reference Assignment is found. This is because the E91 Co-Reference Assignment is making explicit the world view of the E39 Actor carrying out the assignment and this world view is expressed as such only within that specific knowledge base.

I'm not clear at what level of abstraction we are meant to interpret this statement.  The CRM is meant to be "a formal ontology which can be expressed in terms of logic or a suitable knowledge representation language".  So presumably "knowledge base" in the above text is meant in an equally abstract sense.  Surely, in that abstract sense, every assertion that any Actor makes ought to be expressed in terms which they understand, i.e. terms from within their "specific knowledge base".  Otherwise they are just blathering on about stuff of which they know naught.  And if all we are saying is that assertions should relate to entities which fall within the knowledge base of the Actor making them, then why is that so particularly true of co-reference assignments that the point has to be laboured here?  Why is it less true of, say, Type Assignment?

If, on the other hand, this statement is meant to be about Semantic Web systems, then (a) I would probably still disagree with the need for it, but (b) I would argue that it is out of scope in the core CRM document.

Outcome: 

In the 32nd joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 25th FRBR - CIDOC CRM Harmonization meeting, the sig, reviewing the text provided by Øyvind about co reference between information systems as well as the proposed changes to the existing definition, decided to withdraw  the co-reference statement from the text of CIDOC CRM 6.1 and  assigned to  Øyvind, CEO,MD, Arianna Ciula to revise the definition in order to introduce it to CRMinf.

Oxford, February 2015