Issue 382: where to stop documenting the provenance

ID: 
382
Starting Date: 
2018-05-22
Working Group: 
4
Status: 
Done
Background: 

 In the 41st joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 34th FRBR - CIDOC CRM Harmonization meeting, the discussions about the issue 367 raised a new Issue about best practice on epistemology of the knowledge base itself concerning  where to stop documenting the provenance. The sig decided to begin the discussions by email exchange.

Lyon, May 2018

Current Proposal: 

In the 42nd joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 35th FRBR - CIDOC CRM Harmonization meeting, MD and CM were assigned to start the discussion on best practices on epistemology of the knowledge base –regarding where to stop documenting the provenance. The aim is to arrive at a document which will have the status of a recommendation for using the crm.

Berlin, November 2018

Posted by Martin 20/10/2019

Dear Carlo, Eleni

Here my attempt:

Where to Stop the Provenance Chain

 

A guideline

A formal ontology is about “being”. It describes classes of individual items, properties and logical rules constraining their combinations that approximate at a categorical level how we perceive that certain things and phenomena of reality are and behave, including our descriptions of it. It describes “possible states of affairs”[1].  We require that these concepts are not only conventions between humans, but also sufficiently close to reality so that valid deductions about reality can be drawn from the ontology and  instances of it, obtained under theoretical, perfect conditions of observation. The deviations in precision and coverage (i.e. wrt exceptions) of the ontology from reality as an idealized, logical approximation should be understood and tolerable for the purpose of the respective research. Only things and phenomena of reality that behave close enough to the logical form of a formal ontology can usefully described by it.

We regard knowledge as justified beliefs of propositions X of a form that make sense in “I know that X holds”. Besides defining the proposition X as an expression of information, a human stating “I know that X holds” must be able to relate all classes, properties and identifiers (names) in such an expression with situations and individual things of the real world. Therefore only humans have knowledge.

A knowledge base in the sense of the CRM is an information object that instantiates the formal ontology with propositions that the maintainer of the knowledge base believes, i.e., regards it as “the best of my knowledge”. There are subtle, but substantial differences between registered knowledge and reality, because it includes contradictions, alternatives and uncertainties. The maintainer of the knowledge base may be an individual person or a team, trusting each other and sharing the same contextual knowledge of the world (see Doerr, Meghini & Spyratos 20

The maintainer of the knowledge base is its ultimate provenance, providing (or not) trust in the care and honesty of the described propositions. The maintainer should not appear in the knowledge base as propositions of provenance, but be described as metadata about the knowledge base as a whole, exactly as we do not repeat the author of a book in each phrase.

The knowledge may be direct or indirect:

Direct knowledge is that believed out of good, explicable reasons of observation or inferencing by the maintainers themselves.

Indirect knowledge is that that the maintainer adopts or refers to from other sources. In that case, the formal knowledge of the maintainer is restricted to the information as a formal expression and its provenance. The maintainer may or may not belief this information. Therefore, the knowledge base should contain the adequate propositions about its provenance (believed by the maintainer). The maintainer may express doubts about the correctness of this information, if indicated.

It should be possible to communicate with the maintainer and discuss justifications and possible corrections of errors.

Ideally, the source of indirect knowledge should contain further provenance statements about indirect knowledge its author has used. The ideal would be, to link all those provenance statements together until they direct us to all direct knowledge used. This is, of course, impossible, but nevertheless we have the means to document, increase and link our provenance knowledge to larger and larger chains, which will be extremely useful for validating and improving our overall knowledge.

The maintainer of a knowledge base may decide to document provenance of provenance, if there is no reliable digital resource to link to next statement in the provenancechain, or if a local copy of parts of the provenance chain appears to be useful.

 

[1] N.Guarino, …
 

Posted by Carlo Meghini on 22/10/2019

Dear MArtin

thank you for sharing these thoughts. I agree with what you say, I’ve made some really comments in the attached document, hoping that they can be useful to you.

I’ll be around until Friday, so if you wish we can discuss these things in person.

Posted by Martin on 22/10/2019

Dear Carlo,

Good comment! Attached my improvements. Please chec

In the 45th joint meeting of the CIDOC CRM SIG and SO/TC46/SC4/WG9; 38th FRBR – CIDOC CRM Harmonization meeting, the sig reviewed the HW by MD and CM (best practices document on the epistemology of the knowledge base) and took the following decision.

DECISION: Since the text by MD and CM covers (to a great extent) the same conceptual space as the document generated in the context of Issue 297: What is Knowledge base / Knowledge Creation Process (now appearing as the definition of Knowledge Creation Process in the Terminology section of the CRM model), it was decided that the two texts should be compared for overlap, make sure that they’re consistent with one another and possibly get merged into a new text. The new text is to appear under Best practices.

HW: SS is to compare the two texts to check that they’re not in conflict and merge them into one. He is also to go into more depth regarding implementation guidelines (also incorporate the problem of endlessly iterated provenance statements).

 

Heraklion, October 2019

Posted by George on 19/10/2020

Dear all,

There was a HW to review the text proposed for this issue. I have done so. It is in the google folder linked here. 

https://drive.google.com/drive/folders/1_TWuqY8tKsowFSWta-T26WEvNaCMNeFu?usp=sharing

Look forward to discuss this week. My overall point would be that I'm not sure what the case for this document is in the first place. It does, as the issue points out, seem very close to the knowledge creation text already in the specification. If it is to be a guideline, then I would have more examples (in CRM). I think it is a nice text just wonder where it would work well. Right now it seems more a theoretical point of view than practical guidance. I don't see it conflicting with the thoughts of the existing text in the specification document about the knowledge creation process. I wonder if we could link this to an issue on named graphs and how practically to implement these (in general and with regards to documenting provenance). 

 

Posted by Martin on 19/10/2020

Hi George,

I agree. The text about knowledge creation in the new introduction is indeed a sort of principled answer to the question. I think a guideline about Named Graphs is a good idea, as a follow up to the PDF guidelines.

In the 48th CIDOC CRM and 41st FRBR CRM sig meeting (virtual),the sig made an  open end discussion to establish whether the issue needs be further pursued or dropped. A text has been put together by MD and CM that serves as a guideline to the end user on  where and how to include provenance information as regards the statements that are held in the knowledge base. The sig proposed the text to be kept as a starting point and be revised accordingly. It should be cross-checked with the RDF implementation document. Add diagrams based on a toy-example to demonstrate when provenance statements are necessary, the knowledge of the maintainer and how provenance statements are chained. Maybe incorporate ULAN entries. 
RS: example of a ULAN entry in CIDOC-CRM https://data.getty.edu/vocab/ulan/500027372
NEXT STEPS: 
(1)    Talk with the Getty about getting examples from ULAN,
(2)    create named graphs (either in x3ml or by hand –HW: ML; consult MF, NC)
(3)    put together a task force to work on the text (add examples etc.): 
         a.    FORTH: MD
         b.    Takin: GB
         c.    Getty: RS/GB
         d.    CHIN: PM
         e.    SARI: ML, NC
         f.    Uni Köln: OE (can support in terms of critique examples, chains of provenance)
(4)    Revisit the issue in the next sig meeting 

 

October 2020
 

In the 57th CIDOC CRM & 50th FRBR/LRMoo SIG Meeting, the SIG discussed how to proceed concerning this issue. 

MDs text has been reviewed by GB and it now has many unresolved comments. The intention was that the text be reviewed by a group of people (not just GB), that any comments-feedback would get incorporated in it, and it would be brought to the SIG to be voted on and published on the website. 

Decision:  
Contact the people involved in the HW (MD, GB, RS, PM, ML, NC, OE) and ask them to finalize the text in time for the spring 2024 meeting. If they do not, then the issue will be closed. 

Marseille, October 2023
 

Outcome: 

The document in its final form can be found here.