Issue 417: begin_of_the_begin /end_of_the_end is excluded from time range?
Posted by Robert Sanderson on 8/5/2019
I admit I made the rookie mistake of assuming that the P81a/b and P82a/b properties followed the typical temporal pattern of an inclusive beginning and an exclusive end.
Or using interval notation: [begin_of_the_begin, end_of_the_end)
Thus if you know that an event happened sometime in 1586, the begin of the begin would be 1586-01-01T00:00:00 and the end of the end would be 1587-01-01:00:00:00.
However, http://www.cidoc-crm.org/guidelines-for-using-p82a-p82b-p81a-p81b seems to clarify that both are exclusive.
> "P82a_begin_of_the_begin" should be instantiated as the latest point in time the user is sure that the respective temporal phenomenon is indeed *not yet* happening.
> "P82b_end_of_the_end" should be instantiated as the earliest point in time the user is sure that the respective temporal phenomenon is indeed *no longer* ongoing.
And thus (begin_of_the_begin, end_of_the_end)
Meaning that the begin of the begin would need to be 1585-12-31T23:59:59 such that midnight on January first is included in the range, and the end of the end would be midnight of January first, 1587.
However, in the following paragraph it says:
> … e.g. 1971 = Jan 1 1971 0:00:00. Respectively, for “P82b_end_of_the_end” the implementation should “round it up”, e.g. 1971 = Dec 31 1971 23:59:59.
Which would mean that both ends were *included* in the range.
And thus [begin_of_the_begin, end_of_the_end]
Enquiring minds that need to implement this consistently would like to know which is correct ☺
Posted by Florian Kräutli on 9/5/2019
Not having read the guidelines as attentively as you I usually implement P82a/b suggesting that the begin and end date are both included in the range.
For example, here's the date related to a book published in 1586:
I think this is readable as a confidence interval of the book having been published somewhen in 1586, lacking better ways to express the level of accuracy in date datatypes.
Posted by Robert Sanderson on 9/5/2019
Thanks Florian, Nicola!
Should the example be updated (and thus we must all update our implementations) or the specification to match the example which everyone seems to do in practice?
My proposal would be to do the latter, in the face of the current ambiguity.
What has everyone else done in this situation? 3 data points is interesting, but still anecdotal.
(And I’m not going to mention leap seconds that would make the end of some years 23:59:60 instead of 23:59:59, which would be solved by an exclusive end date)
Posted by Florian Kräutli on 10/5/2019
I actually think that the text makes the right assumption. If something is said to have happened in 1586 we can be reasonablycertain that it happened before 1 January 1587. We can’t be certain that it did not happen a millisecond after 31 December 1586 at 23:59:59.
I think we should provide two examples. One that matches the text and the current one, mentioning that this can be done for ease of implementation.
Which version one implements is after all not the decision of the CRM, but depends on the available knowledge and interpretation of the source data.
Posted by Martin on 10/5/2019
Rob is right.
If we talk about seconds, it is somehow hunting flies. But we really need to test how databases interpret intervals given in dates.
The conversion to the begin of the year,day / end of the year,day should be done by the data entry templates, knowing that we instantiate an ..a or ..b property, and NOT manually. We have written such modules in the past for RDBMS implementations. Could be a standard S/W module. Would someone volunteer to provide?
Posted by Martin on 10/5/2019
In other words:
If date <= 1951 internally converts to dec 31, 23:59... 1951 , Florian's solution works out for querying things possibly having happened in this range. If it converts to jan 1, 0:0, it is wrong. To be checked how all 9 date queries work.
Posted by Martin on 11/5/2019
Sorry for answering in pieces. The "ultima ratio" for all we do are the queries, and not the entities. There are 9 possible questions about a time-span: Give me all events that (1) must have happened before event X started, that may have happened before event X started, that must have happened before event X ended, that may have... etc. If the last second of a day is included or not, is completely irrelevant for our purposes. If the end of the end of 1895 is interpreted as Jan 1, 0:0, 1896, the question is, how implementations will answer the above queries wrt 1896, and not, if the last second is in or out. I'll try in the next weeks to sort that out. I hope, different RDF databases will be consistent at least!
Best wishes and thank you for pointing to this issue!
In the 44th joint meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9; 37th FRBR - CIDOC CRM Harmonization meeting, following the discussion initiated by RS’s observation that the time interval selected by P82a/b seems to have both open and closed outer boundaries, the sig assigned MD and CEO to edit the “Guidelines for using P82a, P82b, P81a, P81b” document. The purpose of that is to make sure that (i) the text corresponds to the standard practices of treating time intervals as [,) (closed/open) by default and (ii) that the examples are consistent with the best practices documented in the text.
HW: MD & CEO are to edit the document “Guidelines for using P82a/b, P81a/b” accordingly.
Paris, June 2019
Posted by Martin on 12/10/2019
This is my answer 3rd and 4th paragraph:
The range of the properties "P81 ongoing throughout" and "P82 at some time within" are defined in the CRM as E61 Time Primitive. Instance of E61 Time Primitive are defined as closed, contiguous intervals on the natural time dimension in which we live. “Closed” means that the endpoints belong to the interval. “Contiguous” means that there are no gaps between the endpoints in the interval (which holds for “intervals” in general).
The reason to describe time spans with inner and outer intervals is the existence of a very efficient algebra for calculating resulting areas of determinacy and indeterminacy [Plexousakis et al.XXXX]. Further, they are motivated by the British MIDAS Heritage standards [https://en.wikipedia.org/wiki/MIDAS_Heritage] and easy to define in Relational databases.
Datebases necessarily use discrete (“granular”) values for time, be it whole years, days, seconds or even smaller units. Physical time is, to the best of our current knowledge, isomorphic to real numbers, i.e., continuous. Therefore, the only reliable way to integrate data about time from sources with different granularity is to normalize discrete values to intervals of real numbers. On a given time of platform, such as RDF triples, this means to normalize to the smallest granularity available.
Therefore, defining an outer bound for a temporal phenomenon may include selecting the outermost time granule within which the phenomenon may actually occur, because the outer boundaries of these granules form indeed outer bounds. Note, that no phenomenon occurs at an instance of time any way. Only human declarations of the validity of some regulations or laws are thought to begin at an instance of time. Anything falling into this validity interval again cannot be instantaneous.
Since the E61 Time Primitive of the CRM cannot be expressed in RDF directly, in the official RDF implementation of the CIDOC CRM, we define four properties replacing P81 and P82 adequately using xsd:dateTime.