Issue 429: P72 has Language

Starting Date: 
2019-08-24
Working Group: 
3
Status: 
Proposed
Background: 

Posted by Thanasis Velios on 24/8/2019

Dear Maria,

I think the complexity is from the fact that Language in the CRM is modelled as a Type, i.e. to be used for classifying things.

How about creating an E7 Activity to describe learning a language and then use P16 used specific object → E56 Language?

Or create a separate authority list of types of people: "greek speakers, english speakers, welsh speakers, etc"

Any other thoughts?

All the best,

Thanasis

On 23/08/2019 17:17, Maria Jose de Almeida wrote:
> Dear all,
>
>
> As some of you may know, I’m working in the Portuguese National Archives an we are building a new data infrastructure using CIDOC-CRM for archival description.
>
> When describing biographical information it’s common to state that some person was fluent in some language, or languages, apart from his/her native one. Using current archival descriptions standards [ISAD(G) 3.2.2; EAD <bioghist>] this is represented within a text, usually a very long text string with information of distinct natures. So far we have been able to decompose the different elements and represent them adequately as instances of CIDOC-CRM classes and link them trough the suitable properties. But we are struggling with this one...
>
> We cannot link a Person (E21) to a language (E56) and neither use multiple instantiation, as it has been suggested in other cases (http://www.cidoc-crm.org/Issue/ID-258-p72-quantification), because Person (E21) and Linguistic Object (E33) are disjoint.
>
> The only way around I can think of is to consider someone’s speech as a linguistic object and state that that person participated in the creation of that linguistic object.
>
> But it seams a rather odd solution as we would have to crate individuals for someone’s speech in Portuguese, in French, in Russian, etc. and describe them in a very broader manner. Because when it is stated that a person is fluent in any of those languages, typically what is meant is that that person could interact with other speakers of the same language, mainly trough an oral discourse, or read written documents. Not exactly the same as creating documents in a foreign language, situation which is much more straightforward to represent.
>
>
> Any thoughts that may help us?
>
> Thanks!
>
> -- 
> Maria José de Almeida
> Técnica Superior
> _Direção de Serviços de Inovação e Administração Eletrónica
> _Telefone (direto): 210 037 343
> _Telefone (geral):  210 037 100
> _m-jose.almeida@dglab.gov.pt <mailto:m-jose.almeida@dglab.gov.pt

Posted by Franco Niccolucci on 24/8/2019

Dear Maria, all

the problem comes from the fact that the CRM usually models what humans DO, not what they ARE. To model the latter, it is therefore necessary to introduce an event in which the person participates, as Thanasis suggested. What he proposes is correct, but considering a language instrumental to the activity of learning it sounds a bit awkward to my ear: common sense would consider so a handbook, an app, a teacher etc. 
Also, such activity may be problematic with native languages where an intentional action (= activity) is difficult to attribute to a few months old baby.

From your description I believe that you are interested in documenting the factual knowledge of a language, not that/how it was learnt, so I suggest the following approach.

In this specific case you might use membership in an E74 Group, similar to what is suggested in the scope note of E74 for ‘nationality'. Thus you would have very large groupings of speakers of different languages, and speaking one of them would correspond to being member of that specific group, e.g. 
Maria P107 is member of E74 Group 'Portuguese speakers’. 
Incidentally, this option would also enable you (if you wish) to distinguish among the levels of knowledge of that language via P107.1 kind of member E55 Type ’native speaker’. Thus, also the following would hold for you: Maria P107 is member of E74 Group ‘English speakers’, but with P107.1 kind of member E55 Type ’second language speaker’. Further flexibility can be introduced with this P107.1 if required, like “writer”, “translator”, etc. 

 

Current Proposal: 

Posted by Franco Niccolucci on 26/8/2019

Dear George, all

I think that there are two issues (at least) here.

The first one concerns the identity criteria of this class. This discussion started from an issue related to the latter. In this case the grouping of English speakers, for example, is identified as “those people whose bio states so”. It does not matter if they really speak/spoke English of not, this concerns the veridicality of their bio, which is another story.

So the grouping of English speakers is precisely identified. This is not always the case.

This issue is a particular case of a more general issue concerning fiat vs bona-fide objects, to use the terminology introduced by Smith and Varzi about geographical (but not only) objects. As you may remember, fiat ones have precise boundaries, bona-fide don’t. For groupings, belongingness has the same alternatives, and in most cases what we may call “fiat belongingness” is based on a formal definition, like a listing, mathematical criteria, a decree and so on. There are thus groupings for which it is easy (feasible?) to assess belongingness, others for which it is not, others for which it is unclear. The crm-sig mailing list is an example of a fiat group defined by listing, as is the group of the citizens of Italy at the time I am writing this email, defined by the law and recorded in the civil registry. 
Nationality - mentioned in the E74 scope note - could belong the uncertain case: if you consider nationality as the formal status of being citizen of a country, it is a fiat criterion. But there may be cases in which the nationality may be uncertain. I don’t want to make examples of today as they may be politically sensitive, but if you had asked in 1861 to people from Venice their nationality they would answer “Italian” although their formal nationality was "Austro-Hungarian”. Thanks to the principle of self-determination, the number of such cases is much rarer today than it was in the 19th century, with a few notable exceptions that we all have in mind. However, 99.999% of the cases refer to formal nationality so the above is just a pedantic discussion.

Language(s) spoken is much more difficult to assess: what turns the bona-fide boundary between speakers and non-speakers into a fiat one in this case? A certificate issued by a school? Self-assessment? I think that the case that raised this discussion may be easily solved as I mentioned above. But I would be cautious to use it in other cases.

For the second issue, modelling this grouping as an E74, I understand George’s concern about the use of E74 Group, which is a subclass of E39 Actor and thus is required to “[collectively] have the potential to perform intentional actions of kinds for which someone may be held responsible”, what seems doubtful for speakers of a language. In my opinion this requirement for intentional actions could be considered in a very broad sense; for language, avoiding sexist terminology in English could be an example - stretching it a bit, I admit. But otherwise, how can we model collectivities like this one and others such as “archaeologists”, “Buddhists” “Real Madrid fans” etc ?

Finally, George’s proposal is nice but addresses only the language issue and not other groupings/features of the same type, i.e. collectivities based on some common characteristic, but not required to be able to collectively perform intentional actions, for example illiterate people.

 

Posted by Robert Sanderson on 26/8/2019

Dear all,

I agree with the concerns about modeling the activity of learning a language as a substitute for the ability to communicate in a language.  On paper I have a Ph.D. in French, so surely I’m fluent? Far far from it, as you doubtless noted in Paris! I also agree that modeling as a Group is problematic for the same reason as modeling gender as a Group – the requirement for concerted action. Finally, I agree with Franco’s concern about the narrowness of the scope to only Language. We also have information about the skills and knowledge of individuals or groups such as Techniques employed.

I would not want to model a complete skills management HR system (or video game!), but having some pattern for expressing relevant, observed abilities would be valuable for searching. Use cases would include:

    Search for Human Made Objects (HMOs) not classified as Paintings, that were produced by an actor that is known for their ability in a painting technique.  (e.g. drawings by Van Gogh)
    Search for HMOs that carry a text in a language that is not known by the owner of the object (e.g. manuscript in latin owned by someone not known to speak latin)
    Search for possible attributions for a text in a known language, filtering for people known to speak that language.

 

In terms of solutions, we might consider:

    A super-class for Group (Set?) that allows non-Persons to be aggregated, and does not have the intentionality of action requirement.
        This would enable further modeling patterns, beyond Group and Curated Holding.
    A property similar to George’s suggestion that has E55 Type as its range to include Technique or other types.
        This would enable more specific recording of skills of an Actor without implying any particular event 
    A broad usage / known for activity without times more precise than the life dates of the actor that encompasses all uses of the language.
        This would enable adding timespans when known, and perhaps be a pattern for other similar information such as when a person is known as an author, but is also a painter

We are also modeling our archives at the moment using CRM – it would be very interesting to compare the results, as there are several issues that we do not have a solution for that we are particularly happy with.  The major area of concern is the association of properties not at the item level, but at the aggregate level meaning that some members of the set have this property. When this can be expressed as data rather than just descriptive text, we are worried about the false precision. The collections include both digital and physical objects, which compounds the issue.

 

Posted by Franco Niccolucci on 27/8/2019

Steve,

something for your breakfast tomorrow morning. 

“Knowing” a language is not the same as “using” it. The case started from documentation stating that somebody knows a language, but not reporting any use, which is just potential but not necessarily actual. For example, I know Latin pretty well, but I have very few - if any - opportunities of using it; on the contrary, I do not know Japanese but sometimes say “sayonara” and “arigato” appropriately. In these Portuguese archives I would be correctly recorded as “Latin speaker” but not as “Japanese speaker”.
Your solution instead refers to “using" the language as implied by P16 and would state exactly the opposite.

I share with you the hate for negative searches, for the reasons you clearly explain.

Bene valeas placideque quiescas, Stephane (*)

Francus

(*) in order to enable you in using P16 for my knowledge of Latin

Prof. Franco Niccolucci
Director, VAST-LAB
PIN - U. of Florence
Scientific Coordinator
ARIADNEplus - PARTHENOS

Editor-in-Chief
ACM Journal of Computing and Cultural Heritage (JOCCH) 

Piazza Ciardi 25
59100 Prato, Italy

> Il giorno 26 ago 2019, alle ore 23:32, Stephen Stead <steads@paveprime.org> ha scritto:
>
> Just thinking about this after an interesting game of Railroad Revolution.
> It strike me that it might be useful to consider language as a Conceptual Object and an Actors use of it would be an instance of E7 Activity P2 has type E55 Type {Communication} P16 used specific object E28 Conceptual Object.
> E55 Type {Communication} could be sub-divided into written, spoken, reading etc as necessary.
> The other stuff that Rob mentions is rather different and at first glance looks a lot like the floruit from FRBR which became F51 Pursuit.
> I am concerned about building optimisations of properties that are intended for making searches about negative things like “not known to speak Latin” as this is a nasty place to be: absence of Knowledge versus knowledge of absence……
>  
> Use of a technique is that also the use of an immaterial object?
>  
> Anyway off to bed now. Very interesting question
> TTFN
> SdS
>  

Posted by Athina Kritsotaki on 27/8/2019

Dear all,

I agree with Christian-Emil, especially for the P2 has type property, as a simple solution for the cases that we don't have enough information to infer this capability or cases lacking temporal information - it reminds me of the issue 277 and the example of the artist 

Posted by George Bruseker on 27/8/2019

Dear all,

Sticking to the question of documenting when we have information that someone knew a language or had a skill in a technique, I reiterate that I believe really need a new property and not to use p2 has type. 

p2 has type is a good solution for classifying a kind of phenomena or for specializing a class when it does not require a new relation in the ontology. It's a very useful tool but it does not work for what we need to document here.

The semantics of saying that someone had knowledge in a language can indeed be interpreted as E21 Person p2 has type E55 "English Speaker". It could not, however, be typed E21 person p2 has type E57 Language "English". Why? Because the E57 Type classifies the phenomenon of language not of people. The E55 is relative to the phenomenon it classifies/specializes. People are not language nor vice versa. One of the things we would want to make possible in linking an E21 person to an E57 Language is to create consistent and potentially serendipitous relations between an instance of person and an instance of language. (As one of Rob's examples: the work used E57 English and the person who encountered it E5 was a knower of E57 English, ergo, they could but did not necessarily read it!) This would not be facilitated by saying E21 Person p2 has type E55 "English Speaker" because there are no given semantic connections between the instance "English Speaker" which classified a person as a kind and the instance E57 Language "English" which classifies linguistic phenomena. 

The semantic intent, I would argue, in the schemas that document fields like language and technique is often not to say that this person is of type "English Speaker" or even of type "Painter" but that they have/had knowledge of English (linguistic phenomenon) or Painting (technical phenomenon). Because someone knows or uses a technique does make them someone who would generally be classified (with regards to official documentation) as being an exemplar of that language/technique. So as Rob is not necessarily a 'French Speaker" though he knows French, George is not necessarily a Painter, though he may have a knowledge of painting notable enough to document. (It is counterfactual, I don't even have this knowledge but for lack of a better example)

Human beings are an objective phenomenon that can be witnessed and have certain behaviours and potentials which other phenomena do not, one of which includes the ability to know. We should be able to document this objective phenomenon because it falls within scope. The kind of knowledge in question is not an act of knowing (temporal) but the result of having learned and now acquired a new understanding which allows the human being to act in the world in a new skillful way in certain situations. This knowledge remains, more or less present, in the knower without any particular activation once they have acquired it (forgetting and rustiness not withstanding). It is simply one of their properties. 

It's all a long winded way of saying that we need a relation between E21 Person (at least) to indicate that they have a knowledge. There should be a binary property for this (which could then be extended) which allows one to make the simple statement, A knows B. This would not be a sub property of P2 has type, but a new property. I'm not sure if it would have an existing superproperty. My original suggestion would be to stick to language and then go for a super class, although the question of technique also arises. 

The other issues Rob and Franco raise about documenting fiat groups/sets are very important but perhaps we could make them another discussion and issue (when it comes time to formulating something particular for voting on at SIG). 

About the idea of making language a conceptual object, I think we would have to have a lot of discussion and reflection on that, because it seems like a large metaphysical issue. Language is obviously very particular to human being, Aristotle called us the rational (logos) animal. But it is not clear that logos is the invention of human being or that it can be said to be something that we can use in a utilitarian way like a pot or a mould. It seems more like a medium through which certain types of communicative act can/do occur. Anyhow, also a fun discussion but I think having a E21 person "has knowledge or / was use of language" X property could be a modest first step that is semantically robust to a real use case and can be extended by further modelling without likely breaks to monotonic development.

Posted by Christian Emil on 27/8/2019

Dear all,
To be my own devils advocate:
There is another interpretation of P2 has type as "has former or current type” analogously to the properties P49 has former or current keeper, P51 has former or current owner, P55 has former or current location.  In this way P2 has type does not necessarily describe a trait, but a temporal property (in this case) of a person. We simply don’t know (or care to know) at what time the person had this property. I assume this was the original intention of the class.   From this interpretation it should be ok to use the property P2 has type to model passive (understand) and active (speak/write) knowledge of a language.   
It is more tricky to model the time a person got a property. Attribute assignment can be used but will model when somebody assign the property to the person and not when the person got the property. The information in an archive, e.g. “x” is fluent in Latin can be modelled by E13 Attribute assignment giving information as an observation. Examination with grade is also an E13 Attribute assignment. 
As mentioned in this discussion the FRBRoo class F51 Pursuit (subclass of E7 Activity) is a possible candidate for describing skills although this class require that the person actively uses the skill?

Group again:
I have reread the issues involving the scope note of E73 Group. In 2003 the first sentence of the scope note was 
“A group is any gathering of people that acts collectively or in a similar way due to any kind of social bounds or contact”
And the current is 
“This class comprises any gatherings or organizations of E39 Actors that act collectively or in a similar way due to any form of unifying relationship”
The formulation “act collectively or in a similar way” is (intensionally?) vague.  Is it possible to say that a persons with skills in the same language “act collectively or in a similar way” an can “gathering” be used about persons spread out on different places on the globe? I cannot find such an extended meaning in OED.
Maybe we should NOT use E72 Group as a general relationship model?

Posted by Thanasis Velios on 27/8/2019

Dear all,

The fact that our documentation systems document a direct relationship between language/technique and person does not mean that a direct relationship is needed in the CRM (we have many examples of direct relationships in documentation systems which do not exist in the CRM, e.g. "the date" of an object). The act of speaking using a specific language or painting using a specific technique can be modelled through the respective activities (and I would argue that agency is not limited by the person's age, i.e. some babies are certainly determined to do things). Modelling through activity allows searching based on the language of a text in relation to the language spoken by a person during events of a specific time-span. I think the queries that Rob mentioned can be answered that way.

I appreciate the difference between the activity of speaking using a language and the mental capacity of a person to speak a language or to hold any knowledge. I am not sure there are any research questions that cannot be answered by modelling this through using activity. To model knowledge is to model deduction for which we already have necessary classes. 

Posted by Christian Emil on 28/8/2019

Dear George & all,

Your text and sketch of a solution is indeed interesting. I agree that (natural, human) languages is a special case. Animals are currently not in the scope of CRM. I also agree that there is (currently) no links between an instance 'English (language)' of  E55 Type and an instance 'speaker/writer ofEnglish (language)' of  E55 Type​. Should such a connection be in the type system (in the fringes or outside CRM)? If we introduce a new property from E21 Person what is the range,  E55 Type? 

Posted by George Bruseker on 28/8/2019

Hi Christian-Emil et al.,

Regarding language in particular, my argument would be to make a new direct binary relation something like E21 Person pxx 'knew language’ E56 Language. 

This relation, to my mind, would be parallel to E18 p45 consists of E57 Material

There is indeed an event which we normally don’t know anything about (nor have a research interest in) of learning a language, which leads to the instance of E21 Person having a constitutional change in knowledge (Aristotle called it Hexis) whereby they then know a language. I believe this change in knowledge state is not something that changes the being of the individual as such (primary quality) which is what p2 has type would indicate but only creates a modification in the secondary qualities of the person. 

To loosely parallel existing CIDOC CRM modelling, a production event creates an object. In creating it, materials are used and it creates a new instance of Human Made Object. This instance of Human Made Object now consists of an E57 Material like ceramic. So qua what it is made of we say p45 consists of, qua what it functionally is, we say that it p2 has type ‘jug’ for example. p45 is not a sub property of has type because the relation is not one of “being" the material but rather having the substance of material x.

Regarding time problems, the instance of E21 Person did not always know the language. That being said when we declare a relation like ‘knew language’ we state that it was the case that there was a moment of the existence of this E21 Person where the person had the knowledge (had the hexis) of knowing x. It is actually true for the whole lifetime of the entity that at sometime it knew language x just in case in real life at sometime in its life it knew language x. 

I think that in the interest of not endlessly filling up CRMBase, it might be better to put such an addition into CRMSoc. The above suggestion does not mean to argue that we couldn’t or shouldn’t also model learning events or use events with regards to language but rather that there is a basic function that is ontologically correct to assert that a Person knows a language which fits a real world use case. 

 

Posted by Franco on 28/8/2019

George,

OK with me, but you should explain why knowing a language has a superior status compared to other abilities like

- making vases
- driving vehicles
- painting
- computation (I am particularly passionate about this one)
- properly defining new classes/properties in the CRM
etc.

It seems to me that (speaking/knowing/using) a language is just one (very important) human skill among many, so I would rather consider a broader class, say Exx Skill, one of which skill types is "knowing a language", and then use something like

E21 person Pxx has skill Exx Skill P2 has type E55 “speaking language” P2 has type E56 “EN”; 
as well as: 
E21 person Pxx has skill Exx Skill P2 has type E55 “computation” P2 has type E55 “four basic operations”. 

I leave to you to correctly place Exx Skill in the CRM hierarchy, maybe a subclass of E28 Conceptual Object. 

I would also be grateful if you are able to point me to a clean and comprehensive description of CRMBase which you refer to in your last sentence. 
Due to my ignorance, it looks to me like the Phoenix that, in the words of Don Alfonso in Mozart’s 'Così fan tutte’, “everybody says it exists, but nobody knows where it is” (a nice performance here: https://www.youtube.com/watch?v=73rY81pT5Wk).

Posted by George Bruseker on 28/8/2019

Dear Franco et Al.,

Actually I have no argument against skill, it would be a similar pattern. My worry would be about making either a language or a skill a conceptual object. I see the reason behind the proposal but I'm not sure especially about language being conceived of as a human made object in the crm sense. There I think we more refer to an intentionally created intellectual object with discrete boundaries. I am not convinced this is an appropriate apprehension of language. We come to be in language and reproduce it. A great genius may change it. Mostly it just happens to us and we do not employ it as an object nor are we intentionally aware of it. No one made it. It forms a sort of horizon for communicative axtion.  By using type we avoid the controversy. Skill I would associate with texne in the sense of craft. Craft could be conceived as closer to something like a crm conceptual object but again a craft seems to go beyond any one person or group qua invention so i would find it more comfortable to think of as a type.

So if skill were also to be modelled maybe another binary property

E21person had skill exx skill 
Exx skill isa e55

And then perhaps some super property of both

E21 had knowledge of e55

I would find the ability to express both of these in crm an extremely useful addition instead of creating ad hoc solutions per project.

As to the base issue, I don't mean anything fancy, just whatever is in the basic standard and not a family model. 

Will check out the demonstration vid when back to WiFi!

Posted by Christian Emil on 29/8/2019

Besides the fact that language is a very central "skill" of humans which making vases definitely isn't (there is a book called the articulated mamal about psycolingustics, I have never accountered a book called the vasemaking mamal),  skill in given languages seems to be documented in many documetnation systems. According to the priciples of CRM development we should always base our model on real examples from museum documentation.

Posted by Martin on 23/9/2019

Dear All,

I support Christian-Emils proposal. May I remind you, that the CRM has become very stable by a careful bottom-up process. Introducing super-concepts in a rash to cover the whole world is desastrous. If any, we model first language skills from documentation, and then add other skills one by one. Please also compare to the FRBRoo model about extended activities. We describe a potter by doing pottery, not by having the potential. The skill is a potential. CRM is evidence oriented. People speaking a language do it. How do we assess the Skill?? Aren't the events enough? (Exams, professions, use...)

Posted by Franco on 23/9/2019

Dear Martin

The discussion started from “speaking languages” confronted with “language” as defined in the CRM. 

An example concerned speaking dead languages, like Latin. I can speak Latin, but never do it and probably will never do it in the future: although I could, if I so wished. People in the Vatican City do speak Latin every day, as it is the quasi-official language there (complicated story, see wikipedia for details). One of my colleagues had her first conversation in Latin at the age of 25, with her current husband, as it was the only foreign language both knew. So sometimes people capable of speaking a language never do it, or do it after many years they learnt it, or do it every day. 
According to your approach, you would qualify the priest in Vatican as “Latin speaker”; my colleague also as “Latin speaker” but only since the age of 25, when she magically acquired this qualification uttering some sounds in the language of the Romans; and myself “not Latin speaker”.

About assessment: how do we assess the potential of an Actor: "This class comprises people, either individually or in groups, who have the **potential** to perform intentional actions of kinds for which someone may be held responsible.” 
So they are Actors even before doing anything or being in the position to do so, i.e. before any fact that can support assessing their capacity: just because the can “potentially" do it. Like myself with speaking Latin.
Are they Actors since they are humans? Not really, because there is people “unfit to plead”, i.e. legally not responsible of their actions because of some mental infirmity. Nevertheless I would call them still Actors as they can do actions - without being responsible for what they do.

In the particular case that raised the issue, the language knowledge (skill?) of people was reported in archive documents, which to me seems enough to assess these people's capacity. The same documents probably did not state if the people had ever spoken the language they reportedly knew. So an assessment is possible even it is not factual.

Last, but not least: is there any difference between (being able of) “speaking” and “reading/writing”? I believe “speaking” is just shorthand for any of these... but what name would you give to the capacity of speaking or writing or reading or any combination of these XXXX - I can’t call them skills 

Best

Franco 

PS I attach a short statement voiced in Latin hoping to upgrade from “non-Latin speaker” to “Latin speaker”, as of today.

Posted by Martin on 13/10/2019

Dear Franco,

I agree with all you say, but  I think you misunderstood me.

My approach is not documenting skills. My approach is documenting facts, rather than potentials. I take notice and may document that you spoke Latin, as I have done last time at school. I have a document stating my grade in Latin at high school.  My grade at high school confirms a set of years of continued successful lessons, not that I could understand much Latin now.
Speaking a language can be documented as an extended (observed) activity, as in FRBRoo. For instance, someone writing books in particular language. This falls under any kind of extended activity not further specified, such as an artist using a technique for some time, and avoids transforming actual activities into potentials.

We can document someone's documented opinion about a potential of a person, as an information object.

In the "Principles for Modelling Ontologies" we refer:
"7.2 Avoid concepts depending on a personal/ spectator perspective"

This could be elaborated more. In the CRM, we do not model concepts "because people use them", but because they can be used to integrated information related to them with URIs.  Therefore, your arguments and what I wanted to say is, "skill" is a bad concept for integration. What should be instantiated are the observable activities, which may or may not indicate skills.

The CRM does not make any statement that forbids describing concepts not in the CRM.

I kindly invite you to improve our principles document respectively

You write:
"So they are Actors even before doing anything or being in the position to do so, i.e. before any fact that can support assessing their capacity: just because the can “potentially" do it. Like myself with speaking Latin. Are they Actors since they are humans? Not really, because there is people “unfit to plead”, i.e. legally not responsible of their actions because of some mental infirmity. Nevertheless I would call them still Actors as they can do actions - without being responsible for what they do."

Of course.

This discourse applies to all concepts in the CRM. In dozens of tutorial we have stated, that all real-life concepts have fuzzy boundaries, and that scope notes to do not constitute sufficient logical conditions, but are sufficient to make users understand what we want to talk about. The argument you give is irrelevant, if the errors it may produce do not affect the recall of integrating relevant facts with the CRM. The CRM is not made to produce reliable AI results. Reality is not isomorphic to any logical model.  "Unfit to plead" does not change the fact that the kind of action is one a normal human can be made responsible for. How many "unfit to plead" appear in our cultural historical discourse, and would invalidate any of the properties we use? We indeed need to estimate if exceptions prevent us from finding relevant connections with CRM-based integration. For reasons of recall, we allow all concepts to include a few individuals unfit for the required properties.

You can continue if a chair with a broken leg is still a chair etc. If have not seen any concept free of fuzzy boundaries.

In the CRM, we do not define concepts as being the domain of a property, because then the primitive is the property.

I kindly invite you to improve our principles document respectively
 
to avoid such criticism at each concept, but much more to teach these principles effectively, or to provide better scope notes if there are any.

See also
"5.4 Model domains and range or properties consistent with your level of knowledge of the domain of discourse"

Posted by George Bruseker on 14/10/2019

Dear Martin,

Which is CEO’s proposition that you support? It gets lost in the string. Do you mean that a) a person speaking a language means being part of a group, or b) using the p2 on E21 and then make types for ’Speakers of...'

I am (still and very much ) a supporter of a new property ‘knows language'. I do not think that the group solution works because of the identify criteria of groups. I also don’t think the event solution is necessary (another suggestion that has floated in this conversation). It is often the case that for person we do not know events of their acquisition or use of language or a skill but we do have proposition that they had that language or skill! I also don’ t support the ‘English Speakers’ type solution since it provides a different URI than the URI for ‘English’ and forces more, obscure, modelling.

Another CIDOC CRM principle is model at the level of knowledge that is typically present in information systems. Again, I think the present case (people know languages) is identical to the case of 

E22 consists of E57 Material

This is a typical piece of knowledge held about an object. It would be obtuse to insist that one should create an event node to indicate the manner of this material becoming the constituting material of the object when we don’t know this fact. This is why CRM represents such binary relations, because they are real, they are a level of knowledge and they are observable.

If someone has entered into an information system George: English, Pot Making, it is unlikely that what they want to reconstruct are instances of me using English or performing Pot making. Rather they are interested that there is an individual which has a particular formation which means that he knows language x, knows skill x. This information is probably used in an actual integration to connect an instance of E21 via an instance of E57 Language to for example E33 that use the same E57. 

It would seem we need some sort of hierarchy in the principles which can also be conflicting.

>
> My approach is not documenting skills. My approach is documenting facts, rather than potentials. I take notice and may document that you spoke Latin, as I have done last time at school. I have a document stating my grade in Latin at high school.  My grade at high school confirms a set of years of continued successful lessons, not that I could understand much Latin now.
> Speaking a language can be documented as an extended (observed) activity, as in FRBRoo. 

It may be, but is it typically? I have never seen an information system, especially in museum context that would. 

> For instance, someone writing books in particular language. This falls under any kind of extended activity not further specified, such as an artist using a technique for some time, and avoids transforming actual activities into potentials.
>
> We can document someone's documented opinion about a potential of a person, as an information object. 

That would make this information mostly unusable however. If our goal is to functionally use the observation person x speaks language y, then it needs to be semantically represented and not made a string. 

>
> In the "Principles for Modelling Ontologies" we refer:
> "7.2 Avoid concepts depending on a personal/ spectator perspective"
>
> This could be elaborated more. In the CRM, we do not model concepts "because people use them", but because they can be used to integrated information related to them with URIs.  Therefore, your arguments and what I wanted to say is, "skill" is a bad concept for integration. What should be instantiated are the observable activities, which may or may not indicate skills.

I don’t see that this principle applies. It is not a personal perspective that someone speaks a language, anymore than it is a personal perspective that an object is constituted of a material. This fact can be documented and observed. Someone else can come and do the same. Don’t believe Franco can speak Latin? Watch him and see if he can. When someone writes in an information system, they probably typically mean, some evidence leads me to assert Person y knows language y. They do not mean to say at some point in the past he learned it, or at some point he performed it.

In the case of documenting that someone knows a language this can be used practically to integrate using URIs just in case we use the same URI for English that we use to describe a document and that we use to describe the knowledge of the individual

E21 knows language E57 Language URI:AA
E33 has language E57 Language URI:AA

answers the query, who in this graph knew the language this document was written in.

Functionally, the issue for me  is, is there a good reason against adding a binary property off of person which can indicate their knowledge ability and connect to a well known URI for a language.

Posted by Martin on 14/9/2019

Dear George,

The first principle of all is are there relevant queries that need that property for integrating disparate sources, which indeed provide such data, and is that research one we like to support with the CRM?

Second, using p2 on E21 does the job, doesn't it? What is the added value of "knows language"?

Next principle, keep the ontology small. Querying 1000 properties is already more than anybody can keep in mind. Each additional property has an implementation cost. We need strong arguments for relevance.

It has been the mos t important success factor of the CRM to keep the ontology small and still expressive enough. If we loose this discipline, we will loose the whole project.

Finally, we are not repeating in the CRM the way typically information systems document, but always tried to find a more fundamental representation. With that argument, we could never have introduced events. They did NOT appear in any of the typical systems at that time. It is a principle not to model all the valuable description elements, which are relevant to characterize an item, but not creating interesting links across resources.

I did not say that it is a personal opinion that someone speaks a language. I said, this is observable. I document: Franco has spoken Latin, repeatedly? But talking about skills, is another level, it introduces a quality, which is hard to objectify, as Franco has pointed out. Actually, it is a typical classification problem, with all its boundary case questions, and the CRM is about relations between particulars.

So, what is the added value against p2, and what are the typical research data and typical research questions for integrating such data, that cannot be answered with P2?

Posted by Martin on 14/10/2019

Dear George, All,

As a second thought:

I think documentation formats such as LIDO are an adequate place to add such useful properties to characterize items in a more detailed way, we would not put in the CRM analytically. Shapes, colors etc. being typical examples.

Question: Are there formats from the archival world that use to describe the languages people speak? EAD CFP?
Libraries are interested in the languages someone publishes in, not speaking.

What are the anthropologists registering? Would they be interested in languages learned at school, or rather in the language used for communication in a typical group? Would they document people being incapable of communicating in that group?
Or just infer language via group?

How to distinguish native speakers from non-native?

Would historians make cases of people that could not communicate in a given language, with societal effects?

What about illiterate people? Speaking, not writing...? Maintaining oral history with great precision, etc.

What about creoles ? 

Posted by George Bruseker on 14/10/2019

Dear Martin,

Well I am curious to know if I am the only one who finds this property useful at a high level for describing humans related to CH (presumably an important factor), or if I have truly not understood something.  So some comments below.
 

>     The first principle of all is are there relevant queries that need that property for integrating disparate sources, which indeed provide such data, and is that research one we like to support with the CRM?
>
>     Second, using p2 on E21 does the job, doesn't it? What is the added value of "knows language"? 

The has type solution has the following problem.

E21 p2 has type E56 Language does not work

It does not work because no E21 is a language. But we have already set off as a major category of documentation in the CRM an entire class for languages. It is called E56. So one would expect information around language to aggregate around... language or E56.

The solution: 

E21 p2 has type E55 English Speakers

is not satisfactory because you cannot make any connection between the type E55 English Speaker and any instance of language. You would have to model this relation, which would be an indirection, which would make a simple query people generally might want to make in a cultural heritage system difficult to make.

E21 knows language E56 http://vocab.getty.edu/page/aat/300388277 

would allow the user to connect some instance of E21 to E56 in a semantically clear way and that instance of E56 would be the same one they would use for talking about other objects in the graphs, thus aiding reasoning and helping in connecting things.

>
>
>     Next principle, keep the ontology small. Querying 1000 properties is already more than anybody can keep in mind. Each additional property has an implementation cost. We need strong arguments for relevance.
>
>     It has been the mos t important success factor of the CRM to keep the ontology small and still expressive enough. If we loose this discipline, we will loose the whole project.

In CRM Sig we decided to create a new extension CRMSoc in order to document social factors. I have proposed to put this new property here (though it is so general a type of information this could be argued). Presently we have as primary properties of actors 'contact point'. How, in a non arbitrary way, is this more or less general than language? It is a typical piece of information that one may know about an individual that can be useful in querying your data. Especially in a time of great interest in identity, knowing the variables that are pertinent to the actors related to CH collections does not seem very esoteric.
 

>
>     Finally, we are not repeating in the CRM the way typically information systems document, but always tried to find a more fundamental representation. With that argument, we could never have introduced events. They did NOT appear in any of the typical systems at that time. It is a principle not to model all the valuable description elements, which are relevant to characterize an item, but not creating interesting links across resources.

The proposal of 'knows language' is to follow yet another principle, model bottom up. So indeed, I would not rush to add 'has knowledge' because we don't know what that is, but we certainly do have a good enough mesoscopic understanding of what it means for someone to know a language (in the same rough and ready way we speak of somethign consisting of a material). Of course this could be specialized for highly specific anthropological studies, but this does not fall into a use case I am aware of. Almost any good CH database I have modelled that has information about the people related to the collection does document this. 

I agree we can't add any property for our pleasure, but this is really fundamental documentation, not at all on the sidelines of the information space we are talking about. It is equivalent to saying an object has a material. The people doing CH fall in scope and that they know a language is as fundamental as that. 

About the structure of building the model, of course we should not have it reflect the systems. The principle I refer to is the principle that we have to model at the level of knowledge that is typically available. That is why CRM has shortcuts and allows multiple levels of description given what is known by the user. So we do not force everything to go through an event when we know that typically such thing will not be known (but may in principle be). So it would be great to imagine also the 'language acquistion' node which is the moment(s?) of acquiring a language which would pertain to a very specific field of study. But for the CH information system, we do know that somebody knew a langauge and it is valuable to the overall study and so, unclear to me why not necessary. Introducing 'knows language' introduces interesting links between resources...see above.
 

>
>     I did not say that it is a personal opinion that someone speaks a language. I said, this is observable. I document: Franco has spoken Latin, repeatedly? But talking about skills, is another level, it introduces a quality, which is hard to objectify, as Franco has pointed out. Actually, it is a typical classification problem, with all its boundary case questions, and the CRM is about relations between particulars.
>
>     So, what is the added value against p2, and what are the typical research data and typical research questions for integrating such data, that cannot be answered with P2?

It would be difficult for me to generate all the potential research questions that people may want to do based on knowing the languages that people related to cultural heritage objets through events might want to perform. Again, identity studies are very large, today. 

You may want to know the number of English knowers vs Greek knowers in digs at different periods in the history of archaeological research. You may want to try to see patterns in collections related to speakers of X as opposed to speakers of Y language. You may want to know who at a certain time in your graph may have been able to read a book from that graph. 

I struggle to see the esoteric ness  of the proposal and do see how it fits to the modelling principles we elaborated.

Posted by George Bruseker on 14/10/2019

Dear Martin,

The conversation began with a use case from an archive. I just inform that this is also found in all the projects I work on for memory institutions. They find it in scope, so looking further afield for what anthropologists do doesn't seem like a necessary step? Though highly fascinating!

Posted by Detlev Balzer on 14/10/2019

Dear George, Martin,

this discussion made me curious whether or not I can confirm George's assertion that such statements are common in the cultural heritage field.

EAC-CPF does have a language element, which is, however, only used to indicate in which language the name of a person or corporation is expressed. 

GND, the authority file for libraries in German-speaking countries, has a Language entity which is used for making statements about the "field of study" of a person. Other predicates for the person-language pair of entities do occur, but these are obvious data entry errors.

Having extracted person-related data from a dozen or more cultural heritage projects, I don't remember any example where languages spoken or known by somebody have been considered in any other sense than relating to the documented activity, rather than to the (possibly un-instantiated) capacity of the person.

Of course, this is just an observation that doesn't prove anything. Personally, I would tend towards Martin's view that there is little, if anything, to be gained by defining such kind of statement in a reference model such as the CIDOC CRM.
 

Posted by Franco Niccolucci on 15/10/2019

Dear all,

having somehow started this discussion in a hot August evening, let me remind you that the initial question was:

"When describing biographical information [in an archive] it’s common to state that some person was fluent in some language, or languages, apart from his/her native one. Using current archival descriptions standards [ISAD(G) 3.2.2; EAD <bioghist>] this is represented within a text, usually a very long text string with information of distinct natures. So far we have been able to decompose the different elements and represent them adequately as instances of CIDOC-CRM classes and link them trough the suitable properties.
We cannot link a Person (E21) to a language (E56) and neither use multiple instantiation, as it has been suggested in other cases (http://www.cidoc-crm.org/Issue/ID-258-p72-quantification), because Person (E21) and Linguistic Object (E33) are disjoint.”

I understand these bios consist in a text, and metadata are added to it as instances of various CIDOC-CRM classes. The question was how to indicate in such metadata the knowledge of a language as reported in the bio: so not a real quality of the person, but a fact documented. My suggestion was to use E74 Group. I always prefer to use what is already available and avoid the unnecessary proliferation of classes and properties, in my opinion there are already (more than) enough. But in doing so I try to maximize expressiveness, as otherwise one class (E1 CRM Entity) and one property (P2 has type) would be sufficient for the whole world: P2 is not a jack-of-all-trades. 

Reportedly, the Group solution seemed to please the person who made the question.

I don’t know if the "language spoken" is an information usually taken into account in CH; but in this case it was by the archivist, otherwise no question would have been aaked.

Posted by George Bruseker on 15/10/2019

Dear all,

I think that the turn to the data is the right move here.

So some examples at hand:

Viaf - widely used ref

http://viaf.org/viaf/27251336

Emu - widely adopted collections management system


Chin makers in Canada model - national standard body Canada

It is in the requirements by researchers for building their persons model

Finally, I would come back to the initial example. The project in question as far as I follow it, looks to extract analytic data typically encoded by researchers in archival formats that HAVE NOT ALLOWED the precise recording of information that they would have liked to record Analytically and formally. Ie they are precisely trying to find good structures where there were none. The solution is certainly not to document it as an information object.

I am glad to keep finding examples. I hope we can use this practice generally and could even think of formalizing a list of schemas of reference.

Best,

Posted by Pat on 24/10/2019

Hello all,

Sorry to come late to this interesting discussion!

Usually library databases link a person to a language via expressions produced by the person that are in a particular language. 

E21 Person P14i performed {P14.1i in the role of E55 Type = author}: F28 Expression Creation R17 created: F2 Expression P2 has type: E33 Linguistic Object P72 has language: E56 Language

This captures documented use of a language for formal communication. Knowledge of a language is required to create expressions in it, but people can know a language without ever producing expressions in it, so it does not capture all cases of knowing a language. 

Going via F51 Pursuit (which is moving to CRMsoc along with its associated properties, R60 among them) and is a subclass of E7 Activity, you can link a person to a language that they usually use in certain contexts, which do not have to result in actual F2 Expressions. This could include contexts with public speaking (which is not recorded and turned into an Expression of a Work), participating in meetings with a specific working language, corresponding on lists in a particular language.

E21 Person P14i performed: F51 Pursuit R60 used to use language: E56 Language

However, note that the scope note of F51 does restrict Pursuit to professional/creative activities, and would not cover the use of a language in an informal context, such as family life. This gap would prevent recording, for example, the use of a native language in a home context, when all professional activity is in a 2nd, 3rd, 4th etc. language. (This situation would be common for indigenous peoples and immigrant communities.)

Scope note: This class comprises periods of continuous activity of an Actor in a specific professional or creative domain or field.

R60

Scope note: This property associates an instance of F51 Pursuit with the instance of E56 Language that was characteristically used for the products of the associated activity.

The property R60.1 has type of use allows for specifying a particular form of use.

Declared as a shortcut of: 

E65 Creation. P94 has created (was created by): E33 Linguistic Object. P72 has language (is language of): E56 Language

For our SIG discussion this afternoon.