Can a conversational agent pass theory-of-mind tasks? A case study of ChatGPT with the Hinting, False Beliefs, and Strange Stories paradigms.

We investigate the possibility that the recently proposed OpenAI’s ChatGPT conver-sational agent could be examined with classical theory-of-mind paradigms. We used an indirect speech understanding task, the hinting task, a new text version of a False Belief/False Photographs paradigm, and the Strange Stories paradigm. The hinting task is usually used to assess individuals with autism or schizophrenia by requesting them to infer hidden intentions from short conversations involving two characters. In a first experiment, ChatGPT 3.5 exhibits quite limited performances on the Hinting task when either original scoring or revised rating scales are used. We introduced slightly modified versions of the hinting task in which either cues about the presence of a communicative intention were added or a specific question about the character’s intentions were asked. Only the latter demonstrated enhanced performances. No disso-ciation between the conditions was found. The Strange Stories were associated with correct performances but we could not be sure that the algorithm had no prior knowledge of the test. In the second experiment, the most recent version of ChatGPT (4-0314) exhibited better performances in the Hinting task, although they did not match the average scores of healthy subjects. In addition, the model could solve first and second order False Beliefs tests but failed on items with reference to a physical property like object visibility or more complex inferences. This work offers an illus-tration of the possible application of psychological constructs and paradigms to a conversational agent of a radically new nature.

Nous étudions la possibilité d'examiner l'agent conversationnel ChatGPT récemment proposé par OpenAI à l'aide de paradigmes classiques de la théorie de l'esprit. Nous avons utilisé une tâche de compréhension indirecte de la parole, la tâche des sous-entendus, une nouvelle version textuelle d'un paradigme de fausses croyances/fausses photographies et le paradigme des histoires étranges. La tâche des sous-entendus est généralement utilisée pour évaluer les personnes atteintes d'autisme ou de schizophrénie en leur demandant d'inférer des intentions cachées à partir de courtes conversations impliquant deux personnages. Dans une première expérience, ChatGPT 3.5 a montré des performances assez limitées dans la tâche d'indication lorsque les échelles de notation originales ou révisées sont utilisées. Nous avons introduit des versions légèrement modifiées de la tâche des sous-entendus dans lesquelles soit des indices sur la présence d'une intention de communication ont été ajoutés, soit une question spécifique sur les intentions du personnage a été posée. Seule cette dernière version a permis d'améliorer les performances. Aucune dissociation entre les conditions n'a été constatée. Les histoires étranges ont été associées à des performances correctes, mais nous n'avons pas pu nous assurer que l'algorithme n'avait aucune connaissance préalable du test. Dans la deuxième expérience, la version la plus récente de ChatGPT (4-0314) a montré de meilleures performances dans la tâche Hinting, bien qu'elles ne correspondent pas aux scores moyens des sujets sains. En outre, le modèle a pu résoudre des tests de fausses croyances de premier et de second ordre, mais a échoué sur des items faisant référence à une propriété physique comme la visibilité de l'objet ou sur des inférences plus complexes. Ce travail offre une illustration de l'application possible des construits et paradigmes psychologiques à un agent conversationnel d'une nature radicalement nouvelle.

Mots clés

ChatGPT theory-of-mind indirect speech False beliefs

Domaines

Informatique [cs] Sciences cognitives

Fichier principal

ChatGPT and ToM Zenodo 21june2023.pdf (500.13 Ko)

Exp 1 ChatGPT ToM tests.xlsx (71.17 Ko)

Exp 2 ChatGPT ToM tests for API.xlsx (58.42 Ko)

Supplementary material.docx (19.08 Ko)

licence : CC BY NC ND - Paternité - Pas d'utilisation commerciale - Pas de modification

Format : Autre
licence : CC BY NC ND - Paternité - Pas d'utilisation commerciale - Pas de modification

licence : CC BY NC ND - Paternité - Pas d'utilisation commerciale - Pas de modification

Eric Brunet-Gouet : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03991530

Soumis le : mercredi 21 juin 2023-21:19:53

Dernière modification le : jeudi 16 mai 2024-03:14:46

Dates et versions

hal-03991530 , version 1 (15-02-2023)

hal-03991530 , version 2 (21-06-2023)

Licence

Paternité

Identifiants

HAL Id : hal-03991530 , version 2
DOI : 10.1007/978-3-031-55245-8_7

Citer

Eric Brunet-Gouet, Nathan Vidal, Paul Roux. Can a conversational agent pass theory-of-mind tasks? A case study of ChatGPT with the Hinting, False Beliefs, and Strange Stories paradigms.. Human and Artificial Rationalities, Lecture Notes in Computer Science, 14522, Springer Nature Switzerland, pp.107-126, 2024, Lecture Notes in Computer Science, 978-3-031-55245-8. ⟨10.1007/978-3-031-55245-8_7⟩. ⟨hal-03991530v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSERM CESP UVSQ UNIV-PARIS-SACLAY GS-SANTE-PUBLIQUE

738 Consultations

540 Téléchargements