About the project – ANR TextToKids

TextToKids project is supported by the French National Research Agency (ANR) from december 2019 to december 2023. It is located at the crossroads of linguistics, psycholinguistics and automatic language processing (NLP). Various locks emerge from these domains or their intersections. The major issue addressed is as follows: how to translate the notion of “complexity” of a text into an objective measure, quantifiable by “elementary” descriptors?

Societal context

The increasingly early mastery by children of computer tools and the Internet raises questions about their ability to access the informational content of certain texts and creates the need for filtering or even reformulation tools with a view to adapt to this new class of users. Children indeed have less command of the language compared to adults (more limited lexicon and syntax, different understanding of logical and temporal connections, etc.) and may therefore have difficulty understanding certain texts. These difficulties are due to the on-going learning of the language and the constraints of their developing cognitive system, particularly in relation to their memory capacities. Moreover, many works in psycholinguistics underline that the understanding of texts is linked to the presence of emotional information, hence the importance of taking into account this type of information.

Expected research results

The TextToKids project fits into this paradigm. It aims to study the linguistic and psycholinguistic characteristics conducive to an optimal understanding of information content by children and to propose automatic language processing (NLP) software bricks that integrate these characteristics. The project targets young readers, namely 7-12 year olds, and will distinguish different stages
developmental, especially groups 7-9 and 10-12. The work will be carried out in two experimental fields: via the children’s newspapers Le P’tit Libé and Albert which aim to produce articles describing current events (e.g. the Oscars, Brexit, reception of migrants in France, etc.) and via the search engine dedicated to children Qwant Junior, which indexes texts of various kinds.

The expected research results are:

(1) a typology of the linguistic characteristics of texts intended for children according to their age group (or scholar level);

(2) a method of calculating the suitability of texts (or portions of texts) for children;

(3) remedial strategies (linguistic justifications, proposals for reformulation) of inadequate portions;

(4) tools to help write articles for Albert (guide to good practices and IT tools);

(5) the integration of the adequacy measure in the Qwant Junior search engine.