Pretrained Language Models as Containers of the Discursive Knowledge

Paper · Source

Abstract: Discourses can be treated as instances of knowledge. The dynamic space in which the trajectories of these discourses are described can be regarded as a model of knowledge. Such a space is called a discursive space. Its scope is defined by a set of discourses. The procedure of constructing such a space is a serious problem, and so far, the only solution has been to identify the dimensions of this space through the qualitative analysis of texts on the basis of the discourses that were identified. This paper proposes a solution by using an extended variant of the embedding technique, which is the basis of neural language models (pre-trained language models and large language models) in the field of natural language processing (NLP). This technique makes it possible to create a semantic model of the language in the form of a multidimensional space. The solution proposed in this article is to repeat the embedding technique but at a higher level of abstraction, that is, the discursive level. First, the discourses would be isolated from the prepared corpus of texts, preserving their order. Then, from these discourses, identified by names, a sequence of names would be created, which would be a kind of supertext. A language model would be trained on this supertext. This model would be a multidimensional space. This space would be a discursive space constructed for one moment in time. The described steps repeated in time would allow one to construct the assumed dynamic space of discourses, i.e., discursive space.

Based on Michel Foucault’s concept of discourse, especially the text from 1971 (Foucault, 1971), a knowledge model was proposed, named discursive space, in which discourses are instances of knowledge travel trajectories in a multidimensional dynamic space.

The qualitative procedure used so far to construct the dimensions of discursive space was a derivative of discourse analysis itself. The description of the discourse characteristics was based on the identification of a number of features. These features, or more precisely, the degree of their impact on various aspects, became the basis for the dimensions based on them. Determining this impact is the standard type of analysis of social phenomena. The coordinates were constructed by arbitrarily scaling this impact. Therefore, dimensions and coordinates were constructed on the basis of the same set of data used to analyze the discourse itself.

Such structures are discourses, i.e., linguistic (semantic) structures with a higher degree of abstraction than the sentences they consist of. Therefore, one should search for higher-order units (discourses) composed of lower-order semantic units (words) and their relationships in sentences. This would be a repetition of the embedding technique, but transferred to a higher semantic level, the aim of which is to create a set of vectors describing discourses as semantic units of a higher order.

These discourses would be identified as sets of sentences (fragments of texts), constituting a discourse related to certain concepts (words), represented as tokens at a lower level of embedding. The strength of the semantic range in the text, determining the size of these sets, would be determined by the relevance coefficient, calculated on the basis of a model built at a lower level of embedding (token embedding). This model would also be the basis for the selection of qualified tokens as the basis of discourse. By analyzing the mutual position of the indicated discourses in the corpus of texts, a discursive linguistic model would be created. The introduction of a time variable, i.e., the construction of a dynamic discursive model, would fulfill the assumptions of discursive space.

Teun van Dijk, who wrote: “Discourse presupposes (semantic) situational models of events talks about, as well as (pragmatic) context models of the communicative situation, both construed by the application of general, socially shared knowledge of the epistemic community” [9] (p. 601).

According to Foucault, “[a] group of elements [form and rigor, objects, statement types, notions, strategies—author’s note], formed in a regular manner by a discursive practice, and which are indispensable to the constitution of a science, although they are not necessarily destined to give rise to one, can be called knowledge. (. . .) there is no knowledge without a particular discursive practice; and any discursive practice may be defined by the knowledge that it forms”

Foucault proposes the use of a method based on four rules (orig. principes, règles): reversal (principe de renversement), discontinuity (principe de discontinuité), specificity (principe de spécificité), and exteriority (règle de l’extériorité), which also define this specificity. These rules can be transferred to the phenomenon of knowledge, which acquires the unusual form of a set that is numerous, mobile, internally variable, and related, and at the same time elusive directly and observable only indirectly. These features allow this set to meet the conditions of a complex system [14] (p. 92).

Discursive space is an n-dimensional dynamical space in which discourses, which are autonomous instances of knowledge, run in time trajectories describing the real state of knowledge in the subject that they concern

following definition of knowledge: “knowledge is a set of discourses contained in an n-dimensional manifold that can be interpreted locally as a discursive space”

Byrne and Callaghan describe this approach as follows: “Actually what we have in this example is not a set of models calibrated against real data in terms of initial inputs but rather a modelling process which establishes its correspondence to reality through a qualitative appreciation of how things are working out in reality” [18]

The construction of the discursive space based on the embedding technique would consist of isolating the discourses present in the text and then calculating their location in the abstract space, analogically to tokens, i.e., based on the probability of their occurrence in the context of other discourses. This would implement the so-called distributional hypothesis, as described by Juraffsky and Martin cited above, but would move to a higher level of semantic order.

These discourses would form a sequence analogous to the text that would be their source. A discourse is created around a specific issue, which is also the source of the name describing the subject of this discourse and the knowledge it contains. Thus, these names (concepts, words) would form a sequence of words (concepts) in the order resulting from the source text, repeating the order of the discourses. The resulting structure can be interpreted as a kind of supertext, which could then become the basis for training the language model, analogous to the token embedding technique. Due to a procedure that could be called discourse embedding, a linguistic model based directly on knowledge instances would be created because discourses as its base would be interpreted in this way. Therefore, this model would necessarily be a model of knowledge.

step two would be to build a series of language models over time. Then, we would be dealing with a set containing (ordered) sets of discourses (instances of knowledge) appropriate for specific moments in time. This would make it possible to plot the trajectories of relevant discourses in time and would implement the source model of dynamical space, which is the basis of the discursive space.

Based on significant analogies, primarily concerning the same subject of research, which is language, manifested as real, empirical text corpora, the paper proposes to use a technology analogous to token embeddings, but identifies hypothetical semantic units of a higher order than tokens. These units, i.e., discourses, have been presented and justified in discourse theory and formalized in discursive space theory. Such a technology could be called discursive embedding. Constructing the space of embeddings for discourses as semantic units of a higher level would solve the problem of constructing the dimensions of the discursive space and would allow it to be completely formalized.

The genesis of the human mind, Habermas notes, lies in the interplay between «the perspective of an observer on what is going on in the world with the perspective of a participant in interaction» with others (Habermas 2008: 171). The “subjective mind” of the individual arises within a communicative process of understanding that constantly generates, in parallel, a linguistic “objective mind” of materially embodied symbols that is to some extent independent of its individual speakers. These subjective and objective sides of the human mind are both distinct and complicated. Distinct since, «On the one hand, objective mind evolved out of the interaction between the brains of intelligent animals who had already developed the capacity for reciprocal perspective taking […] On the other hand, the “objective mind” claims relative independence vis-à-vis these individuals, since the universe of intersubjectively shared meanings, organized according to its own grammar, has taken on symbolic form» (Habermas 2008: 174-175). These “two minds” are, however, also tightly co-implicated, since:

These meaning systems can, in turn, influence the brains of participants through the grammatically regulated use of symbols. The “subjective mind” of those individuated participants in shared practices develops only in the course of the socialization of their cognitive capacities. This is what we mean by the self-understanding of a subject who can step into the public space of a shared culture. As actors, they develop the awareness of being able to act one way or another because they are confronted in the public space of reasons with validity claims that challenge them to take positions.

As rational subjects, our agency finds motivations «in this dimension and

follows logical, linguistic, and pragmatic rules that are not reducible to natural laws»

(Habermas 2008: 173).

he argues, the computer analogy that is often invoked to assimilate our thinking to the inner workings of computing machines is fundamentally flawed because it misses «the socialization of cognition that is peculiar to the human mind» (Habermas 2008: 175).

LLMs escape, at least to some extent, the narrow formula of the individual hardware that runs its own pre-established software, as they are based on semi-automated learning processes fed by the same kind of socially shared “objective mind” that “programs” the individual human brain LLMs escape, at least to some extent, the narrow formula of the individual hardware that runs its own pre-established software, as they are based on semi-automated learning processes fed by the same kind of socially shared “objective mind” that “programs” the individual human brain

If we accept this distinction as consistent with the Habermasian stance on computers and programming, we can preliminarily note that, while from the “perspective of observation” humans and AIs clearly are two entirely different kinds of systems operating on their own rules and mechanics, from the “perspective of participation” the difference is much more subtle.1

For Habermas, participation in discursive practices is a central aspect of becoming responsible agents: «People enter the public space of reasons by being socialized into a natural language and by gradually acquiring the status of a member of a linguistic community through practice. Only with the ability to participate in the practice of exchanging reasons do they acquire the status of responsible authors of actions that is definitive of persons as such, i.e. the ability to account for themselves toward others».

The process of becoming responsible agents is accompanied by a distinct reflexive aspect, specifically in the form of «a reflexive stability of our consciousness of freedom» (Habermas 2008: 208) rooted in the self-awareness that our convictions and our actions are grounded in meanings and reasons that inhabit ourselves and are shared, transmitted and revised within a community of communicants we belong to.

Only by growing into an intersubjectively shared universe of meanings and practices through socialization can persons develop into irreplaceable individuals. This cultural constitution of the human mind explains the enduring dependence of the individual on interpersonal relations and communication, on networks of reciprocal recognition, and on traditions. It explains why individuals can develop, revise, and maintain their self-understanding, their identity, and their individual life plans only in thick contexts of this kind (Habermas 2008: 296).

This kind of reflexive consciousness and sense of identity is not a property that can be currently attributed to LLMs, at least based on how they operate and the kind of linguistic output they display. These preliminary considerations, then, suggest that the “perspective of participation” in practices is where the interaction between humans and AIs highlights both their common traits – as in the emergence of discursive capacities out of the learning process upon the “objective mind” of symbolic linguistic repertoires – and their differences – when it comes to the emergence of the intentional, desiring subjective mind of the human participants and the iterative, stochastic simulation that fuels the output of LLMs.2 This, however, leaves substantially intact the problem of what kind of moral status should be attributed to this new kind of actor.