[go: up one dir, main page]

Academia.eduAcademia.edu

Mixing it up: Computation in dynamical systems

2023, Invited talk. First International Conference on Language and the Brain (Biolinguistics panel)

Abstract

The concept of ‘computation’ is, to make an understatement, multifaceted. ‘Computation’ in syntax, cognitive science, and computer science often receive drastically different definitions, sometimes at direct odds with each other. Are we defining closed input-output mappings over naturals? Are we integrating information from multiple sources interactively? What are the basic ingredients in a definition of ‘computation’ such that we can say that a digital computer and a human are doing it? In this talk we will examine some aspects of the relation between what ‘computation’ looks like in the theory of syntax, as well as in some aspects of neurocognition and computer science, and try to establish to what extent these approaches deal with the same kind of process. Asking these questions is important in order to bridge the gap between syntactic theory (which is concerned with providing empirically adequate structural descriptions for natural language sentences) and cognitive neuroscience (which is concerned with the neurocognitive underpinnings of what goes on in language production and processing). Building on the distinction between emulation and simulation, of long pedigree in computer science and AI research, we will focus on the basic properties of syntactic computation, analyse what we should require of a descriptively adequate grammar, and whether a correspondence with neurocognitive processes is not only possible, but even desirable.

Mixing it up Computation in dynamical systems Diego Gabriel Krivochen University of Oxford diego.krivochen@ling-phil.ox.ac.uk Emulation vs. simulation  Description of dynamical systems: differential equations ❖ These track the changes in parameters in a DS throughout time  Extended from physical systems to biological systems, and even social sciences  Physical computation has followed the Turing model for decades  Are we doing simulation or emulation? ❖ More specifically: what are we doing, what do we aim to do, and what do we think we’re doing? the emulation mission is to motivate and create algorithms that can recognise, interpret, learn and express grammars and languages, by performing the same activities that human brains can, to at least the same standard. […] Emulation stands is in stark contrast to modelling and simulation. By definition, the simulation mission is to create algorithms that recognise, interpret, learn and express grammars and languages, by performing such activities in the same way that human brains perform them. (Peter Grindrod, p.c.) (Binder & Ellis, 2016 use simulation vs. computation respectively)  Note: the Turing test, as originally conceived (Turing, 1950), pertains to emulation, not simulation  Turing himself was aware of the distinction: The game may perhaps be criticised on the ground that the odds are weighted too heavily against the machine. If the man were to try and pretend to be the machine he would clearly make a very poor showing. He would be given away at once by slowness and inaccuracy in arithmetic. May not machines carry out something which ought to be described as thinking but which is very different from what a man does? This objection is a very strong one, but at least we can say that if, nevertheless, a machine can be constructed to play the imitation game satisfactorily, we need not be troubled by this objection. (Turing, 1950: 435)  Turing’s idea of a ‘machine’ was a general purpose mechanism, capable of carrying out any set of instructions in a specific format and able to compute any well-defined function over reals (Turing, 1936: 249, ff.)  Not quite ‘syntactic computation’, at least in any concrete sense ❖ There may be some connection to the so-called Third Factor, but that’s still rather ex machina at least as far as syntax is concerned ❖ Merge (just like, say, LFG’s functional control, TAG’s adjunction, or HPSG’s feature subsumption) is still an analytical tool Computation with and without context  What is the nature of the relation between grammars and neurophysiological underpinnings? ❖  Turing-computability is sequential, function-based, closed ❖  I.e.: computability = computation of functions So what we do as fleshy beings is not computation under this definition ❖  Goldin & Wegner (2007): physical Church-Turing thesis is fallacious. The CTT applies to effective computation, not physical computation If CTT is assumed to encompass all computation, then this entails that all computation is function-based and algorithmic ❖  Information needs to be encoded in a sequence of symbols, formulae need to be ‘Gödelised’ Cognitive ‘computation’ is probably none of those things ❖  Are we playing the Turing imitation game? Do we want to say that? Probably not. Goldin & Wegner: interactive computation is also computation  In interactive computation, communication with other systems happens during the computation, not before or after it ❖  In function-based computation, interactivity is banned by the very definition of a function ❖  This allows the system to adapt to its environment What would ‘function-based homeostasis’ look like? There are parallels with certain derivational grammatical architectures ❖ If the ‘interfaces’ only read the outputs of syntax, we have a function-based system ❖ Same if we have parallel levels linked by functions (as in LFG; Dalrymple et al., 2019) ❖ ‘Invasive’ interfaces were a thing briefly during the late-2000s / early-2010s (e.g., Boeckx, 2007) ❖ Proposals which do not separate components (e.g., Stroik & Putnam, 2013) also represent a step towards ‘interactive computation’  What is ‘physical computation’? What properties does it have? ❖ Note: physical materials support classical computation; after all, our computers are made of physical materials. Perhaps ‘embodied’ would be a better word, but it has too many connotations Computation in nervous systems     Two main properties of nervous systems: ❖ Robustness ❖ Adaptability Gollo & Breakspear (2014): analysis of computation in neural networks ❖ Robustness is provided by synchronisation across interacting regions ❖ Adaptability is provided by dynamical frustration in local motifs In contrast to digital computers, brains have plasticity ❖ There is memory storage, there is information processing: the basic ingredient for ‘computation’ ❖ This allows creatures to map their environment, extract information from signals, make predictions about the behaviour of objects around them ❖ But there’s also permanent integration of contextual information, no pre-determined set of instructions, no ‘halting’, non-linear dynamics… The characterisation of the processes that may underpin the brain’s adaptive behaviour across scales is still elusive ❖ Are there compilers taking information across levels? Hardly. ❖ The role of information compression is crucial in living systems The ‘as if’ problem  Models of brain dynamics take the form of systems of (partial) differential equations  But everyone admits that the brain does not solve PDEs in real time  Binder & Ellis (2016): physical laws are not the same as computer algorithms   ❖ Multiple algorithms can solve the dynamics of a two-body system, for example ❖ A physical law is not simply the execution of an algorithm The same applies to brains ❖ ‘Computation’ emerges at the mesoscopic level (Freeman, 2000, a.o.) ❖ However, that’s not ‘function-based algorithmic computation’ ❖ There’s a difference between behavioural outputs, structural basis for those outputs, and algorithmic modelling of those outputs (e.g., Markov chains for decision making) Modelling involves a good amount of as if thinking ❖ A system S behaves as if it was doing X, so X is a good approximation to whatever dynamics are taking place in S really ❖ This is precisely what emulation does Computing in dynamical systems   The physical CTT can be interpreted in terms of simulation or emulation ❖ Emulation: nothing that nature does is not Turing-computable ❖ Simulation: everything that nature does can be carried out, step by step, by a TM Applied to the relation between grammars and their neurophysiological underpinnings, the PCTT can also be interpreted in these two ways ❖ Emulation: grammars are approximations of descriptions of stable states of cognitive dynamics and the neurophysiological substratum that support them (Saddy, 2020; Krivochen, 2021) ❖ Simulation: grammars are descriptions of physical observables A review of the cognitive neuroscience literature will suggest that non-Phase Heads (PHs) may oscillate at gamma, that transitive PHs […] would do so at [frequency] beta2, and that intransitive PHs oscillate at beta1. In fact, this triangle represents the three elements of phrases/phases: complement, head, and specifier/edge. (Ramírez Fernández, 2015: 78) (see also Murphy, 2015 and much related work)  Leaving aside all linguistic issues (e.g., phases may be parametrisable), is this telling us anything about either grammars or neurocognition? ❖ We think not.  Looking at grammars as descriptions of stable states (sort of like Poincaré maps) allows us to retain discreteness in categories and operations ❖ Both crucial in procedural and declarative models ❖ Merge does not apply ‘gradually’, constraints are functions from expressions to truth values. Constraint satisfaction is binary (Haider, 2019)  Recognising this entails abandoning the idea that a theory of grammar is a theory of human knowledge of language  We echo Postal’s (2010) words when he says I understand grammatical study to be concerned with the characterization of NL, not with the characterization of knowledge of NL nor with any mechanisms that yield such knowledge.  Putnam & Chaves (2022) express a similar sentiment the assumption of anything ‘isomorphic’ between the grammars that linguists invent and the linguistic processes going on in the brain is nothing but speculation at this point. Cognitive neuroscience has not yet found any clear relation between formal grammars and the neural language system  The key point is that it is misguided to ask of grammars to emulate cognition, as much as it is misguided to ask of neuroscience to provide characterisations of expressions and relations in natural language sentences  The characterisation of computation in grammars and in neurocognition is radically different  The emergence of tools such as ChatGPT has moved some to emphasise that: ‘No grammar or hierarchical structure [is] needed to learn a language.’ (Rens Bod; see also Frank, Bod & Christiansen, 2012; Piantadosi, 2023)  This is the result of confusing (i) emulation with simulation and (ii) grammar with knowledge of language  To an extent, the Turing test has been both overhyped and misinterpreted  Keeping, like Postal does, the syntactic characterisation of expressions and relations in NL distinct from whatever humans do, avoids these misunderstandings ❖ Accepting strong reductionism, in any of its forms, leaves us with no theory of syntax to speak of ❖ And a definition of ‘computation’ that is either too vague to say anything interesting, or too narrow to apply to everything one may want to apply it to Towards mixed computation  An intermediate step: non-biological dynamical systems  Consider e.g. the Feigenbaum bifurcation diagram, generated by the recurrence relation 𝑥𝑛+1 = 𝑟𝑥𝑛 (1 − 𝑥𝑛 ) This is a model of population growth, which has been applied e.g. in biology. 0 < 𝑥𝑛 < 1 represents the ratio of existing population to the maximum possible population Chaotic behaviour emerges from this very simple equation as r ≈ 3.56995 Zooming in… Linear convergence: Finite-state 1 < r < 2.3 Bifurcation: good ol’ Context-Free 2.4 < r < 3.54 Chaos!: ??? r ≈ 3.56995  Binder & Ellis (2016) make a very similar analysis (see also Crutchfield & Young, 1989; Moore & Lakdawala, 2000)  A lesson for syntax? Structural uniformity misses important dynamics Some data and a problem 1) Wakanda is a big small country 2) Gandalf is an old old man 3) That’s fake fake news  What do the structural descriptions for (1-3) look like? o Minimalist and Cartographic approaches (Cinque, 1999, 2010; Scott, 2002; Ticio, 2003; Alexiadou, 2013; Bortolotto, 2016, etc.): FPns are proxies for whatever functional head is assumed to be there Configurationally, nothing would change if we had multiple N’ / NP or whatever: there is structure in the form of nonterminal symbols …and the problem?  The interpretation of the sequence of adjectives is not the same in the three cases 1) A country that is big for a small country 2) Intensive reduplication: a very old man 3) Ambiguity: a. News that is fake as fake news (i.e., truthful news) b. Extremely fake news (intensive reduplication)  Only in (1) and (3a) does one of the adjectives have scope over a constituent that includes the other adjective plus the head noun •  That is: only in (1) and (3a) do we have a hypotactic relation between the As. If we define scope in terms of (asymmetric) c-command (Ladusaw, 1980; May, 1985…), then only in (1) and (3a) should we have one of the adjectives c-commanding the other • However, a computational system based on binary Merge (or binary PSR for all we care) can only generate structures where one of the adjectives has scope over the other • In other words, binarity assigns more structure than necessary (Chomsky & Miller, 1963; Lasnik, 2011; Schmerling, 2018; Krivochen, 2015, 2021)  Empirical observation: not all sequences of adjectives have the same interpretation  Theoretical proposal: if we want structural descriptions to connect directly to interpretation, all three cases of adjective iteration cannot receive the same structural description (whatever your favourite structural description format is) ➢ By ‘connect directly to interpretation’ we mean something along the lines of Jacobson’s (2009, 2012) Direct compositionality or Bach’s (1976) rule-by-rule hypothesis (based on Montague, 1973) a. For every syntactic rule, there is a unique translation rule specifying the translation of the output of the rule as a function of the translation(s) of the input(s). b. All rules apply strictly locally in a derivation, that is, no rule has access to earlier or later stages of a derivation. […] (Bach, 1976: 187)  What are our options? ➢ We may stick to structural uniformity and not worry about interpretation ➢ Or (bear with me here), we may give up structural uniformity What we have and what we want  We have: ➢ A set of sentences where adjective iteration behaves differently, both semantically and syntactically o Semantically, depending on what the semantic value of an adjective applies to o Syntactically, we can apply certain tests: 5) Fake is what the fake news was (only scopal interpretation available) 6) *Old is what the old man was (impossible on an intensive reduplication reading: the man is just old) 7) %Big is what the small country was ➢ Instances of adjective iteration in an otherwise unremarkable syntactic context o  I.e., modifying an NP We want: ➢ A syntactic framework where we can capture the behaviour of these adjectives both syntactically and semantically ➢ A way to connect syntactic structure and compositional semantic interpretation Some syntax: Tree Adjoining Grammars   A TAG is essentially a formal system working on trees which can be augmented and combined with one another either at the frontier or ‘counter-cyclically’ ➢ These two cases correspond to the (generalised) operations substitution and adjunction respectively. ➢ Trees may be either elementary or derived: derived trees are obtained by means of applying composition operations to elementary trees. Two kinds of elementary trees: initial trees and auxiliary trees.  Initial trees are single-rooted structures which contain a non-terminal node which is identical to the root of an auxiliary tree.  Auxiliary trees are also single-rooted structures, which contain a node in their frontier which is identical to their root: this allows for auxiliary trees to be adjoined to initial trees at the nonterminal that corresponds to the root of the auxiliary tree. Substitution and Adjunction What do we do with this?   First, we define our grammar to be lexicalised (Frank, 1992; 2013; XTAG group, 2001; a.o.)  That is: each elementary tree (ET) contains a single lexical category (the ‘anchor’ of the elementary tree)  For example, in a man, there is only one lexical category: man. The ET is then: Second, we define each elementary tree as a computationally uniform unit   That is: the formal dependencies within an ET are of a single kind (regular, context-free, contextsensitive…) Third, we require that the application of a syntactic composition rule occur in tandem with a semantic interpretation rule (the rule-by-rule hypothesis) This allows us to define a system in which substrings that display varying kinds of formal dependencies belong in distinct elementary trees The system at work  We want there to be no hypotaxis between old old in the intensive reading, but we do want there to be a scope relation between [old old] as a unit and man  We have two elementary trees (the choice of labels is inconsequential, but we’ve chosen to be relatively conservative in this respect for pedagogical purposes):  Note that in the auxiliary tree there is a ‘flat’ dependency between adjectives  The core idea is that the iterated adjectives constitute a syntactic object that o o Cannot be tampered with (recall *Old is what the old man was in an intensive reduplication reading) Modifies an argument as a unit: we do not have ⟦old⟧(⟦old⟧(⟦man⟧)) but rather ⟦old old⟧(⟦man⟧) After adjunction We adjoined the AT at an internal NP node in the IT. The system is based on identifying identically labelled nodes and substituting under label identity Of course, as we saw, intensive iteration is not limited to two tokens of a category! ‘Nice and all, but… how does it help?’  ❖  A direct consequence of this approach is that we can assign distinct structural descriptions to the two readings of fake fake news! Let the semantic value of an ET be defined by the semantic value of its anchor: ⟦fake⟧ and ⟦news⟧ The first application of adjunction outputs the following derived tree:  At this point, we have the semantic value of fake applied to the semantic value of news  We apply adjunction again, and…  The corresponding semantic interpretation rule applies the semantic value of fake to the semantic value of fake news, delivering the ‘truthful news’ reading  The intensive iteration reading is obtained like old old man –you can do it at home-  Giving up structural uniformity is not controversial in the study of computation in complex nonlinear dynamical systems  Why is it so in syntax? ❖ Mostly a historical-rhetorical artefact, we think  Note that we are not saying that by giving up structural uniformity in syntax we can suddenly establish a 1-to-1 correspondence with the kind of dynamical system that corresponds to the biological underpinnings of language  The operations are different, the primitives are different, the data is different, the analytical methods are different…  We can just take a hint from the study of CNDS and stop trying to fit a Calabi-Yau peg into a round hole Conclusions  Mixed computation, as just presented, is also emulation ❖ We do not claim that humans process linguistic structure in terms of directed graphs, that these graphs are combined via substitution and adjunction in a way that is ‘psychologically real’, that elementary trees correspond to this or that oscillation…  However, it is far better as an emulation tool than structural uniformity  And it takes us closer to bridging the gap, since computation in complex dynamical systems is characterised by different coexisting structures  The angle is different: it’s not that the computations are identical or isomorphic between brains and grammars (cf. Putnam & Chaves)…  …rather, that they may be characterised by comparable underlying properties ❖ Multistability ❖ Dynamical frustration ❖ Degeneracy  These properties hold, so far as we know, across levels of organisation  While not psycholinguistic at all in nature, mixed computation seems to satisfy Pollard’s (1996: 12) requirement of ‘psycholinguistic responsibility’, and raises the bet Thank you Sincere thanks are also due to Doug Saddy for his seemingly limitless knowledge and unwavering patience. Whatever I got right is due to collaboration, while errors are only mine to regret.