Swedish and Danish,
spoken and written language
A statistical comparison
Peter Juel Henrichsen
Copenhagen Business School
Jens Allwood
University of Gothenburg
(c) John Benjamins
Delivered by Ingenta
on: Wed, 05 Apr 2006 14:14:36
to: Guest User
IP: 192.87.50.3
The aim of much linguistic research is to determine the grammar and the
lexicon of a certain language L. The spoken variant of L – in so far as it is
considered at all – is generally taken to be just another projection of the same
grammar and lexicon. We suspect that this assumption may be wrong. Our
suspicion derives from our contrastive analyses of four corpora, two Swedish
and two Danish (covering spoken as well as written language), suggesting
that – in the dimensions of frequency distribution, word type selection, and
distribution over parts of speech – the mode of communication (spoken
versus written) is much more significant as a determining factor than even
the choice of language (Swedish versus Danish).
Keywords: language comparison, spoken language corpora, speech versus
writing, Danish versus Swedish, Scandinavian languages
.
Introduction
Politically, Danish and Swedish are considered to be two different languages;
indeed (and perhaps for this reason) many Swedes declare themselves unable
to understand Danish, and vice versa. At the same time, spoken Danish and
written Danish are taken to be one language as a matter of course (similarly for
Swedish). Our corpus data tell quite another – and perhaps surprising – story.
International Journal of Corpus Linguistics : (), ‒.
‒ ⁄ - ‒ © John Benjamins Publishing Company
Peter Juel Henrichsen and Jens Allwood
Previous investigations of the differences between the two languages have
been based on intuitions and comparison of traditional grammars. Quantitative corpus-based approaches have, so far as we are aware, not been tried. In
this paper, we will make use of this kind of approach by comparing:
(i)
(ii)
(iii)
(iv)
a spoken language corpus for Danish
a spoken language corpus for Swedish
a written language corpus for Danish
a written language corpus for Swedish
The reason for the employment of both spoken and written corpora is that
previous work has shown considerable differences between spoken and written
Swedish (Allwood 1998) and between spoken and written Danish (Henrichsen
2002b). The differences are, in fact, so significant that they raise the question
of whether intralinguistic variation between the spoken and written variants
of a language might actually be greater than the difference between what has
traditionally been regarded as different languages. As we will see below, this
might indeed be the case.
spoken
languages of Swedish and Danish, in
(c) The
John
Benjamins
many ways, seem to be more similar than spoken Danish or Swedish are to
Delivered by Ingenta
written Danish or Swedish.
on: Wed,
05takeApr
2006
14:14:36
In our comparison
we will
variation
between
social activities into acGuest
User
count by matching similarto:
activities
in spoken
Danish and spoken Swedish.
To be more explicit, we will
Danish informal interviews with Swedish
IP:match
192.87.50.3
informal conversations and interviews. Likewise, in order to keep invariant
the style of writing, we will match Danish newspaper texts with Swedish
newspaper texts.
The implications of our observations are at the moment not entirely clear,
but they may have practical as well as theoretical consequences important in
such areas as conversational training, studies in intercultural communication,
and localization of language technology.
As we shall see, certain parts of speech are typical of the spoken variants
of the Scandinavian languages (and for English and other languages as well),
notably interjections, pronouns, attitudinal adverbs, and conjunctions. Those
parts of speech are not always studied as intensively as e.g. nouns, verbs, and
prepositions in language courses, and so one might speculate that fluency in
conversation could be improved by rehearsing the related communicative functions, such as feedback (attention, accept, reluctance, etc.), own communication management (hesitation, resumption, repair), attitude (empathy, scepticism, irony), and pronominal reference (keeping track of the discourse refer-
Swedish and Danish, speech and writing: A statistical comparison
ents in a spoken narration) – as a supplement to more traditional language
exercises focussed on grammatical constructions and categories typical of the
written language.
In Scandinavia today, speech technological projects are almost exclusively
intra-national.1 Adaptation of written language dictionaries and grammars for
use in speech synthesis and recognition is commonplace in our countries and,
we suspect, in other small language areas around the world too. Given the
many common features that we find in the spoken variants of the Scandinavian languages – and the relatively large discrepancies between spoken and
written styles within each language – we speculate that Denmark, Sweden, and
other small countries might benefit from founding a speech technology on
advanced components developed for neighbouring languages rather than on
existing national dictionaries and grammars based on written text sources.
The organization of the paper is as follows. After a short discussion of the
linguistic background of our investigations, we present our four reference corpora in Section 3, while in Section 4 we comment on our methodology. Section
5 presents the details of
contrastive
analysis of the Danish corpora, one
(c)ourJohn
Benjamins
written and one spoken, and in Section 6 we present – in a condensed form – a
Delivered by Ingenta
Swedish investigation along similar lines. In Section 7 our findings are summed
on:picture
Wed,
05 Apr
2006
14:14:36
up and a general
is drawn
of spoken
and written
language, and finally
to:guidelines
Guest
in Section 8 we suggest some
for User
further research.
English translations ofIP:
Scandinavian
word types are given in a reduced
192.87.50.3
form in the main text; for fuller translations, see the Appendix.
. Background
Structuralist linguistics for a long time has favored (sometimes explicitly but
perhaps mostly implicitly) the view that the difference between spoken and
written language is of minor importance to linguistic theory. With the work
reported in this paper, we wish to examine this belief more critically. Is the
difference between spoken and written language really without theoretical importance? Let us consider some reasons why this view might be challenged. A
basic reason is that spoken language has evolutionary primacy and probably is
genetically facilitated. Unless we assume very rapid genetic change, this is not
the case for written language.
Another reason is that the structure of spoken and written language, although similar in some respects, is also very different in many respects. Face-
Peter Juel Henrichsen and Jens Allwood
to-face spoken language is interactive (in its most basic form), multimodal (at
the very least containing gestures and talk) and highly context dependent. Further, it is basically organized into utterances which are often no longer than
a word. Written language, on the other hand, in its most typical form is noninteractive, monological and monomodal with a lesser degree of contextualization, organized in sentences which are governed by normative rules of the type
that a proper sentence should contain a subject and a predicate. The norms of
spoken language are usually not of this sort, rather they concern intelligibility
and adequacy in different social activities.
As part of the mechanisms that make spoken language into the efficient and
finely tuned means of communication that it is, we find ways of changing your
mind or ‘own communication management’ (for example, what from a normative written language perspective might be called ‘disfluencies’, ‘false starts’,
‘self repair’ etc.). We also find short and unobtrusive ways of giving feedback
(for example, by words like yeah and uhuh) while overlapping with another
speaker’s utterance. None of these phenomena that are typical and central to
the functionality of spoken
any place in written language.
(c) language
John have
Benjamins
The differences between spoken and written language have previously been
Delivered by Ingenta
discussed by, for example, Allwood 1998, Biber et al. 1999 and Leech et al.
on: Wed,
2006about
14:14:36
LRW2001. Generally,
we may05
say Apr
that estimates
the importance of the
to:that
Guest
User
distinction vary, from holding
it is merely
a genre difference, similar to the
difference between texts from
and texts from newspapers (Biber 1988),
IP:novels
192.87.50.3
to claiming that the difference is of a more radical nature, such as McKelvie
(1998), Debaisieux and Deulofeu (2001). Our belief is that the differences are
fairly significant and that their true nature, to a considerable extent, has remained hidden because most research that has been done, has been focussed
only on those aspects of spoken and written language which are comparable.
What this means is that since it is unsurprising that spoken language does not
contain punctuation marks, this feature is often not even noted as a significant difference. In fact, the most common meaningful signs in written language are indeed ‘,’ (comma) and ‘.’ (period). The term ‘word’ is avoided here,
in order not to block the comparison. Similarly, very significant features of
spoken language, such as overlap between speakers, own communication management and feedback are absent in written language and therefore left out of
the comparison. The missing types of comparison become even more evident,
if we bring in body movements and various types of gestures, which are also
a sine qua non of face-to-face spoken language communication, but absent in
written language.
Swedish and Danish, speech and writing: A statistical comparison
In our investigation, to be presented below, we will, however, restrict ourselves to features that exist in both spoken and written language and we will
argue that even with this restriction, the differences that can be found are
considerable.
. Data
In this section we present the linguistic data of our investigation together with
our analytical methods.
. Four reference corpora
Our comparison is mainly based on four corpora referred to as DanSPO, SweSPO, DanWRI, and SweWRI. To enhance comparability, in line with what
was noted above, these corpora were all adjusted by removing all non-lexical
markup (such as punctuation
in written
corpora and details of pronunciation,
(c) John
Benjamins
pauses etc. in speech). Each reference corpus consists of orthographically conDelivered by Ingenta
trolled words only, organized with one sentence per line in the written corpora
on: per
Wed,
Apr corpora.
2006 14:14:36
and one utterance
line in05
the spoken
User
DanSPO is identical toto:
theGuest
Danish speech
corpus BySoc, established in
the late eighties in the socio-linguistic
project “Bysociolingvistik” (The CopenIP: 192.87.50.3
hagen Study in Urban Sociolinguistics). It consists of so-called Labovian interviews (Labov 1984), i.e. informal conversations without preset topic – about
Table 1. Composition of reference corpora
Reference corpus DanSPO
SweSPO
DanWRI
SweWRI
Size (words)
Style
1,335,247
Labovian
interview
(informal
conversation)
380,338
1,334,944
Informal
Mixed newspaper
interview and genres
informal
conversation
785,986
Mixed newspaper
genres
Source corpus
BySoc
GSLC
Berlingske 99
(Danish daily
newspaper)
Göteborgsposten
2001 (Swedish daily
newspaper)
Selection
All of BySoc
Gbgfragment
of GSLC
Fragment of text
body (articles
only)
Fragment of text
body (articles only)
Peter Juel Henrichsen and Jens Allwood
80 in total, mostly recorded in the informants’ own homes. BySoc is described in Gregersen et al. (1991), Henrichsen (1998), and is available at
www.id.cbs.dk/∼pjuel/BySoc
SweSPO is identical to the gbg-fragment of the Göteborg Spoken Language
Corpus (GSLC). GSLC was mainly recorded in the period 1978–2000 as part
of many different projects. GSLC contains 1.3 million word tokens organized
in around 25 sections containing different social activity types such as auction, patient-doctor consultation, and shopping (Allwood 1999; Allwood et al.
2002a; Allwood et al. 2002b). The gbg-fragment – alias SweSPO – consists of
informal interviews and informal conversations.
As seen, the spoken corpora were collected with different purposes. We
have sought to keep constant the activity influence on language style in our
investigation by selecting for SweSPO the segment of GSLC containing informal interviews and conversations, since (only) this style matches the style of
BySoc with respect to number and kind of participants (linguist+informant)
and purpose (free-style conversation).
(c) John Benjamins
. DanSPO
Delivered by Ingenta
on: ofWed,
Aprsliced
2006
14:14:36
DanSPO consists
all files 05
in BySoc
one utterance
per line. An ’utto: Guest
terance’ is defined as a sequence
of lexicalUser
words delimited by any of these
events: pause (notated £,IP:
££,192.87.50.3
£££ for normal/long/very long pause), hesitation with phonation (∼), audible breathing (#), non-verbal communication (e.g. laughter, notated “(ler)”), turn shift, incomplete words (interruption point marked with “-”), partly unintelligible passage (transcription enclosed in square brackets). Other kinds of sound-related information (e.g. rising/falling intonation, hesitation with phonation, prolonged syllable) are ignored, as are passages marked by the transcriber as being atypical (e.g. readA> aha ££ jamen hvor- hvor-∼ hvord- hvordan hvordan∼
1>
------------------------------------------------A> skete det havde jeg nær sagt (ler) hun blev
1>
ja (ler) sådan
------------------------------------------------A> hun blev gravid som syttenårig
1> [ mente du det nok ]
(ler) #
Figure 1. Sample from corpus BySoc in so-called ‘score-format’ showing the onsets of
the interviewer (A>) and the informant (1>) relative to each other.
Swedish and Danish, speech and writing: A statistical comparison
aha
jamen
hvordan hvordan
skete det havde jeg nær sagt
ja
hun blev hun blev gravid som syttenårig
sådan mente du det nok
aha
but
|:how:|
did it happen I almost said
yes
|:she became:| pregnant aged 17
that’s probably what you meant
Figure 2. The corpus sample from Fig. 1 is here shown in DanSPO format (left column). English glosses are given in the right column. |: xyz :| means xyz repeated.
aloud text, foreign language, etc.). Figure 1 shows how the original transcription text was transformed to the present corpus DanSPO.
The sample in Figure 1 is rendered in DanSPO as shown in Figure 2.
. SweSPO
SweSPO consists of all words in the gbg-fragment of corpus, one utterance per
line.
(c) John Benjamins
The transcriptions
of GSLC can be
rendered
in several formats dependDelivered
by
Ingenta
ing on the closeness desired to standard orthography. The present study emon: Wed, 05 Apr 2006 14:14:36
ploys GSLC with transcriptions in standard Swedish orthography (excluding
to: Guest User
punctuation), being the style most equivalent to BySoc.
IP:
192.87.50.3
The GSLC transcription
format
thus differs somewhat from the BySoc format. Among the differences is the representation of overlapping utterances (demarcated in GSLC with square brackets and in BySoc by using a relative time
axis) and extra-lexical information (rendered in separate lines in GSLC, while
interspersed in the transcription in BySoc). In the corpora used in this paper
we have abstracted away from such differences retaining the lexical word forms
only; compare Figure 2 and 4.
$B:
$A:
@ <
$A:
$B:
$A:
$B:
hon är min maka
< ja >
giggling >
ja ja ja det är bara att fylla i det här
ja
[19 ska väl inte va ]19
[19 hur ofta ]19 träffas ni / ganska sällan
Figure 3. Sample from GSLC. The transcription format includes information on relative timing of utterances (pauses, points-of-interruption, overlaps etc.).
Peter Juel Henrichsen and Jens Allwood
hon är min maka
ja
ja ja ja det är bara att fylla i det här
ja
ska väl inte va
hur ofta träffes ni
ganska sällan
she is my partner
yes
|: yes:| it’s just to fill in this
yes
should not be
how often do you meet
fairly seldom
Figure 4. Sample from Figure 3 shown in SweSPO format (left column). English glosses
are in the right column. |: xyz :| means xyz repeated.
See Allwood et al. (2002b) for a contrastive analysis of the transcription
formats of BySoc and GSLC.
. DanWRI and SweWRI
DanWRI and SweWRI are copied from large newspaper corpora, as mentioned
above. Punctuation is removed, and sentence-initial capital letters are lowered
except for lexically governed
capitalizations
(proper names, certain pronouns
(c) John
Benjamins
and abbreviations).
.
Delivered by Ingenta
on: Wed, 05 Apr 2006 14:14:36
to: Guest User
Methodology
IP: 192.87.50.3
As we intend to compare words in different languages and different modes of
communication, our project involves translation as well as transcription. Neither activity can be claimed to be semantically neutral. Translating – or transcribing – a word potentially changes its meaning. How, then, can we expect
our comparisons to be meaningful? Aren’t we comparing apples to oranges?
. Comparing äpplen and æbler
Cognate languages often have ‘false friends’ – words that are etymologically
related and phonetically similar, and yet do not mean the same. One example is
the Swedish-English pair även – even, “även” meaning also, too, likewise rather
than even.
False friendship is also widespread between Danish and Swedish. The lexeme spelled “rolig” means quiet in Danish, while amusing in Swedish; “spring”
is run in Swedish, but jump in Danish etc., and such superficial similarities often mislead inexperienced translators. In other cases, a certain semantic do-
Swedish and Danish, speech and writing: A statistical comparison
main is structured differently in the two languages. For example, Swedish
“kusin” (cousin) is ambiguous in Danish between “fætter” (male cousin) and
“kusine” (female cousin). Like English, Swedish has no single term translating
Danish “fætter” while Danish lacks a collective term for “fætter”∪“kusine”.
Of course, as long as the semantic conflicts are as clear-cut as in these examples, they can be controlled using bilingual dictionaries. However, many semantic displacements are so tiny or subtle that not even the most advanced
of dictionaries are aware of them. In Sweden, for instance, the term “mjölk”
(milk) in general has the default reading whole milk which is what you’ll get in
the dairy shop if you simply ask for mjölk. Danish “mælk” (milk) has no similar default. Ask for mælk, and you’ll probably be met by the question: whole
milk or low-fat?
Semantic displacements,2 large and small, are likely to pervade any translation list, and the lack of semantic control is hence intrinsic to all projects
involving translation. On the other hand, this does not mean that dictionaries are meaningless things. Language users that are firmly rooted in two languages, often have precise
inter-subjective
judgments in questions of ade(c)and
John
Benjamins
quate and inadequate translations. What is important to remember is simply
Delivered by Ingenta
that preferred translations should be understood as being best possible rather
than exact. on: Wed, 05 Apr 2006 14:14:36
to: Guest
User
In our project, the problems
of semantic
displacement has to be considered and, if possible, quantified.
As we are comparing varieties of language
IP: 192.87.50.3
organized along two orthogonal axes – viz. national language and mode of
communication – we must consider in which dimension the problems can be
expected to be greatest.
Even very extensive monolingual dictionaries do not usually distinguish
between the meanings of written and spoken realizations of a word, or do so
for a small number of lexemes only. Likewise, linguistic literature on word semantics usually does not state explicitly whether the claims and observations
made count for spoken or for written language. It hence seems to be a common
understanding among linguists that, concerning word semantics, the mode of
communication is not of great importance. Also the fact that children can learn
to write within a few years is easier to explain assuming that they, at least by default, reuse the semantics of the words they know. So even if neither of these
arguments are completely conclusive, it is fairly uncontroversial to claim that
changing the mode of communication leaves the semantic content of lexemes
largely intact.3
Peter Juel Henrichsen and Jens Allwood
In contrast, we know for a fact that the transition from language to language does imply semantic displacements, as exemplified above and amply
documented in bilingual dictionaries.
Given these facts and assumptions, we still believe our experimental setup
to be meaningful. Our main hypothesis is that – in the statistical dimensions
we are studying – the choice of mode (written versus spoken) is more significant than the choice of tongue (Danish versus Swedish) as a determining factor. Assuming that the meaning of words is less well preserved under translation than under transcription, alignments of Swedish and Danish words can
be expected to be less equivalent, more ‘noisy’, than alignments of written and
transcribed words within the same language. It could be argued, then, that this
will render any result in support of our main hypothesis even more significant; if two hunters compete and the one with the bent gun wins, this certainly
adds to his achievement. However, we do not want to press this point too hard,
and for now we just observe that our assumptions concerning comparability of
languages and modes of communication are generally shared among linguists.
(c) John Benjamins
. Three dimensions
of comparisonby Ingenta
Delivered
on:toWed,
Apr 2006
14:14:36
We have chosen
compare05
our corpora
using three
different statistical apto: Guest User
proaches,
– frequency distributionIP: 192.87.50.3
–
–
word type ranking
distribution over parts of speech
Since these three levels of description are largely independent of each other,
tendencies observable at more than one level are particularly significant. We
will expand on this in the sections to follow.
In including descriptions of the frequency distribution we adhere to the
Zipfian tradition. George Kingsley Zipf claimed that certain distributional patterns are universal, i.e. are found in any (large) sample of any language (Zipf
1936). An important aspect of Zipf ’s programme is the demonstration that
languages can be compared based solely on the frequency distributions in text
collections, one advantage being that arbitrarily different languages become directly comparable since frequency distribution functions make no reference to
the actual inventory of word types.
Word type ranking refers to the ordered frequency lists of word types.
Questions to be addressed at this level include: Which word type is the overall
Swedish and Danish, speech and writing: A statistical comparison
most frequent in each corpus? To what degree are the lists of frequent types
shared among the corpora (modulo word-to-word translation)? Does the corpus suite subdivide naturally according to the individual word type preferences? If yes, which corpora prefer the same types? As the answering of these
questions does require translation as well as transcription, we must not forget
the risk of semantic displacement when interpreting the results.
Parts of speech (POS) distribution concerns parts of speech rather than
types. For the bilingual comparisons, we settled on a smallish tagset of 7 tags
corresponding to the traditional major parts of speech. The tagset will be
presented and commented on in later sections.
In comparing the POS-distributions we apply methods similar to those of
the word type selections, except that the analytical objects are POS tags rather
than word types. The central questions of this session include: which tag is the
most frequent in each corpus? How are the tags distributed in general? Does the
corpus suite subdivide naturally based on the POS preferences? If yes, which
corpora agree?
Again a caveat is in(c)
place.
Automatic
tagging is of course fast and conveJohn
Benjamins
nient, but not without its problems. Our taggers were trained mainly on writDelivered by Ingenta
ten text sources, so the quality of the DanSPO and SweSPO tagging cannot
on:
Wed,
Apr corpora.
2006 More
14:14:36
be expected to
match
that of 05
the written
importantly, however,
Guest
User
the very idea of analyzingto:
speech
in categories
developed for written text is
questionable. Our tagset (and
other POS-tagsets) lacks labels for sound
IP:most
192.87.50.3
and vision based features such as intonation, stress pattern, pause distribution,
turn shift, facial expression, and gestures – features playing an essential role
in spoken communication. So in general, comparative studies of spoken and
written corpora based on standard POS-tagging are blind to certain aspects of
spoken language expressivity and must be careful not to jump to wrong conclusions concerning the diversity of expression. One of our reasons for including
this type of analysis in the present study is that we wish to be able to relate
to similar investigations in other languages – English in particular. Word-toword translation between English and the Scandinavian languages is often not
possible on a word-to-word basis (cf. Appendix). On the other hand, all Germanic languages largely share the same grammatical taxonomy allowing POSbased comparisons. For more details on the automatic POS tagging of speech
corpora, see Nivre et al. (1996) and Nivre and Grönqvist (2001).
We are now ready to enter the main part of this report, save a concluding
remark. At each of the three levels of description we wish to determine whether
the corpus suite divide naturally into pairs. If this is the case, the essential ques-
Peter Juel Henrichsen and Jens Allwood
tion is which corpora go together – those sharing national language, or those
sharing mode of communication?
. The Danish case
This section reports on our frequency-based contrastive analysis of the two
Danish corpora DanSPO and DanWRI.
. Frequency distribution
As explained in Section 4, we first study the frequency distributions of DanSPO
and DanWRI irrespective of the actual word types (‘Zipf style’).
Figure 5 shows the number of occurrences for the 30 most frequent words
in DanSPO and DanWRI. From here on, we adopt the notation #n for the word
type with rank number n (in a specified corpus). Notice the large difference
between the leftmost columns
in the Benjamins
graph: #1 of DanSPO has 74,159 occur(c) John
rences while #1 of DanWRI has only 44,981. In general, the top ranked types
Delivered by Ingenta
are seen to cover larger parts of DanSPO than of DanWRI. #1-#10 cover 28.5%
on: only
Wed,
Apr 2006
14:14:36
of DanSPO, while
20.8%05
of DanWRI.
Put in another
way, it takes only 30
to: Guest User
IP: 192.87.50.3
80000
Count
60000
DanSPO
40000
DanWRI
20000
0
#1
#4
#7
#10 #13 #16 #19 #22 #25 #28
Rank
Figure 5. Danish frequency distribution. The graph shows the number of occurrences
of the 30 most frequent word types in DanSPO and DanWRI respectively.
Swedish and Danish, speech and writing: A statistical comparison
Table 2. Type-token ratio for DanSPO and DanWRI in various frequency ranges,
shown in absolute and relative measures (Accumulated count / Coverage in %)
Range
#1–10
#1–20
#1–30
#1–50
#1–100
#1–200
#1–500
#1–1000
#1–10000
DanSPO
380,599
549,283
671,223
806,525
940,834
1,046,036
1,144,585
1,197,630
1,302,670
DanWRI
28.5%
41.1%
50.3%
60.4%
70.5%
78.4%
85.7%
89.7%
97.6%
277,161
412,762
473,882
536,450
618,383
696,591
797,258
876,435
1,144,510
20.8%
30.9%
35.5%
40.1%
46.3%
52.2%
59.7%
65.6%
85.7%
types to cover 50% of the DanSPO text mass while 154 types in DanWRI. In total, DanWRI has 104,968 different types while DanSPO has only 35,112, or one
third. The same words are thus reused to a much larger extent in speech than
in writing. Table 2 shows the type-token ratio for selected frequency ranges.
(c) John Benjamins
Delivered by Ingenta
. Word type ranking
on: Wed, 05 Apr 2006 14:14:36
Even if the distributional patterns
of DanSPO
and DanWRI do differ substanto: Guest
User
tially, of course their preferred
types could still be the same. Is this the case?
IP: 192.87.50.3
In Table 3 below we present the 10 most frequent word types in DanSPO and
DanWRI, respectively.
Only four types appear in both top-10 lists, namely “det”, “og”, “er”, “i”.
Of these four, only “og” (and) cover similar parts of DanSPO and DanWRI
while “det” (it/this/that/the) is almost three times as frequent in DanSPO as in
DanWRI. Also the frequencies of “er” (BEPRES ) and ‘i’ (in) differ markedly.
Let us pick out the ten most frequent types in DanSPO for a closer study.
Each of them does occur in DanWRI, but most are not as frequent. How
frequent is illustrated in Figure 6b below showing the relative coverage of
DanSPO’s #1–10 in DanWRI. On the logarithmic axis, value “1” means equal
coverage in DanSPO and DanWRI, “0.1” means 10% coverage in DanWRI relative to DanSPO, etc. Figure 6b, analogously, shows the coverage of DanWRI’s
#1–#10 in DanSPO.
Examples: From Figure 6a we learn than “der” and “ikke” are only half as frequent in DanWRI as in DanSPO (the exact values are 0.53 and 0.49, respectively). In Figure 6b we see that “til” and “for” are similarly weak in DanSPO
Peter Juel Henrichsen and Jens Allwood
Table 3. Frequency lists: 10 most frequent word types in DanSPO and DanWRI
Rank
DanSPO
Type
#1
#2
#3
#4
#5
#6
#7
#8
#9
#10
det
ja
og
jeg
er
så
der
ikke
var
i
Count
it, this, ...
yes
and
I
BEPRES
then, so, ...
there, it, ...
not
BEPAST
in, ...
74,159
47,127
41,538
39,371
38,317
36,193
32,305
24,869
23,467
23,341
DanWRI
Type
Count
in
and
toINF , ...
BEPRES
it, this, ...
a, one
toPREP , ...
on, at
of, off, ...
for, ...
44,981
39,145
34,560
26,913
26,644
23,145
21,615
20,972
20,364
18,822
i
og
at
er
det
en
til
på
af
for
Types occurring in both subtables (DanSPO and DanWRI) are in bold typeface. Only minimal English translations are given in the table, cf. Appendix for fuller translations.
Top 10 DanWRI types:
(c) John Benjamins
Coverage in DanSPO
Delivered
by
Ingenta
1
1
10
0.1
0.1
“i”
on:
Wed,
05
Apr
2006
14:14:36
#1
“det”
“og”
“ja”
to: Guest
User“at”
“og”
#3
“er”
“jeg”
“det”
#5
IP:
192.87.50.3
“er”
“en”
“så”
Top 10 DanSPO types:
Coverage in DanWRI
“der”
“ikke”
“var”
#7
“i”
#9
“til”
“på”
“af ”
“for”
10
#1
#3
#5
#7
#9
Figure 6. Relative coverage of types #1–10 in DanSPO (Figure 6a, left) and DanWRI
(6b, right). Types appearing in both tables are in bold typeface. Eng. glosses: see Table 3.
(0.49 and 0.48). In both tables the only short bar is that of “og”, being the only
top ranked type of equal coverage. In Figure 6a, “ja” is extreme – actually extending over the left edge of the graph by more than one order of magnitude
(“ja”, yes, being 128 times less frequent in DanWRI than in DanSPO).
In general we find a substantial difference in coverage for almost all top
ranked types, in about half of the cases by a factor 2 or more (corresponding
Swedish and Danish, speech and writing: A statistical comparison
to types that are more than twice as frequent in one corpus than in the other).
This pattern is repeated in the range #11–#20 where we also find but a single
type of similar coverage (#14 “de”, they, covering just 8% more of DanSPO than
DanWRI) while all others differ widely (half of them by a factor 2 or more).
The two corpora thus clearly diverge concerning word type ranking (the
divergence is even more pronounced for low frequency word types as we will
see shortly), but we still need to show that the discrepancy is related to the
mode of communication rather than e.g. genre or topic. We therefore introduce
three new written corpora in various genres: daily newspapers (referred to as
W-1), magazines (W-2), and journals (W-3) (from Maegaard 1975). Compare
now Figure 7 with Figure 8 below. Figure 7 repeats Figure 6b expanding the
data series up to #20 while leaving out some details. Figure 8 presents the same
20 word types showing their coverage in W-1, W-2, and W-3.
Unsurprisingly, the choice of written genre (Figure 8) does have some impact on the word type selection, but the disagreements among the written corpora are generally much smaller than in Figure 7. Also the general picture is far
less chaotic. Among the
newJohn
corpora,Benjamins
W-2 (“magazines”) diverges most from
(c)
DanWRI, yet no type in this corpus is over-selected by a factor greater than 1.5
Delivered by Ingenta
(or under-selected by less than 0.66). This is clearly in contrast to the DanSPO
on:caseWed,
05
Apr
14:14:36
versus DanWRI
where we
found
half2006
of the types
being over-selected by a
to:
Guest User
factor 2+ (or under-selected
by 0.5).
IP: 192.87.50.3
DanSPO coverage
0.1
1
10
1
4
7
10
13
16
19
Figure 7. Coverage of DanWRI’s word types #1–20 in DanSPO
Peter Juel Henrichsen and Jens Allwood
W-1
0.1
1
W-2
10
1
0.1
W-3
10
0.1
1
10
1
1
1
4
4
4
7
7
7
10
10
10
13
13
13
16
16
16
19
19
19
Figure 8. Coverage of DanWRI’s word types #1–20 in newspapers (W-1), magazines
(W-2) and journals (W-3)
The next question(c)
is whether
picture is repeated for the infrequent
JohnthisBenjamins
types. To answer this question we adopt a formula for quantifying the deviation
Delivered by Ingenta
in word type preference in two corpora X and Y.
on: Wed, 05 Apr 2006 14:14:36
4
to: Guest
User
Deviation in Word Type Preference
(DWTP)
b
X
IP:
192.87.50.3
log Freq(X, Typer )
DWTP(X, Y, #a, #b) =
r=a
Freq(Y, TypeXr )
b–a+1
Function DWTP(X, Y, #a, #b) measures the mean deviation of types #a–#b
with regard to coverage in X and Y. DWTP is thus a function of the types in
′
range #a–#b of X. TypeXr′ is the type ranked r′ in (the frequency list of) corpus
X ′ . Freq(X’, T’) is the frequency of type T’ in X’. Value 0 corresponds to total
agreement (each type in #a–#b covers equal parts in X and Y). Other values
represent disagreement – more so, the larger the value. Examples:
DWTP(DanSPO, DanWRI, 1, 10) = 1.315
DWTP(DanWRI, DanSPO, 1, 10) = 0.599
These values correspond to Figures 6a and 6b above, as DWTP values are equivalent to the mean length of the bars. Notice that, due to the asymmetry of the
formula (range #a – #b referring to X rather than Y) DWTP(X′ , Y′ , a′ , b′ ) is not
in general equal to DWTP(Y′ , X′ , a′ , b′ ). The tables in Figure 8 compute as:
Swedish and Danish, speech and writing: A statistical comparison
Table 4. Deviation in word type preference for assorted type ranges (shown as DWTP
values)
Range
( DanSPO,
DanWRI )
( DanWRI,
DanSPO )
#1–10
#11–20
#21–30
#31–50
#51–100
#101–200
#201–500
#501–1000
#1001–10000
1.135
1.453
1.211
1.014
0.825
0.932
1.108
1.036
1.109
0.599
0.509
0.920
0.553
0.837
1.039
1.437
1.711
1.642
DWTP(DanWRI, DanSPO, 1, 20) = 0.554
DWTP(DanWRI, W-1, 1, 20) = 0.147
DWTP(DanWRI, W-2, 1, 20) = 0.183
DWTP(DanWRI, W-3, 1, 20) = 0.108
(c) John Benjamins
Delivered by Ingenta
on: ofWed,
Apr is2006
For DWTP values
<0.2, the05
deviation
less than14:14:36
22% (insignificant). Value
0.69 (= log 2) means a deviation
of a factorUser
2; value 1.1 is factor 3, and value
to: Guest
1.61 is factor 5.
IP: 192.87.50.3
In Table 4, DanSPO and DanWRI are compared by lexical selection for
various type ranges. As seen, disagreement is generally stronger in the low end
of the frequency scale. In particular, the DanWRI types ranked #201+ (mainly
content words) have markedly different frequencies than have the same types
in DanSPO (differing by more than a factor 4 on average).
We conclude that spoken and written Danish (as represented in our reference corpora) prefer different types to a large extent. While this holds true
for all type ranges, the disagreement becomes extreme in the lower part of the
DanWRI list (#201+).
. Grammatical observations
We now add a grammatical dimension to our investigations. Using Eric Brill’s
algorithm (Brill 1994), we annotate DanSPO and DanWRI with part of speech
tags, employing the Danish PAROLE tagset (Bilgram & Keson 1998; Henrichsen 2002a). This tagset contains about 150 tags distributed over 10 major parts
Peter Juel Henrichsen and Jens Allwood
Table 5. Personal pronouns
Type
jeg
du
vi
han
hun
I
youSG
we
he
she
DanSPO
Rank
Count
DanWRI
Rank
Count
Weighted-count
W-1 / W-2 / W-3
#4
#18
#22
#23
#40
#27
#197
#28
#23
#59
4,800 / 11668 / 2,787
1,166 / 3,610 / 342
5,415 / 5,810 / 4,790
5,722 / 7,348 / 2,115
1,653 / 5,036 / 422
39,371
15,553
13,326
13,314
6,609
5,638
557
5,498
7,122
2,073
W-1,2,3 figures are weighted for direct comparison with DanSPO/DanWRI figures
Table 6. Particles with special discourse functions
Type
så
der
ikke / ik’
altså
sådan
so, then, ...
there, ...
not / y’know
so, well, ...
thus, like, ...
DanSPO
Rank
Count
#6
#7
#8 / #15
#20
#26
36,193
32,305
41,968
14,239
12,837
DanWRI
Rank Count
Weighted-count
W-1 / W-2 / W-3
#24
#12
#17
#234
#152
4,909 / 8,448 / 4,614
15,213 / 13,953 / 17,606
9,580 / 10,856 / 9,217
340 / 486 / 561
590 / 988 / 758
6,351
17,277
12,257
453
754
(c) John Benjamins
Delivered by Ingenta
W-1,2,3 figureson:
are weighted
for direct
with DanSPO/DanWRI
Wed,
05 comparison
Apr 2006
14:14:36figures
to: Guest User
of speech: noun, verb, adjective,
pronoun, conjunction, preposition, adverb,
IP: 192.87.50.3
interjection, ‘unique’ (for grammatical particles etc.), and ‘residual’ (a rarely
used category for www-addresses, smileys, unidentified tokens etc).
The POS-tagged versions of DanSPO and DanWRI allow us to pose new
questions: Which parts of speech dominate in written Danish, and which in
speech? Is there any agreement?
Consider first some examples, the cases of personal pronouns, prepositions, determiners, and ‘discourse particles’ (the latter referring to a loose collection of conjunctions and adverbials etc. acting as modifiers, “altså”, “så”, “sådan”, “der”, or as feedback triggers, “ik’ ”. They are often found in utterance final
position, and often have a prolonged vowel/sonorant, e.g. “ altså∼ “, ” sån∼”).
As seen, pronouns and discourse particles are in general far more frequent
in DanSPO than in DanWRI. There is, however, considerable variation within
the categories: while “vi”, “han”, and “hun” are 2–3 times more frequent in
DanSPO, the corresponding figures for “jeg” and “du” are 7 and 28, respectively. So, 1st person is extremely common in informal conversation, while
quite rare in the daily paper.
Swedish and Danish, speech and writing: A statistical comparison
Table 7. Determiners
en
den
et
Type
DanSPO
Rank
Count
DanWRI
Rank
Count
Weighted-count
W-1 / W-2 / W-3
aUTT , ...
theSG+UTR , ...
aNEU , ...
#19
#29
#41
#6
#11
#18
20,736 / 21,691 / 22,214
15,392 / 15,326 / 17,067
9,394 / 8,961 / 10,173
15,406
10,590
6,456
23,145
17,509
10,799
W-1,2,3 count figures are weighted for direct comparison with DanSPO/DanWRI figures
Table 8. Prepositions
Type
i
til
på
af
for
med
in, ...
toPREP , ...
on, ...
of, off, ...
for, ...
with, by, ...
DanSPO
Rank
Count
DanWRI
Rank
Count
Weighted-count
W-1 / W-2 / W-3
#10
#28
#21
#35
#34
#27
#1
#7
#8
#9
#10
#13
49,508 / 35,116 / 41,716
24,453 / 19,374 / 21,397
18,699 / 17,387 / 17,430
20,025 / 15,689 / 27,779
17,974 / 14,130 / 19,731
17,910 / 17,814 / 16,688
23,341
10,628
13,343
7,511
9,099
11,384
44,981
21,615
20,972
20,364
18,822
16,615
(c) John Benjamins
W-1,2,3 figures are weighted
for direct comparison
with DanSPO/DanWRI figures
Delivered
by Ingenta
on: Wed, 05 Apr 2006 14:14:36
Determiners and prepositions,
in contrast,
are favored in DanWRI (see
to: Guest
User
Tables 7 and 8).
IP: 192.87.50.3
“I” (in) is the favorite preposition of both DanSPO and DanWRI (by a safe
margin). Otherwise there is little or no agreement among the most frequent
prepositions:
– DanWRI:
– DanSPO:
til > på > af > for > med
på > med > til > for > af
DanSPO and DanWRI tokens are distributed over parts of speech as follows:5
DanWRI: Noun-Verb-Prep-Pro-Adj-Adv-Conj-Unique-Residual-Interjec
DanSPO: Pro-Verb-Adv-Noun-Conj-Prep-Interjec-Adj-Unique-Residual
In each row the order of the elements reflects the absolute numbers of tokens in the corresponding categories. DanWRI, hence, has more nouns than
verbs, more verbs than prepositions, etc. Interjections is the smallest category
in DanWRI.
Categories in bold are larger in relative measures, i.e. much larger in one
corpus than in the other. By way of example, interjections are not the largest
Peter Juel Henrichsen and Jens Allwood
Table 9. DanSPO’s favored categories
PAROLE tag
POS
Examples
DanSPO
RGU
I=
PP;
PT;
adverb
interjection
personal pronoun
interrogative pron.
så, sådan, altså
ja, nej, nå, mm
jeg, han, jeres
hvem, hvad
Count
DanWRI
200,151
88,115
192,415
7,404
87,813
498
57,899
2,458
Table 10. DanWRI’s favored categories
PAROLE tag
SP
NC;
AN;
NP;
POS
Examples
preposition
common noun
adjective
proper name
under, i
dag, pigernes
stort, bedre
Bo, Norge
Count
DanSPO
DanWRI
89,870
123,404
77,278
18,875
178,696
296,005
128,737
92,995
(c) John Benjamins
category of DanSPO, yet much larger in DanSPO than in DanWRI (7th in the
Delivered by Ingenta
DanSPO sequence, 10th in DanWRI).
on: Wed,
05 Apr
14:14:36
In conclusion,
written Danish
prefers2006
nouns, prepositions,
and adjectives,
to:
Guest
User
while spoken Danish prefers pronouns, adverbs, and interjections (including
192.87.50.3
all sorts of particles used asIP:
attention
signals, feedback, response elicitors etc).
. Concluding remarks on the Danish data
We have used statistical measures to compare the verbal material of corpora
DanSPO and DanWRI of spoken and written Danish. We have found substantial differences between the two, not only in frequency distribution, but in word
type selection and in categorical preference as well. Each of these three dimensions is independent of the two others, in the sense that large disagreement
in any one dimension may well co-occur with near-agreement in the other
two. It is therefore interesting to observe that DanSPO and DanWRI diverge
substantially in each dimension.
In Section 7 – after having presented the Swedish data – we follow up on
these preliminary observations.
Swedish and Danish, speech and writing: A statistical comparison
. The Swedish case
We now turn to the next question: How does Swedish relate to Danish? Where
are the most significant similarities – and the most pronounced differences?
Are the conclusions drawn in the previous section specific for Danish, or do
they hold for Swedish too?
For ease of comparison we present the Swedish and Danish figures side
by side in most of the tables below, repeating some Danish data from the
previous sections.
. Frequency distribution
We first compare the frequency distribution of spoken and written Swedish in
Figure 9, and compare it to the corresponding Danish graph, repeated below
as Figure 10.
As the graphs 9 and 10 show, the frequency distributions for spoken and
written Swedish closely(c)
match
those ofBenjamins
spoken and written Danish. This actuJohn
ally is valid for all frequency ranges, as seen in Table 11.
Delivered by Ingenta
The remarkably close match in columns SweSPO and DanSPO is not quite
Wed,
05 AtApr
matched by on:
the written
corpora.
rank2006
100, the 14:14:36
accumulated frequency for
to:than
Guest
User
DanWRI is about 3% higher
for SweWRI.
This difference may be due to
IP: 192.87.50.3
70000
Frequency (ppm)
60000
50000
SweSPO
40000
SweWRI
30000
20000
10000
0
#1
#4
#7
#10 #13 #16 #19 #22 #25 #28
Rank
Figure 9. Swedish frequency distribution, 30 top-ranked types (frequencies shown in
parts-per-million).
Peter Juel Henrichsen and Jens Allwood
70000
Frequency (ppm)
60000
50000
DanSPO
40000
DanWRI
30000
20000
10000
0
#1
#4
#7
#10 #13 #16 #19 #22 #25 #28
Rank
Figure 10. Danish frequency distributions, 30 top-ranked types. Same data as in Figure
5, rendered here in ppm.
(c) John Benjamins
Delivered
by Ingenta
SweSPO
DanSPO
SweWRI
on:28.8%
Wed, 05 28.5%
Apr 2006 14:14:36
19.8%
to: Guest
User28.8%
41.7%
41.1%
50.9%
50.3%
32.8%
IP: 192.87.50.3
Table 11. Accumulated frequencies
Rank
... #10
... #20
... #30
... #50
... #100
... #200
... #500
... #1000
60.4%
71.0%
78.4%
86.2%
90.7%
60.4%
70.5%
78.4%
85.7%
89.7%
37.1%
43.5%
49.6%
58.1%
64.6%
DanWRI
20.8%
30.9%
35.5%
40.1%
46.3%
52.2%
59.7%
65.6%
factors beyond our control. There may for instance be minor differences in the
word token definitions applied in corpus Berlingske-99 (DanWRI) and corpus
Göteborgsposten (SweWRI), or the two newspaper corpora may not consist
of exactly the same text genres. In any case, the difference is fairly small and
certainly insignificant in comparison with the spoken-written discrepancy.
. Word type ranking
We now take a comparative look at the word type ranking in all four corpora.
Table 12 provides a comparison of the 10 most highly ranked words in the two
languages.
Swedish and Danish, speech and writing: A statistical comparison
Table 12. Aligned Swedish and Danish frequency lists
Rank
#1
#2
#3
#4
#5
#6
#7
#8
#9
#10
SweSPO
(type / count)
DanSPO
(type / count)
SweWRI
(type / count)
DanWRI
(type / count)
det [∼det] 69,622
är [∼er]
33,641
och [∼og] 29,534
ja [∼ja]
28,093
att [∼at]
27,270
jag [∼jeg]
22,811
man [∼man] 20,190
så [∼så]
19,964
som [∼som] 18,318
inte [∼ikke] 18,058
det [+det] 55,552
ja [+ja]
35,303
og [+och] 31,116
jeg [+jag] 29,493
er [+är]
28,703
så [+så]
27,112
der [–]
24,200
ikke [+inte] 18,629
var [+var] 17,579
i [+i]
17,479
i [∼i]
34,887
och [∼og] 30,030
att [∼at]
21,633
en [∼en]
17,783
det [∼det] 17,710
på [∼på]
17,542
som [∼som] 17,519
är [∼er]
14,494
av [∼af]
13,398
med [∼med] 13,182
i [+i]
33,687
og [+och] 29,317
at [+att] 25,883
er [+är] 20,158
det [+det] 19,954
en [+en] 17,334
til [+till] 16,188
på [+på] 15,706
af [+av] 15,251
for [+för] 14,096
Count figures are in parts-per-million. Lexemes with +/∼ are nearest Swe./Dan. equivalents
according to Palmgren et al. (2001) and Molde (2000). Cf. Appendix for Eng. translations.
Table 13. Word type ranking (DWTP values): Swedish speech vs. writing
Range
#1–#10
#11–#20
#21–#30
#31–#50
#51–#100
#101–#200
#201–#500
#501–#1000
(c)(SweSPO,
JohnSweWRI)
Benjamins (SweWRI, SweSPO)
Delivered
1.313 by Ingenta
0.571
on: Wed, 051.251
Apr 2006 14:14:360.799
1.289
1.064
to: Guest
User
1.150
0.988
IP: 192.87.50.3
1.187
0.819
1.038
1.005
0.994
1.152
1.468
1.462
Types ranked #1001+ are not considered in this table due to the small size of corpus gbg.
The top-10 lists of spoken Danish and spoken Swedish share seven types
(i.e. nearest-equivalent translations according to two leading dictionaries). The
three remaining Swedish types are att, man and som, the Danish residual being der (which lacks a Swedish equivalent), var and i. For the written language,
the overlap is also considerable with eight out of 10 equivalents, the Swedish
residual being som and med, the Danish residual til and for. In contrast, the
SweSPO and SweWRI top-10 lists share five types only, and the DanSPO and
DanWRI lists share only four. We may also note that the rank order is more
similar between the two spoken language variants than it is between the spoken and written variant of the same language. The same holds for the two
written variants.
Peter Juel Henrichsen and Jens Allwood
Table 14. Word type ranking (DWTP values): Danish versus Swedish
Range
#1–#10
#11–#20
#21–#30
#31–#50
#51–#100
#101–#200
(DanSPO, SweSPO)
(DanWRI, SweWRI)
0.300
0.316
0.401
0.616
0.646
0.694
0.143
0.169
0.158
0.291
0.332
0.373
Table 13 shows that written Swedish and spoken Swedish are quite distinct
in their word type preferences – in parallel with the Danish case as seen in
Table 4 in Section 5.2.
Table 14 is based on a word-to-word Dan-Swe translation list compiled
from a standard dictionary (Svensk-Dansk Ordbog 2001) and carefully examined by three linguists (not including the authors). In preparing the data for
Table 14, all content words were excluded from the calculation being presumably typical of the activity type rather than a more general structural feature
(c) John Benjamins
of the language. The common noun “naturen” (nature) is e.g. very frequent in
byproper
Ingenta
GSLC while absent inDelivered
BySoc, whereas the
noun ‘Nyboder’ (a suburb of
on:
Wed,
05
Apr
2006
14:14:36
Copenhagen) is frequent in BySoc, but absent in GSLC.
The table shows that the
distinct
preferences
noted in Table 4 and Table 13
to:
Guest
User
are indeed upheld, so thatIP:
(i) spoken
Danish
and
192.87.50.3Swedish are similar (especially concerning the top-ranked types), and (ii) written Danish and Swedish
are also similar.
. Grammatical observations
Spoken and written tokens are distributed over the major parts of speech as
follows:
DanWRI:
DanSPO:
Noun-Verb-Prep-Pro-Adj-Adv-Conj
Pro-Verb-Adv-Noun-Conj-Prep-Adj
SweWRI:
SweSPO:
Noun-Verb-Prep-Pro-Adv-Adj -Conj
Pro-Verb-Adv-Noun-Conj-Prep-Adj
The tagsets employed in the tagged versions of the four reference corpora are
not identical, but at least compatible. Certain specific categories had to be
omitted from the investigation being absent in at least one of the corpora. The
ignored categories are: Feedback, Own Communication Management, Unique,
Swedish and Danish, speech and writing: A statistical comparison
Residual, Interjections, Numerals. The SweSPO category list is compiled from
Allwood (1999). The exclusion of the categories feedback, own communication management and interjection has the consequence that the apparent differences between spoken and written language are actually diminished, since
these three categories are all very much more common in spoken language
than in written language. If we disregard the caveat of Section 4.2 above, it is
striking that the distributions of the parts of speech are near-identical in spoken Danish and Swedish, and again in written Danish and Swedish. In contrast,
the distributions differ widely when holding constant the language rather than
mode of communication, i.e. the similarity is much greater between spoken
Danish and spoken Swedish than it is between spoken and written Danish or
spoken and written Swedish.
. Language versus mode of communication
Having aligned the Swedish
and Danish
figures, we are now in a position to
(c) John
Benjamins
draw some overall conclusions concerning the relative importance of mode of
Delivered by Ingenta
communication in comparison with national language. Table 15 and 16 beWed,between
05 Apr
2006
14:14:36
low illustrateon:
the relations
the four
reference
corpora. In each cell is
to:forGuest
User of corpora.
represented the DWTP value
a certain combination
Several conclusions can
be read
off these tables.
IP:
192.87.50.3
Table 15. Word type ranking (DWTP values, 10 top-ranked types)
Rank #1-#10
SPOKEN versus ...
WRITTEN versus ...
... SPOKEN
(DanSPO,SweSPO) = 0.30
(SweSPO,DanSPO) = 0.29
(DanWRI,DanSPO) = 0.60
(SweWRI,SweSPO) = 0.57
... WRITTEN
(DanSPO,DanWRI) = 1.14
(SweSPO,SweWRI) = 1.31
(DanWRI,SweWRI) = 0.14
(SweWRI,DanWRI) = 0.11
Table 16. Word type ranking (DWTP values, 100 top-ranked types)
Rank #1-#100
SPOKEN versus ...
WRITTEN versus ...
... SPOKEN
(DanSPO,SweSPO) = 0.45
(SweSPO,DanSPO) = 0.46
(DanWRI,DanSPO) = 0.74
(SweWRI,SweSPO) = 0.85
... WRITTEN
(DanSPO,DanWRI) = 0.98
(SweSPO,SweWRI) = 1.14
(DanWRI,SweWRI) = 0.27
(SweWRI,DanWRI) = 0.25
Peter Juel Henrichsen and Jens Allwood
First, the choice of mode of communication is clearly more significant than
the choice of national language with respect to the distributional patterns discussed in this paper. Written Swedish is far more similar to written Danish
(DWTP = 0.11 in Table 15) than it is to spoken Swedish (0.57). Spoken Danish
and Swedish are much more similar (0.30) than spoken and written Danish
(1.14), and so forth.
Secondly, concerning lexical preferences, spoken language seems more idiosyncratic than written language meaning that the top ranked types of written
language are all moderately frequent in speech as well while a number of the
top ranked words of speech are extremely rare in writing. Compare values
DWTP(SweSPO,SweWRI,#1,#10)=1.31
DWTP(SweWRI,SweSPO,#1,#10)=0.57,
a highly significant difference recalling that DWTP-values are logarithmic.
Thirdly, the two main conclusions above persist when extending the range
under consideration from #1-#10 to #1-#100, but the distinctions become less
pronounced: large DWTP-values
(corresponding
to large discrepancies) tend
(c) John
Benjamins
to decrease, while small values (close similarities) increase. In other words, it is
Delivered by Ingenta
to a large degree the top-frequent types that account for the differences between
on:
Wed,
Apr 2006When
14:14:36
the spoken and
written
mode05
of communication.
including more types,
to: but
Guest
User
the main conclusions still hold,
less clearly
so.
Finally, observe that the
within each cell are equal within a small
IP:figures
192.87.50.3
margin, indicating that the transfer between languages (keeping the mode of
communication constant) is largely symmetrical: the difference between written and spoken Danish (DWTP = 0.60) closely matches the difference between
written and spoken Swedish (DWTP = 0.57), and so forth. This symmetry is
perhaps not surprising, yet encouraging since the opposite situation would blur
the otherwise quite clear conclusions that we have been able to draw.
Figure 11 below illustrates the data of Table 15. Each data point represents a
pair of corpora (C1, C2), the X-value corresponding to DWTP(#1,#10,C1,C2),
and the Y-value to DWTP(#1,#10,C2,C1). The geometrical distance to (0,0)
hence measures the disagreement in word type selection (larger distance meaning larger disagreement). In other words, points near to (0,0) represent corpora
with similar word type preferences, while data points far from (0,0) represent
corpora that disagree on which word types to prefer.
The perhaps surprising conclusion is that Danish speech and Swedish
speech are much more similar to each other in the dimensions investigated
here – frequency distribution, word type selection, and distribution of parts
Swedish and Danish, speech and writing: A statistical comparison
1.4
SWEWRI vs.
SWESPO
1.2
DANWRI vs.
DANSPO
1.0
0.8
0.6
0.4
DANSPO vs.
SWESPO
0.2
DANWRI vs.
SWEWRI
0
0
0.2
0.4
0.6
0.8
(c) John Benjamins
Delivered by Ingenta
on:Danish
Wed,
05is Apr
2006
14:14:36
of speech – than
speech
to Danish
writing
and Swedish speech is to
Swedish writing. There seem
grounds User
for claiming that if spoken Danish
to:to be
Guest
and written Danish are toIP:
be upheld
as
one
language, certainly spoken Dan192.87.50.3
Figure 11. Pairwise similarities mapped as DWTP-values
ish and spoken Swedish should be considered as one language as well – and
similarly for Swedish. Whether this conclusion holds in other dimensions of
language like syntax, semantics and pragmatics remains to be seen; for now we
leave this interesting question to further investigation. Our conjecture is that
the perceived difference between Danish and Swedish speech is mostly a matter
of minor lexical and phonological transformations.
Open word classes (proper nouns, common nouns, adjectives, content
verbs) are strong in written language while function words including pronouns
dominate in speech.6 Typical written language includes large numbers of different words, while spoken interaction to a much greater extent recycle the same
words. Each type in DanWRI thus has less than 13 occurrences on average,
while each DanSPO type has 38.
However important these general observations may be, a single type seems
to provide the biggest surprise: the strikingly high frequency of multi-purpose
pronoun det. Ask a Dane or Swede which word is the most frequent in his
everyday vernacular, and chances are that (s)he will suggest “og”/“och”, “i”,
Peter Juel Henrichsen and Jens Allwood
“at”/“att”, “jeg”/“jag” or “ja” – hardly ever “det” (we tried this many times).
Even in the process of answering the question, he/she will inevitably use “det”
a dozen times – without noticing. Why is “det” so frequent, and yet so invisible?
. Biber et al. 1999
The Longman Grammar of Spoken and Written English (Biber et al. 1999) provides grounds for interesting comparisons with our findings. The book reports
on a large-scale corpus-based comparison of various styles of English speech
and writing (termed ‘registers’), including newspaper text (5.4 million words)
and transcriptions of conversations (3.9 million words). Fitting together pieces
of information scattered in various sections (2.3.5, 2.4.14., 4.1.2, 4.10.5, 14.3.3
et pass.), we arrive at this POS distribution:
English speech (conversations):
English writing (news):
Pro-Noun-Verb-Adv-Prep-Adj
Noun-Prep-Verb-Adj-Pro-Adv
Categories larger in one mode (“register”) than in the other are in bold typeface
(c) John Benjamins
(our category of conjunctions (Conj) is incompatible with the Longman POS
Delivered
by(see
Ingenta
inventory). Comparing
with our findings
6.3 above) we observe that – in
on:
Wed,
05
Apr
2006
14:14:36
English as in Scandinavian – nouns, prepositions, and
adjectives are typical of
writing, while pronouns and
(and obviously
to:adverbs
Guest
User interjections) are typical
of speech.
IP: 192.87.50.3
Why are pronouns, adverbs, and interjections typical of speech? What do
these three categories have in common? Certainly not their morphology. In so
far as these items have morphological features at all, pronouns are more like
nouns, adverbs like adjectives, and interjections like grammatical particles – all
categories typical of the written language.
Perhaps some of the reasons are the following: Spoken language draws on
contextually available information more than written language, where less context is available and the text has to be more explicit. One consequence of this is
that written language has to introduce and maintain reference by explicit use
of descriptive nouns and noun phrases, while spoken language, relying on context, can make do with pronouns. Interactive spoken language is also generally
more impulsive and reactive. This means that there is a greater need for and
use of interjections (including words for feedback and own communication
management) and attitudinal adverbs. In written language, as already has been
mentioned, on the other hand, there is a greater need of contextual explicitness, which is often met by using longer descriptive noun phrases containing
Swedish and Danish, speech and writing: A statistical comparison
prepositions, binding the phrases together, and adjectives to provide more explicit descriptive information. In spoken language instead there are often more
conjunctions helping to flesh out content, that may be more compressed in
written language, into several short statements.
. Conclusions and implications for further research
In this paper, we have studied three dimensions of language based on word
frequencies: frequency distribution, word type ranking and the distribution of
parts of speech in spoken and written Danish and Swedish. We have found that
in all of the three dimensions spoken Danish and spoken Swedish are more
similar to each other than are spoken Danish to written Danish or spoken
Swedish to written Swedish.
At the moment, however, it is a little unclear what conclusions can be
drawn from these observations. A first conclusion might be that the differences between spoken (c)
and written
are the same in at least three imJohnlanguage
Benjamins
portant respects in Danish and Swedish. Given the compatibility with English
Delivered by Ingenta
data discussed above, it is not unlikely that the same differences might be found
05modes
Aprof2006
14:14:36
between the on:
spokenWed,
and written
other languages,
i.e.,
to: Guest User
(i) Common words are reused more often in spoken language than in written
IP: 192.87.50.3
language and written language
has a richer vocabulary in frequent use than
spoken language.
(ii) The discourse functions expressed by certain very frequent words could
represent a constant functional need in the spoken languages of a certain
language type. The picture for written language is less clear.
(iii) The discourse functions expressed by pronouns, adverbs, interjections
and conjunctions are more typical of spoken language than written language, while the discourse functions expressed by nouns, adjectives and
prepositions are more typical of written language.
A second conclusion might be that spoken Danish and spoken Swedish are
more closely related than spoken and written variants of Danish and Swedish
are to each other. The plausibility of this conclusion depends on how the properties we have studied are related to other properties that give a language its
identity. It also depends on whether the properties we have observed are general
and universal features of the difference between spoken and written language
Peter Juel Henrichsen and Jens Allwood
variants rather than a feature of the particular relationships between spoken
and written Danish and Swedish.
We therefore believe our study should be extended in two ways: (i) We
should investigate the difference between spoken and written variants of other
languages than Swedish and Danish, in order to see if our results reoccur.
(ii) We should attempt to correlate our present findings concerning spoken
and written Swedish and Danish with other features of these languages, to see
if what we have found is part of a more general picture of the relationship
between the languages.7
. Acknowledgments
We wish to thank our two anonymous reviewers for their precise and highly
value-adding comments.
(c) John Benjamins
Delivered by Ingenta
on:
Wed,
05(Nordic
AprSpeech
2006
14:14:36
. – at least since
the decline
of NST
Technology)
in Voss, Norway.
to:termGuest
. ‘Displacement’ is an intuitive
coined to User
denote the semantic distance between
Danish-Swedish synonyms (i.e.IP:
translations
preferred by language users with a solid under192.87.50.3
standing of both languages). The term ‘displacement’ is thus used for informal presentation
Notes
only, not for data analysis.
. This is also the stance taken in the extensive corpus-based Longman Grammar of Spoken
and Written English (Biber et al. 1999) where spoken language transcripts (conversations)
are compared directly with written sources like newspaper texts and academic prose.
. To avoid illegal 0s, tokens appearing in X but not in Y are ignored in computing DWTP.
For simplicity we don’t use smoothing to level out the granularity effects of such zeroes; consequently, DWTP values may be too small for low frequency ranges (greatest discrepancies
being ignored).
. Due to the methodological uncertainties concerning speech tagging (cf. 4.2 above) we
will not present the actual sizes of the categories.
. Certain function categories are actually more common in writing: determiners, prepositions.
. Some recent, affirmative results are reported in Henrichsen (2004).
Swedish and Danish, speech and writing: A statistical comparison
References
Allwood, J. (1998). Some Frequency based Differences between Spoken and Written Swedish.
Proceedings of XVIth Scandinavian Conference of Linguistics (pp. 18–29). University
of Turku.
Allwood, J. (1999). Talspråksfrekvenser. Gothenburg Papers in Theoretical Liguistics S21,
Gothenburg University.
Allwood, J., Grönqvist, L., Ahlsén, E., & Gunnarsson, M. (2002a). Göteborgskorpusen
för Talspråk. In P. J. Henrichsen (Ed.), Korpuslingvistik (pp. 39–58). Copenhagen:
Akademisk Forlag.
Allwood, J., Henrichsen, P. J., Ahlsén, E., Grönqvist, L., & Gunnarsson, M. (2002b).
Transliteration Between Spoken Language Corpora – Moving between Danish BySoc
and Swedish GSLC. Gothenburg Papers in Theoretical Linguistics 86, Gothenburg
University.
Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University
Press.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman Grammar of
Spoken and Written English. London: Longman.
Bilgram, T. & Keson, B. (1998). The Construction of a Tagged Danish Corpus. Proceedings
of NODALIDA-1998 (pp. 129–139). University of Copenhagen.
Brill, E. (1994). A Report of Recent Progress in Transformation-based Error-driven
Learning. Proceedings of the ARPA Workshop on Human Language technology 1994,
Princeton, N. J. [Brill’s tagger is available at http://www.cs.jhu.edu/∼brill]
Dansk-Svenska Ordbok. (2000). 3. edition. Molde, B. (Ed.), Stockholm: Norstedts Förlag.
Debaisieux, J.-M., & Deulofeu, J. (2001). Grammatically Unacceptable Utterances are
Communicatively Accepted by Native Speakers; Why are They? Proceedings of DISS-01
(pp. 69–72), University of Edinburgh.
Gregersen, F., & Pedersen, I. L. (Eds.). (1991). The Copenhagen Study in Urban Sociolinguistics. Copenhagen: Reitzel, Vols. 1+2.
Henrichsen, P. J. (1998). Peeking Into The Danish Living Room – Internet access to a large
Danish speech corpus. Proceedings of NODALIDA-1998 (pp. 109–119). University of
Copenhagen.
Henrichsen, P. J. (2002a). Fyrre Kilometer Kryds og Bolle – metoder til grammatisk
opmærkning i største skala. In P. J. Henrichsen (Ed), Korpuslingvistik (pp. 68–88).
Copenhagen: Akademisk Forlag.
Henrichsen, P. J. (2002b). Some Statistically based Differences Between Spoken and Written
Danish. Gothenburg Papers in Theoretical Linguistics 88, Gothenburg University.
Henrichsen, P. J. (2004). Siblings and Cousins – Statistical Methods for Spoken Language
Analysis. Acta Linguistica Hafniensia, 36, 7–33. Copenhagen: Reitzel.
Labov, W. (1984). Field methods of the Project on Linguistic Change and Variation. In J.
Baugh et al. (Eds.), Language in Use: Readings in Sociolinguistics (pp. 28–53). Englewood
Cliffs, NJ: Prentice Hall.
Leech, G., Rayson, P., & Wilson, A. (2001). Word Frequencies in Written and Spoken English.
London: Longman.
(c) John Benjamins
Delivered by Ingenta
on: Wed, 05 Apr 2006 14:14:36
to: Guest User
IP: 192.87.50.3
Peter Juel Henrichsen and Jens Allwood
Maegaard, B. (1975). Hyppige ord i Danske Aviser, Ugeblade og Fagblade. Copenhagen:
Gyldendal.
McKelvie, D. (1998). The Syntax of Disfluency in Spontaneous Spoken Language.
Human Communications Research Centre, University of Edinburgh, Research Paper
HCRC/RP-95.
Nivre, J., Grönqvist, L., Gustafson, M., Lager, T., & Sofkova, S. (1996). Tagging Spoken
Language Using Written Language Statistics. 16th ICCL (at COLING-96) (pp. 1078–
1081). University of Copenhagen.
Nivre, J., & Grönqvist, L. (2001). Tagging a Corpus of Spoken Swedish. International Journal
of Corpus Linguistics, 6 (1), 47–78.
Svensk-Dansk Ordbog. (2001). Palmgren, V., Munch-Petersen, V. P. & Hartmann, E. (Eds.).
3 edition. Copenhagen: Gyldendal.
Zipf, G. K. (1936). The Psycho-biology of Language – Introduction to Dynamic Philology.
London: Routledge.
Authors’ addresses
Peter Juel Henrichsen
Center for Computational Modelling of Language
Dept. of Computational Linguistics
Copenhagen Business School
Bernhard Bangs Allé 17B
DK-2000 Copenhagen F
Denmark
E-mail: pjuel@id.cbs.dk
Jens Allwood
Dept. of Linguistics,
University of Gothenburg
Box 200
SE-40530 Göteborg
Sweden
E-mail: jens@ling.gu.se
(c) John Benjamins
Delivered by Ingenta
on: Wed, 05 Apr 2006 14:14:36
to: Guest User
IP: 192.87.50.3
Swedish and Danish, speech and writing: A statistical comparison
Appendix – English translations
Approximate English translations of top-frequent Swedish and Danish types
Rank
SweSPO
DanSPO
SweWRI
DanWRI
#1
det
it, thisPRO ,
thatPRO , theDEF+NEU
det
= Swe. “det”
i
inPREP , inADV
i
inPREP , inADV
#2
är
BEPRES
ja
yes, yeah
och
and
og
and
#3
och
and
ja
og
and
jeg
att
cf. SweSPO
en
at
= Swe. “att”
er
yes, yeah
I
aUTRUM ,
oneNUM ,
onePRO
BEPRES
#4
#5
#6
#7
er
det
det
(c) John
Benjamins
BEPRES
cf. SweSPO
= Swe. “det”
Delivered
by Ingenta
jag
så
på
en
Apr“så”2006on,14:14:36
I on: Wed, 05= Swe.
at
= Swe. “en”
man
der
som
til
to: Guest
User
oneGENERIC+NOM+SG ,
therePRO ,
thatSUBORD ,
to, till
IP: 192.87.50.3
you
there
who
,
att
toINF , thatSUBORD
GENERIC+NOM+SG
ADV
SUBORD
whomSUBORD ,
whichSUBORD ,
likeCONJ
#8
så
so, thatSUBORD ,
thenCOORD
ikke
not
är
cf. SweSPO
på
on, at
#9
var
BEPAST
var
BEPAST
av
of, off, by
af
of, off , by
#10
inte
not
i
inPREP , inADV
med
with, by
for
for, too
Capitalized symbols refer to paradigms (e.g. BEPRES = am/is/are)
Informatik og Pragmatisme
Tak til Martina Sophia Bach for Kundskabende Relationer og til Karoline ‘Bille’ Sahlertz for råd og kommentarer undervejs
Where is the life we have lost in living?
Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?
T.S. Eliott: The Rock, 1934
Where is the information we have lost in data?
Where is the data we have lost in bits?
Leif Bloch Rasmussen, 2020
Institute of Digitalization, CBS
Målet med dette indlæg er at redegøre for de metoder, jeg gennem min tid i universitetsverden har brugt i forbindelse med aktionsforskning på områderne informatik, design af informations systemer og innovation/ entreprenørskab. Det er en fortælling, der starter med mit ph.d. forløb på DtH/DtU omkring styring (ledelse og organisering) i virksomheder og edb-teknologiens muligheder for støtte hertil. Den slutter med mit virke med at skrive Af-Handling om Sanse(n)de Handletanker for Signifikant Bæredygtighed.
I og med der er tale om en fortælling kan artiklen ses som en refleksion over metoder i fortiden og nogle billeder fra metoder i nutiden. Sigtet er imidlertid bidrag til et ante-narrativ, dvs. bud på metoder for fremtiden.
1. Historien
Undervejs i mit ph.d. forløb var jeg blevet bekendt med og inspireret af Charles West Churchman’s teleologiske systemtankegang og Eric Jantsch’ selvorganiserende systemer
Gæsteprofessor ved Driftsteknisk Institut, DtH i 1975, gennemførte en studiekreds og skrev Design for Evolution, Braziller, 1975.. Vi var fire ph.d.-studerende, der skrev om Technological Forecasting, Strategisk Planlægning, Konstruktion af Informations systemer og Teknisk styring i industrielle virkomheder, alle på Driftsteknisk Institut, Danmarks tekniske Højskole (DtH), nu Danmarks tekniske Universitet (DtU). Ud af dette kom tilbud til mig om at deltage i opbygning af Handelshøjskolens, HHK (nu Copenhagen Business School, CBS) forskning og undervisning i det, der i 1977 hed edb. Sidenhen IT. Jeg blev inviteret med til at danne et center på tværs af eksisterende institutter: Informationsforskningsafdelingen. Fik samtidig en lektor-stilling. Vi ønskede at kalde det Center for Informatik, men dette begreb eksisterede ikke, fik vi at vide fra bibliotek og ledelse.
Vort metodegrundlag dengang i slut 70’erne var indlejret i en debat/strid om opgave-/begrebsorienteret design, socio-teknisk design og fag-politisk tilgang til design af edb-systemer/ny teknolog. De tre tilgange responderende til og/eller var baseret på videnskabsteorierne positivisme, hermeneutik/fænomenologi og marxisme/kritisk teori. Min inspiration fra Churchman
Design of Inquiring Systems - Basic Concepts of Systems and Organizations, Basic Books, New York, 1971 tilsagde imidlertid flere end tre metodiske/filosofiske tilgange, idet han arbejdede med empirisme (Locke/Hume), rationalisme (Leibniz, Decartes), idealsime (Kant), dialektik (Hegel), teleologisk pragmatisme (Edgar Arthur Singer) - og som ’modvægt’ hertil: a- og anti-teleologi. Som værende en del af ’studenteroprøret’ i 70’erne på både DtH og HHK blev det mit lod at deltage i forsøgene på at få etableret projekt-orienteret undervisning og aktionsforskning ved Handelshøjskolen. Hertil var Churchman’s metode-bredde og tværfaglighed en hjælp, men også en hæmsko. En hjælp, fordi det hele tiden åbnede nye muligheder; en hæmsko fordi debatten/striden ofte tog form af magtkampe om bevillinger, stillinger, hvor traditionerne i de økonomiske og tekniske discipliner vejede tungere end eksperimenter. Churchman udtrykte det ved at sige, at man skulle opfordre sine studerende til at være ’broad and comprehensive’, ikke ’narrow and specific’. Men det omvendte var alt for ofte tilfældet, også i forskningen.
Churchman udvidede sin systemtankegang i 1979 med ’The Systems Approach and Its Enemies’
Basic Books, New York, 1982 med ’Thought and Wisdom’
The Gaither Lectures, Intersystems Publications, Seaside, California, hvor han argumenterede for at æstetik, etik, politik og religion skulle ses i samspil med den teleologiske systemtankegang om kundskabelse, viden: Videndannelse, som vi kaldte det dengang i mangel af det svenske begreb ’kunnskapa’. Nu kalder jeg det kundskabende systemer og kundskabende relationer.
Min erfaring blev imidlertid, at de tre oprindelige tilgange - primært baseret på en strid om marxisme eller kapitalisme som grundlag for virket med Informationsforskning - ikke tillod yderligere nuancer. Min forståelse af Churchman blev dermed ’blot’ en ledetråd for mig selv og mit virke. Min aktionsforskning - i samarbejde med fagforeningerne i FTF. LO var optaget af NJMF-projektet i Norge, DEMOS-projektet i Sverige og DAIMI-projektet i Danmark. Et refugium var min deltagelse i UNESCO-organisationen International Federation for Information Processing (IFIP)
Med tak til Börje Langefors, der var gæsteprofessor hos os i 1977-78 og opfordrede os til at deltage i IFIP. - hvor den samme strid dog også manifesterede sig - bl.a. i striden mellem Technical Committee 8 (Design of Information Systems) og Technical Committee 9 (Computers and Society). Den sidste, som jeg var en del af dannede gennem årene specielle ’Working Groups’ om bl.a. IT og arbejde, etik, Udviklings-lande og teknologi, IT’s historie, kvinder og teknologi, bæredygtig udvikling, IT og innovation. Igen udfordring til metoder og filosofier.
Som et forsøg på at samle trådene i tvær-faglighed med sigte på trans-disciplinaritet skrev jeg om otte forskellige tilgange til det, der efterhånden fik lov til at bære navnet Informatik i samvirke med Økonomistyring (skabt af ’renegater’ fra traditionel erhvervsøkonomi og regnskabsvæsen). En oversigt over de otte filosofier (’Weltanschaungen’), som jeg kaldte dem, ses i Bilag 1. De var og er fortsat relevante for mit virke i forskning, undervisning, administration og samfundsengagement.
De otte filosofier kræver naturligvis vilje til at bore dybt i dem. Men jeg håber at mine læsere kan gå ind i en gensidig dialog herom.
Mine erfaringer er imidlertid, at jo nærmede jeg (gennem aktionsforskningen) kom toppen af hierarkiet omkring medbestemmelse og strategi for edb-/informations-teknologi, des mere blev der lukket ned for ideen om filosofier og verdensanskuelser. Magten måtte ikke udfordres med metoder og filosofier, der rakte videre end positivisme og fænomenologi/hermeneutik, specielt da marxismen og til dels kritisk teori var kommet i miskredit med murens fald og kapitalismens ’sejr’. Studerende var langt mere villige til at bryde med dikotomien - rød-blå. Modstanden kom ikke fra brugen af ordene positivisme, fænomenologi, hermeneutik eller marximen, men fra den simple kendsgerning, at alt andet end markedet bestemte, burde bestemme. Edb, IT og Informations Systemer var et neutralt effektivitetsredskab - ikke en kampplads for fiktive teorier om fremtiden
Min ’løsning’ blev da i første omgang det, der kunne udledes af den gængse/banale pragmatisme. Brug den filosofi og de deraf afledte metoder, der passer til situationen - også brug af metoder fra andre end de otte filosofier, selvom de ikke var accepterede/dokumenterede: bare de virkede.
En hjælp til mine valg blandt filosofierne blev et nøjere studie af pragmatismen i dens to former. Charles Sanders Peirce’s og Williams James’, fordi Peirce’s udgave havde været inspirerende for Karl-Otto Apel og Jürgen Habermas i Europa og James’s udgave for Singer og Churchman i USA - og dermed min verden. Samlet blev det en erkendelse af, at der var en paradoxal sammenhæng mellem Marx’s teorier og Peirce’s - og dermed mellem pragmatisme, marxisme og kritisk teori. Læren var, at jeg burde grave dybere i Peirce’s udgave af pragmatisme for bedre at forstå aktionsforkningen.
Dette ledte straks frem til Peirce’s syv formuleringerne af det pragmatiske maxime, hvoraf en formulering faldt mig særlig værdifuldt:
“Consider what effects, that might conceivably have practical bearings, we conceive the object of our conception to have. Then, our conception of these effects is the whole of our conception of the object.”
(Peirce on p. 293 of "How to Make Our Ideas Clear", Popular Science Monthly, v. 12, pp. 286–302. Reprinted widely, including Collected Papers of Charles Sanders Peirce (CP) v. 5, paragraphs 388–410.)
Heri ligger mere end en regel for afklaring af det empiriske indhold i fakta, begreber og hypoteser; men en aldrig sluttende virke med tegn: abduktivt, deduktivt, induktivt søgende processer. Churchman og Singer kaldte det ’sweeping-in-process’, hele tiden værende parat til at inddrage flere fakta, begreber, hypoteser fra den virkelige verden, fra
vor værdsatte verden og vor bevidsthed. Men også at foretage ’sweeping-out-process’, stoppe op, hvile, samle kræfter, reflektere - og så starte eksperimenterne igen og/eller give stafetten videre til andre.
Udfordringen/fordringen i at bryde med de tre mere traditionelle tilgange for design af edb-, IT-, Informations-Systemer og samtidig skabe nye tværfaglige, trans-disciplinære metoder blev ikke overvundet, måske fordi sproget i denne form for systemtankegang stred mod veletablerede fag-sprog, disciplin-sprog og praksis/hverdags-sprog. Men nok så meget, fordi koblingen mellem kritisk teori og pragmatisme pegede på spørgsmålet om der fandtes en retning for mit virke - eller om det bare var processer, eller om det var håbløst at håndtere kompleksiteten, der tårnede sig op. Peirce selv ’løste’ problemet ved at kræve ideal-stræben (’progress’). Han nævnte tre idealer: sandhed (logik), skønhed, godhed (videnskab, æstetik, etik). Churchman så Systemtankegangen som udtryk for søgen efter sandhed (viden), men beskrev samtidig æstetik, etik, politik og religion som ’fjenderne’ af teleologisk systemtankegang. Venlige fjender har jeg taget dem til at være - og samtidig tilføje ’eros’ som empati, omsorg, jf. Adam Smith’s og Charles Darwin’s andre sider end de mest brugte: ’den usynlige hånd’ som markedsmekanismen i økonomien’ og ’den stærkestes, bedst tilpassedes ret’ i biologien og den sociale verden.
Vore metoder, teorier, filosofier og praksis skal tilstræbe fremskridt, vel vidende, troende, at det endegyldige ideal aldrig kan nås, og at der bør ske fremskridt i alle: som en resonans
Tak til Paul Charles Schroeder, der i 2015 i en mail korrespondence gjorde mig opmærksom på, at information var andet og mere end en tilstand og/eller et transportmiddel, men et resonanfænomen. (Se også hans ph.d. afhandling: Spatial Aspects of Metaphors for Information - Implications for Polycentric System Design University of Maine, 2003. Information er rumligt i 3-D, ikke bare et punk tog/eller en linje dem imellem - sammen og hver for sig.
Dermed blev - forbliver - min egen udfordring, fordring at finde veje til at kombinere de 8 verdensanskuelsers sandhedssøgende processer med de øvrige idealer.
Ante-narrativer
Peirce’s teori om tegn - semiotik i nutidens sprog, semioesis i Peirce’s sprog - blev en mulighed, der åbnede sig i samspil med ’significs’, Victoria Welby’s teori, udviklet i dialog med Peirce. Hun kaldte Peirce’s semiotik for sprog i ’father-sense’, medens hun selv søgte et sprog i ’mother-sense’. Jeg har hidtil mest søgt i ’father-sense’.
2. 1. Semiotikkens tegn-klasser
Min brug af Peirce’s pragmatisme maximer har ledt mig frem til at eksperimentere med hans semiotik i form af dannelsen af tegn-klasser, dvs. kombination af de triadiske elementer: objekt, tegn, interpretant med de triadiske fænomenologiske kategorier: mulighed, aktualitet, vane (førstehed, andethed, tredjehed).
De fænomenologiske tegn-kategorier for væren tager Peirce til at være:
Førstehed - en væren, der ikke har reference til noget som helst andet og klassificeres under overskriften mulighed (’kvalitet’). Dens manifestationer inkluderer følelse, emotion, forestillingsevne
Andethed - en væren, der er erfaringen, oplevelsen af indsats og klassificeres under overskriften aktualitet (’fakta’) Dens manifestationer inkluderer perception, erfaring, individuel eksistens, eksisterende objekter, hændelser
Tredjehed - en væren, der forbinder førstehed og andethed under overskriften vane (’argument’, ’lovmæssighed’ ). Den manifesterer sig gennem tanke, bevidsthed og kognition. Tredjeheden samstemmer den indre verden af fantasi med den ydre verden aktuel adfærd
De triadiske tegn-elementer tager han til at være:
Objekt - den virkelighed vi virker i/med og som med-bestemmer tegn og interpretant
Tegn - de tegn vi bruger til at beskrive virkeligheden og som virker tilbage på virkeligheden og med-bestemmer interpretanten.
Interpretant - den tolkning vi gør af tegn og objekt; og som samtidig virker tilbage og med-bestemmer objekt og tegn
(Sidenhen udvidede Peirce til 6 og 10 tegn-elementer og tegn-relationer, men her vil jeg ’nøjes’ med de tre nævnte i kombination med de tre fænomenologiske kategorier)
I ovenstående tolkninger er brugt Peirce’s egne ord i min danske version. Han har andre versioner, og udviklede dem hele tiden, dels i sig selv, hver for sig - og sammen, men også i en søgen efter flere fænomenologiske kategorier/tegn-elementer.
Ars combinatori
Peirce’s kvalifikationsregel i dannelsen af tegn-klasser for forståelse af virkeligheden, bevidsthed, værdsat verden siger, at førstehed af objekt, tegn, interpretant bestemmer andethed, der så sammen med førstehed bestemmer tredjehed. Med tre tegnkategorier fremkommer 10 tegn-klasser, med 6 tegn-kategorier fremkommer 28 tegn-klasser og med 10 tegn-kategorier fremkommer 66 tegn-klasser.
Peirce’s bruger ars combinatori mellem fænomenologiske kategorier og tegn-elementerne dannes da ti tegn-klasser, idet han bruger Llull (1232 - 1315) ars combinatoria af deiteter og Leibniz’s samme af monader. Men Peirce bringer ars combinatori videre, idet arbejder med det fundamentale i fænomener (essentielle, eksistentielle kategorier og elementer
se A.H. Maróstica´: Ars combinatoria and time: Llull, Leibniz and Peirce, SL, Vol. 32, 1992, p. 105-134. Dog kun i semiotikken i brugen af logikken.
I det følgende vil jeg beskrive processen for dannelse af de ti tegn-klasser, men først en kort introduktion til de basale elementer: tegn objekt, tegn, interpretant og fænomenologiske kategorier: mulighed, aktualitet, vane.
Vi får da følgende skema:
Interpretant
Tegn
Objekt
Førstehed Potentialitet
Rheme
Ikon
Kvalitegn
Andethed Aktualitet
Decitegn
Index
Sintegn
Tredjehed Lov
Argument
Symbol
Legitegn
Skemaet med disse ni elementer kombineres i Peirce’s semiotik til ti tegn-klasser:
6 abduktive som alle har mulighed som start
3 induktive som alle har aktualitet som start
1 deduktiv som alene baseres på lovmæssighed
Vi får i Peirce’s tolkning og i mit valg af Shank & Cunningham tolkninger - oversat til mit dansk
Shank, Gary & Cunningham, Donald J: Modeling the Six Modes of Peircean Abduction for Educational Purposes, Northern Illinois and Indiana University, 1996
:
Inferens type
Peirce
Inferens type
Shank& Cunningham
Engelsk
Dansk
1)Rheme Iconic Qualisign
1) Open Iconic Tone
1) Omen/Hunch
1)Varsel, Anelse, Fornemmelse
2)Rheme Iconic Sinsign
2) Open Iconic Token
2) Symptom
2) Symptom
3)Rheme Iconic Legisign
3) Open Iconic Type
3) Metaphor/
Analogy
3)Metafor, Analogi, Billede
4)Rheme Indexical Sinsign
4)Open Indexical Token
4) Clue
4) Fingerpeg, Spor, Nøgle til forståelse
5)Rheme Indexical Legisign
5)OpenIndexical Type
5) Diagnosis/
Scenario
5)Diagnose, Scenarie
6) Rheme Symbolic Legisign
6)OpenSymbolic Type
6) Explanation
6)Forklaring, Forståelse; evt. hypotese
7) Dicent Indexical Sinsign
7) Singular Indexical Token
7) Empirical data
7)Empiri: Dataindsamling på basis af hypotese (forklaring, forståelse)
8) Dicent Indexical Legisign
8) Singular Indexical Type
8) Pattern recognition
8) Dannelse af mønstre på basis af data
9) Dicent Symbolic Legisign
9) Singular Symbolic Type
9) Falsification/
Chosen Pattern
9) Valg af mønster, der passer på hypotese; evt. falsificering
10) Argu-ment Symbolic Legisign
10) Formal Symbolic Type
10) Model/
Theory
10) Model, Teori
Min metode for fremtiden består da af to dele:
Videreudvikle til 28 og 66 tegn-klasser i dialog med Welby’s sigifics - ’father-sense’ sammen med ’mother sense’
Konstruere tegn-klasser for temaer, jeg vil virke med
For at illustrere metoden, jvf. 2 ovenfor har jeg anvendt den på mit tema siden starten i universitetsverden: informatik med basis i Churchman’s Systemtankegang.
Derfor først en kort introduktion til Churchman’s Systemtanke-gang. I ‘Design of Inquring Systems’, p. 43 introducerer han de ni elementer, han dengang mente var essentielle og nødvendige for at noget kunne være et system:
Systemet er teleologisk
Systemet har et præstationsmål
Der eksisterer en klient, hvis interesser (værdier) tjenes af systemet på en sådan måde, at des højere præstationsmål, des bedre tjenes i - og mere generelt: klienten er standard for præstationsmål
Systemer har teleologiske komponenter, som med-producerer præstationsmålet for systemet
Systemet har et miljø (defineret enten teleologisk eller a-teleologisk), som også medproducerer præstationsmålet for systemet
Der eksisterer en beslutningstager, som - via sine resourcer - kan producere forandringer i præstationsmålet for systemets komponenter og dermed forandringer i systemets præstationsmål
Der eksisterer en designer, som konceptualiserer systemets natur på en sådan måde, at koncepterne potentielt producerer handlinger hos beslutningstageren og dermed
forandringer i komponenternes præstationsmål og dermed i præstationsmål for systemet
Det er designerens intention at implementere forandringer i systemet med henblik på at maximere systemets værdi for klienten
Systemet er ’stabilt’ med henblik på designeren på den måde, at der er en indbygget garanti for at designerens intention ultimativt er realiserbar
Om disse ni nødvendige betingelser også er tilstrækkelige er et basalt spørgsmål for hele bogen.
Et bud på brug af Peirce’s tegn med de ni første nødvendige betingelser kan da være:
Interpretant
Tegn
Objekt
Potenti-alitet
Klient
Præstationsmål
Værdi
Aktualitet
Designer
Implementering
Garant
Vane, Lov
Beslutnings-tager
Komponenter
Miljø
Dermed forslag til ars combinatori
1. Klient-præstationsmål-værdi: heroiske stemning
2. Klient-præstationsmål-garant: idealer for kundskabelse
3. Klient-præstationsmål-miljø: horizontale fremskridt
4. Klient-implementering-garant: vertikale fremskridt
5. Klient-implementering-miljø: theoria
6. Klient-komponenter-miljø: mulighedsformulering og eksperiment-porte-følje (anomalier, paradokser, dilemmaer; pre-jekter, pro-jekter)
7. Designer-implementering-garant: fronesis
8. Designer-implementering-miljø: poiesis
9. Designer-komponenter-miljø: praxis
10.Beslutningstager-komponenter-miljø: techne - ‘concludere’
Denne ars combinatori gælder for alle de kundskabende systemer, Churchman arbejder med, jf. Bilag 1. Men læren fra min brug af ars combinatori på Churchman’s ni ’elementer’ for et system er, at der skulle lægges meget mere vægt på klienten og klientens muligheder for abduktiv tænking uden indblanding fra beslutningstager og designer - det der blev til mulighedsformuering (abduktion) i stedet for den tradionelle hovedvægt på problem-formulering (for induktion og deduktion): for både studerende, tillidsrepræsentanter og andre aktionsforskere, på basis for PentaHelix-modellen for samspil mellem virksomheder, offentlige institutioner, universiteter, NGO’ere og engagerede borgere.
I 1979 tilføjer Churchman tre nødvendige betingelser mere for at noget kan tages til at være et system:
Der eksisterer en systemfilosof - der søger at finde de sandeste veje til kundskabelse
Der er fjender af kundskabelse - dvs. ensidigheden i logikkens ideal om sandhed: idealerne om æstetik, etik, politik, religion
Der er en universel og individuel vilje til signifikans, at pege fremad mod noget, der rækker videre end min eksistens og essens - noget for fremtidige generationer at få med og at stræbe for - og dermed signifikant bæredygtighed
Men stadig ikke tilstrækkelige. Hvad jeg skal gøre ved dem er stadig et åbent spørgsmål. I systemtankegangen åbner de tre for at grave dybere i filosofien end ’bare’ videnskab, kundskabelse alene. Og dermed relationen til Peirce: Er de i Peirce’s semiotik-verden eller hører de til hans metafysik og kosmologi?.
Metafysik og Pragmatisme
Peirce’s pragmatisme og Churchman’s systemtankegang kan ikke ’udtrykkes/fanges’ i en metode, men skal udvikles i dialog med eksperimenter. Hertil kan otte metafysiske temaer, jeg har fundet hos Peirce, hjælpe - inklusive de tre ekstra hos Churchman:
Legelystne Muserier
Æstetikkens Kald
Normative Videnskaber
Fællesskab for kundskabelse - Kundskabende Relationer
Fallibilisme
Metafysik og Synekisme
Kosmologi og Tychisme
Transcendens og Agapisme
Hver af disse otte temaer vil jeg kort fortælle om i det følgende, som et forsøg at skabe yderligere eksperimenter.
2.2.1. Legelystne Muserier
I ’’A Neglected Argument for the Reality of God’, Hibbert Journal, 1908 beskrev Peirce nødvendigheden af den rene leg som grundlag for
forståelse. Han ser denne ’playful musement’, som jeg tillader mig at oversætte til legelystne muserier, idet Peirce selv henviser til de ni græske muser som den inspiration, vi alle har adgang til. Uden formål og uden hensigt, men som sjælens renhed før de normative videnskaber om æstetik, etik og logik. Det negligerede argument for Guds realitet er, at denne sjælens, naturens renhed er en approximation, vi aldrig kan nå, men dog stræbe efter. Ganske som at stræbe efter fremskridt i de normative videnskaber; fremskridt i skønhed, godhed, sandhed. Idealer, ja, men uopnåelige, hverken sammen eller hver for sig. Disse legelystne muserier er en kombination af rationel, mytologisk, evolutionær tilgang vil virkelighedens strøm: at iagttage strømmen fra bredden, at sejle i strømmen og at være strømmen.
2.2.2. Æstetikkens kald
Peirce skrev selv, at han havde negligeret æstetik i sine anstrengelser for at udvikle logikken. Imidlertid havde han en basis i Friedrich Schillers og Alexander Gottlieb Baumgartens æstetik. Fra dem skabte Peirce sin egen version af æstetik som centrum for sin teori om viden, da han jo så æstetik som mere fundamental end til logik. Han så ’sanse-viden som værende af højeste filosofisk vigtighed, hvor æstetik tilstræber at opdage den mest ideale tilstand af tingene, der kunne have en mulig praktisk betydning for menneskelig adfærd.
For at kompensere for Peirce’ uddybning af æstetikken vil jeg bruge den danske doktor i teologi og filosofi Dorthe Jørgensen. Hun bruger også Baumgarten i sit arbejde med
æstetik og skønhed
’En engel gik forbi’, 2006 og ’Den Skønne Tænkning’, 2014 (eng.: ’Poetic Inclinations - Ethics, History, Philosophy’, 2020 og ‘Imaginative Moods - Aesthetics, Religion, Philosophy’, 2020 (begge under oversættelse).. Hun siger, at filosofi opstår som under, i undren, der genereres af skøn (æstetisk) tænkning. Hun siger som Peirce, uden reference til ham, at denne æstetik kan fostre menneskelige vel-væren og udvikle vore forståelser af historien, imødekommenhed, frihed og det gode liv. Hun præsenterer den formative natur i skøn (æstetisk) tænkning og fastslår dens relevans i mange discipliner og et bredt spektrum af samfundet, f.eks. grænsestudier, uddannelse, policy og socialt arbejde. Herunder udvikler hun sin skønhedserfaring og erfaringsmetafysik (se senere under metafysik).
I sin filosofiske afhandling ’Den skønne Tænkning’ siger hun, at denne skønne tænkning:
anerkender, karakteriserer og håndterer komplexitet
er i stand til bevæge sig på forskellige og flere niveauer samtidigt
kan rumme og virke med mere end en dagsorden samtidig og kan undersøge mere end et tema ad gangen
er orienteret mod noget fælles (traditionelt kaldet en ide), men uden at glemme dets specificitet (fænomenet); dvs. finde det fælles i det specifikke, ideen i det konkrete fænomen
Skønhedserfaringen som metode stifter mening takke være den følelse af nærvær, der kendetegner en erfaring, hvori man ikke er i sig selv, men derimod ude ved fænomenet, ved meningen. Skønhedserfaringen fortæller ikke om, hvori den følte mening nærmere bestemt består, hvad fænomenet/meningen egentlig er, og hvad det er for sammenhæng, der indgår i. Derfor lukker skønheds-
erfaringen ikke horisonten, men åbner den. Den erfaring af mening, som den udgør, sætter gang i de kundskabende og etiske processer og vækker de metafysiske spørgsmål om, hvor og hvad mening overhovedet er.
Dorthe Jørgensen menneskelig viser vigtigheden af erfaringer, der traditionelt karakteriseres som religiøse eller æstetiske for vor forståelse; for vor filosofi og metode.
2.2.3. Normative videnskaber
De tre normative videnskaber er æstetik, etik og logik (videnskab). De kan ses som idealer vi skal søge at finde fortrolighed med, ja, et venskab med. I dialog med andre mennesker, dig, vi, os, hinanden, og i dialog med naturen.
Det er videnskaben om idealer uden efterfølgende begrundelser; disse normative videnskaber studerer det som burde være. Æstetik undersøger hvordan følelse og sansning sker i den aktuelle verden. Etik undersøger bevidst handling og logik bevidst tænkning. Æstetikken er den nødvendige betingelse for de to andre grene. Logik stoler på etik, der yderligere stoler på æstetik.
Ifølge det pragmatiske maxime er denne dialog mellem den legende munterhed, skønheden og de normative videnskaber i verden en gensidig interaktion.
Dermed bliver etikken en væsentlig del af metoden. Peirce skriver ikke meget herom direkte, Churchman gør det indirekte indtil de sidste to bøger (1979 og 1982); men som ’fjende’, som direkte fjende, der ser logik som fare, der nedvurderer godhed for sandhed; som ’ikke-fjende’, der giver
anledning til refleksion over kundskabelse for sandhed; som ’venlig-fjende’, der vil dialogen for fælles fremskridt.
Jeg har selv brugt den danske filosof Knud Ejler Løgstrup som ledestjerne med hans Etiske Fordring (1956), der fordrer, at man forstår og virker med de spontane, suveræne livsytringer som tillid, talens åbenhed, barmhjertighed. At de er spontane, suveræne vil sige, at de melder sig af sig selv, sætter sig igennem uden det enkelte menneskes vilje bag om ryggen på det, og at de fordærves i samme øjeblik, de gøres motiverede eller betingede.
De spontane suveræne livsytringer hører til det før-kulturelle. De skyldes ikke mennesket selv, men må skrives på det givne livs konto. For Løgstrup er det ensbetydende med, at det er nærliggende at tyde virkeligheden religiøst, idet de spontane livsytringer ses som et udtryk for, at livet er er en gave.
De spontane suveræne livsytringer spiller en væsentlig rolle i etikken, idet de afslører, hvad det gode liv er. De er næstekærlighedens fuldbyrdelse. Men hvis de udebliver eller undertrykkes, træder fordringen eller pligten ind som det næstbedste og kræver kærlighed til næsten.
2.2.4. Fællesskab for kundskabelse/Kundskabende Relationer
Spørgsmålet er imidlertid: er individet alene i denne Kundskabelse. Svaret for Churchman og for Peirce er NEJ: Jeg, Du, Vi som menneskehed avancerer, gør fremskidt i alle tre normative videnskaber - værende opmærksomme på hinanden - Hin Anden, dvs. i naturlig, kreativ, desing dialog og kooperation i et kooperativt miljø. Vi er opmærksomme på mulighederne i kundskabende relationer, vi er imellem fæller på/i fælleder.
Peirce ønsker en dialog om, hvorvidt kundskabelse er en ensom proces, grænsende til kun at være geniers, særlige eneres evne. Churchman spørger om det samme. Peirce var sammen med John Dewey en af de første til at spørge om muligheden for kundskabelse - og, jvf. ovenstående, de øvrige idealer - som en manifestation af fælleskab; fælles virke som en nødvendig betingelse for skabe skønhed, godhed, sandhed. Nødvendigheden af at mennesker sammen-virker for at skabe empirisk og teoretisk viden.
Peirce’s mening var, at viden måtte være indlejret i en social kontekst, og, derfor kræver inter-subjektiv enighed blandt de involverede for at opnå legitimitet. Men hvem skal deltage?
Kundskabende Relationer handler om essensen og eksistensen i kundskabelse og processer i videnskabelig spørgen, søgen, stræben. Et Kundskabende Relationer kan i denne kontekst tages til at være enhver gruppe tvær- og trans-disciplinære individere, der er engageret i en process af gensidig forståelse i sansning, handling, tænkning i fysiske, kemiske, biologiske, sociale, kulturelle, psykiske, spirituelle, transcendale ’rum’ brugende rationelle, mytologiske, evolutionære tilgange (jvf. Erich Jantsch).
2.2.5. Fallibilisme
Peirce hævder, at enhvert forsøg på at bruge, acceptere en metode for tvivl vil være et selv-bedrag, fordi vi besidder en varietet af sikkerheder, som det forekommer os ikke kan drages i tvivl, stilles spørgsmålstegn ved. Så det vi producerer er ikke ’reel tvivl’. Disse sikkerheder (’beliefs’) vil lure i baggrunden, øve indflydelse på vor refleksion. Men
Peirce anmoder, bønfalder os om ikke at ’foregive, at tvivle i filosofien hvad vi ikke tvivler på i vore hjerter’.
Peirce’ doktrin om fallibilisme - synspunktet om, at alt i vor nuværende tro kan være fejlslagen, misforståelser. Dette er hjertet i hans filosofiske projekt. Usikkerhed er ikke bare en attitude, som påtvinges os på grund af uheldige begrænsninger i menneskelig erkendelse.
Usikkerhed er en nødvendig forudgående betingelse for al viden. Dette skyldes at aktuel kundskabelse er den eneste kilde til Peirce’sk viden. Vi kundskaber kun, når vi erfarer genuin usikkerhed. Så usikkerhed om ens egen overbevisning er en motor under huden på Peirce’s normative videnskaber og hans.
Fallibilisme er det filosofiske budskab om, at ingen tro kan have begrundelse, der garanterer skønhed, godhed, sandhed. At prøve at give en sådan begrundelse for sand tro (‘justified true belief’, med Peirce’s ord) ved hjælp at autoritet, a priori, stædighed vil ikke hjælpe. Kun eksperimenter vil vise os - undervejs i vor rejse ind i fremtiden. Spørgsmålet er imidlertid, om Peirce’s udsagn i ’On Justified True Belief’ om sådanne eksperimenter hører videnskaben til - og i givet fald hvad videnskab da er. Jeg vil mene, at Peirce svarer, at eksperimenter gælder for viden om hele vor væren i verden i dialog mellem virkeligheden, bevidstheden, den værdsatte verden - i en fysisk, kemisk, biologisk, social, kulturel, psykisk, mental, spirituel verden på tværs - og i transcendens.
2.2.6. Metafysik og Synekisme
Metafysik er den gren af filosofien, der beskæftiger sig med de "store" spørgsmål om verdens begrundelse, væsen og mest almene træk. Peirce’s metafysik bliver ofte betegnet som ‘videnskabelig metafysik og søger altså at forklare virkeligheden i de fænomenologiske kategorier og de logiske metoder og principper, der er udtrykt i de normative videnskaber.
Synekisme er det synspunkt i en metafysisk teori, at universet eksisterer som et kontinuert hele af alle dets dele, hvor ingen del er fuldstændig separeret eller bestemt og fortsætter med at vokse i komplexitet og forbundethed gennem semiosis og gennem virket af en ureducerbar, universel og flertydig kraft. Som metode søger synekismen kontinuiteter, hvor diskontinuiteter antages at være premanente og søger semiotiske relationer, hvor der ellers kun antages at dyadiske relationer eksisterer. Synekisme og pragmatisme støtter gensidigt hinanden: synekismen giver et teoretiske rationale for pragmatismen og bruger samtidig det pragmatiske maxime til at identificere tænkelige konsekvenser af eksperimentel aktivitet til at berige teorien ved at åbenbare og skabe relationer.
Denne stræben, søgen er et spørgsmål om individuel, lokal tolkning og forståelse, samtidig med at den er baseret på en universel absolut skønhed, godhed, sandhed. Vi vil aldrig nå disse idealer, men vi kan nærme os, skabe fremskridt i retning af dem, også skabe kvalitative forvandlinger, metamorfoser i dem - ja, måske helt andre, men hele tiden i en søgen for approximationer - hver for sig, og sammen. Metafysik er et forsøg på forklaring, håb for muligheden af succes i
kundskabelse. Ganske som søgen efter at udmønte det pragmatiske maxime.
2.2.7. Kosmologi og Tychisme
Kosmologi søger efter universets oprindelse. Med vor tids sprog: fra Big Bang til i dag og ind i fremiden. Igen spreder forsøgene på at forstå sig til de tre nævnte tilgange, metoder: Den rationelle, der ser universet som astronomi, oprindelig med jorden som centrum, men nu snarere som en evigt ekspanderende-skrumpende torus. Den mytologiske, der ser verden som skabt af en guddom, der vil komme igen og skabe en salig afslutning. Den evolutionære, der ser universet som en selv-organiserende verden af orden gennem fysiske, sociale og spirituelle fluktuationer (kaos)
Ilya Prigogine og Isabelle Stengers: Den nye pagt mellem mennesket og universet, Ask, 1985. Dansk udgave af med forord af professor i teologi Johannes Witt-Hansen, også ansvarlig for oversættelse af Karl Marx’ Kapitalen...
Tychisme er Peirce’s teori om, at absolut chance, tilfældighed eller indeterminisme er en reel faktor i universet. Denne doktrin danner en væsentlig del af hans evolutionære kosmologi. Den kan ses som det stik modsatte af Einsteins udsagn om, at Gud ikke spiller terninger med universet. Heisenberg usikkerheds teori og Bohr’s komplementaritetsprincip støtter Peirce - og den ovennævnte teori af Prigogine bekræfter det samme. Naturligvis som udtryk for fysikkens og kemiens naturvidenskabelige forståelse af verden; men pegende på samme processer i de sociale og spirituelle rum.
2.2.8. Transcendens og Agapisme
Transcendens er det der overskrider og ligger ud over det betragtede område, eller mere bredt den verden vi eksisterer i - det immanente. Den fysiske, sociale og spirituelle rum besidder således en transcendens i forhold til bevidstheden (og omvendt), og gemmer potentilet på noget i ikke er opmærksomme på eller erkender. Ofte tilskrives transcendesn noget guddommeligt, der ligger ud over vor verden.
Peirce mener, at vi lever og virker i både en transcendent og en immanent verden og han kæder disse liv sammen med evolutionær kærlighed. Kærlighed som agapisme, den vigtigste udviklingskraft, der omfatter tychasme (udviklingskraft i førsteheden) og anancasme (udviklingskraft i andetheden). Agapismen - næstekærligheden - er den vigtigste, fordi den omfatter de to andre i trejdeheden i en kombination af tilfældighed og nødvendighed. Agapisme er tro på uselvisk, velgørende åndelig kærlighed, kærlighed til sjælen. Det kan betyde tro på, at sådan kærlighed skal være den eneste ultimative værdi, og at alle andre værdier er afledt af den, eller at det eneste moralske.
Peder Voetmann Christiansen skriver p. 24 i indledningen til Charles Sanders Peirce: Metafysik og Kosmologi, Gyldendal, 1996):
”Ser vi på den historiske udvikling af institutioner og ideer i menneskesamfundet kan vi finde eksempler på alle tre udviklingsområder, og der kan anvises videnskabelige metoder til at skelne mellem dem. Agapasmen kommer til udtryk i ”tidsånden”, der i perioder har muliggjort f.eks. storslået arkitektur som de gotiske katedraler, der i dag ikke kan eftergøres. Enhver forsker ved, at de store opdagelser
”ligger i luften” og ofte bliver gjort uafhængig af forskellige personer.”
3. Konklusion
I Design of Inquiring Systems’ slutter Churchman: “Conclusion comes from latin concludere, which means to shut up together.”
Derfor en slut refleksion ved hjælpe af T.S. Eliott over mine 40 år i universitetsverden:
Efter at havde været med fra den spæde edb-udvikling frem til en verden i digitalisering forekommer det mig vigtigt at spørge:
Hvor er den digitalisering jeg har tabt i data
Hvor er de data jeg har tabt i information
Hvor er den information jeg har tabt i viden
Hvor er den viden jeg har tabt i leven
Hvor er den leven jeg har tabt i livet?
Er digitaliseringen den nye verden, hvor den virtuelle liv ikke må ødelægges og kompliceres af det biologiske, sociale, kulturelle, spirituelle levende liv?
Bilag 1. Alternativer i Filosofi og Metode
Filosofi
Systemteori
Projektledelse
Informatik
Innovation
Filosofi
Kapitalisme
Liberalisme
Konservatism
digialisering
0-1 tankegang
HUA(Hovedet Under Armen)
Individuel Kreativitet
Fokus på Innovation i teknologi og økonomi
Marked drevet
Positivism Empirisme Rationalisme
ex.: Locke, Hume, Leibnitz, Comte, Popper
Generel Systemteori (Kenneth Boulding, Ludwig von Bertalanffy)
Rationel fasemodel
Fra ide og behov til implementering
Technologisk Forecasting
Planlagt Design
Fokus på megatrends
Behov drevet
Fænomenologi Hermeneutik
ex.: Kant, Humboldt, Heidegger, Gadamer, Wittgenstein, Chomsky
Cybernetik I (Nobert Wiener, Ron Ashby)
Socio-teknik
Tvær-disciplinaritet
Fokus på social innovation
Proces og Læring drevet
Marxisme Kommunisme Socialisme
ex. Marx, Lenin, Mao
Cybernetik II
Klasse-magt modellen Fagpolitisk strategi
Forhandlinger Klassekamp
Fokus på samfundsmæssig og politisk innovation
Materielle Valg Magt drevet
KritiskTeori
ex.:Horkheimer, Adorno, Arendt, Marcuse, Apel, Habermas Honneth, Rosa
Ingen systemteori
Cybernetik III
Manturana
Niklas Luhmann’s Autopoiesiske systemer
Eksemplarisk indlæring
Undertryktes pædagogik
Ideologikritik
Objektiv/subjektiv sandhed
Dialektisk spiral
Interesse-model
Teleology
ex.: Addams Dewey, Mead, James, Peirce, Bradford-Smith, Singer, Cowan, Ackoff, Churchman
Teleologisk Systemtankegang (Churchman)
Cybersemiotik
Teleologisk design
Systemfilosofi
Proces
Kollektivt bedvidst/ubevidst
Anarki
Polycentrisk Ledelse på Fælleder
Existentialism
ex.: Sartre, Kierkegaard, Derrida, Foucault
ingen
Anti-planlæggere
Ingen
Væren
Essens
Feminology
ex.: Welby, Greer, Friedan, Seachild, Kock
ingen
Basisgruppe-modellen
“Du er gudinden”
28