Summer School on Language Data and Linguistic Questions, Bergen, July 1997
Data, Facts and Concepts of Language
Helge Dyvik
Linguistics is a discipline of many subdisciplines, and the
communication among them is frequently less lively than it could have been.
More typically linguists from different subdisciplines will entertain
themselves with stereotypes of each other: the corpus linguist talks about the
generative grammarian who studies language by contemplating his or her own
inner linguistic soul while sitting at his or her desk in a comfortable
armchair (the corpus linguist himself, as we know, preferring hard and
uncomfortable stools while perusing his corpus); the generative grammarian
talks about the corpus linguist happily catching stray linguistic butterflies
from his texts to mount them in his rather unprincipled collections; the
conversation analyst laughs immoderately at the impenetrable formulae of
the formal semanticist, which, when you have spent hours deciphering them, turn
out to mean that John snores; the formal semanticist derives much amusement
from the conversation analyst's attempt to explain every obscure noise emitted
during a conversation while shunning every hint of a precise concept. Etcetera.
Of course we should stick to our stereotypes - they are part of what makes
linguistics and linguistics conferences so entertaining. But in our more
serious moments, of which we should have at least some, we may question their
justification, we may suspect that they to some extent block potentially
fruitful communication among the subdisciplines, and we may ask what sources
they have. In answering the latter question I believe that the concept of
linguistic data is central. What constitutes a subdiscipline is the set of
questions about language it wants answered. As the sets of questions vary, so
do the language concepts themselves, as well as the rational methods for
finding answers and the kind of phenomena that count as relevant data. It may
therefore be of interest to focus on the connection between the concepts of
data and the attitudes to data collection on the one hand, and the concepts of
language and the linguistic questions on the other, which we find within the
various subdisciplines of linguistics. Perhaps this will enable us to see a few
distinctions which often go unnoticed, but which may help us in our attempts at
inter-sub-disciplinary communi-cation. I will start with the concept of 'data'
itself.
Data is the neuter plural of
the past participle of the Latin verb 'dare', which means 'to give'.
Etymologically, then, data is what is given. This suggests that data should be
something uncontroversial. The interpretation of data may be subject to
disagreement, of course, but it should not be a matter of controversy whether a
set of data exists or not. Data should in some sense be public - data cannot be private if it is to serve as a basis of
intersubjective control of some claim. This, at least, is the ideal. The
question, then, is how we more precisely can develop a concept of data that
fulfils the requirement of being both given, public and uncontroversial. Should
we take data to be the immediate, unanalysed objects and states of affairs
which surround us? But unanalysed objects and states of affairs are useless; we
need to capture them within some conceptual apparatus, if only in order to
describe them. According to one such conceptual apparatus we are surrounded by
constellations of elementary particles distributed in space and time. If our
questions concern the use of passive in a Norwegian dialect, this conception of
data is not very helpful. Even a less exotic alternative, such as a conception
of the sounds emerging from speakers as pressure waves in air is of little help
in such a project. The grammarian is not very interested in spectrograms. In
short, it is evident that our empirical basis cannot be states of affairs that
are seen as independent of human perception and concept formation, or even as
independent of the specific questions we want answered. Our empirical basis
must be some sort of facts, and in the concept
of a fact we already have something proposition-like: "It is a fact that p," we say. A fact, in other words, is a bit of reality captured
in one of several possible conceptual frameworks, or individuation schemes, as some would put it. We could conceive of a fact as
"digitalised", or filtered, information about the world: the
extraction of a fact necessarily involves loss of information. The capacity to
disregard information selectively is a precondition for orientation, or even
survival, in the world, and hence also for research. By talking about facts we
emphasise the importance of identifying the relevant level of abstraction. It is facts at a particular level of abstraction that are relevant
for answering our questions: not facts about particle constellations, nor facts
about sound waves, but, for instance, facts about uttered sequences of word
forms.
However, we cannot replace the concept of data with the concept
of facts. There are many kinds of facts, and many of them are far from
uncontroversial, public, or intersubjectively accessible. Even the most general
statements of a yet-to-be discovered true theory describe facts, but in that
case facts that are very far removed from what we would call data. Hence it
must be a special subspecies of facts that we associate with data - perhaps
facts that seem immediately accessible intersubjectively, such as the fact that
a certain sequence of recognisable Norwegian word forms was uttered in a given
situation. Theories about the use of passive are built on facts like this (in
conjunction with other facts, for instance facts about the correctness of the
sequence). But the circumstance that a certain sequence of word forms was
uttered is a very abstract kind of fact, in fact an institutional fact, in John
Searle's terminology. An institutional fact is a fact that presupposes a human
institution within which it can exist, in this case in the form of
unformulated, constitutive norms for correct linguistic behaviour. The very
concept of an utterance is unthinkable without such a surrounding institution.
Therefore the fact that a certain word form sequence was uttered cannot be
reduced to a set of natural science facts, or brute facts, in Searle's
terminology. Notice what I am not saying now: I am not saying that there can be
no ultimate natural science account of human behaviour. What I am saying is
that a piece of behaviour once conceived as an utterance cannot be conceptually
reduced to natural science facts. Nevertheless, as linguists we treat such
institutional facts as data, which shows that there is an inevitable
conventional element, an element of consensus, in our delimitation of the kinds
of facts which we regard as immediately and intersubjectively accessible. There
are complex assumptions behind our perception of a sound stream as a certain
sequence of word forms, but the point is that we choose not to thematise those
assumptions in the context of our passive project; they are not among the
assumptions we want to test. It is, anyway, impossible to question all
assumptions at once. Therefore we treat the assumptions as unproblematic
background knowledge. They are part of our observational theory, and this works
fine as long as we are surrounded by colleagues who are not bent on continually
challenging them.
This conclusion may give the impression that the question
of what data is can only be answered relative to some observational theory. But
we should beware of equating data with facts,
even when we restrict our attention to facts that seem immediately and
uncontroversially accessible. True, data is useful only when we have derived
some particular facts from it, and true, it is the type of fact we are
interested in that decides what sort of data is interesting. Still, in order to
remain close to our original intuition about data as something
uncontroversially given, we should try to retain a concept of data as something
richer than the facts we derive from it. Data is more concrete than facts. In
our data no information is lost yet, as it necessarily is in a fact. It
is the tape recording of the informant which is data, and not the abstract word
form sequences we have derived from it. The tape recording contains much more
information, for instance acoustical information which could have been
extracted as a separate set of facts. Since the tape recording was made with
grammatical questions in mind, the acoustical facts that could be extracted
from it are probably not suited for testing phonetic or phonological hypotheses
about the dialect, and in that sense even the data is coloured by our
assumptions and interests. But we want to incorporate a minimum of
interpretation and sifting of information into the data concept itself - data
is to be a starting point for interpretation rather than a result of it. If we
accept this, we recognise it as a slightly imprecise, if harmless,
simplification when the dialectologist writes: "Here is my data," and
then lists a set of transcribed utterances. Strictly speaking the transcribed
utterances are a representation of a set of assumed facts, derived from the
data by means of an observational theory which the dialectologist regards as
uncontroversial.
Let us therefore distinguish between data, the particular
facts we extract from it, and the theory or the hypothesis which we test
against these particular facts. 'Particular facts', then, are what we refer to
with what philosophers of science call "observational sentences" or
"protocol sentences" - sentences describing individual occurrences
which should be explained by our theory, if explanation is our ambition. If we
somewhat loosely take an explanation to be any kind of satisfactory answer to a
"why"-question, I suppose that most of us will agree that we look for
explanations. But with less loose proposals about what it takes to be an
explanation we experience well-known problems which necessitate a closer look
at the concepts of 'observational sentence' and 'data' in the different
subdisciplines of linguistics.
In the discourse of the philosophy of science observational
sentences are something that describes not only particular facts, but
particular events, incidents localised in time
and space. Particular events, such as the event of a substance attaining a
certain temperature in an experimental situation, are the normal objects of
classical deductive-nomological explanations. The classical, somewhat
simplified, picture is that an event is explained if the observational sentence
describing it can be deduced from some general law in combination with a set of
initial conditions. But facts are not necessarily events localisable in time
and space, and in central disciplines within linguistics there is reason to ask
whether the particular facts we want to account for are events or something
else. And if they are something else, we may go on to ask in what sense we may
then claim to explain them.
The distinction I have tried to draw between data and facts
may help us to see this possibility that some of the particular facts of
linguistics are not events. The point is that data always arises as the result of some event, whether it is a tape
recording, an utterance, a text corpus or an informant reaction. All these
things either are or immediately result from incidents that take place on
specific locations at determinate times. However, the particular facts which we
extract from such data, and which we want to account for, are not necessarily
events. Take generative theory, which obviously intends to be explanatory. In
an article from 1974 Fred Dretske points out an interesting thing about
explanations within this field: "...if we confine our attention to
language instead of the actual speech acts that embody a use of language, there
is, quite literally, nothing happening." Obviously, speech acts, localised
in time and space, are events. But the question is whether the generative
grammarian reasonably can have ambitions about explaining speech acts, or her
explananda perhaps rather are atemporal facts about the expressions in some
language. And we may doubt that it is the grammarian's job to explain why Mary
said "There is a cat on the mat" to John on Sunday 2 November. The
grammarian's job, rather, is to explain why we in a certain sense cannot say
"There are the cat on the mat" in English, whereas we can say, "There is a cat on the mat". I say "in a
certain sense", since in a certain other sense we obviously can say "There are the cat on the mat" - I have just said it
twice myself. If, on the other hand, a physicist is right in claiming that an
electron 'cannot' do x, then an electron simply will not do x, not even as an
example of an impossible event. As linguists we easily see that the modal
auxiliary 'can' is used with different modalities by the physicist who says
that an electron cannot do x, and by the linguist who says that we cannot say
"There are the cat on the mat". The physicist's modality is alethic:
it concerns truth in possible worlds. Therefore there is an immediate
connection between what an electron can and cannot do and what it in fact does
- if your theory explain the one, it also explains the other. The linguist's
modality, on the other hand, seems to be deontic: it concerns certain norms or
conventions for correct behaviour which are violated if we act in certain ways.
The particular facts which the grammarian wants to explain, then, concern
institutional, atemporal properties of linguistic expressions seen in
abstraction from concrete utterances of the expressions: the fact that they are
well-formed, or synonymous, or the fact that they have a certain meaning. These
are properties that are anchored in norms for correct behaviour. But it is not
impossible for us to violate the norms and produce ungrammatical utterances,
and furthermore, if we do, this does not in itself falsify our hypotheses about
the norms. Hence we might claim that a grammatical theory which in this sense
explains what people can or cannot say does not
therefore explain that people actually say what they do in concrete situations.
Since norms do not determine behaviour, actual behaviour is not explained by
theories about norms.
This does not mean that we should not also ask about the
explanations of actual linguistic behaviour - the speech acts which constitute
an important part of the grammarian's data. The question is simply whether this
is the grammarian's task or not. Some would claim that it is, and disagree with
what I just said. However that may be, it seems important to distinguish
between these to types of facts which to a large extent have been derived from
the same kind of data: facts about institutional properties of linguistic
expressions considered as types, and facts about specific utterance acts of
expression tokens in specific situations. Dretske discusses this distinction
and illustrates it with an analogy which I find helpful. Let us assume that an
anthropologist wants to collect a set of particular facts about religious
beliefs and rituals in a certain society. His data may be documents, witness
reports and artefacts from the society in question. From these data the
anthropologist extracts, by interpretive methods, a set of assumed facts about
beliefs and rituals. He then uses these particular, but atemporal facts as a
basis for developing and testing a general theory about the origins, functions
and consequences of such beliefs and rituals. He wants his theory to explain a
certain type of facts, say, why the rituals survived a radical change in
beliefs. However, his theory has nothing to say about the reasons why the
documents were written, why the witness reports were made or why the artefacts
were produced. In other words, his theory does not explain his concrete data,
it does not explain the events that led to the existence of his data. But it
may also be interesting to look for explanations of these things; the point is
that this would be the task of a different kind of theory. Cf. fig. 1. Theory 2
will contain hypotheses about the conditions for writing things down or
preserving and transmitting them in other ways, etc., and it is clearly
distinct from Theory 1. Theory 2 tries to explain concrete events in space and
time, while theory 2 has more atemporal explananda, such as the properties of
correctly performed rituals.

Fig. 1 Adapted from Dretske 1974

Fig. 2 Adapted from Dretske 1974
Consider then the analogous linguistic diagram in fig. 2.
The theories 2, 3 and 4 are to explain why speakers produce the utterances they
do in specific situations, and must probably contain psychological hypotheses
about the goals, desires and beliefs of human beings, and the connection
between such factors and actual behaviour. It seems, then, that these theories
take us from linguistics into psychology.
However, some linguists - we may call them the 'mentalists'
- would draw fig. 2 differently. They don't want to leave linguistics in order
to explain Facts B, but to give causal explanations rather than goal-related
explanations of aspects of linguistic behaviour, and of the speaker's
intuitions, by drawing an arrow directly from Theory 1 - the grammar - to Facts
B - linguistic and metalinguistic behaviour. This is obviously not easy; as I
have already discussed, grammars hardly give complete causal explanations why
people say what they do. Even a perfect theory of English syntax and semantics
does not enable us to predict what John is going to say in a given situation.
However, one might claim that this is not altogether different from the
situation of the physicist, whose theories frequently do not enable him to
predict what is going to happen outside the laboratory, because of the sheer
complexity of the real world. Still, the analogy is not convincing, since a
grammar does not even in an idealised way allow us to predict what is going to
be said. Anyway, the mentalist solution is modularisation: the grammar is
assumed to be one of several mental modules which jointly determine behaviour.
Thus, while for the non-mentalist a grammar specifies the institutional
properties of linguistic expressions, for the mentalist it is interpreted
realistically as a model of the speaker's mentally represented capacity to act
grammatically. While the non-mentalist sees the grammar as one of several
possible ways of calculating the properties of linguistic expressions, the
mentalist sees it as a theory of mental realities in virtue of its form. But
since this grammatical competence obviously does not determine linguistic
behaviour, it is assumed to be just one of several mental modules which
together determine actual behaviour, or performance.
If, on the other hand, we stick to fig. 2 the way Dretske
draws it, a different picture emerges. Then Theory 1 only explains Facts A,
which are institutional facts in John Searle's sense - facts of the same order
as the fact that Parliament passed a law or that John and Mary married or that
a football player was offside. The theory explains such institutional facts by
accounting for the way in which certain acts satisfy the constitutive norms of
the relevant institution. We might compare with a marriage ritual, disregarding
he fact that such a ritual normally is written down in its prescribed form,
while a grammar is something we must discover. If we want to explain why John
married Mary, the ritual is incapable of explaining the event - why they
performed this act. It might have something to do with love or similar things,
and a marriage ritual is not a theory of love. On the other hand the rules of
the ritual can explain why the movements they made, the words they uttered, and
the movements and words they were exposed to, counted as a well-formed wedding. Similarly, the theories of the grammarian,
the phonologist and the semanticist do not explain why people say what they do,
nor do they explain why people think or say that linguistic expressions are
wellformed, synonymous or ambiguous. The theories simply explain that the
linguistic expressions are wellformed or
synonymous or ambiguous, and they explain this by showing how the expressions
satisfy the constitutive norms of the language as formulated in the
theory, for instance by means of recursive grammatical and semantic
rules.
Can these two views of grammar - the mentalist and the
non-mentalist - be reconciled? They seem to differ in taking a formal grammar
to be a theory of two quite different types of object. However, both types of
object appear to be legitimate objects of study, so it may be worthwhile to
consider the question where the disagreement, if any, resides. According to the
non-mentalist view, language and grammar are irreducibly supra-individual
things, residing in a community of communicating individuals in the form of
constitutive norms for correct and meaningful linguistic performance. The
concept of a norm or a convention cannot be understood or characterised on a
purely individual level; therefore this conception of language sees it as
something necessarily shared. From the individual perspective, according to
this view, language and grammar are external abstract objects of which the
individual may have more or less partial knowledge. Characterising this partial
knowledge and the way it is structured in the individual is a highly
interesting psycholinguistic project, but it is a different project; it is not
the grammarian's project qua grammarian.
According to the mentalist view, on the other hand, the
study of grammar belongs to the study of individual psychology. To the extent
that grammar and language exist, they exist as knowledge structures in the
individual, and the grammarian's theory is a theory of these knowledge
structures. Here we perceive a point of real disagreement between the two
views: to the mentalist, language as a social or supra-individual object is an
"epi-phenomenon"; it is simply a secondary construction on knowledge
structures within individuals. It is only these mental structures that have real
existence and hence can be made the objects of scientific study. To the
non-mentalist, however, language as a supra-individual entity cannot be reduced
to properties of individuals. Obviously there is an intimate connection between
language and the knowledge of it even to the non-mentalist: if nobody knows a
language L, L cannot exist. But even so, non-mentalists deny that language can
be reduced to the knowledge individuals have of it - in a similar (but not
identical) way that the Norwegian Constitution cannot be reduced to the knowledge that lawyers happen to have of it, or mathematics to
the psychology of mathematicians.
Do we have to stop here at the recognition of two
irreconcilable philosophical positions, or is it possible to evaluate their
interrelationship further? One interesting path from this point is to consider
the attitudes which the proponents of the two views have towards data: how do
they see the empirical basis for their respective theories? With two
conceptions of language seemingly so totally different we expect significant
differences in the kinds of data that would be considered relevant, or at least
differences in the kinds of basic facts that the different linguists want to
extract from their data. If language is a norm-based, supra-individual object
we expect that data in the form of utterances is sifted in such a way that
utterances considered incorrect by informants
are disregarded. The non-mentalist grammarian sees his theory as a theory of
correct language use - not in the prescriptive sense, of course, but as a
consequence of the fact that language is a norm-based phenomenon and hence
inevitably is constituted by a concept of correctness or wellformedness.
Therefore he quite legitimately will only take such linguistic utterances into
consideration that are considered well-formed by a critical mass of informants.
On the other hand, if language is a property of the individual mind, residing
in mental structures not directly accessible to consciousness but operative in
governing the linguistic performance of the individual, we expect a different
approach to data. In the first place, we then expect a recognition of the fact
that the mental structures of individuals speaking the same language might be
different: speakers with comparable output might still have different knowledge
structures. In the second place, we expect the total linguistic output of the
individual to be interesting - in other words, we don't expect the concept of
correctness to play a significant role in the analysis of data. And in the
third place, we expect psycholinguistic experimentation to be important, in
order to anchor the linguistic facts in concepts with a wider psychological
applicability.
The interesting observation, of course, is that none of
these expectations are necessarily met by mentalist grammarians. As for the
point about different structures in different individuals, the strategy among
mentalist grammarians has traditionally been to idealise to the "perfect
speaker-hearer in a completely homogeneous speech community". Divorced
from its seeming reference to an individual - the perfect speaker-hearer - this
comes very close to identifying language as a supra-individual entity, a common
denominator in a community. As for the point about being interested in the
total linguistic output of the individual, mentalist grammarians are very much
concerned with sifting data according to grammaticality, i.e., correctness -
inescapably a normative concept. And as for the last point about
psycholinguistic experimentation, that has never been considered essential by
mainstream mentalist grammarians, although interesting experimental work
certainly does exist.
A tempting conclusion, then, is that even mentalist
grammarians are studying language as a supra-individual entity, and that
the individual psychology mostly resides in the rhetoric of the discipline -
with possible unfortunate consequences for the interpretation of the results. I
believe one unfortunate consequence of the rhetoric of individual psychology
is the insufficient recognition of the difference between certain types of linguistic projects. As an example, take
the study of second language acquisition. From a period when this was conceived
as a study of errors in the output of language learners, the discipline moved
to an understanding of the object of study as the interlanguage of the subjects. In other words, the language of the language
learner is typically recognised as a language in its own right, governed by
rules and amenable to grammatical investigation. Central questions about this
interlanguage will then be to what extent it derives its properties from the
target language, to what extent from the mother tongue of the subject, and to
what extent from universal principles of language acquisition. It is obviously
a step in the right direction to try to account for the specific regularities
in the output of language learners by trying to see their interlanguage as a
rule-governed language in its own right. Still, it is worth reflecting on the
language concept we presuppose when we call this a 'language', compared to the
language studied by the regular grammarian of some dialect. In the case of the
language learner it is quite obvious that we are studying language on the
individual level. In this case there is no doubt that what we are after is a
characterisation of the linguistic competence of individuals. And notice what
this means for our treatment of the data: it would be meaningless in this case
to discard utterances on the grounds that they are 'incorrect' or 'ill-formed'
according to some standard. We are definitely interested in the 'total output'
of the informant, and there is no concept of correctness or wellformedness
around that is relevant to sifting the data before trying to infer the rules.
We don't ask the language learner or some informant whether a certain sentence
produced is well-formed or not, and then exclude it from our data if the answer
is "no". On the contrary, we expect
the language produced to be incorrect. In this case, in other words, our object
of study is definitely not constituted by a set of intersubjective norms. Now,
this does not mean, of course, that the language learner is not following
norms. To some extent he or she is following the norms of the target language, to
some extent possibly norms of the mother tongue, and to some extent other
factors may be operative. But the point is that it is not a language as defined
by a set of norms which is the object of study here, as it is for the
regular grammarian. Rather, it is the total competence of an individual, be it
norm-governed or not. This makes an enormous difference: it is an entirely
different kind of object, requiring a very different method from the method of
the regular grammarian, and a very different approach to data collection (even
though we may want to use the same kind of formal grammars in both kinds of
studies, but then with a very different interpretation). But this difference is
obscured by the rhetoric which describes even the regular grammarian's object
of study as belonging to individual psychology. This leads us too easily to use
the same terminology in the two kinds of studies, using terms such as 'language'
and 'grammar' to refer both to the supra-individual, norm-based system of a
language community and to the developing knowledge structures inside the head
of a language learner, as if they were the same kind of object.
The field of second-language acquisition enables us to see
this difference particularly clearly, since the output of the second-language
learner tends to deviate significantly from the grammatical norms. But the
distinction is just as important when we study the output and knowledge
structures of people speaking their first language: studying the competence of
an individual remains a very different thing from studying the grammatical
norms of the community. It seems plausible to claim that it is the way we
collect and process our data and reason on the basis of it that decides what we
ultimately are studying, rather than the rhetoric with which we like to present
our research. Hence, linguists who write grammars to account for the structure
of well-formed expressions are inevitably studying a supra-individual, abstract
object, and it seems advisable to bring the rhetoric in accordance with that
fact. This will benefit the genuine psycholinguistic study of the linguistic
capacities of individuals by helping us to avoid confusing the two types of
investigation conceptually and methodologically. It will also help us to see
that we need some account of the grammatical norm systems of a community before we can approach the question of the way in which knowledge of these
systems is structured in individuals. To put it simply: we need a definition of
P before the question of the structure of the knowledge of P can even be given
content.
We may still want to ask some questions about this
supra-individual conception of language. What sort of object is it, what sort
of existence can we attribute to it? Is it reasonable to assume that the language
or the languages of a community exist out there with well-defined borderlines
for us to discover, separating them from each other? Or to what extent is there
an element of construction involved in our
isolation of a language? And on what basis does the linguist build his putative
knowledge of a language conceived in this way?
The linguist roughly utilises three sources of knowledge:
informants, text corpora and so-called 'introspection'. So far I have
sporadically referred to the use of informants, but have said little about the
other two.
When the linguist consults her own intuitions, this is
often referred to as 'introspection'. This term may be slightly misleading:
there is a difference between consulting one's own intuitions about what is
current in a language community and contemplating one's own emotional reactions
or such-like. In the first case the result of the so-called
"introspection" is in principle controllable, since the object of
interest is not the intuitions themselves, but what they are about: informant
interviews and textual data may show that the intuitions were mistaken (but not
of course that the linguist was wrong in assuming that she had them ). In the
latter case, however, the result is not controllable.
Still, even if they can be checked with informants, the
linguist's own intuitions hardly qualify as 'data', since they are not public.
Even so, they have an obvious and inevitable place in the method. If the
linguist masters the language under investigation, her own intuitions will
clearly be a source of knowledge and at least a basis for formulating
hypotheses.
The Finnish linguist Esa Itkonen goes even further. As I
read him, he seems to claim that introspection in the final analysis is the only source of knowledge about institutional properties of linguistic
expressions, such as their wellformedness and meanings. Itkonen criticises what
he calls the positivistic understanding of science which we find in mentalist,
or Chomskyan, generative grammar. In his view there can be no question of the
grammarian testing hypotheses on the basis of collected data. Language is a
normative, rule-based phenomenon, and the grammarian - as I have already
pointed out - is not interested in all possible utterances, but only in the
correct ones. It is only about correct utterances the grammarian wants to
generalise. But, Itkonen says, we have a priori
knowledge about what is correct or incorrect; we know with absolute certainty
whether an expression in our language is well-formed or not. Granted, we do not
immediately know what rules correctly summarise the norms, but this knowledge
is something we arrive at by reflecting creatively on what we already know.
Hence the linguist's task is explicating rather than explanatory: the linguist
will explicate linguistic norms of which she has a priori knowledge, and not explain linguistic occurrences as if they were
natural occurrences. According to Itkonen the so-called 'data' is not
independent of the so-called 'theory', since we only take utterances that we already
know are correct into account. Hence 'data' hardly has any place in linguistics
at all as Itkonen sees it. The compilation of text corpora, for instance, he
describes as "idle ceremony".
It is interesting that Itkonen thus recommends - or rather:
presents as the only possibility - precisely the method that is in practice
used the most by the linguists he disagrees most with: introspection. Still, to
Itkonen, and perhaps to us, this is not so strange after all. When mentalist
generative grammarians use introspection a lot, in spite of their natural
science rhetoric, this can be taken as another indication of the discrepancy
between practice and rhetoric among them.
Many linguists will find it difficult to reconcile Itkonen's
ideas completely with their own experience, in spite of his convincing
argumentation. Especially the obvious fallibility of intuitions, both those of
the linguist and those of her informants, indicate that Itkonen may
underestimate the empirical element in linguistic investigations. It is simply
not the case that each and every language user knows it all. Linguistic
knowledge in the individual is partial and perhaps also partly fallacious
compared to the full set of linguistic norms of the community, as we
discern it in the converging and complementary intuitions of a critical mass of
informants. Furthermore, anyone who has used a text corpus in a linguistic
study has had what we in Norwegian call "aha experiences". Such aha
experiences hardly seem compatible with the assumption that we knew it all
before: "Aha! Yes, I knew that," seems a little inconsistent.
Certainly linguistic aha experiences are sufficiently valuable for the linguist
to rebuff Itkonen's claim that the compilation of text corpora is simply an
idle ceremony.
Text corpora are a type of data that amply illustrates the
importance of the distinction between language data and the particular
linguistic facts that can be extricated from such data. It is temptingly easy
to come to regard a text corpus as something very concrete, real and reliable,
almost identical with the language itself, and to forget about the uncertainty
that may reside in the interpretive methods we apply to extricate linguistic
facts from a corpus.
During the past year I have been using a bilingual
Norwegian-English corpus consisting of original texts aligned with their
translations as a source of information about lexical semantics. It is obvious
that this sort of question - how can we delimit and describe the meanings of
words - makes some of the information derivable from the corpus interesting,
and other information uninteresting. In order to illustrate this, let me
mention a few things which would not be of
concern in such a study. We would not be concerned with those aspects of
translation which makes it a creative kind of activity, and which presuppose
deep cultural insight and rich knowledge about the topic domains of the texts.
Or rather: these aspects concern us to the extent which is necessary in order
to identify them and then disregard them. What we would be interested in
isolating in the data, is unimaginative
translation. Unimaginative translation is semantically interesting precisely
because semantics can be seen as the theory of unimaginative language use: the
kind of use (or the aspects of use, rather) that can be accounted for purely on
the basis of literal meanings. The point is that we need a theory of
unimaginative and literal language use as a basis for accounting for any kind
of language use at all with a minimum of precision. Thus we would want to look
for instances of unimaginative translation, under the assumption that it is
unimaginative translation that most closely reflects what we would like to
consider to be the semantic properties of words, phrases and sentences.
So why is not all, or most adequate translation
semantically motivated in this sense? The reason is that the translational
relation, as we find it realised in a translational corpus, is not a relation
between abstract linguistic expressions, like for instance synonymy. Rather, it
is a relation between situated texts. The
translational relation interrelates parole items
rather than langue items: the actual linguistic
expression used is not the only thing which determines what will count as a
useful translation. Relevant also are the context of utterance, the purpose of
the utterance, and various other kinds of background knowledge. We often see
that information which in one text derives from general background knowledge or
contextual factors is given explicit linguistic expression in the other. In
such cases the situated text must be the basis for establishing that a
translational relation obtains ? the relation is a relation between situated
texts.
Now, semantic properties are properties of linguistic
expressions seen as types, not only as tokens in texts. In order to use
translations as a source of information about semantics, we therefore need to
extricate the contribution that contextual factors such as these make to the
translational relation from the contribution made by correspondence relations
between words and phrases seen as types. That is, the translational relation we
are interested in isolating, is not the one between texts or parole items, but the one between linguistic expressions or 'signs' seen
as types, that is, between langue items as they
occur in grammars or dictionaries. In other words, we unsurprisingly find that
corpus data cannot be directly used in raw form. Linguistic data are as usual
accessible only through an interpretive process, for instance through dialogues
with informants, in this case consisting in "peeling off" two layers
of irrelevant data. In the first place we want to disregard "bad
translations", thereby isolating instances of the genuine textual
translational relation between two languages. In the second place we want to
disregard translational choices that can be motivated only by reference to the
particular text and its circumstances, thereby isolating the linguistically
predictable translations. The linguistically
predictable translations will then be the ones that reflect the translational
correspondence relations between the sign inventories of the two languages -
relations between words and phrases seen as types rather than textual tokens.
This is the relation we are interested in looking at more closely to see what
semantic insights can be gleaned from it. Actual translational corpora will
reflect this relation imperfectly - imperfectly in two important ways: In the
first place any corpus will be severely partial, only containing a very small
part of the total extension of the translational relation. In the second place
it will contain much that does not belong to the relation, in the form of bad
translations and textually bound translational choices.
Now, finally , what is the ontological status of the
relation we would be looking for in this way? What is our justification for
believing in it, and in the concomitant concept of literal meanings associated
with linguistic signs seen as types rather than textual tokens? The picture
painted by this assumption is like the one we find in situation semantics, for
instance, whereby the full interpretation of a text is the joint result of the
meanings of its words and phrases and various factors of context and discvourse
situation. In other words, the assumption is that it is meaningful to talk
about the meaning contribution of linguistic expressions seen in isolation -
that is, of literal meanings, or at least literal meaning potentials. This, of
course, is a much-debated issue, some philosophers of language tending to
emphasise that language is only meaningful in contexts of use, and that literal
meanings should be looked at with much skepticism. Personally I don't hold the
view that there is an objective and clear-cut distinction to be discovered
between properties that should be attributed to signs as types (properties such
as 'literal meanings', for instance) and properties that can only be attributed
to sign occurrences in specific contexts. Rather, we find a scale: Studying the
interpretations and possible translations of a sign as we proceed from the
circumstances of a particular text to the circumstances of ever more general
types of texts, it will gradually become more and more natural to see the
interpretations and translational possibilities as predictable from the sign
itself, considered in isolation, rather than as motivated by the text types.
But there will probably be no well-defined borderline between the two types of
cases.
Still, a distinction of this kind is legitimate because
there are clear cases on either side, and because we need to draw the
distinction in order to uphold the obviously necessary assumption that the
linguistic signs chosen in a particular commu-nicative situation also
contribute to what is conveyed. Assuming that everything comes from the context
absurdly implies the claim that language is superfluous - that a situation in
which someone gives a lecture conveys no more to the audience than the same
situation with the single difference that the lecturer shuts up. Still - and
this is my point here - there will inevitably be an element of purpose-driven
construction in drawing the line between semantic properties and other
properties of situated texts. Thus, since semantic properties are properties of
a language, or langue, while properties that
only belong to situated texts are properties of language use, or parole, there is an inevitable element of construction in our isolation of
language itself as an object of study. We are certainly not constructing our
object of study to the point of complete relativism, but it is important to see
the way in which the object itself to some extent is shaped by the questions we
want answered. Perhaps this course can contribute to some insightful
de-construction of our language concepts.
References
Dretske, F.I. 1974: Explanation in linguistics, in: Cohen,
D. 1974 (red.): Explaining linguistic phenomena.
New York: John Wiley & Sons.
Dyvik, H. 1992: To forelesninger om lingvistikkens vitenskapsteori. Department of Linguistics and Phonetics, University of
Bergen: Skriftserie, serie B, nr. 41.
Dyvik, H. 1995: Språk, språklig kompetanse og lingvistikkens objekt. I:
Cathrine Fabricius-Hansen and Arnfinn Muruvik Vonen (red.): Språklig
kompetanse ? hva er det, og hvordan kan det beskrives? = Oslo-studier i språkvitenskap 11. (1995) Oslo: Novus Forlag, pp.
20-41.
Itkonen, E. 1975: Transformational grammar and the philosophy of science, in:
E.F. Koerner (red.): The transformational-generative paradigm and modern
linguistic theory. Amsterdam 1975.
Searle, J. 1969: Speech Acts. Cambridge:
Cambridge University Press.