Data, Facts and Concepts of Language
Linguistics is a discipline of many subdisciplines, and the communication among them is frequently less lively than it could have been. More typically linguists from different subdisciplines will entertain themselves with stereotypes of each other: the corpus linguist talks about the generative grammarian who studies language by contemplating his or her own inner linguistic soul while sitting at his or her desk in a comfortable armchair (the corpus linguist himself, as we know, preferring hard and uncomfortable stools while perusing his corpus); the generative grammarian talks about the corpus linguist happily catching stray linguistic butterflies from his texts to mount them in his rather unprincipled collections; the conversation analyst laughs immoderately at the impenetrable formulae of the formal semanticist, which, when you have spent hours deciphering them, turn out to mean that John snores; the formal semanticist derives much amusement from the conversation analyst’s attempt to explain every obscure noise emitted during a conversation while shunning every hint of a precise concept. Etcetera. Of course we should stick to our stereotypes - they are part of what makes linguistics and linguistics conferences so entertaining. But in our more serious moments, of which we should have at least some, we may question their justification, we may suspect that they to some extent block potentially fruitful communication among the subdisciplines, and we may ask what sources they have. In answering the latter question I believe that the concept of linguistic data is central. What constitutes a subdiscipline is the set of questions about language it wants answered. As the sets of questions vary, so do the language concepts themselves, as well as the rational methods for finding answers and the kind of phenomena that count as relevant data. It may therefore be of interest to focus on the connection between the concepts of data and the attitudes to data collection on the one hand, and the concepts of language and the linguistic questions on the other, which we find within the various subdisciplines of linguistics. Perhaps this will enable us to see a few distinctions which often go unnoticed, but which may help us in our attempts at inter-sub-disciplinary communi-cation. I will start with the concept of ‘data’ itself.
Data is the neuter plural of the past participle of the Latin verb ‘dare’, which means ‘to give’. Etymologically, then, data is what is given. This suggests that data should be something uncontroversial. The interpretation of data may be subject to disagreement, of course, but it should not be a matter of controversy whether a set of data exists or not. Data should in some sense be public - data cannot be private if it is to serve as a basis of intersubjective control of some claim. This, at least, is the ideal. The question, then, is how we more precisely can develop a concept of data that fulfils the requirement of being both given, public and uncontroversial. Should we take data to be the immediate, unanalysed objects and states of affairs which surround us? But unanalysed objects and states of affairs are useless; we need to capture them within some conceptual apparatus, if only in order to describe them. According to one such conceptual apparatus we are surrounded by constellations of elementary particles distributed in space and time. If our questions concern the use of passive in a Norwegian dialect, this conception of data is not very helpful. Even a less exotic alternative, such as a conception of the sounds emerging from speakers as pressure waves in air is of little help in such a project. The grammarian is not very interested in spectrograms. In short, it is evident that our empirical basis cannot be states of affairs that are seen as independent of human perception and concept formation, or even as independent of the specific questions we want answered. Our empirical basis must be some sort of facts, and in the concept of a fact we already have something proposition-like: "It is a fact that p," we say. A fact, in other words, is a bit of reality captured in one of several possible conceptual frameworks, or individuation schemes, as some would put it. We could conceive of a fact as "digitalised", or filtered, information about the world: the extraction of a fact necessarily involves loss of information. The capacity to disregard information selectively is a precondition for orientation, or even survival, in the world, and hence also for research. By talking about facts we emphasise the importance of identifying the relevant level of abstraction. It is facts at a particular level of abstraction that are relevant for answering our questions: not facts about particle constellations, nor facts about sound waves, but, for instance, facts about uttered sequences of word forms.
However, we cannot replace the concept of data with the concept of facts. There are many kinds of facts, and many of them are far from uncontroversial, public, or intersubjectively accessible. Even the most general statements of a yet-to-be discovered true theory describe facts, but in that case facts that are very far removed from what we would call data. Hence it must be a special subspecies of facts that we associate with data - perhaps facts that seem immediately accessible intersubjectively, such as the fact that a certain sequence of recognisable Norwegian word forms was uttered in a given situation. Theories about the use of passive are built on facts like this (in conjunction with other facts, for instance facts about the correctness of the sequence). But the circumstance that a certain sequence of word forms was uttered is a very abstract kind of fact, in fact an institutional fact, in John Searle’s terminology. An institutional fact is a fact that presupposes a human institution within which it can exist, in this case in the form of unformulated, constitutive norms for correct linguistic behaviour. The very concept of an utterance is unthinkable without such a surrounding institution. Therefore the fact that a certain word form sequence was uttered cannot be reduced to a set of natural science facts, or brute facts, in Searle’s terminology. Notice what I am not saying now: I am not saying that there can be no ultimate natural science account of human behaviour. What I am saying is that a piece of behaviour once conceived as an utterance cannot be conceptually reduced to natural science facts. Nevertheless, as linguists we treat such institutional facts as data, which shows that there is an inevitable conventional element, an element of consensus, in our delimitation of the kinds of facts which we regard as immediately and intersubjectively accessible. There are complex assumptions behind our perception of a sound stream as a certain sequence of word forms, but the point is that we choose not to thematise those assumptions in the context of our passive project; they are not among the assumptions we want to test. It is, anyway, impossible to question all assumptions at once. Therefore we treat the assumptions as unproblematic background knowledge. They are part of our observational theory, and this works fine as long as we are surrounded by colleagues who are not bent on continually challenging them.
This conclusion may give the impression that the question of what data is can only be answered relative to some observational theory. But we should beware of equating data with facts, even when we restrict our attention to facts that seem immediately and uncontroversially accessible. True, data is useful only when we have derived some particular facts from it, and true, it is the type of fact we are interested in that decides what sort of data is interesting. Still, in order to remain close to our original intuition about data as something uncontroversially given, we should try to retain a concept of data as something richer than the facts we derive from it. Data is more concrete than facts. In our data no information is lost yet, as it necessarily is in a fact. It is the tape recording of the informant which is data, and not the abstract word form sequences we have derived from it. The tape recording contains much more information, for instance acoustical information which could have been extracted as a separate set of facts. Since the tape recording was made with grammatical questions in mind, the acoustical facts that could be extracted from it are probably not suited for testing phonetic or phonological hypotheses about the dialect, and in that sense even the data is coloured by our assumptions and interests. But we want to incorporate a minimum of interpretation and sifting of information into the data concept itself - data is to be a starting point for interpretation rather than a result of it. If we accept this, we recognise it as a slightly imprecise, if harmless, simplification when the dialectologist writes: "Here is my data," and then lists a set of transcribed utterances. Strictly speaking the transcribed utterances are a representation of a set of assumed facts, derived from the data by means of an observational theory which the dialectologist regards as uncontroversial.
Let us therefore distinguish between data, the particular facts we extract from it, and the theory or the hypothesis which we test against these particular facts. ‘Particular facts’, then, are what we refer to with what philosophers of science call "observational sentences" or "protocol sentences" - sentences describing individual occurrences which should be explained by our theory, if explanation is our ambition. If we somewhat loosely take an explanation to be any kind of satisfactory answer to a "why"-question, I suppose that most of us will agree that we look for explanations. But with less loose proposals about what it takes to be an explanation we experience well-known problems which necessitate a closer look at the concepts of ‘observational sentence’ and ‘data’ in the different subdisciplines of linguistics.
In the discourse of the philosophy of science observational sentences are something that describes not only particular facts, but particular events, incidents localised in time and space. Particular events, such as the event of a substance attaining a certain temperature in an experimental situation, are the normal objects of classical deductive-nomological explanations. The classical, somewhat simplified, picture is that an event is explained if the observational sentence describing it can be deduced from some general law in combination with a set of initial conditions. But facts are not necessarily events localisable in time and space, and in central disciplines within linguistics there is reason to ask whether the particular facts we want to account for are events or something else. And if they are something else, we may go on to ask in what sense we may then claim to explain them.
The distinction I have tried to draw between data and facts may help us to see this possibility that some of the particular facts of linguistics are not events. The point is that data always arises as the result of some event, whether it is a tape recording, an utterance, a text corpus or an informant reaction. All these things either are or immediately result from incidents that take place on specific locations at determinate times. However, the particular facts which we extract from such data, and which we want to account for, are not necessarily events. Take generative theory, which obviously intends to be explanatory. In an article from 1974 Fred Dretske points out an interesting thing about explanations within this field: "...if we confine our attention to language instead of the actual speech acts that embody a use of language, there is, quite literally, nothing happening." Obviously, speech acts, localised in time and space, are events. But the question is whether the generative grammarian reasonably can have ambitions about explaining speech acts, or her explananda perhaps rather are atemporal facts about the expressions in some language. And we may doubt that it is the grammarian’s job to explain why Mary said "There is a cat on the mat" to John on Sunday 2 November. The grammarian’s job, rather, is to explain why we in a certain sense cannot say "There are the cat on the mat" in English, whereas we can say, "There is a cat on the mat". I say "in a certain sense", since in a certain other sense we obviously can say "There are the cat on the mat" - I have just said it twice myself. If, on the other hand, a physicist is right in claiming that an electron ‘cannot’ do x, then an electron simply will not do x, not even as an example of an impossible event. As linguists we easily see that the modal auxiliary ‘can’ is used with different modalities by the physicist who says that an electron cannot do x, and by the linguist who says that we cannot say "There are the cat on the mat". The physicist’s modality is alethic: it concerns truth in possible worlds. Therefore there is an immediate connection between what an electron can and cannot do and what it in fact does - if your theory explain the one, it also explains the other. The linguist’s modality, on the other hand, seems to be deontic: it concerns certain norms or conventions for correct behaviour which are violated if we act in certain ways. The particular facts which the grammarian wants to explain, then, concern institutional, atemporal properties of linguistic expressions seen in abstraction from concrete utterances of the expressions: the fact that they are well-formed, or synonymous, or the fact that they have a certain meaning. These are properties that are anchored in norms for correct behaviour. But it is not impossible for us to violate the norms and produce ungrammatical utterances, and furthermore, if we do, this does not in itself falsify our hypotheses about the norms. Hence we might claim that a grammatical theory which in this sense explains what people can or cannot say does not therefore explain that people actually say what they do in concrete situations. Since norms do not determine behaviour, actual behaviour is not explained by theories about norms.
This does not mean that we should not also ask about the explanations
of actual linguistic behaviour - the speech acts which constitute an important
part of the grammarian’s data. The question is simply whether this is the
grammarian’s task or not. Some would claim that it is, and disagree with
what I just said. However that may be, it seems important to distinguish
between these to types of facts which to a large extent have been derived
from the same kind of data: facts about institutional properties of linguistic
expressions considered as types, and facts about specific utterance acts
of expression tokens in specific situations. Dretske discusses this distinction
and illustrates it with an analogy which I find helpful. Let us assume
that an anthropologist wants to collect a set of particular facts about
religious beliefs and rituals in a certain society. His data may be documents,
witness reports and artefacts from the society in question. From these
data the anthropologist extracts, by interpretive methods, a set of assumed
facts about beliefs and rituals. He then uses these particular, but atemporal
facts as a basis for developing and testing a general theory about the
origins, functions and consequences of such beliefs and rituals. He wants
his theory to explain a certain type of facts, say, why the rituals survived
a radical change in beliefs. However, his theory has nothing to say about
the reasons why the documents were written, why the witness reports were
made or why the artefacts were produced. In other words, his theory does
not explain his concrete data, it does not explain the events that led
to the existence of his data. But it may also be interesting to look for
explanations of these things; the point is that this would be the task
of a different kind of theory. Cf. fig. 1. Theory 2 will contain hypotheses
about the conditions for writing things down or preserving and transmitting
them in other ways, etc., and it is clearly distinct from Theory 1. Theory
2 tries to explain concrete events in space and time, while theory 2 has
more atemporal explananda, such as the properties of correctly performed
Fig. 1 Adapted from Dretske 1974
Fig. 2 Adapted from Dretske 1974
Consider then the analogous linguistic diagram in fig. 2. The theories 2, 3 and 4 are to explain why speakers produce the utterances they do in specific situations, and must probably contain psychological hypotheses about the goals, desires and beliefs of human beings, and the connection between such factors and actual behaviour. It seems, then, that these theories take us from linguistics into psychology.
However, some linguists - we may call them the ‘mentalists’ - would draw fig. 2 differently. They don’t want to leave linguistics in order to explain Facts B, but to give causal explanations rather than goal-related explanations of aspects of linguistic behaviour, and of the speaker’s intuitions, by drawing an arrow directly from Theory 1 - the grammar - to Facts B - linguistic and metalinguistic behaviour. This is obviously not easy; as I have already discussed, grammars hardly give complete causal explanations why people say what they do. Even a perfect theory of English syntax and semantics does not enable us to predict what John is going to say in a given situation. However, one might claim that this is not altogether different from the situation of the physicist, whose theories frequently do not enable him to predict what is going to happen outside the laboratory, because of the sheer complexity of the real world. Still, the analogy is not convincing, since a grammar does not even in an idealised way allow us to predict what is going to be said. Anyway, the mentalist solution is modularisation: the grammar is assumed to be one of several mental modules which jointly determine behaviour. Thus, while for the non-mentalist a grammar specifies the institutional properties of linguistic expressions, for the mentalist it is interpreted realistically as a model of the speaker’s mentally represented capacity to act grammatically. While the non-mentalist sees the grammar as one of several possible ways of calculating the properties of linguistic expressions, the mentalist sees it as a theory of mental realities in virtue of its form. But since this grammatical competence obviously does not determine linguistic behaviour, it is assumed to be just one of several mental modules which together determine actual behaviour, or performance.
If, on the other hand, we stick to fig. 2 the way Dretske draws it, a different picture emerges. Then Theory 1 only explains Facts A, which are institutional facts in John Searle’s sense - facts of the same order as the fact that Parliament passed a law or that John and Mary married or that a football player was offside. The theory explains such institutional facts by accounting for the way in which certain acts satisfy the constitutive norms of the relevant institution. We might compare with a marriage ritual, disregarding he fact that such a ritual normally is written down in its prescribed form, while a grammar is something we must discover. If we want to explain why John married Mary, the ritual is incapable of explaining the event - why they performed this act. It might have something to do with love or similar things, and a marriage ritual is not a theory of love. On the other hand the rules of the ritual can explain why the movements they made, the words they uttered, and the movements and words they were exposed to, counted as a well-formed wedding. Similarly, the theories of the grammarian, the phonologist and the semanticist do not explain why people say what they do, nor do they explain why people think or say that linguistic expressions are wellformed, synonymous or ambiguous. The theories simply explain that the linguistic expressions are wellformed or synonymous or ambiguous, and they explain this by showing how the expressions satisfy the constitutive norms of the language as formulated in the theory, for instance by means of recursive grammatical and semantic rules.
Can these two views of grammar - the mentalist and the non-mentalist - be reconciled? They seem to differ in taking a formal grammar to be a theory of two quite different types of object. However, both types of object appear to be legitimate objects of study, so it may be worthwhile to consider the question where the disagreement, if any, resides. According to the non-mentalist view, language and grammar are irreducibly supra-individual things, residing in a community of communicating individuals in the form of constitutive norms for correct and meaningful linguistic performance. The concept of a norm or a convention cannot be understood or characterised on a purely individual level; therefore this conception of language sees it as something necessarily shared. From the individual perspective, according to this view, language and grammar are external abstract objects of which the individual may have more or less partial knowledge. Characterising this partial knowledge and the way it is structured in the individual is a highly interesting psycholinguistic project, but it is a different project; it is not the grammarian’s project qua grammarian.
According to the mentalist view, on the other hand, the study of grammar belongs to the study of individual psychology. To the extent that grammar and language exist, they exist as knowledge structures in the individual, and the grammarian’s theory is a theory of these knowledge structures. Here we perceive a point of real disagreement between the two views: to the mentalist, language as a social or supra-individual object is an "epi-phenomenon"; it is simply a secondary construction on knowledge structures within individuals. It is only these mental structures that have real existence and hence can be made the objects of scientific study. To the non-mentalist, however, language as a supra-individual entity cannot be reduced to properties of individuals. Obviously there is an intimate connection between language and the knowledge of it even to the non-mentalist: if nobody knows a language L, L cannot exist. But even so, non-mentalists deny that language can be reduced to the knowledge individuals have of it - in a similar (but not identical) way that the Norwegian Constitution cannot be reduced to the knowledge that lawyers happen to have of it, or mathematics to the psychology of mathematicians.
Do we have to stop here at the recognition of two irreconcilable philosophical positions, or is it possible to evaluate their interrelationship further? One interesting path from this point is to consider the attitudes which the proponents of the two views have towards data: how do they see the empirical basis for their respective theories? With two conceptions of language seemingly so totally different we expect significant differences in the kinds of data that would be considered relevant, or at least differences in the kinds of basic facts that the different linguists want to extract from their data. If language is a norm-based, supra-individual object we expect that data in the form of utterances is sifted in such a way that utterances considered incorrect by informants are disregarded. The non-mentalist grammarian sees his theory as a theory of correct language use - not in the prescriptive sense, of course, but as a consequence of the fact that language is a norm-based phenomenon and hence inevitably is constituted by a concept of correctness or wellformedness. Therefore he quite legitimately will only take such linguistic utterances into consideration that are considered well-formed by a critical mass of informants. On the other hand, if language is a property of the individual mind, residing in mental structures not directly accessible to consciousness but operative in governing the linguistic performance of the individual, we expect a different approach to data. In the first place, we then expect a recognition of the fact that the mental structures of individuals speaking the same language might be different: speakers with comparable output might still have different knowledge structures. In the second place, we expect the total linguistic output of the individual to be interesting - in other words, we don’t expect the concept of correctness to play a significant role in the analysis of data. And in the third place, we expect psycholinguistic experimentation to be important, in order to anchor the linguistic facts in concepts with a wider psychological applicability.
The interesting observation, of course, is that none of these expectations are necessarily met by mentalist grammarians. As for the point about different structures in different individuals, the strategy among mentalist grammarians has traditionally been to idealise to the "perfect speaker-hearer in a completely homogeneous speech community". Divorced from its seeming reference to an individual - the perfect speaker-hearer - this comes very close to identifying language as a supra-individual entity, a common denominator in a community. As for the point about being interested in the total linguistic output of the individual, mentalist grammarians are very much concerned with sifting data according to grammaticality, i.e., correctness - inescapably a normative concept. And as for the last point about psycholinguistic experimentation, that has never been considered essential by mainstream mentalist grammarians, although interesting experimental work certainly does exist.
A tempting conclusion, then, is that even mentalist grammarians are studying language as a supra-individual entity, and that the individual psychology mostly resides in the rhetoric of the discipline - with possible unfortunate consequences for the interpretation of the results. I believe one unfortunate consequence of the rhetoric of individual psychology is the insufficient recognition of the difference between certain types of linguistic projects. As an example, take the study of second language acquisition. From a period when this was conceived as a study of errors in the output of language learners, the discipline moved to an understanding of the object of study as the interlanguage of the subjects. In other words, the language of the language learner is typically recognised as a language in its own right, governed by rules and amenable to grammatical investigation. Central questions about this interlanguage will then be to what extent it derives its properties from the target language, to what extent from the mother tongue of the subject, and to what extent from universal principles of language acquisition. It is obviously a step in the right direction to try to account for the specific regularities in the output of language learners by trying to see their interlanguage as a rule-governed language in its own right. Still, it is worth reflecting on the language concept we presuppose when we call this a ‘language’, compared to the language studied by the regular grammarian of some dialect. In the case of the language learner it is quite obvious that we are studying language on the individual level. In this case there is no doubt that what we are after is a characterisation of the linguistic competence of individuals. And notice what this means for our treatment of the data: it would be meaningless in this case to discard utterances on the grounds that they are ‘incorrect’ or ‘ill-formed’ according to some standard. We are definitely interested in the ‘total output’ of the informant, and there is no concept of correctness or wellformedness around that is relevant to sifting the data before trying to infer the rules. We don’t ask the language learner or some informant whether a certain sentence produced is well-formed or not, and then exclude it from our data if the answer is "no". On the contrary, we expect the language produced to be incorrect. In this case, in other words, our object of study is definitely not constituted by a set of intersubjective norms. Now, this does not mean, of course, that the language learner is not following norms. To some extent he or she is following the norms of the target language, to some extent possibly norms of the mother tongue, and to some extent other factors may be operative. But the point is that it is not a language as defined by a set of norms which is the object of study here, as it is for the regular grammarian. Rather, it is the total competence of an individual, be it norm-governed or not. This makes an enormous difference: it is an entirely different kind of object, requiring a very different method from the method of the regular grammarian, and a very different approach to data collection (even though we may want to use the same kind of formal grammars in both kinds of studies, but then with a very different interpretation). But this difference is obscured by the rhetoric which describes even the regular grammarian’s object of study as belonging to individual psychology. This leads us too easily to use the same terminology in the two kinds of studies, using terms such as ‘language’ and ‘grammar’ to refer both to the supra-individual, norm-based system of a language community and to the developing knowledge structures inside the head of a language learner, as if they were the same kind of object.
The field of second-language acquisition enables us to see this difference particularly clearly, since the output of the second-language learner tends to deviate significantly from the grammatical norms. But the distinction is just as important when we study the output and knowledge structures of people speaking their first language: studying the competence of an individual remains a very different thing from studying the grammatical norms of the community. It seems plausible to claim that it is the way we collect and process our data and reason on the basis of it that decides what we ultimately are studying, rather than the rhetoric with which we like to present our research. Hence, linguists who write grammars to account for the structure of well-formed expressions are inevitably studying a supra-individual, abstract object, and it seems advisable to bring the rhetoric in accordance with that fact. This will benefit the genuine psycholinguistic study of the linguistic capacities of individuals by helping us to avoid confusing the two types of investigation conceptually and methodologically. It will also help us to see that we need some account of the grammatical norm systems of a community before we can approach the question of the way in which knowledge of these systems is structured in individuals. To put it simply: we need a definition of P before the question of the structure of the knowledge of P can even be given content.
We may still want to ask some questions about this supra-individual conception of language. What sort of object is it, what sort of existence can we attribute to it? Is it reasonable to assume that the language or the languages of a community exist out there with well-defined borderlines for us to discover, separating them from each other? Or to what extent is there an element of construction involved in our isolation of a language? And on what basis does the linguist build his putative knowledge of a language conceived in this way?
The linguist roughly utilises three sources of knowledge: informants, text corpora and so-called ‘introspection’. So far I have sporadically referred to the use of informants, but have said little about the other two.
When the linguist consults her own intuitions, this is often referred to as ‘introspection’. This term may be slightly misleading: there is a difference between consulting one’s own intuitions about what is current in a language community and contemplating one’s own emotional reactions or such-like. In the first case the result of the so-called "introspection" is in principle controllable, since the object of interest is not the intuitions themselves, but what they are about: informant interviews and textual data may show that the intuitions were mistaken (but not of course that the linguist was wrong in assuming that she had them ). In the latter case, however, the result is not controllable.
Still, even if they can be checked with informants, the linguist’s own intuitions hardly qualify as ‘data’, since they are not public. Even so, they have an obvious and inevitable place in the method. If the linguist masters the language under investigation, her own intuitions will clearly be a source of knowledge and at least a basis for formulating hypotheses.
The Finnish linguist Esa Itkonen goes even further. As I read him, he seems to claim that introspection in the final analysis is the only source of knowledge about institutional properties of linguistic expressions, such as their wellformedness and meanings. Itkonen criticises what he calls the positivistic understanding of science which we find in mentalist, or Chomskyan, generative grammar. In his view there can be no question of the grammarian testing hypotheses on the basis of collected data. Language is a normative, rule-based phenomenon, and the grammarian - as I have already pointed out - is not interested in all possible utterances, but only in the correct ones. It is only about correct utterances the grammarian wants to generalise. But, Itkonen says, we have a priori knowledge about what is correct or incorrect; we know with absolute certainty whether an expression in our language is well-formed or not. Granted, we do not immediately know what rules correctly summarise the norms, but this knowledge is something we arrive at by reflecting creatively on what we already know. Hence the linguist’s task is explicating rather than explanatory: the linguist will explicate linguistic norms of which she has a priori knowledge, and not explain linguistic occurrences as if they were natural occurrences. According to Itkonen the so-called ‘data’ is not independent of the so-called ‘theory’, since we only take utterances that we already know are correct into account. Hence ‘data’ hardly has any place in linguistics at all as Itkonen sees it. The compilation of text corpora, for instance, he describes as "idle ceremony".
It is interesting that Itkonen thus recommends - or rather: presents as the only possibility - precisely the method that is in practice used the most by the linguists he disagrees most with: introspection. Still, to Itkonen, and perhaps to us, this is not so strange after all. When mentalist generative grammarians use introspection a lot, in spite of their natural science rhetoric, this can be taken as another indication of the discrepancy between practice and rhetoric among them.
Many linguists will find it difficult to reconcile Itkonen’s ideas completely with their own experience, in spite of his convincing argumentation. Especially the obvious fallibility of intuitions, both those of the linguist and those of her informants, indicate that Itkonen may underestimate the empirical element in linguistic investigations. It is simply not the case that each and every language user knows it all. Linguistic knowledge in the individual is partial and perhaps also partly fallacious compared to the full set of linguistic norms of the community, as we discern it in the converging and complementary intuitions of a critical mass of informants. Furthermore, anyone who has used a text corpus in a linguistic study has had what we in Norwegian call "aha experiences". Such aha experiences hardly seem compatible with the assumption that we knew it all before: "Aha! Yes, I knew that," seems a little inconsistent. Certainly linguistic aha experiences are sufficiently valuable for the linguist to rebuff Itkonen’s claim that the compilation of text corpora is simply an idle ceremony.
Text corpora are a type of data that amply illustrates the importance of the distinction between language data and the particular linguistic facts that can be extricated from such data. It is temptingly easy to come to regard a text corpus as something very concrete, real and reliable, almost identical with the language itself, and to forget about the uncertainty that may reside in the interpretive methods we apply to extricate linguistic facts from a corpus.
During the past year I have been using a bilingual Norwegian-English corpus consisting of original texts aligned with their translations as a source of information about lexical semantics. It is obvious that this sort of question - how can we delimit and describe the meanings of words - makes some of the information derivable from the corpus interesting, and other information uninteresting. In order to illustrate this, let me mention a few things which would not be of concern in such a study. We would not be concerned with those aspects of translation which makes it a creative kind of activity, and which presuppose deep cultural insight and rich knowledge about the topic domains of the texts. Or rather: these aspects concern us to the extent which is necessary in order to identify them and then disregard them. What we would be interested in isolating in the data, is unimaginative translation. Unimaginative translation is semantically interesting precisely because semantics can be seen as the theory of unimaginative language use: the kind of use (or the aspects of use, rather) that can be accounted for purely on the basis of literal meanings. The point is that we need a theory of unimaginative and literal language use as a basis for accounting for any kind of language use at all with a minimum of precision. Thus we would want to look for instances of unimaginative translation, under the assumption that it is unimaginative translation that most closely reflects what we would like to consider to be the semantic properties of words, phrases and sentences.
So why is not all, or most adequate translation semantically motivated in this sense? The reason is that the translational relation, as we find it realised in a translational corpus, is not a relation between abstract linguistic expressions, like for instance synonymy. Rather, it is a relation between situated texts. The translational relation interrelates parole items rather than langue items: the actual linguistic expression used is not the only thing which determines what will count as a useful translation. Relevant also are the context of utterance, the purpose of the utterance, and various other kinds of background knowledge. We often see that information which in one text derives from general background knowledge or contextual factors is given explicit linguistic expression in the other. In such cases the situated text must be the basis for establishing that a translational relation obtains ? the relation is a relation between situated texts.
Now, semantic properties are properties of linguistic expressions seen as types, not only as tokens in texts. In order to use translations as a source of information about semantics, we therefore need to extricate the contribution that contextual factors such as these make to the translational relation from the contribution made by correspondence relations between words and phrases seen as types. That is, the translational relation we are interested in isolating, is not the one between texts or parole items, but the one between linguistic expressions or ‘signs’ seen as types, that is, between langue items as they occur in grammars or dictionaries. In other words, we unsurprisingly find that corpus data cannot be directly used in raw form. Linguistic data are as usual accessible only through an interpretive process, for instance through dialogues with informants, in this case consisting in "peeling off" two layers of irrelevant data. In the first place we want to disregard "bad translations", thereby isolating instances of the genuine textual translational relation between two languages. In the second place we want to disregard translational choices that can be motivated only by reference to the particular text and its circumstances, thereby isolating the linguistically predictable translations. The linguistically predictable translations will then be the ones that reflect the translational correspondence relations between the sign inventories of the two languages - relations between words and phrases seen as types rather than textual tokens. This is the relation we are interested in looking at more closely to see what semantic insights can be gleaned from it. Actual translational corpora will reflect this relation imperfectly - imperfectly in two important ways: In the first place any corpus will be severely partial, only containing a very small part of the total extension of the translational relation. In the second place it will contain much that does not belong to the relation, in the form of bad translations and textually bound translational choices.
Now, finally , what is the ontological status of the relation we would be looking for in this way? What is our justification for believing in it, and in the concomitant concept of literal meanings associated with linguistic signs seen as types rather than textual tokens? The picture painted by this assumption is like the one we find in situation semantics, for instance, whereby the full interpretation of a text is the joint result of the meanings of its words and phrases and various factors of context and discvourse situation. In other words, the assumption is that it is meaningful to talk about the meaning contribution of linguistic expressions seen in isolation - that is, of literal meanings, or at least literal meaning potentials. This, of course, is a much-debated issue, some philosophers of language tending to emphasise that language is only meaningful in contexts of use, and that literal meanings should be looked at with much skepticism. Personally I don’t hold the view that there is an objective and clear-cut distinction to be discovered between properties that should be attributed to signs as types (properties such as ‘literal meanings’, for instance) and properties that can only be attributed to sign occurrences in specific contexts. Rather, we find a scale: Studying the interpretations and possible translations of a sign as we proceed from the circumstances of a particular text to the circumstances of ever more general types of texts, it will gradually become more and more natural to see the interpretations and translational possibilities as predictable from the sign itself, considered in isolation, rather than as motivated by the text types. But there will probably be no well-defined borderline between the two types of cases.
Still, a distinction of this kind is legitimate because there are clear
cases on either side, and because we need to draw the distinction in order
to uphold the obviously necessary assumption that the linguistic signs
chosen in a particular commu-nicative situation also contribute to what
is conveyed. Assuming that everything comes from the context absurdly implies
the claim that language is superfluous - that a situation in which someone
gives a lecture conveys no more to the audience than the same situation
with the single difference that the lecturer shuts up. Still - and this
is my point here - there will inevitably be an element of purpose-driven
construction in drawing the line between semantic properties and other
properties of situated texts. Thus, since semantic properties are properties
of a language, or langue, while properties that only belong to situated
texts are properties of language use, or parole, there is an inevitable
element of construction in our isolation of language itself as an object
of study. We are certainly not constructing our object of study to the
point of complete relativism, but it is important to see the way in which
the object itself to some extent is shaped by the questions we want answered.
Perhaps this course can contribute to some insightful de-construction of
our language concepts.
Dretske, F.I. 1974: Explanation in linguistics, in: Cohen, D. 1974 (red.):
linguistic phenomena. New York: John Wiley & Sons.
Dyvik, H. 1992: To forelesninger om lingvistikkens vitenskapsteori. Department of Linguistics and Phonetics, University of Bergen: Skriftserie, serie B, nr. 41.
Dyvik, H. 1995: Språk, språklig kompetanse og lingvistikkens objekt. I: Cathrine Fabricius-Hansen and Arnfinn Muruvik Vonen (red.): Språklig kompetanse ? hva er det, og hvordan kan det beskrives? = Oslo-studier i språkvitenskap 11. (1995) Oslo: Novus Forlag, pp. 20-41.
Itkonen, E. 1975: Transformational grammar and the philosophy of science, in: E.F. Koerner (red.): The transformational-generative paradigm and modern linguistic theory. Amsterdam 1975.
Searle, J. 1969: Speech Acts. Cambridge: Cambridge University Press.
[an error occurred while processing this directive]