Exploiting Structural Similarities

in Machine Translation1

Helge Dyvik

Department of Linguistics and Phonetics

University of Bergen
 
 

Abstract

The central properties of an experimental system for machine translation, PONS, and the ideas behind them, are presented and motivated. PONS achieves a compromise between linguistic sophistication and efficiency by automatically exploiting structural similarities between source and target language in order to take ‘shortcuts’ during the translation process. The system uses a PATR-type linguistic formalism to encode LFG-type grammatical descriptions and Situation Semantics-type semantic descriptions, and it is implemented in Medley Interlisp.
 
 
 
 
 
 

1. Introduction: Linguistic Sophistication vs. Efficiency
 
 

Linguistic sophistication and efficiency are traditionally regarded as adversaries in machine translation systems. If we can establish simple pointers between corresponding expressions in source and target language or get from the one to the other by means of a few simple constituent order adjustments or other ad hoc operations, we do not want to waste time finding a lot of redundant grammatical and semantic information about the expressions. This has led some system developers to dispense with complex grammars and resort to ad hoc measures of the indicated kind. But on the other hand only linguistically motivated descriptions allow us to structure the information about a language in such a way as to be flexibly relatable to alternative translation partners, and only such descriptions allow us to keep the consequences of them under control. Thus we seem to have conflicting purposes.

A working hypothesis underlying the work reported here is that this conflict between linguistic sophistication and efficiency is not a necessary one. The foundation for this optimism is the efficiency of human translation, which indicates that we may achieve efficiency by studying the structure of the problem domain itself — the translational relation between languages — and exploiting that in the development of systems. In particular it seems worthwhile to ask what circumstances make the translator’s job easy, and what circumstances make it difficult. One obvious fact is that it is easier to translate between closely related languages than between more distantly related languages. The linguistic scene in Scandinavia, with its closely related standard languages, provides ample illustration of the fact that it is too simple to conceive of translation as the same kind of task irrespective of the kind of language pair that is involved. Obviously the human translator will maximally exploit her bilingual competence, her knowledge of the structural relationship between the languages. For instance, she will use as much of the structure of a source sentence as she can within the limits imposed by idiomaticity. She will not take the trouble to ponder on the semantic and stylistic properties of a sentence in total abstraction from its form and then try to encode these properties from scratch in the target language, with no regard for the way it was encoded in a related source language.

One might claim that it is not simply considerations of pardonable laziness, or, in the case of MT systems, efficiency, which motivate shortcuts of this kind. In addition, exploiting one’s knowledge of structural similarity between two languages is an aspect of capturing the translational relation between them as closely as possible. This is because the translational relation also involves equivalence with respect to the source text’s way of using language. The more closely related a pair of languages is, the greater is probably the overlap between their inventories of linguistic ‘devices’, i.e., the more easy it is to express a certain denotational content in the same way. In closely related languages similar effects can be achieved with similar means. Hence aiming at a structurally similar translation may contribute to the achievement of a certain aspect of the translational relation.
 
 
 
 

2. The PONS System
 
 

The PONS System incorporates these ideas in a simple form. ‘PONS’ is acronymic for "Partiell oversettelse mellom nærstående språk" = ‘Partial Translation between Closely Related Languages’. The system has two levels of intended users: grammar writers who supply the system with descriptions of languages, and people who use the resulting system to translate texts between the languages described. Hence the system is a development environment for MT systems rather than an MT system itself. The system has mostly been tested on translation of sentence sets and simple texts (e.g., fairy tales) between the closely related languages Norwegian and Swedish, and between the more distantly related English and Norwegian. The linguistic descriptions have comprised small, ad hoc lexicons and medium-sized, less ad hoc grammars of 60-70 fairly complex rules.
 
 

2.1 The Linguistic Descriptions

The linguistic descriptions are developed in a modified and extended version of Lauri Karttunen’s D-PATR, a development environment for unification-based grammars. The descriptions consist basically of a lexicon and a set of syntactic rules.

There is no morphological analysis: the lexicon consists of a list of entries for all word forms and a list of stem entries, or ‘lexemes’. Each entry is basically a set of equations defining a feature structure. There is also a template list exempting the grammar writer from entering recurring sets of equations many times. The lexical entries are compiled to re-entrant, potentially cyclic directed graphs representing feature structures, and are available to the unification-based parsing, translation and generation procedures in that form. Fig. 1 shows in a simplified form the central part of the structure representing the word form spiser ‘eats’.
 
 


 
 

Fig. 1 Lexical entry as feature structure: spiser ‘eats’

Fig. 1 illustrates some central features of the linguistic descriptions. The structure can be seen as an underspecified version of the structure representing any sentence with spiser as its main verb form. It has a syntax substructure roughly corresponding to an LFG-type f-structure, and a trans substructure representing the semantics of the expression in the form of a situation schema (cf. Fenstad & al. (1987) for the concept of situation schemata and their derivation by co-description in unification-based grammars). Thus, fig. 1 informs us that spiser expresses a relation eat’ of two arguments, and that the location of the eating temporally overlaps the discourse-location. The latter piece of information is expressed by the deictic present tense and represented in situation-theoretic terms as a value of loc, i.e., the location of the described situation, by means of an indeterminate — or parameter — (ind) of type LOCATION being constrained to enter into the relation temp-overlap with the discourse-location. Furthermore the structure represents the linking information carried by the form spiser: arg1 is linked to the function subject and arg2 to object by the unification of the semantics of the respective functions with the corresponding argument positions.

The syntactic rules are context-free phrase structure rules annotated with equations defining feature structures. The rules, too, are compiled to directed graphs representing the feature structures; fig. 2 shows a simple example.

Fig 2 Syntactic rule as feature structure

Fig 2 represents the rule S -> NP VP with a feature structure showing that the structures of the S mother and the VP daughter are unified, and furthermore that the structure of the NP daughter is the value of the path <syntax subj>. The rule formalism also allows optional constituents, Kleene operators (* and +), wholly or partly unspecified constituent order (‘ID-rules’, in GPSG terms), the equivalent of local head movement (to cope with verb-second phenomena, where finite verbs occur non-adjacent to their complements), and unbounded filler-gap dependencies.
 
 

2.2 Parsing

The source text is divided into substrings at certain punctuation marks, and the strings are parsed by a bottom-up, unification-based active chart parser. It is obviously unrealistic to aim for full-coverage grammars able to assign a representation to any string occurring between two punctuation marks. In general, therefore, the result of the bottom-up parse may be a set of partial analyses, as indicated in fig. 3.

Fig. 3 Partial analysis resulting from bottom-up parse

Each edge in the chart in fig. 3 represents an analysis of the substring it spans. The system chooses the edge sequence(s) containing the lowest number of edges — i.e., the maximal analyses of the string — and translates each edge content separately, concatenating the results. Obviously the quality of the translation improves with increased grammar coverage, but as long as the words are known some output is guaranteed.
 
 

Fig. 4 Result of a parse: PS-tree with associated features

The analysis contained in an inactive edge consists of a phrase structure tree with an associated feature structure, as exemplified in fig. 4, which shows a simplified version of the analysis assigned to the sentence John sees Mary. Each node in the PS-tree is associated with a substructure of the full feature structure of the sentence, which is associated with the top node. We find the same bipartite structure as in fig. 1, with the value of trans, a situation schema, serving as the semantic representation of the sentence. The leftmost layer of the schema represents grammaticalized information about the discourse situation: the sentence is declarative, which means that the speaker informs the hearer at the discourse location about a described situation, which in its turn is represented as arg3 of the speech act.

In the most elaborate mode of operation of the system, when no structural similarity between the languages can be exploited (see 2.3, 2.4 below), the situation schema is extracted from the source feature structure and used as an interlingua expression from which target language expressions are generated. Divorced from its syntax partner the situation schema contains a minimum of source language specific grammatical information — thus, for instance, the argument linking to specific syntactic functions is "forgotten".
 
 

2.3 Grammar Comparison

A central property of the PONS system is that it is able to exploit structural similarities between the source and the target language. When the system ‘knows’ — and we will see how it can know this — that the target language allows the expression of the same content by means of the same or a very similar syntactic structure, it takes a shortcut past the complicated semantically based translation procedure and generates the translation directly from the syntactic structure of the source string. To achieve this the system operates in three alternative modes. Modes 1 and 2 are the shortcut modes: They exploit types of structural similarity between source and target language in a transfer procedure. Only in cases when no similarity can be exploited is the situation schema utilized to generate translations "from scratch", using no information about the source sentence structure. This is mode 3. The choice of appropriate mode is made by the system itself on the basis of information about the relationship between the languages. This language pair specific information is also derived by the system itself by means of a grammar comparison, which happens once and for all before translation starts. Information about the result of the comparison is then added to the respective grammars. A grammar can contain information about any number of translation partners.

During the grammar comparison the lexicons and rule sets are compared item by item. Fig. 5 illustrates the criteria taken into account during comparison of two lexical entries.

Fig. 5 Automatic lexicon comparison

For two lexical entries to correspond they will have to share category, sense and linking information (if any). If they do, they are categorized as ‘mode 1’ with respect to the other language, and a pointer to (the index of) its partner(s) in the other lexicon is added; cf. the pair Eng. ‘frighten’ : Norw. ‘skremme’ in fig. 5. A pair like Eng. ‘please’ : Norw. ‘like’, on the other hand, does not share linking properties and are therefore not interrelated: they count as ‘mode 3’ with respect to each other, in spite of shared sense and category. The consequences of this will be explained in 2.4 below.

Fig. 6 illustrates aspects of the comparison of syntactic rules.

Fig. 6 Automatic rule comparison

If a source and a target rule have the same mother and daughter categories, and if the daughters occur in the same order and the daughter feature structures end up in the same places in the mother feature structures, then the rules correspond in mode 1. In that case a ‘mode 1’ tag is added, but no pointer to the target rule is needed: the point is that when we know that the target grammar has a rule that similar, we do not need to consult it, but can use the source string structure. This is the case with the S -> NP VP rules in fig. 6. On the other hand, if two rules differ in the order of the daughters, or in the presence or absence of minor syntactic categories, while sharing the other properties, the correspondence is classified as ‘mode 2’, and a pointer to the rule partner is added. This is illustrated by the NP -> POSS N’ : NP -> N’ POSS rules in fig. 6, analyzing phrases like Eng. my dog : Norw. hunden min ‘the-dog my’. When a source string contains a phrase like this, we need quick access to the target rule to build a correct subtree, but the rest of the source analysis (the complete feature structure, and the daughter nodes) can be used as they are. The net effect is a quick constituent order adjustment without the unprincipled ad hoc quality that such adjustments usually have, since it is triggered by information derived from a grammar comparison. If there is no target rule corresponding in either mode 1 or mode 2, the source rule is marked ‘mode 3’ with respect to the target language.
 
 

2.4 Translation Procedures

Fig. 7 gives an overview over the translation procedures. After parsing the source string the mode of each derived tree is determined. A tree node has the mode of the rule used to expand it, and a tree as such has the highest mode of any of its nodes. That is, for a tree to be mode 1, all its nodes and lexical entries must be mode 1.

Fig. 7 Overview over the translation procedures

The mode of the tree determines the further course of events. Fig. 8 illustrates a mode 1 situation, the result of parsing the sentence Wagner frightens a dog which can be rendered in Norwegian as Wagner skremmer en hund.

Fig. 8 Mode 1 parse with target pointers

All nodes in fig. 8 are mode 1, and this means that an isomorphous tree with a closely corresponding feature structure could be built from the target grammar, too. This, in its turn, means that we won’t have to do that: the procedure can forget about the target grammar and use the source structure already built. In other words, it is possible to perform a kind of word-by-word translation by substituting target words for source words at the terminal nodes. More precisely, the target stem structures pointed to from the terminal nodes are overwritten over the structures at the terminal nodes (see fig. 9). The operation ‘overwrite’ is used rather than ‘unify’ since we want the target stem structures to ‘win’ in cases of feature clashes — e.g., in cases when a target noun stem has a different gender from the source stem. The final stage is finding the correct target word forms, which will be the forms whose structures are simultaneously unifiable with the structures now present at the terminal nodes (see fig. 10). Since agreeing tree nodes partially share feature structures, agreement comes out correctly even in cases where gender features, for instance, have been altered by overwriting, as explained above.

Fig. 9 Mode 1: overwrite target stem structures at terminal nodes

Fig. 11 shows a mode 2 situation. One of the nodes in the parse tree is mode 2 and contains a pointer to a target rule with a different constituent order, reflecting the fact that possessives occur on different sides of the head noun in the two languages. The procedure builds a subtree from the target rule pointed to and splices it in at the appropriate position in the source tree. The target rule feature structure is overwritten over the structure at the source tree node. This is shown in fig. 12. The effect is that of a quick switch of constituent order, and in the example also the addition of the feature definite to the N daughter. After this the procedure continues as in mode 1, with stem overwriting and search for compatible word forms.

A mode 3 parse is a parse where at least one node or lexical entry is mode 3, i.e., lacks a structurally close partner in the target language. Thus, if we derive a tree by means of at least one rule without a mode 1 or 2 partner in the target language, or, say, substitute the verb form pleases for frightens in the example, the tree will be mode 3, in

Fig. 10 Insert compatible word forms at terminal nodes

the latter case since ‘please’ does not point to Norw. ‘like’ because of the different argument linkings. (The Norwegian translation of Wagner pleases my dog would be Hunden min liker Wagner, with subject and object reversed.) In such a case we have to discard both the tree and the syntactic features, and only hold on to the semantic features: the situation schema. The procedure extracts the situation schema from the feature structure and uses it as an interlingua expression to generate target strings from it. This happens in three stages; cf. fig 7:

I Target stem entries expressing the relations in the situation schema (see’, frighten’, please’, dog’, named-Mary’, etc.) are found. Appropriate parts of these entries are unified with appropriate parts of the situation schema. The result is that target language specific grammatical features are added to the interlingua structure. Since there may be multiple solutions, the output of this stage is a set of full feature structures containing both syntactic and semantic information, e.g., target language specific argument linking information.

II Target grammar rules and stem entries are predicted top-down to the extent that they are compatible with at least one of the initial feature structures. The output of this
 
 

Fig. 11 Mode 2 parse with target pointers

stage is a set of trees with stem entries at the terminal nodes, in other words the same kind of structure as the one we have after overwriting target stems over terminal nodes in modes 1 and 2.

III Target word forms whose structures are simultaneously compatible with the terminal tree nodes are found; this is the same procedure as the final procedure in modes 1 and 2. The final output, then, is strings of target word forms.

The output of mode 3 translation, then, is — with an important qualification — the set of target strings to which the target grammar assigns a situation schema which is compatible with the input schema. The reason why this must be qualified is that the set of strings with a compatible situation schema will in general be infinite, which would give us a non-terminating procedure. It is generally possible to continue adding new constituents to a construction — adjectives, relative clauses, adverbials, etc. — and the resulting augmented situation schema will always be compatible with the input schema,

Fig. 12 Mode 2: splice in target subtrees

since unification is a purely additive operation. To avoid this the procedure is constrained to find only those strings that do not express any additional relations by lexical means. That is, we never get extra words whose sense is not licenced, so to speak, by the information present in the input situation schema. This is achieved by limiting the set of sense-carrying lexical entries to the set found during stage I of the generation procedure. Thus, the Norwegian NP bilen min will yield English my car, but not my blue car, my new car, my car that I bought yesterday, etc., which all have compatible situation schemata. This does not mean, however, that the target strings never express relations that do not occur in the input schema. If such relations are expressed by grammatical means — morphological categories or syntactic rules — they may be added to the target schema. We need this degree of flexibility to cope with the fact that languages differ as to what content they grammaticalize. Consider, for instance, translation from a language without tense into English — say, the Chinese sentence Wo qu ‘I go’, ‘I went’. The situation schema resulting from parsing this sentence would contain no relation specifying precedence or overlap with respect to the discourse location, whereas this information is obligatorily present in English sentences. The unification-based translation procedure would lead to the generation of the set of English strings that have a compatible situation schema, that is, both present and past tense sentences. In this case, then, relations would be added to the situation schema — since the relations in this case are expressed by grammatical rather than lexical means in the target language.
 
 
 
 

3. PONS and the Translational Relation
 
 

The alternative modes of operation make PONS at the same time a transfer system and an interlingua system. In modes 1 and 2 translation follows a kind of transfer procedure: the syntactic structure derived by parsing the source string is modified by overwriting it with information from the target grammar, whereupon target strings are generated from it. In mode 3 translation goes via an interlingua expression in the form of a situation schema.

The interlingua method is often regarded with skepticism. One reason is that it may seem to make the translational relation too rigid. If translation presupposes that the source and the target expressions are mapped to identical interlingua expressions, then the translational relation calculated must be an equivalence relation; that is, it must be both symmetric and transitive. All expressions mapped to the same interlingua expression would belong to a translational equivalence class. But the empirical translational relation is not an equivalence relation, given plausible assumptions. If we define a translational relation T as the relation holding between a set of source language expressions A and the set of target language expressions B such that each expression in A can be felicitously translated to every expression in B, and only to them, in some context, then it is easily seen that T is neither symmetric nor transitive. To take one simple example: assume Norwegian as source and Swedish as target, and the sets A and B interrelated by T:

(1)

A = {brevene som har blitt skrevet, brevene som er blitt skrevet,...}

B = {breven som har skrivits, breven som skrivits,...}

All the expressions mean roughly ‘the letters that have been written’. But in Swedish the perfect auxiliary har can be omitted in subordinate clauses, as in breven som skrivits. Thus, this expression does not express tense and therefore corresponds to past tense as well as present tense expressions in Norwegian (‘had been written’ as well as ‘have been written’). Hence the sets A and B would no longer be interrelated by T if we switched source and target language. Similar examples can easily be adduced to show that transitivity fails, the point always being that languages differ in what distinctions they may express or leave unexpressed, so that information may be lost or added in the translation process. Even the translational relation holding between individual expressions (rather than the relation T between sets of expressions) may plausibly be assumed to be non-symmetric, e.g., in cases where the source language expresses by lexicalization or by an idiom something that the target language can only express compositionally. An example would be Norwegian Det var skareføre translated into English as There was a frozen crust on the snow creating special conditions for skiers. Switching source and target here arguably does not take us back to the idiom or single lexeme as the optimal translation, since the compositional quality of the English expression can be recreated in Norwegian. In short, an interlingua system imposing symmetry and transitivity on the translational relation would be formally inadequate.

However, the unification-based translation procedure using situation schemata as interlingua expressions do not have this weakness. This is because the source and the target expressions are not required to be associated with identical situation schemata, only with compatible situation schemata. The operation of unification tests for compatibility of structures, not for identity. As a consequence the non-symmetry of the T relation is easily captured. Examples such as (1) above, for instance, cause no problem — brevene som er blitt skrevet would yield both breven som har skrivits and breven som skrivits, since both Swedish expressions will have a compatible situation schema, while breven som skrivits in its turn, when translated back into Norwegian, would yield both present and past tense translations, for the analogous reason.

In the shortcut or transfer modes 1 and 2 we will not, in the normal case, get anything that we would not also have got if we forced the system to use the elaborate mode 3. This is what the initial grammar comparison ensures. But the strings produced by modes 1 and 2 will in general be subsets of the strings produced by mode 3, since modes 1 and 2 enforce the additional demand that the source and the target strings must share formal properties as well as semantic properties. As an example, consider (2) (all translatable as ‘The decisions that were made by the Government’):

(2)

Swedish:

(a) de av regeringen fattade besluten

the of the-government made decisions

(b) de beslut som fattades av regeringen

the decisions that were-made by the-government

Norwegian:

(c) de av regjeringen fattede beslutninger

the of the-government made decisions

(d) de beslutningene som ble fattet av regjeringen

the decisions that were made by the-government

While the Swedish expression (a) in mode 3 might yield both (c) and (d) when translated into Norwegian, it would yield only (c) in mode 1. Now, formal equivalence is one possible component of the translational relation (cf. Koller 1992). In the case of closely related languages, formally similar constructions will typically share stylistic properties; this is precisely the case in the example from Swedish and Norwegian just mentioned. The result of this is that the lower modes 1 and 2 are more than just expedient devices to speed up the translation process — they may actually contribute to capturing further properties of the empirical translational relation. In the example mode 1 gives us the translation that is not only denotationally but also stylistically equivalent to the source expression, while mode 3, given the kind of situation schemata we are using, gives us all denotationally equivalent expressions.

From a more general point of view we may regard this as an exploitation of implicative relations between aspects of the translational relation between a given pair of languages. In the example formal equivalence implies stylistic as well as denotational equivalence. It seems a plausible hypothesis that there will be more implicative relationships like these the more closely related the source and target languages are. In pairs like Norwegian and English we can usually trust that modally and denotationally equivalent constructions are pragmatically equivalent: in both languages interrogatives can be used to make polite requests, etc., while this might not be the case in more distantly related languages. Among Scandinavian languages, furthermore, formal equivalence will often imply denotational and stylistic equivalence, as we just saw. This is the reason why translation is easier the more closely related the languages are: as long as we can trust such implicative relations we know that we do not have to worry about recreating more than the properties that imply all the others — the rest will follow. Hence a study of such implicative relations among aspects of the translational relation ought to benefit the development of MT systems that are at once efficient and linguistically informed.
 
 


References

Barwise, J. & Perry, J. (1983): Situations and Attitudes. Cambridge, Mass.: The MIT Press.

Bresnan, J. (1982) (ed.): The Mental Representation of Grammatical Relations. Cambridge, Mass.: The MIT Press.

Dyvik, H. (1988): Sentence synthesis from situation schemata: a unification-based algorithm. Nordic Journal of Linguistics 11, 17-32.

Fenstad, J.E., Halvorsen, P.-K., Langholm, T. & Benthem, J. v. (1987): Situations, Language and Logic. Dordrecht: D. Reidel.

Gazdar, G., Klein, E., Pullum, G. & Sag, I (1985): Generalized Phrase Structure Grammar. Oxford: Basil Blackwell.

Halvorsen, P.-K. (1988): Situation Semantics and semantic interpretation in constraint-based grammars. In: Proceedings of the International Conference on Fifth Generation Computer Systems, FGCS-88, Tokyo, Japan, November 1988.

Halvorsen, P.-K. & Kaplan, R. M. (1988): Projections and semantic description in Lexical-Functional Grammar. In: Proceedings of the International Conference on Fifth Generation Computer Systems, FGCS-88, Tokyo, Japan, November 1988.

Kaplan, R.M. & Bresnan, J. (1982): Lexical-Functional Grammar: a formal system for grammatical representation. In: Bresnan (1982), pp. 173-281.

Kaplan, R.M., Netter, K., Wedekind, J. & Zaenen, A. (1989): Translation by Structural Correspondences. In: Proceedings of the Fourth Conference of the European Chapter of the Association for Computational Linguistics, pp. 272-281, University of Manchester.

Karttunen, L. (1986): D-PATR - A Development Environment for Unification-Based Grammars. Center for the Study of Language and Information, Stanford University: Report no. 61.

Koller, W. (1992): Einführung in die Übersetzungswissenschaft. 4. Auflage. (Uni-Taschenbücher 819.) Heidelberg: Quelle & Meyer.
 

1. This article is published in Computers and the Humanities 28: 225-234, 1995.
© 1995 Kluwer Academic Publishers.
 

[an error occurred while processing this directive]