TERM AND TRANSLATION VARIATION OF MULTIWORD TERMS

Phraseology is central to specialized language. In scientific and technical communication, multiword terms (MWTs) (e.g. volatile organic compound ) are the most frequent type of phraseological units. Rendering them into another language is not an easy task due to their cognitive complexity, the proliferation of different forms, and their unsystematic representation in terminographic resources. This often results in a broad spectrum of translations for MWTs, leading to higher term variation as a result of their composition by two or more constituents. In this study we carried out a quantitative and qualitative analysis of English term variants of MWTs from the environmental domain and their translations into Spanish. The focus was on translation variation and its occurrence in different linguistic resources.


Introduction
Phraseology has been extensively studied in both general language (Firth 1957;Sinclair 1991;Mel'čuk et al. 1995;Corpas Pastor 1996, 2000Mogorrón-Huerta 2010; inter alia) and specialized discourse (Picht 1991;Meyer & Mackintosh 1996;inter alia). The guiding principle of phraseology is that language is not only composed of individual words, but also of larger structures that function as a whole (Sinclair 1991). In their broadest sense, phraseological units are lexical units formed by two or more words that frequently co-occur and exhibit a variable degree of lexicalization, syntactic and semantic stability, and a possible idiomatic nature (Gläser 1988;Corpas Pastor 2000). There are different types of phraseological units (e.g. idioms, collocations, proverbs, etc.), whose consideration as such depends on the scope of the approach and how they comply with these features.
Specialized phraseological units have been defined as phraseological units conveying a specialized meaning which are frequently used in a scientific or technical domain, contain at least one term, and exhibit a certain degree of lexicalization (Bevilacqua 2004: 28), e.g. to generate power. Specialized phraseological units are fundamental since they give semantic precision to terms, play a central role in conceptual systems, and allow expression according to the conventions of the different specialized domains. However, they have been less studied than their homologues in general language.
Multiword terms (MWTs) are the main type of phraseological unit in specialized discourse (Meyer & Mackintosh 1996;Ramisch 2015). In MWTs, two or more lexemes converge to form a new unit of meaning (e.g. shrouded wind turbine). These lexemes can join and form a graphic compound (sacacorchos) or keep the space in between (pez espada). Although the designation of compound is often restricted to the first type (Corpas Pastor 1996), we consider both types as compounds, even though our study focuses on separated or syntagmatic compounds.
Some authors refuse to consider MWTs as phraseological units (Zuluaga 1975;García-Page 2008). However, we agree with those including them in the group (Benson et al. 1986;Pawley 2001;Ramisch 2015; inter alia), because they share the defining features of phraseological units: the formation by two 1 or more elements, the frequent co-occurrence, the functioning as a whole, and a certain degree of lexicalization. They differ from idioms, the undeniable example of phraseological unit according to narrow approaches (García-Page 2008), because they are less lexicalized and idiomatic (thus, more transparent), and they convey concepts.
Precisely because these combinations are less lexicalized than other phraseological units, they are especially inclined to term variation or the coexistence of different denominations. For example, the Spanish idiom a tenor de lo establecido is a very lexicalized structure that does not seem to admit term variation. On the contrary, MWTs such as contaminación por ozono show different term variants, such as contaminación por ozono troposférico or contaminación fotoquímica. This happens because MWTs allow lexical expansion by combining concepts in different ways that are feasible in a particular 1. One of the defining features of phraseological units is their formation by at least two elements. However, these can often converge in a graphic compound (e.g. rompeolas). This does not mean that they lose their phraseological nature (in fact, there are studies, such as Oltra Ripoll' s [2018], which include monolexical units in their phraseological analysis). Nevertheless, these are not the focus of our study.
domain (Picht 1991). For this reason, they can be more prone to variation than general phraseological units or other fixed, specialized combinations. Furthermore, knowledge transfer and market expansion highlight the need for specialized translation, which implies dealing with a vast quantity of MWTs. Rendering them into other languages is not exactly easy, in part due to their characteristics, which make them cognitively and structurally complex, as well as their unsystematic treatment in terminological resources. This results in a wide variety of translation solutions, some more adequate than others in each context, which evidence a high degree of term variation when translating MWTs.
We carried out a quantitative and qualitative analysis of the Spanish translations of a set of English MWTs and their variants, with a focus on translation variation. Our goals were: (1) to define a typology of term and translation variants of MWTs; and (2) to investigate term variation of Spanish MWTs in different contexts (translation, bilingual or multilingual lexicography, and original production in Spanish). Our results showed that MWTs exhibit an enormous degree of term variation of different characteristics, which is particularly present in translation scenarios (i.e. parallel corpora).
The rest of this article is organized as follows. Section 2 explains term variation and its presence in specialized discourse. Section 3 focuses on variation in translation contexts. Section 4 presents the materials and methods of the study, and Section 5 describes the results obtained. Particularly, in Section 5.1 we present a typology of term variation found in translation, and in Section 5.2 translation variation is compared across different resources. Finally, Section 6 summarizes the conclusions of this study and future research lines.

Term variation in terminology
Variation is an essential feature of all languages, even in specialized discourse. It can be conceptual, when it affects meaning, or denominative, as explored in this study, when different designations are used to name the same concept (e.g. wildfire and rural fire). Not only are monolexical units variable, but also phraseological units, in particular MWTs.
However, in specialized discourse, both phraseology and variation were obscured for a long time, namely because the General Theory of Terminology (TGT, Wüster 1968) downplayed problematic aspects such as context, phraseology, and variation. In fact, whilst aware of the existence of these phenomena in specialized discourse, phraseology and variation were perceived as obstacles for effective expert communication. The General Theory of Terminology took thus a prescriptive approach in which a term was said to allude to only one concept, and a concept was named by only one term. The richness resulting from variation was thus artificially ignored for the sake of precision, even though variation often emerges for the same reason, since new ways of conveying meaning are constantly sought for.
Variation did not become a focus until the advent of the new theories of terminology, which formulated communicative and cognitive approaches, and acknowledged the variable nature of both terms and concepts (Cabré 1993;Temmerman 2000;Freixa 2006;León Araúz 2017). Although to a lesser extent compared to general language (Freixa 2006;Sanz Vicente 2011), specialized discourse exhibits a considerable degree of variation, which has been discovered by means of corpora (Fernández-Silva 2018).
MWTs are largely used to illustrate specialized term variation. These terms are formed by two or more elements: a head and one or several modifiers. The head often indicates the category to which the concept belongs, whereas the modifier often indicates the criterion for subdivision of the category (Bowker 1998: 487). Depending on the nature of the head (e.g. object, property, process, etc.), the modifier can specify different types of features (e.g. purpose, location, method, material, etc.) (ibid: 488). Although some MWTs are fixed, it is evident that most of them are very often prone to variation (e.g. air/atmospheric/air-borne/airborne pollutant), as has been explained in different studies (Bowker 1998;Fernández-Silva & Kerremans 2011;Daille 2017;Giacomini 2018;Gledhill & Pecman 2018;Cabezas García & Chambó in press;inter alia).
Initially, the use of one variant or the other might seem arbitrary. However, as shown by Rogers (1997: 219), there are certain systematic patterns of variation that need to be explained. In the same line, Bowker & Hawkins (2006) affirm that variation cannot be attributed to the carelessness of subject field experts, but rather to their desire for precision and the carefulness invested in their choice of expression. On the one hand, variation sometimes happens with a specific purpose (Bowker 1998;Fernández-Silva et al. 2009;Kerremans 2017;Freixa & Fernández-Silva 2017;Gledhill & Pecman 2018), and, on the other, it can also reveal the novelty of concepts (i.e. neologisms) (Cabré 1993;Picton 2011).
As stated by Candel Mora & Carrió Pastor (2012), discovering the causes or types of variation is important for both theoretical and practical reasons. From a theoretical perspective, it reflects the mental processes involved in the selection of one term over another. On a practical level, this information is helpful for terminologists or translators in production tasks, since they need to know when to use one variant over another and the reasons why it is the best choice in a particular context. Traditionally, the reasons for variation have been user-based (resulting in temporal, geographic, or social variation) or usage-based (i.e. field, tenor, and channel) (Gregory & Carroll 1978). Nevertheless, additional reasons can be involved in the variation of term denominations. As Freixa (2006) states, causes for term variation can be (1) dialectal; (2) functional; (3) discursive; (4) interlinguistic; and (5) cognitive. Several of these causes can also cooccur.
Dialectal reasons are based on the geographical, chronological, or social origin of speakers. Functional reasons are related to field, tenor, and channel. Discursive reasons can lead to stylistic and rhetoric changes. For instance, Collet (2003) argues that MWTs contribute to text cohesion by molding their shape in different contexts. Freixa & Fernández-Silva (2017), and Fernández-Silva (2018) state that intratextual term variation, especially in the case of MWTs, facilitates cohesion thanks to its repetitive nature. Along these lines, Fernández-Silva (2016) points out that MWT variants that emphasize different perspectives of the same concept contribute to knowledge construction. Accordingly, Gledhill & Pecman (2018) argue that English N+of+N MWT variants (e.g. release of plumes) usually convey knowledge rich contexts, while N+N MWTs (e.g. plume release) introduce new information.
Reasons for term variation can also be interlinguistic, when two or more languages are in contact and influence each other; and cognitive, which are based on the different conceptualizations of reality or user motivations. Cognitive variants are thus the natural reflection of multidimensionality, that is, the nature of those concepts that can be organized according to different facets or dimensions (Bowker 1998). As Meyer & Mackintosh (1996) argue, MWTs are the ideal scenario for multidimensionality since different facets can be shown in the modifiers (e.g. photochemical smog [cause], summer smog [time], ozone smog [agent]). Term and concept variation, far from being unrelated, are thus the consequence of the convergent influence of multidimensionality, context and dynamism in specialized domains (León Araúz 2017).
Apart from the causes of variation, it is also important to reflect on its consequences. According to Fernández-Silva et al. (2009), term variation can have no cognitive consequences when there is only a change in the form but not in the meaning (e.g. marine product and sea product). Alternatively, term variation can have cognitive effects, when there is a shift in perception along with the change in form (e.g. sea product and fishing product). In this line, some classifications based on the semantic distance of term variants have emerged, such as the ones presented in Aguado de Cea & Montiel Ponsoda (2012) and Fernández-Silva (2018). Both studies distinguish three main groups: variants with (1) minimum, (2) medium, and (3) maximum semantic distance.
In the first group, the semantic content does not change, that is, terms are conceptually equivalent. For Aguado de Cea & Montiel Ponsoda (2012), this set includes synonyms, such as graphical and orthographical variants (localization, localisation), inflectional variants (cat, cats), and morphosyntactic variants (nitrogen fixation, fixation of nitrogen). Fernández-Silva (2018) adds morphological variants (ozone-depletion potential, ozone-depleting potential) and specifies that, in MWTs, synonymy can affect just one of the constituents (organic matter pollution, organic matter contamination).
For Aguado de Cea & Montiel Ponsoda (2012), variants with a medium semantic distance are partial synonyms or terminological units that highlight different aspects of the same concept, such as stylistic or connotative variants (man, bloke), diachronic variants (tuberculosis, phthisis), dialectal variants (gasoline, petrol), pragmatic or register variants (headache, cephalalgia), and explanatory variants (immigration law, law for regulating and controlling immigration). Fernández-Silva (2018) studies medium semantic distance in MWTs where the conceptual change is reflected in the modifiers. They can be subject to reductions (motor vehicle emission, vehicle emission), additions or deletions of non-defining characteristics (non-point pollution, non-point source pollution), or the use of a different defining feature (summer smog, Los Angeles smog). Again, the implication of some of these characteristics is directly linked to multidimensionality and cognitive variants.
Finally, variants that entail a maximum semantic distance are terminological units that highlight different features of the same concept which belong to different conceptualizations, or variants that refer to two conceptually related concepts (Aguado de Cea & Montiel Ponsoda 2012; Fernández-Silva 2018). In the specific case of MWTs, they involve changes in the MWT head or even in both constituents. These variants are the most cognitively complex since the semantic category of the concept is altered. This change can be just an emphasis of a specific perspective of the conceptualization (exhaust emission, exhaust pollution). Alternatively, a hypernymic category (doubly fed induction generator, DFIG machine) or, on the contrary, a hyponymic category (high level ozone, stratospheric ozone) can be employed. It is also possible to replace both the head and modifier with totally different categories (oil pollution, discharged hydrocarbon), which is the most semantically distant type of variation.
Evidently, term variation can acquire a wide range of forms, as suggested in the examples above. The following list shows the classification proposed in Faber & León Araúz (2016: 12-13) 2 , which encompasses different proposals found in the literature and offers a thorough picture of term variation types, specifying whether semantics or communicative situations are affected: (A) Orthographic variants that are not influenced by geographic origin and do not alter semantics or the communicative situation, e.g. groundwater, ground water. (B) Diatopic variants: (i) Orthographic variants that do not modify semantics, e.g. fecal, faecal.
(ii) Dialectal variants, which can alter semantics if cultural factors are involved, e.g. gasoline, petrol. (iii) Culture-specific variants, which affect semantics and the communicative situation, e.g. dry lake, sabkha.
2. Some of the examples have been changed with variants found in this study.
(iv) Calques, which can modify semantics and the communicative situation, e.g. environmentally hazardous substance > sustancia ambientalmente peligrosa, sustancia peligrosa para el medio ambiente. (v) Borrowings, which can alter semantics and the communicative situation and can be adapted or not, e.g. smog > smog, esmog (C) Short form variants, which have an effect on the communicative situation: (i) Abbreviation, e.g. greenhouse gas, GHG. The nature and scope of variants are very diverse and can have different consequences in communication. Nevertheless, terms can activate more than one variant type, which might make term choice more difficult. For example, H2O and/or water can be domain-based variants since the first one is more frequently used in Chemistry and Water Treatment domains than in Oceanography, for example. However, their use also depends on the communicative situation (i.e. formal or informal). On the contrary, the same type of variant can be expressed by more than one term. Diaphasic variants, in particular, form a continuum from more formal to informal (e.g. thermal low pressure system, thermal low, thermal trough, and heat low) (Faber & León Araúz 2016).

Term variation and translation
Term variation has also been explored in interlinguistic contexts. The notion of 'equivalence' becomes thus crucial since terminologists and translators often follow different criteria. While terminologists usually understand equivalence at the term level, because their goal is the inclusion of terms in terminographic resources, translators look for correspondence at the sentence or text level. That is why they search for the functional equivalence of their translations (Reiss & Vermeer 1984;Nord 1997) instead of a direct term correspondence, which would be the case for terminologists. In this line, Gerzymisch-Arbogast (2008) highlights the fact that translation is a textbased rather than a corpus-based activity, and Kerremans & Temmerman (2016) state that translators are not always guided by the principle (often adopted for structuring multilingual terminology databases) that a term in the source language should be rendered as a direct (literal) equivalent in the target text. Therefore, terminological equivalence does not always correspond to translation equivalence (Kerremans & Temmerman 2016: 59). Equivalence is thus a broader concept for translators than for terminologists and allows for a wide spectrum of translation mechanisms. Consequently, equivalents at the sentence or text level can be those reproducing the same function or effect than the source text (Reiss & Vermeer 1984;Nord 1997), rather than just those conveying the same concept. For this reason, the use of hypernyms or other variants reflecting different conceptualizations (e.g. as a result of multidimensionality) can be justified in a translation equivalence context.
Along these lines, understanding and finding equivalence for phraseological units can be problematic for translators. In the case of MWTs, comprehension problems may be related to the identification of the term as a single unit of understanding, its internal dependencies, or the semantics of its formants. In turn, production-related problems include the order in which its formants should be translated, the prepositions that should accompany them, or the distinction among different variants should they be found in any resource. The well-known notions of translation problem, error, strategy, and technique become thus central. A phraseological unit poses a translation problem when it "puts up resistance to being translated" (Oltra Ripoll 2018: 102). This can lead to translation errors or the inadequate resolution of a problem (Hurtado 2001: 279). Therefore, the analysis of term variation in translation, as the one carried out in this study, must consider the strategies or mechanisms employed to solve translation problems, and, specifically, the techniques or procedures to find the solution to a particular translation problem (e.g. omission, addition, modulation, paraphrase, etc.). As for the identification of translation equivalences, parallel corpora have traditionally been used (Déjean & Gaussier 2002;Daille & Morin 2005). These are collections of original texts and their aligned translations, which facilitate equivalent identification. However, they are scarce, particularly in specialized languages and in language pairs in which English is not involved, and the influence of the source language is obvious. These corpora are useful for error analysis or the identification of the different translation options, but complicate the study of idiomatic language uses. In consequence, comparable corpora are increasingly present, which consist in collections of texts of the same type and domain, originally written in every language. These are more easily obtained and allow for the study of real language uses, although equivalent identification is evidently more complicated.
One of the studies devoted to term variation in translation is Fernández-Silva et al.' s (2009), who investigate, among other aspects, the role of the cultural system by means of term variants in French and Galician. Kerremans (2010Kerremans ( , 2016 also studies term variation in specialized translation, with a focus on the reflection of the English source language variants in the target languages (Dutch and French). Along these lines, Fernández-Silva & Kerremans (2011) study cognitive term variants, and affirm that source language variants in Galician are reflected in the English target texts. Miyata & Kageura (2016) argue that translated texts (from Japanese into English) show a higher density of term variants as a result of the different translations possibilities. This finding was also confirmed in a previous study by Sanz Vicente (2011), which focused on the translation of English MWTs into Spanish and observed the higher coexistence of term variants in the target language. Accordingly, Jiménez-Crespo & Tercedor-Sánchez (2017) explore term variation in translated (English > Spanish) and non-translated texts (Spanish), paying particular attention to register, determinologization, explicitation, and term variation in translated documents. Conciously or unconsciously, these studies often use MWT examples to illustrate term variation in translations.
As some of these research studies suggest, MWT variation can influence their translations, where a higher degree of variation is usually found. Every language has different degrees of variation depending on their own linguistic characteristics. For instance, English is prone to graphical variation due to the use of hyphens, among other factors, whereas Spanish is more prone to morphosyntactic variation due to its compounding rules. For this reason, ascertaining term variants in the source language is relevant with a view to facilitating equivalent identification. In this sense, Rogers (1997) underlines that translators are frequently obliged to make decisions not only regarding synonymy within the source text and target text, but also regarding the cross-linguistic relations between these synonyms.
The representation of the different translation possibilities in terminographic resources becomes thus central in descriptive settings (Kerremans 2010), contrary to what has traditionally been done. These resources often describe just a small portion of variants (if at all), which are represented in an unsystematic way. As for MWTs, those formed by more than two constituents are rarely included (Giacomini 2018). Users are often confronted with a lack of information on how term variation arises and which selection criteria to choose. Evidently, they need to know when to use each variant as well as its conceptual and communicative implications, since this will affect the receiver' s interpretation of the message. Otherwise, translators can actually over-standardize, thus creating consistency in places where the use of variants was deliberate and well-reasoned (Bowker & Hawkins 2006: 80). Translators and terminologists must thus learn to recognize the patterns that lie beneath variation so that they do not inappropriately standardize terminology (ibid: 101). However, while refusing the wüsterian principle of univocity, we believe that the proliferation of term variants in target texts is often puzzling and overwhelming, partly due to an unsystematic treatment of MWTs, since translators do not always consider them as a single unit of understanding. Consequently, besides describing different types of variants, which is undoubtedly important, the added value of a linguistic resource lies in the enhancement of those data with additional information, such as semantic, pragmatic, and usage aspects (Faber & León Araúz 2016;Giacomini 2018), which improves a sound use of variation in texts.

Materials and methods
According to Kerremans & Temmerman (2016: 45), corpora allow us to study the textual and linguistic features of translations, taking into account different contextual parameters that have an impact on translation choices and, ultimately, on the translation product. For this reason, different corpora were used for MWT extraction and term variant identification.
Since one of our goals was to explore MWT variation patterns in translation situations, the OPUS2 English corpus (Tiedemann 2012) was first used to extract a set of MWTs, which would then be compared with their equivalents in the OPUS2 Spanish parallel corpus (Tiedemann 2012). The OPUS2 English corpus is an open source parallel corpus that can be accessed at Sketch Engine (https://www.sketchengine.eu/) (Kilgarriff et al. 2014) and encompasses 40 languages. It is organized in subcorpora, such as the European Central Bank (ECB), the European Parliament Proceedings (EUROPARL), and the Translated UN documents (MultiUN). Aware of the scarcity of specialized parallel corpora, we selected this parallel corpus, which includes both general and specialized corpora, and decided to focus on specialized terms of general interest. This is the case of Pollution, a specialized concept that is present in everyday communication due to the increasing climate awareness.
Therefore, starting from the term pollution, a conceptual analysis was carried out in the corpus, which allowed us to identify pollution-related concepts, such as emission, ozone, gas, pollutant, substance, contamination, smog, air, etc. These terms were then used as MWT heads in CQL (Corpus Query Language) queries in Sketch Engine, which allow to search for specific morphosyntactic patterns, such as MWTs premodified by different elements (Table 1)  The CQL expression in Table 1 elicits premodified MWTs (the most frequent structure of these terms), such as heavy metal pollution or indoor air pollution. It searches for the lemma pollution ([lemma="pollution"]) (or emission, ozone, gas, pollutant, etc., in the following queries) preceded by nouns, adjectives, adverbs, past participles, or present participles ([tag="N.*|JJ.*|RB.*|V-VN.*|VVG.*"]) appearing one or more times ({1,}). On the right of the head pollution, a restriction is included in order to exclude nouns or adjectives ([tag!="N.*|JJ.*"]), which could indicate that pollution is not the MWT head but takes part in a longer term.
This query was repeated for the other possible heads (emission, ozone, gas, etc.), and the 234 most frequent MWTs were selected. Several of these MWTs were term variants, and were thus grouped by concept. Finally, we obtained a set of 98 pollution-related concepts.
To identify the Spanish variants of the English MWTs, different resources were used. Firstly, with a view to investigating MWT variants in translation contexts, we used three parallel corpora, which allowed an easy identification of variants thanks to their alignment. Although, ideally, parallel corpora of specialized texts on the environment in English and Spanish would be the best option, due to their scarcity we used parallel corpora encompassing both general and specialized discourse in a wide variety of domains.
These parallel corpora were the OPUS2 English-Spanish corpus (Tiedemann 2012), which was presented above in its monolingual version in English; the EurLex English-Spanish corpus (Vaisa et al. 2016), a multilingual corpus in all the official languages of the European Union that includes texts in the EUR-Lex database and is available in Sketch Engine; and Linguee (https://www.linguee.es/), an online corpus of aligned translations in different languages, such as English and Spanish, which includes general as well as specialized texts. Even though Linguee does not allow specific CQL queries and shows just a summary of the possible translations, it complemented the alignment mismatches that were often found in OPUS2 and EurLex. As a consequence, it allowed us to collect more different translations.
Secondly, in order to compare term variants found in translation contexts with those present in bilingual or multilingual lexicographic scenarios, Spanish equivalents of English MWTs were also looked up in two terminological databases: TERMIUM Plus and IATE. TERMIUM Plus (https:// www.btb.termiumplus.gc.ca/) is a terminological database developed by the Government of Canada, which describes millions of concepts from specialized domains in English, French, and Spanish. Additionally, IATE (https:// iate.europa.eu/) is the EU' s terminology database and includes terms from a wide range of specialized domains in the official languages of the European Union. The entries consulted in these resources also allowed us to expand the collection of English source terms, since many of their entries contain synonyms. Therefore, a new set of terms was researched in the parallel corpora in order to expand the collection of Spanish translation variants. The final set of terms (a total of 277) ranged from two-word terms (e.g. oil pollution) to sixword terms (e.g. aggregate anthropogenic carbon dioxide equivalent emissions).
Finally, with a view to analyzing term variants in a context of original production in Spanish, we used a Spanish comparable corpus of specialized texts on the environment. The corpus was compiled by the LexiCon research group of the University of Granada while building EcoLexicon (https://ecolexicon. ugr.es, León Araúz, Reimerink & Faber 2019), a terminological knowledge base on the environment, and consists of approximately 10 million words. Since the size of the corpus cannot compete with the size of parallel ones, this was compensated by the use of Google Scholar (https://scholar.google. com/) as a second comparable corpora.
Evidently, Google Scholar is not a comparable corpus strictly speaking. Moreover, it does not allow for flexible searches, such as lemmatized or CQL queries. Searches were time-consuming, since many variants needed to be looked up several times taking into account the infectional rules of Spanish (i.e. number and gender). However, this was useful to obtain more results and measure the frequency of all variants found in the previous resources. We decided to only retain those terms that occurred a minimum of 10 times. At least in terms of frequency, this corpus should be more reliable due to its size. We agree with Bowker & Hawkins (2006), who state that the Web represents a huge and easily accessible body of linguistic data. Grefenstette (2002: 207) supports this claim by arguing that the size of the Web compensates for its "dirtiness": "the correct form is always orders of magnitude more frequent than the erroneous form (…). The Web is dirty but the signal (correct forms and correct usage) is so strong noise can easily be ignored". In this case, restricting the queries to Google Scholar ensures specialized language since Google Scholar is limited to research works.
It should be noted that the sequence of resources presented (i.e. parallel corpora, terminographic resources, and comparable corpora) was not random. We started with resources that provide direct access to interlinguistic variants (i.e. parallel corpora and terminographic databases), and then the last step was querying the comparable corpora, which required specific strategies to findequivaleces the search process of which was more complex.
Our equivalence identification strategy in the comparable corpora involved the following queries. The terms found in parallel corpora (e.g. contaminante atmosférico, contaminante aéreo [atmospheric pollutant]) were literally searched for to confirm their presence in the comparable corpora. It should be noted that some of the variants obtained from parallel corpora were not queried in the comparable corpora, since they could bias the results. This was the case of hypernyms used as term variants. For instance, although contaminación acústica [acoustic pollution] and ruido [noise] are used as variants in the parallel corpora, searching for ruido [noise] in the comparable corpora would elicit additional meanings that would spoil the results. The same was true of polysemic acronyms in Google Scholar (e.g. GEI [GHG]) and ad hoc variants that could not convey exactly the same meaning (e.g. daños medioambientales [environmental damages] as a term variant of contaminación medioambiental [environmental pollution]).
Additionally, the MWT heads and modifiers found in the parallel corpora were used with a span in between, as in the CQL query shown in Table 2. The span was set in 5 elements in order to allow for different possibilities without being too broad, since concepts in an MWT don't usually have a wider distance, as found in previous studies. However, in larger MWTs, such as those including participles or relative sentences, a higher span was used. Since CQL queries are not possible in Google Scholar, the * wildcard was employed to indicate the span. This allowed us to obtain new possibilities of extended MWTs, such as contaminantes liberados a la atmósfera or contaminantes vertidos a la atmósfera [pollutants released into the air].  Other queries in the specialized corpus of EcoLexicon included searching for the head and sorting the results by right so as to distinguish the different modifiers accompanying the head easily. The same strategy was applied to modifiers, sorting by left to discover the different possible heads. This type of queries is more time consuming, that is why they were only used in the EcoLexicon corpus, given the corpus size and the restriction possibilities, but not in Google Scholar. Figure 1

Characterizing translation variation
The analyzed concepts show a high degree of variation in Spanish, ranging from 2 term variants (e.g. sectorial emission) to more than 46 (e.g. aircraft noise). Few of them appear to be highly lexicalized, but among those who are, what seems to be a trend is the fact that having an acronym makes them more stable, such as compuesto orgánico volátil (COV) [volatile organic compound, VOC]. On the contrary, codifying a causal relation was found to make MWTs more prone to variation, since they usually present multiple periphrastic structures making the semantics of the concept explicit. One of these examples is anthropogenic emissions, which can be rendered as emisiones antropogénicas, but also as emisiones procedentes de fuentes humanas, emisiones generadas por el hombre, emisiones causadas por el hombre, emisiones producidas por los humanos or emisiones provocadas por el hombre, among other variants. It is also worth noting that the more specific the concepts are, the more stable their designations were found to be, even if their hypernyms show many more variants. For instance, it is striking to see that while contaminación atmosférica transfronteriza [transboundary air pollution] can be rephrased as contaminación transfronteriza del aire, its hyponym contaminación atmosférica transfronteriza a larga distancia [long-range transboundary air pollution] does not follow the same pattern to be rephrased as contaminación transfronteriza del aire a larga distancia or contaminación transfronteriza a larga distancia del aire (not found in any of the resources). Thus, depending on the bracketing structure (i.e. internal dependencies in the MWT), the distance between the different elements of the MWT seems to present a limited span.

Inaccuracies (some of them can be considered translation errors [a-d] and some others as cognitive or intentional variants [e]
). a. Inaccuracies related to the semantics of one of the formants -e.g. contaminación atmosférica transfronteriza a larga distancia, contaminación atmosférica transfronteriza prolongada -e.g. contaminación por nitratos, contaminación por nitrógeno -e.g. contaminación por fuente no localizada, contaminación que no viene de fuente b. Inaccuracies related to the semantic relation between the formants i. Confusion origin-patient -e.g. contaminación del terreno, contaminación de origen terrestre ii. Confusion patient-agent -e.g. contaminación en materias orgánicas, contaminación por materia orgánica c. Inaccuracies related to bracketing (i.e. internal dependence analysis) -e.g. contaminación del aire urbano, contaminación urbana del aire -e.g. gases de efecto invernadero antropogénico, gases antropogénicos con efecto invernadero -e.g. gases de efecto invernadero causados por la actividad humana, gases de efecto invernadero causado por la actividad humana -e.g. calidad exterior del aire, calidad del aire exterior d. Inaccuracies due to style and redundancy -e.g. contaminación del aire al aire libre -e.g. emisión fugitiva fugaz -e.g. marea negra por petróleo -e.g. contaminación por contaminantes orgánicos e. Inaccuracies due to ad hoc translations (modulation) -e.g. desastre ecológico marino, contaminación marina -e.g. paisajes degradados, contaminación visual -e.g. exceso de ruido, contaminación acústica -e.g. envenenamiento industrial, contaminación industrial As can be inferred from the classification above there are several structural shifts that also convey a difference in meaning (i.e. inaccuracies, modulations and certain omissions). Cognitive variants occur on the changes mostly affecting nouns, whether in the modifier or in the head, but especially in the latter. As for term opacity, different structures convey more transparent meanings thanks to explicitation. For instance, the preposition por is more specific than de for making causal relations explicit (e.g. contaminación por petróleo, contaminación de petróleo), since de is naturally more ambiguous in Spanish. The most frequent types of variation found in our study were: (1) the omission of articles; (2) the changes in modifiers (reflecting structural or semantic modifications); and (3) the introduction of periphrastic structures, often through participles such as causado, producido, provocado, inducido, originado, ocasionado, etc. (e.g. cambio climático producido/provocado por el hombre), and relative clauses followed by verbs such as provocar, causar, contribuir, originar, and producir (e.g. emisión de gases que provocan el efecto invernadero). Inaccuracies, while not that frequent, are worth mentioning, since they are especially due to the incorrect interpretation of the semantics and internal dependencies of MWTs.
Quite often, several of these types coincide within the same set of variants conveying the same concept, as in contaminación transfronteriza a gran distancia (where only structural changes and synonyms apply) (Table 4) or smog fotoquímico (where cognitive variants stand out) ( Table 5): contaminación atmosférica transfronteriza a larga distancia (adjective changes) contaminación atmosférica transfronteriza de larga distancia (preposition changes) contaminación atmosférica transfronteriza de largo alcance (noun changes) contaminación atmosférica transfronteriza a largas distancias (number changes) contaminación atmosférica internacional de largo alcance (adjective changes) contaminación atmosférica transfronteriza a gran distancia (adjective changes)  Whereas, in Table 4, the variants for contaminación transfronteriza a gran distancia usually imply structural changes and synonyms (e.g. gran > larga; distancia > alcance; a larga distancia > de larga distancia; a larga distancia > a largas distancias, etc.), term variants in Table 5 are quite different. In this case, cognitive variants are noticeable, which result from the different conceptualizations and modulation of the same concept. For instance, different dimensions are highlighted in the modifiers, whether they are adjectives or nouns. Some of them point to the time when this type of pollution usually occurs (bruma de verano, contaminación de verano, contaminación estival, esmog de verano [summer smog]). Others show the city where it was first described (smog de Los Ángeles, smog tipo Los Ángeles [Los Angeles smog]). Variants can also introduce the agent producing this pollution (contaminación por ozono, esmog de ozono, nubes de ozono, ozono en el aire ambiente), or even the process that causes it, the chemical reaction of ozone and light (bruma fotoquímica, contaminación fotoquímica, esmog fotoquímico, neblumo fotoquímico). Heads also show cognitive variation, which can also entail changes in the general conceptual category, as in esmog de ozono and ozono fotoquímico or less complex and less accurate conceptualizations, such as bruma de verano, niebla sucia, and polución de verano. Additional structural aspects are also observed in Table 5, such as the use of the adapted borrowing esmog and its non-adapted variant smog. sMoG fotoquíMiCo is thus a clear example of the richness of term variation.
Therefore, the representation of term variation in terminological resources should be adapted to the different types and consequences of variants. A possible way of covering structural variants (e.g. morphological or morphosyntactic variants) would be by grouping and ranking them by frequency. However, frequency alone cannot be used as the sole criterion of classification, since other motivations, such as stylistic, cognitive or functional aspects, can be involved in term selection and should also be somehow represented. Consequently, for cognitive variants, frequency as well as the semantic content emphasized by the term should be made explicit. In turn, morphological and morphosyntactic variants should highlight their differences contrastively (as in Table 4). Besides, when necessary, specific markers on term formation devices should also be included (e.g. calque, borrowing, acronym, etc.). However, these issues are not the scope of this study and will be investigated in future research.

Comparing translation variation across different resources
In line with the findings reported by Sanz Vicente (2011) and Miyata & Kageura (2016), parallel corpora show more translation variants than comparable corpora. Based on our results, 68 out of the 98 concepts were designated with more different terms in parallel corpora, whereas 30 showed more term variants in comparable corpora. In those cases where the comparable corpora provided more variants, there was a slight difference in the number of possible term choices. In contrast, when the parallel corpora showed more variants, the difference amounted to up to 24 additional translations for the same concept. The proliferation of variants in parallel corpora could confirm the hypothesis that translators do not always consider MWTs as a whole unit of understanding but rather as different chunks of independent strings.
In parallel corpora, the use of Spanish variants often depends on the choice made in the source language. For example, anthropogenic climate change is usually translated as cambio climático antropógeno or cambio climático antropogénico, whereas when the same concept is referred to as human-induced climate change, most frequent Spanish variants include cambio climático inducido por el hombre or cambio climático provocado por las actividades humanas. A similar observation can be made regarding synthetic or longer variants in source terms. Emissions of greenhouse gases gives rise to more periphrastic choices in Spanish than greenhouse gas emissions (e.g. producción de gases que contribuyen a la creación del efecto invernadero). However, not only does it occur with regards to synthetic variants, but also in relation to register and transparency. When the source text contains atmospheric pollution, it is usually translated as contaminación atmosférica, whereas when air pollution is chosen, both contaminación atmosférica and contaminación del aire emerge in Spanish texts. Contaminación de la atmósfera, a much less frequent variant in Spanish, only arises when atmospheric pollution does in English texts, which would be an example of how parallel corpora do not always show most idiomatic forms of specialized terms. On the contrary, there are also cases where the use of English variants does not give rise to clear-cut cross-linguistic equivalent variants, as in ozono troposférico, which is not only used as the literal equivalent of tropospheric ozone but also as that of ground-level ozone in approximately half the cases where ground-level ozone is used in English texts.
Looking at the quantitative results 4 in more detail, it is worth noting that among the seven resources (Figure 2), OPUS and Google Scholar were those where more translation variants were found (597 and 758 respectively), followed by Linguee (480) (even though not all results are displayed), EurLex (444) and EcoLexicon (450), and finally Termium Plus (105) and IATE (118), which is not surprising since terminological resources usually reflect up to two or three variants. Evidently, this does not mean that OPUS and Google Scholar are the best resources to find more reliable results, since reliability would rather depend on the amount of resources containing the same term choices. In this sense, not all translation variants were found in all seven resources; only 25 Spanish terms (pertaining to 25 concepts) were confirmed by all. A staggering 73% of the terms were found only in one of the resources (434 terms) or two of the resources (598 terms) (Figure 3). This means that only 25 concepts have a clearly preferred variant and that most translation variants are not highly lexicalized. The results of analyzing the amount of terms that were found only in one of the resources confirm that parallel corpora contain a higher amount of infrequent or ad hoc translation variants (Figure 4). What is striking is the fact that, although few in number, terminological resources also present very rare translation variants, such as bióxido de carbono equivalente (Termium Plus) or contaminante corpuscular (IATE). Furthermore, frequency is not always taken into account in these term bases. For example, Termium Plus only includes the variant cambio climático mundial as an equivalent of global climate change, although cambio climático global exhibits a much higher frequency since the last forty decades, as shown in the Google Ngram Viewer (Figure 5).

Conclusions
This study delves into English term variants of MWTs from the environmental domain and their translations into Spanish. The focus was on translation variation and its occurrence in different linguistic resources (i.e. parallel corpora, terminographic resources, and comparable corpora). A typology specifically conceived to characterize translation correspondences of MWTs has been proposed based on omissions, structural shifts, transpositions, expansions and inaccuracies. Additionally, parallel corpora have been found to include more translation variants than other linguistic resources. This proliferation could confirm the hypothesis that translators do not always consider MWTs as a whole unit of understanding but rather as different chunks of independent strings. However, the greater number of translation variants in parallel corpora can also respond to the general texts included in these corpora, even if it was expected that our set of MWTs did not appear very frequently in general texts, given their specialization. Since, to the best of our knowledge, parallel corpora on environmental science are scarce or even non-existent, an investigation of term variation in this type of corpus should be carried out in the future. Our results highlight the need for an accurate term variation representation in terminographic resources, such as terminological knowledge bases (TKBs). Since they are conceived for different user types and specialized knowledge is not only conveyed in expert-to-expert scenarios, we believe that term variants should be extensively covered in TKBs. However, a balance between thoroughness and over-information must be sought for, because entries presenting 46 different variants (as was found in some of the concepts studied) could be counter-productive. Variation information should thus cover different types of variants and usage-related information, such as frequency and semantic or pragmatic indications. As a future line of research, we plan to implement the findings of this study in the TKB EcoLexicon, where new modules will be offered to compare the use, frequency, and meaning of term variants. Pilar león araúz es Profesora Titular en la Facultad de Traducción e Interpretación de la Universidad de Granada, donde imparte asignaturas relacionadas con la Terminología y las herramientas de traducción asistida por ordenador. En 2009 obtuvo el Doctorado en Traducción e Interpretación por la Universidad de Granada. Es Licenciada en Lenguas Extranjeras Aplicadas por la Universidad de Provence (Francia) y en Lenguas Modernas Aplicadas por la Universidad de Northumbria (Reino Unido). Sus principales áreas de investigación son la terminología, la representación y extracción del conocimiento, y la lingüística de corpus, sobre las cuales ha publicado artículos en revistas internacionales como Terminology, International Journal of Lexicography y Frontiers. Además, forma parte del comité científico de revistas y congresos internacionales.
Melania Cabezas GarCía holds a degree in Translation and Interpreting from the University Pablo Olavide and a Master' s degree in Professional Translation from the University of Granada, where she teaches courses in the Bachelor' s Degree in Translation and Interpreting. She received her PhD in Translation and Interpreting from the University of Granada in 2019. She is a member of the LexiCon research group and was awarded a research fellowship by the Spanish Ministry of Education to write her PhD dissertation on the formation, translation, and representation of English and Spanish multiword terms. Her research interests are Terminology, Corpus Linguistics, and Specialized Translation. She has published papers in international journals on Linguistics, Terminology, and Specialized Translation, and serves on the scientific boards of international journals and conferences, such as LREC and the International Journal of Lexicography.