Linguistic Obstacles in Machine Translation : English and Kurdish Language

Machine Translation (MT) is an application of Natural Language Processing (NLP) which is an application of Artificial Intelligence (AI). NLP combines both Computer Science and Linguistics fields. One of the reasons for MT that should be taken into consideration is challenges for traditional translations including cost, time wasting...etc. Although Machine Translation nowadays has important roles in translation but still it cannot be considered as an alternative for traditional translation completely. Unfortunately, as Kurdish nation, we lack having a machine translation system capable to translate Kurdish sentences perfectly. Thus, this paper tackles the major problems or obstacles for Kurdish machine translation in detail. It can be regarded as a basic or fundamental point for anyone who would like to work in this area.


Introduction
In dealing with the computational Linguistics, the first thing that people should do is to try all their bests in understanding the nature of the language and the rules, which human language operates.This is to find the mechanism of this operation and then to simulate it through the automatic means.While translating, human beings usually try to clarify the source language on three levels: Semantic level: understanding words out of context, as in a dictionary.Syntactic level: understanding words in a sentence.Pragmatic level: understanding words in situations and context.In order to have a high quality output, the same idea should be processed while doing automatic translation (machine translation).

Obstacles and Issues Word Equivalency between the two languages
In some cases, a word in Kurdish language may not have an equivalent word meaning in another language such as English or Arabic...etc.Instead it may be expressed by a group of words in English or vice versa.Kurdish-English example: Kurdish English ‫کونە‬ Water storage that made of animal skin, specially lamb and sheep ‫مەشکە‬ Sour milk storage that made of animal skin, specially lamb and sheep Salam (no year: 96) identifies that Kurdish language has more terms regarding family relations than English and Arabic.For example, in English language the word (cousin) is a generalized word while in Kurdish language it is specified also gender considered.
As in: ‫ئامۆزا‬ (children of uncle from father)

Phrase Translation
One of the problems, in traditional or direct oral translation, is that "not all phrases in one language have equivalent phrases in another language" in such cases, dictionary translation would not provide a proper or meaningful translation.For instance; in the case of verb phrase (VP) as Crystal (2003:352) defines verb phrase as" a type of verb consisting of a sequence of a lexical element plus one or more particles", Kurdish VP contains only one main verb while in English, it is two parts which are auxiliary/ modal and main verb.In addition, Kurdish linguists agreed on defining a sentence as "a sequence of words that provide a full perfect meaning that has a symbol at the end" (Wirya, 2011 A: 34).While Nariman (2012: 13-14) rejects that definition to be generalized in all sentences of Kurdish language since there should be a sentence consisting of a phrase alone as in: . ‫مرد‬ = (s/he passed away) S→VP→v.
He also says that the rule of "S= NP + VP" cannot be applied in all Kurdish sentences, while this tackles only those with non-transitive verbs or those with "weak" verb.As in:  (2005: 98) explains another problematic issue for machine translation.She states that Noun phrase in Kurdish language can be modified or post modified by ordinal number and still with the same meaning, while in English language it is not acceptable to be post modified by ordinal it is rather to be cardinal number.
While in English the noun will be pluralized as in: He has (one sister( and )three brothers ( (Quirk and Greenbaum, 1973: 65).

Different Structures
English has SVO structure while Kurdish sentence structure is differ.Wirya (2011 A: 35-36) and Omed (2011: 135-7) states that Kurdish has SOV structure.Yet, in some cases OSV, OVS, VSO, and SVO can also be found in the surface structure that are derived from the same deep structure by the application of different transformational rules.As shown in the following sentences: In the verb phrase where the subject and object are represented by personal suffixes the orders of VSO, SVO and OVS are found.acadj@garmian.edu.krdVol.5, No.4 (August, 2018) .
(OVS) =I send them.Kurdish is an agglutinative language while English is not since its patterns are separate (Wirya, 2011B: 123) Moreover, Nariman (2012: 14) says that the definition (Kurdish sentence contains subject, direct object and a verb) is not applicable for all Kurdish sentences.It only tackles those that their verbs are transitive, as in: ‫خوێند.‬ ‫وانەکەی‬ ‫کارزان‬ (SOV) = Karzan studied the subject.Rasul (2005: 192) highlights that in Kurdish language there is transformational rule that makes the syntactic items change their location.Although Kurdish is an SOV language, but because of those movements, sometimes the verb can come in the beginning as in: ‫چوو‬ -‫بازار.‬‫بۆ‬ ‫م‬ (VSO) These changes break the grammatical rules, and usually the poets play with the items for the sake of their poem.Omer (2015:56) in his research explains that there are mismatches in languages, which is very hard for machine translation to perform its work.As in the case of idioms, not only the translation should be using different words while their cultural symbols and meanings also do not match.For instance: ‫دەبارێت.‬‫داس‬ ‫و‬ ‫تەور‬ (It is raining axe and billhook.) in English the expression is different which means(It is raining cats and dogs) ( ‫پشیله‬ ‫سه‬ ‫و‬ ‫كان‬ ‫گه‬ ‫ده‬ ‫كان‬ ‫بارین‬ So, in translation, the above structures of the Kurdish language should be well defined.As for passive form the elements of Kurdish language do not match with English for translation.Sajida (2013: 38) clarifies the process of passive form in Kurdish language as (SOV=1, 2, 3) in the active form, when it changes to passive it will be (OV=2, 3).This process changes in English sentence, as Leech et al. (2001: 363) simplify passive form in English language as (SVO=1, 2, 3) is the active form, transferring it to passive the result will be (OV=3, 2).The following examples provide more detail about changing sentences from active to passive form: (2) (1) ‫فرۆشرا.‬‫خانووەکە‬ The cat chased the mouse. (1) (2) (3) acadj@garmian.edu.krdVol.5, No.4 (August, 2018) The mouse was chased.

Parts Of Speech Equivalent
English POS may not have correspondence Kurdish POS.For Omer (2015: 57-58), the issue in Kurdish machine translation is that parts of speech in Kurdish sentence does not match with another language.As shown in the below There is a verb in Kurdish language which is problematic for machine translation, since there is no auxiliary and modal verbs in Kurdish language except in a case of present which is ‫)ە(‬ and past ‫.)بوو(‬This ‫,)ە(‬ in order to function as a verb, it should be added to an adjective.The problem is that there are some cases in the language that the ‫)ە(‬ loses its function as a verb and participates in changing the part of speech with different grammatical functions.For zare (2005: 98), Sometimes in Kurdish language the agreement between the noun (subject) and the verb according to the form (not function) distracts or breaks as in; (plu.)‫هەن.‬‫هەژار‬ ‫منداڵی‬ (Singular) (plu.)There are a lot of poor children (plu.)Aurahman (1979: 13-14) believes that as a result of developing Kurdish language, parts of speech change, the elements of a sentence may change its function and take a new one.For example in Kurdish language ‫هێمن‬ is (adj.)in: ‫هێمنە.‬‫منداڵێکی‬ ‫ئازاد‬ -(Azad is a calm child.)‫هێمن‬ is (sbj.)in: ‫زیرەکە.‬‫كوڕێکی‬ ‫هێمن‬ -(Hemn is a clever boy.) ‫هێمن‬ is (adv.)in: ‫دەرؤیشت.‬‫هێمن‬ ‫دارا‬ (Dara walked calmly.)In this case ‫هێمن‬ can function as a (sbj., adj., and adv.)Concerning pronouns, in Kurdish comparing to English, they rise another problem for MT, since Kurdish pronouns are bound (attaching the verb) and free morphemes while English pronouns are only free.As shown in table below by Omed (2011: 33)

Differences in Tenses
Tenses that exist in English language may not exist in Kurdish language.For example, in Kurdish language there is no form for differentiating present simple, present continuous, and future simple.As in : ‫دێ.‬ ‫ئەو‬ = S/he comes.S/he is coming.S/he will come.(Salam, No year: 97) Semantic Issues Semantics is about studying and investigating the direct meaning in language (Huford et al, 2007:1).In MT the semantic problem come to pass if there would be lack in encoding items.Talib (2014: 44-47) also states that in modern linguistics, the problem between grammar and denotation words are differentiated according to ungrammatical sentences and non-semantic words.For example, it is easy to figure out the ungrammatical sentence as in: ‫نووست.‬ ‫ژوورەکەدا‬ ‫لە‬ ‫منداڵەکان‬ * (This sentence is ungrammatical since the verb ‫نووست‬ is used with single pronoun and the subject is plural) Also in ‫زیرەکە.‬‫سێوەکە‬ ‫*دار‬ * The apple tree is clever.‫خوارد.‬‫کتێبەکەی‬ ‫*سیروان‬ *Sirwan ate the book.The above sentences are grammatically correct while semantically not.Since there is no agreement between the elements of the sentence, Hence, for this case the (syntagmatic and paradigmatic) level of sentences should be considered in translation.The words that go with each other should be highlighted, if not the syntactic rule refuses them and regards them as ungrammatical or incorrect.However; in some cases, those sentences can be meaningful and expressed correctly if they are in the negative or interrogative forms; English Kurdish Who says Sirwan ate the book?

Pragmatic Issues
Another challenge for MT is that not only vocabularies and rules are sufficient for a good translation, but also past experiences have an important role.Pragmatically it is considered that the meaning of any word depends on the context and shared knowledge.As stated by Yule (2006:248) pragmatics is "the study of speaker meaning and how more is communicated than is said".Aurahman (2002: 135-136) remarks that arranging and differentiation among homonym, polysemy, and synonym in making dictionaries are really hard job and need an accurate intention.It is also worth mentioning that many Kurdish dictionaries still do not tackle this area.To him, a good way for making Kurdish dictionary is that the words should be explained briefly as how to use them.So in this case, many of the words in the dictionary should be put in simple sentences in order to clarify the word more.Likewise, Avesta (2009: 92) shows that Implicature, in any language, is a challenge pragmatically, and Kurdish language is not out of the situation.
A. ‫دا؟‬ ‫پسوولەکەت‬ ‫پارەی‬ (Did you pay the bill?)B. ‫ئەڵ‬ ‫ئەبێت.‬‫خۆش‬ ‫ڕۆژێکی‬ ‫ئەمرۆ‬ ‫ێن‬ (They say, today will be a nice day.)Here, it shows that (A) intends to fight, (B) purposely says (today will be a nice day) and the meaning is controversial.Talib (2014: 61-62) indicates another challenge in Kurdish language for MT, since the meaning and intention of a sentence vary by changing the tone i.e putting pauses for sentence is to show; criticize, humor, disagreement, anger, surprise…etc.For example; ‫زیرەکە.‬ ‫کوڕێکی‬ (He is a clever boy.) acadj@garmian.edu.krdVol.5, No.4 (August, 2018) In the above sentence the speaker can use a (low-high) sound to show humiliation and criticize, and the actual meaning will be (he is not a clever boy).To extend that, Metaphor also in every language is also a problematic case for translation, every society has its own metaphor to compare and express ideas in a sentence, and Kurdish linguists argue that in Kurdish language there are sentences that have only one intention and meaning while there are some that stands for more than one meaning.Kamil (1981: 85-86) supports that idea by providing an example: ‫زەرەرتانە.‬ ‫خەو‬ ‫کورد‬ ‫میللەتی‬ ‫هەستن‬ ‫خەو‬ ‫لە‬ Wake up Kurdish people, sleeping is unbeneficial for you.
Here, the metaphor is ‫خەو‬ ‫لە‬ ‫هەستن‬ = wake up, which means don not be silent, be aware, revolute…etc.In addition, there is another obstacle for MT that is a tough task to the machine to provide a suitable meaning in the right position.This problem is called "ambiguity".One Kurdish word may have many different English meanings or vice versa.There is ambiguity in a language according the word and sentence level.Talib (2014: 51, 65-71) explains the ambiguity in the syntactic level as: ‫هەڵهات.‬‫گەالوێژ‬ ‫ئەگەرێم.‬‫شیرەکە‬ ‫لە‬ To understand the ambiguity in syntactic level, the purpose of using words in those sentences should be considered.To show the exact meaning some other phrases should be added to the sentence since for Kurdish language in the first sentence ‫)گەالوێژ(‬ means (A proper noun/ A Kurdish month), ‫)هەڵهات(‬ means (escape/ appear) and the second sentence in English means (I am searching for the milk/sward.)so the word ‫)شیر(‬ stands for ( milk/ sward).Similarly, Nariman (2012: 22) provides a Kurdish sentence that has double meaning: ( ‫پێ‬ = leg/ ability) ‫دەشکێ‬ ‫پێم‬ .
I have ability to break it.My leg is breaking.My leg will break.My leg breaks.
Also putting stress on different elements of same sentence may cause ambiguity, for example: Kurdish English ‫هاتن.‬‫چواریان‬ ‫'هەر‬ The all four came. .‫هاتن‬ ‫'چواریان‬ ‫هەر‬ Only four of them came.
Also sometimes the same sentence may contradict in meaning when it is translated.As in:

table :
Izadin (2005: 94)shows that in English language، the inflectional suffix of plurality (~S) when adds to a noun will not change the part of speech while for Kurdish(‫)~ان‬has many functions and considers as derivational suffix, since it changes the part of speech, meaning and intention of the word in the sentence.