Access

You are not currently logged in.

Access your personal account or get JSTOR access through your library or other institution:

login

Log in to your personal account or through your institution.

Orthographic and morphological processing for English-Arabic statistical machine translation

Ahmed El Kholy and Nizar Habash
Machine Translation
Vol. 26, No. 1/2, Machine Translation for Arabic (March 2012), pp. 25-45
Published by: Springer
Stable URL: http://www.jstor.org/stable/41410958
Page Count: 21
  • Download ($43.95)
  • Cite this Item
Orthographic and morphological processing for English-Arabic statistical machine translation
Preview not available

Abstract

Much of the work on statistical machine translation (SMT) from morphologically rich languages has shown that morphological tokenization and orthographic normalization help improve SMT quality because of the sparsity reduction they contribute. In this article, we study the effect of these processes on SMT when translating into a morphologically rich language, namely Arabic. We explore a space of tokenization schemes and normalization options. We also examine a set of six detokenization techniques and evaluate on detokenized and orthographically correct (enriched) output. Our results show that the best performing tokenization scheme is that of the Penn Arabic Treebank. Additionally, training on orthographically normalized (reduced) text then jointly enriching and detokenizing the output outperforms training on enriched text.

Page Thumbnails

  • Thumbnail: Page 
[25]
    [25]
  • Thumbnail: Page 
26
    26
  • Thumbnail: Page 
27
    27
  • Thumbnail: Page 
28
    28
  • Thumbnail: Page 
29
    29
  • Thumbnail: Page 
30
    30
  • Thumbnail: Page 
31
    31
  • Thumbnail: Page 
32
    32
  • Thumbnail: Page 
33
    33
  • Thumbnail: Page 
34
    34
  • Thumbnail: Page 
35
    35
  • Thumbnail: Page 
36
    36
  • Thumbnail: Page 
37
    37
  • Thumbnail: Page 
38
    38
  • Thumbnail: Page 
39
    39
  • Thumbnail: Page 
40
    40
  • Thumbnail: Page 
41
    41
  • Thumbnail: Page 
42
    42
  • Thumbnail: Page 
43
    43
  • Thumbnail: Page 
44
    44
  • Thumbnail: Page 
45
    45