Monday, December 25, 2006

GIZA++ Guide

A newer and easier guide for Ubuntu/Debian users is available at http://blog.vee-u.com/2008/03/02/giza_pp/
  • Firstly, we have to prepare 2 text files, which each lines are identical. For example,

    tha.txt:
    ฉัน กิน ข้าว
    ฉัน ไป โรงเรียน

    eng.txt:
    I eat rice.
    I go to school.
  • Secondly, generating vocabulary files and correspondences file, using plain2snt.out. For example plain2snt eng.txt tha.txt. It must generate eng_tha.snt, eng.vcb and tha.vcb.
  • Writing configuration file. For example,

    config:
    outputfileprefix play_giza
    sourcevocabularyfile eng.vcb
    targetvocabularyfile tha.vcb
    c eng_tha.snt
  • Finally, running GIZA++ using this command. "GIZA++ config". Then the final result must be in the file play_giza.A3.final. (be careful if you use Mac OS X)

No comments:

Creative Commons License
This workis licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.