- Firstly, we have to prepare 2 text files, which each lines are identical. For example,
tha.txt:
ฉัน กิน ข้าว
ฉัน ไป โรงเรียน
eng.txt:
I eat rice.
I go to school.
- Secondly, generating vocabulary files and correspondences file, using plain2snt.out. For example plain2snt eng.txt tha.txt. It must generate eng_tha.snt, eng.vcb and tha.vcb.
- Writing configuration file. For example,
config:
outputfileprefix play_giza
sourcevocabularyfile eng.vcb
targetvocabularyfile tha.vcb
c eng_tha.snt
- Finally, running GIZA++ using this command. "GIZA++ config". Then the final result must be in the file play_giza.A3.final. (be careful if you use Mac OS X)
Monday, December 25, 2006
GIZA++ Guide
A newer and easier guide for Ubuntu/Debian users is available at http://blog.vee-u.com/2008/03/02/giza_pp/
Labels:
Alignment,
GIZA++,
Guide,
howto,
manual,
Simplify,
tutorial,
word aligner,
word alignment
GIZA++: XML output
An alignment output from GIZA++ is in special format. It looks nice and readable but I just don't want to write a parser. Hence I modified GIZA++ to output XML instead. [Download the patch]
Sunday, December 24, 2006
GIZA++ on Mac OS X (HFS+)
Today I find that foobar.a3.final and foobar.A3.final are the same file on HFS+ (the file system are used in my iBook). Now I know why foobar.A3.final in my working directory is not the same as what mentioned in GIZA++'s README. A workaround is as follow:
diff -Nuar GIZA++-v2/model3.cc GIZA++-v2-osx/model3.cc --- GIZA++-v2/model3.cc Tue Sep 30 21:24:18 2003 +++ GIZA++-v2-osx/model3.cc Sat Dec 23 18:16:08 2006 @@ -318,8 +318,8 @@ d4file = Prefix + ".d4." + number ; d4file2 = Prefix + ".D4." + number ; d5file = Prefix + ".d5." + number ; - alignfile = Prefix + ".A3." + number ; - test_alignfile = Prefix + ".tst.A3." + number ; + alignfile = Prefix + ".uA3." + number ; + test_alignfile = Prefix + ".tst.uA3." + number ; p0file = Prefix + ".p0_3." + number ; } // clear count tablesI noticed this after running GIZA++ on NetBSD and the result was just like in README. Update: Now I switched from Mac OS X to Ubuntu http://blog.vee-u.com/2008/03/02/giza_pp/
Subscribe to:
Posts (Atom)
This workis licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.