Thursday, May 24, 2007
CakePHP: Donation
Yesterday, I donate 5 USD to Cake Software Foundation.
5 USD = 5 meals (for me, in Thailand).
So it is much money :-P
Tuesday, April 17, 2007
For GNU/Linux only
I found that I posted a lot articles about human language technology and etc. here. Thus, I create new blog (and homepage) at www.vee-u.com. And I try to post mostly GNU/Linux and free software related stuff here.
Labels:
Blog,
GNU/Linux,
Vee Satayamas,
Vee's blog,
www.vee-u.com
Saturday, March 31, 2007
Converting Orchid corpus to XML
Orchid corpus is a Thai part-of-speech annotated corpus, which is used to be freely available on Nectec's website. (I wish it will become available again.) Since, it has quite unique format so it is quite inconvenient to handle. Therefore I just wrote a script to convert it to XML. Then I can just use a XML parser like pulldom to handle it by using a familiar API e.g. (pull)DOM etc.
The example for Orchid corpus format.
%metadata
%metadata
#P1
#1
blaa blaa blaa//
blaa/NNNN
blaa/NNNN
blaa/NNNN
//
The example XML for Orchid corpus format.
<corpus>
<document author="abcd" ...>
<paragraph>
<sentence raw_txt="blaa blaa blaa">
<word surface="blaa" pos="NNNN"/>
<word surface="blaa" pos="NNNN"/>
<word surface="blaa" pos="NNNN"/>
<word surface="blaa" pos="NNNN"/>
</sentence>
</paragraph>
</document>
...
</corpus>
TEI format is probably suit for this job but I am just to lazy to read the specification.
Labels:
corpus,
format,
orchid corpus,
part-of-speech,
thai,
XML
Wednesday, March 28, 2007
Displaying multilingual text in SVG using Firefox
In Khem's tree editor, SVG is used for displaying tree in Firefox. Firefox 2.x on Windows XP can display English text and Thai text in SVG correctly. But when I try to use Firefox 2.x on Mac OS X, Thai, Bengari and Chinese text became a box as shown below.
I try to use other font families, i.e. Times, Sans and Helvetica but only English text can be displayed.
(using this following code)
<svg xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="1.1"
baseProfile="full">
<text x="50" y="50"
font-size="16" fill="blue" >
Wikipedia 維基百科 วิกิพีเดีย উইকিপিডিয়া
</text>
</svg>
Thus, I try to assign a font family to the text as the following code:
<svg xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="1.1"
baseProfile="full">
<text x="50" y="50"
font-family="Garuda" font-size="16"
fill="blue" >
Wikipedia 維基百科 วิกิพีเดีย উইকিপিডিয়া
</text>
</svg>
It works. Firefox can display Thai text correctly. However, Firefox still cannot display Bangari text and Chinese text. As shown below.
Sunday, February 25, 2007
A pure ruby ternary search tree implementation
source code. It takes 10 minutes to load the Yaitron dictionary. Thus, I try ctst :-P. Thank lindever for introducing me TST :-)
Friday, January 5, 2007
Subscribe to:
Posts (Atom)
This workis licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.