Joseph Chee Chang, Jason S. Chang, and Roger Jang. ACL 2012.
TermMine is an information extraction system that can automatically mine translation pairs of terms from the web. We used a small set of terms and translations to gather mixed-code text from the web to train a CRF model that can identify translation pairs at run-time.
In this paper, we present a new method for learning to finding translations and transliterations on the Web for a given term. The approach involves using a small set of terms and translations to obtain mixed-code snippets from a search engine, and automatically annotating the snippets with tags and features for training a conditional random field model. At run-time, the model is used to extracting translation candidates for a given term. Preliminary experiments and evaluation show our method cleanly combining various features, resulting in a system that outperforms previous work.
1 2 3 4
1 2 3 4 5 6 7 8 9 10 11 12 13 14