This is Gosling’s latest hack, now online. Internationalization is a big hairy tedious ugly necessary job, and if we can turn some community mojo loose on it, that has to be a good thing. OpenOffice.org has done a good job of building a community around an internationalization effort, so we’re not starting from zero. I’ll be keeping an eye on this.
Comment feed for ongoing:
From: Tim Foster (Oct 17 2006, at 14:05)
Yes, Internationalisation is hard : http://blogs.sun.com/timf/entry/internationalisation_is_hard_still. Writing good i18n tools is hard too - and unfortunately imho, this is not one of them. The segmentation sucks. Not a problem if you can guarantee if 100% of all of the content is going to be translated by a human, but if you start using CAT tools in any way, shape or form, or expect to be able to reuse these translations in your friendly neighbourhood translation-memory further down then line, then you're going to be up for a disappointment. Getting segmentation right ( http://blogs.sun.com/timf/entry/how_to_write_a_tm ) is one of the most important things in any CAT tool, and you only get one chance, or risk spending megabucks down the line, either retranslating material, or using expensive sentence alignment technologies (expensive, because humans need to review them)
A quick look at java.lang.Object segmentation ( http://doc.java.sun.com/DocWeb/api/java.lang.Object?lang=da&mode=Translate ) suggests that this needs more work imho. I mentioned this to James already - it'll be interesting to see how this experiment pans out :-)
[link]
From: Andrew Kobayashi (Oct 18 2006, at 02:01)
Personally I think its great but I don't necessarily think that it even counts as a CAT tool. TM's are great but segmentation is so hard across multiple languages - if you are trying sentence based segmentation for example - I don't think that a sentence has the same "semantic payload" in Japanese as in English. Japanese is an example of a language that allows a lot of implied content, which means that a sentence translated from J to E is going to be stripped of a lot of meaning. Translation memory is a useful tool but so far a community has way more collective understanding of meaning than any tool/database can.
[link]