Log in

View Full Version : Translations

December 16th, 2004, 04:28 PM
PhD MD MFA MBA Join Date: Sep 2000
Location: USA
Posts: 2,850
Rep Power: 10
c0deblue, would you be willing to start a new thread describing exactly how you did that translation? There are so many non-English references available that it would be nice to have some process of machine translating them, at least the bulk of the words. There are a million things I want to get from the Beilstein Index, Chemische Berichte, Annalen, Angewandte Chemie, and various patents that I just ignore because the thought of going through by hand and translating the words would be too time consuming. Your process sounds like it could speed things up greatly.

Not sure how much this will add to the knowledge base or simplify what by any definition is a difficult process, but here goes.

The first step in translating anything is GOOD COPY. If it's already a text file you save some work, but more often the original is an image scan or graphics file of some sort. If it's a good sharp image, you can do OCR directly. If not you'll probably have to use a graphics editor (like Photoshop) to clean it up, lighten dark backgrounds, improve contrast, etc. OCR accuracy depends on this. Illustrations, tables or columns often give OCR problems, so crop these and save them as separate graphic files to be re-inserted later. Where original pages are two columns wide, crop the image and do OCR on one column at a time.

Once the OCR step is done, careful proofreading and correction are essential. This will entail line-by-line comparison using a split screen, but words will be unrecognized by the translation software unless every letter is correct. This kind of error typically affects not only the unrecognized word, but other words in a sentence may also be mistranslated due to failure of the context-recognition algorithms that determine meaning. The importance of meticulously proofing the OCR product against the original can't be overemphasized.

With the foreign language version of your document perfect, you can now do a quick machine translation using Systran. Some words will remain unrecognized and sentence structure will be poor, but the output in most cases will be good enough to get the thrust of the article. If this is all you wanted, then the job is complete. However, if you need a more polished translation for permanent reference, you first have to go back and find meanings for the unrecognized words; where sentences make no sense at all, run these through GE Trans. If they still make no sense, select "Display alternate meanings" for all the possibilities. Look up still-unrecognized words in an online dictionary (see notes below).

When you've filled in all the blanks, try another Systran translation on the edited text. This should improve the translation and sometimes even make sense of sentences that were incomprehensible the first time. The final step in the translation process is transposing - moving all the verbs and modifiers from the ends of sentences to their appropriate locations and conforming sentence structure to grammatically acceptable English. This can require some creative thinking as it's sometimes necessary to reconstruct sentences using different words without changing overall meaning.

The following examples will illustrate:

(Original text):
Während des Krieges wurde eine auswärtige Firma beauftragt, umentflammbares Fasermaterial anzufertigen. Dazu Wurde einmalig auch eine Versuchsmenge mit Sulfamid und Formaldehyd imprägniert und deren Restbestände nach Jahren verarbeitet. Dabei erkrankten plötzlich einige Arbeiter unter schweren Vergiftungserscheinungen. Die anfänglichen Symptome wurden nur von Laien beobachtet, es wurde mitgeteilt, daß die Leute bei der Arbeit plötzlich umfielen und krampfartige Zustände bekamen, wobei ihnen, ähnlich epileptischen Krämpfen, Schaum vor dem Munde stand. Zwei Leute erkrankten erst zu Hause nach dem Verlassen des Arbeitsplatzes. Die vier schwerst Erkrankten wurden in ein Krankenhaus eingeliefert. Sie zeigten schwere Bewußtseinsstorungen und starke motorische Unruhe, die Patienten warfen sich hin und her, stöhnten, griffen mit den Händen in die Luft und knirschten mit den Zähnen. Diese Erscheinungen klangen langsam ab, bei einem Patienten gingen sie in einen ausgesprochenen psychischen Verwirrungszustand über, aber auch die anderen zeigten Gedächtnisstörungen. Im Verlaufe von 8 Tagen bis 3 Wochen erholten sich alle Patienten, Dauerschäden blieben nicht zurück. Die sonstigen körperlichen Untersuchungen der Kranken ergaben nichts bemerkenswertes

(Raw Systran output):
During the war a foreign company was assigned to make umentflammbares synthetic material. In addition uniquely also an attempt quantity with Sulfamid and formaldehyde was impregnated and their residues after years was processed. Suddenly some workers under heavy symptoms of intoxication got sick. The initial symptoms were observed only by laymen, it was communicated that the people suddenly fell down with the work and got cramp-like conditions, whereby foam before the mouth stood for them, similarly epileptischen cramps. Two people got sick only at home after leaving the job. The four gotten sick ones schwerst were in-supplied to a hospital. They showed heavy Bewusstseinsstorungen and strong motor unrest, which patients threw themselves, groaned back and forth, seized with the hands into air and knirschten with the teeth. These features faded away slowly, with a patient changed them into an expressed psychological confusion condition, in addition, the others showed memory disturbances. In run from 8 days to 3 weeks recovered all patients, permanent damages did not stay not. The other physical investigations of the patients did not result in anything remarkable.

(After further translation and transposing):
During the war a foreign company was assigned to make flame-resistant synthetic material. In the course of years of manufacture, trial quantities of a unique material were produced impregnated with Sulfamide-Formaldehyde and its residues. Suddenly some workers got sick with heavy symptoms of intoxication. The initial symptoms were observed only by laymen, who reported that the people suddenly fell down at their work with cramp-like conditions, foamed at the mouth, and suffered epileptic-like seizures. Two people got sick at home after leaving the job. Four severely sickened workers were admitted to a hospital. They showed profound disturbances of consciousness and motor function, in which the patients threw themselves, thrashed back and forth, clawed at the air with their hands and crunched their teeth. These features slowly diminished, in one patient changing into a condition of severe psychological confusion, with the others showing memory disturbances. Over a period of 8 days to 3 weeks, all patients recovered without permanent damage. The other physical investigations of the patients did not uncover anything remarkable.

When you're satisfied with the translation, you're ready to format it in an MS Word document, insert the illustrations etc. that you saved as individual graphic files, and make it look as close as possible to the original. It should be mentioned that Systran is integrated with the Word program, so presumably one could start with a Word document and do all the translations from there. However, this will generate enormous temp files and suck up system resources like mad. For me (Pentium I with 128 M memory), it's not worth the risk of losing everything in mid-task due to a system crash, but you could try it with a newer computer with lots of memory.

The only thing remaining is to convert the Word document to a PDF file - a straightforward process if you have the full version of Acrobat or a program like PDF Maker. Be sure you check the PDF carefully though, since the conversion process isn't always perfect. Incorrect indents, spacing, super- and sub-script, graphic boundaries etc. are things to watch for. If there are errors, correct or compensate these in the original Word document and do a new PDF conversion. Don't try to correct them in Acrobat - it causes more problems than it's worth.

The last step in the process is to post the goodies here so the rest of us can have the benefit of all your hard work! :D


Systran: Multi-language translation software. The starting point for translation. Initial output leaves something to be desired, but the program can be customized with your own dictionaries ("CSDs") for greater versatility. Systran's strength is in its ability to assign to each word the most "appropriate" of many possible meanings based on context and sentence structure. However, this is also a weakness since Systran is far from perfect and there's no convenient way of viewing alternative meanings for problem words; even though alternative meanings may be in the program's dictionary, they aren't displayed if the software thinks they don't fit.

GE Trans: A German to English machine translator. Works like Systran but doesn't provide much in the way of context recognition. It does allow you the option of displaying alternative meanings for every word in the translation, which can be valuable since the initial output of GE Trans sometimes doesn't make much sense with technical writings. You can customize this program by adding new words as you find them, thereby tailoring it to your needs. Free download at http://www.theabsolute.net/sware/getrans.html

The Beilstein Dictionary: For translation of technical terms standardized in the Beilstein Organic Chemistry Handbook. Good for filling in some of the blanks - http://www-sul.stanford.edu/depts/swain/beilstein/bedict1.html

Online German-English dictionary: Provides a convenient lookup for words unrecognized by the translation software. The nice thing about this is it returns lists of similar words, so you can often get the meaning of difficult compound words even if they themselves aren't in the online database - http://dict.tu-chemnitz.de/

Other German translation dictionaries:

Lots of miscellaneous translation resources: