TermExtractor.com
Frequently Asked Questions

TermExtractor.com

TermExtractor.com is a service for extracting bilingual terminology from Translation Memory files (TMs).

The service is provided by Masterin® and is free of charge for TMs containing up to 2 000 translation units.

Contact us for information about the Intranet version.

Supported languages are English, Swedish and Finnish.

Arrow Back to main page

Frequently Asked Questions

  1. Why should I extract terms from my Translation Memory file?
  2. What languages are supported by TermExtractor.com?
  3. How come TermExtractor.com is free of charge?
  4. Can I extract monolingual terminology with TermExtractor too?
  5. How do I determine the format of my Translation Memory file?
  6. What is UTF-8 and ISO-8859-1?
  7. What is a Trados Translator's Workbench TXT file? How do I create such a file?
  8. Who owns the rights to the extracted terms?
  9. How do I know that my Translation Memory file and any bilingual terminology extracted from it is kept private?
  10. Can I close my web browser when terms are being extracted? Can I turn off my computer?
  11. When I try to start an extraction process, TermExtractor.com says a required language component is not running and cannot be restarted automatically. What should I do?
  12. My bilingual term file contains a lot of crap. How come?
  13. My bilingual term file contains no terms/very few terms. What's wrong?
  14. The estimated time left of my extraction process is way in the future. What is going on?
  15. My Macintosh-formatted Wordfast translation memory file failes to process. Can I fix this?
  16. I cannot find an answer to my question in this FAQ. What should I do next?

1. Why should I extract terms from my Translation Memory file?

It depends on what you do.

If you are a translator, TermExtractor.com will help you to be more consistent when working with domain-specific terminology. Spend less time doing laboursome terminology exploration in dictionaries and on the web looking for target-language equivalents of the same technical terms over and over again. Instead, extract the terminology you need for a particular translation task by feeding TermExtractor.com with the Translation Memory file you are using for the job at hand.

If you are a translation buyer or a communications officer at a company, you can promote unified, multi-lingual company tone of voice by maintaining a term base of your company's language use. Utilize TermExtractor.com to build and update this term base from the Translation Memory files you or your translation provider maintains.

If you are a Language Service Provider (LSP), you can help your translators produce higher quality in less time (thus making more profit yourself) by providing them with an accurate term base. TermExtractor.com helps you keep your term base up to date with the latest domain-specific terms and equivalents.

Read more in this whitepaper (PDF, 0.5 MB).

[Up]

2. What languages are supported by TermExtractor.com?

TermExtractor.com supports English, Swedish and Finnish (all language pairs and translation directions).

3. How come TermExtractor.com is free of charge?

TermExtractor.com is in a public beta phase. In other words, we value the opinion of our users and aim at improving our service based on the feedback we get from the field. In order to reach as many users as possible, we have decided to offer TermExtractor.com as a free service for the time being. However, in order not to clog the service too much, TermExtractor.com accepts only small translation memories (up to 2.000 translation units per extraction task). Ask us about an unlimited version (liable to charge) for your company's Intranet.

[Up]

4. Can I extract monolingual terminology with TermExtractor too?

No. TermExtractor.com only creates bilingual vocabularies using Translation Memory files as input.

5. How do I determine the format of my Translation Memory file?

Files ending with the file extension .tmx should be in Translation Memory eXchange (TMX) format. Files ending with the file extension .txt can be of various formats.

If you are unsure, open the file in Notepad/Wordpad and look for clues but make sure you do not alter the file. If the first row of the file contains the string "Wordfast", you are dealing with a Wordfast file. If the file contains lines with the string "<TrU>", you are dealing with a Trados Translator's Workbench TXT file. If the file is not readable, you might have opened a machine-coded (binary) file that TermExtractor.com cannot handle at all.

Read on if you need help with file encoding schemes (Unicode, UTF-8, ISO-8859-1, Ascii etc.)

[Up]

6. What is UTF-8 and ISO-8859-1?

UTF-8 and ISO-8859-1 are file encoding schemes. Translation Memory files can be encoded by different file encoding schemes. TermExtractor.com accepts files encoded as ISO-8859-1 ("Western European") and as UTF-8 ("Unicode"). Normally, you can try using the file format without specified file encoding as this defaults to ISO-8859-1. Should TermExtractor.com report an error, simply switch to the UTF-8 version of the file format and try again.

[Up]

7. What is a Trados Translator's Workbench TXT file? How do I create such a file?

If you want to extract terms from files created and maintained by Trados Translator's Workbench (TWB), you need to export the Translation Memory database from Trados' internal format to a readable text format. Open the Translation Memory in TWB (File > Open...), then select File > Export..., then select the source and target languages as fields to be exported, click OK and then select a file name and format for the exported file. TermExtractor.com supports all of the export formats available in TWB (version 7.5).

[Up]

8. Who owns the rights to the extracted terms?

You. All copyrights and IPRs of any Translation Memory files uploaded to and of any bilingual term files generated by and downloaded from TermExtractor.com are owned by the user and kept at his disposal only.

[Up]

9. How do I know that my Translation Memory file and any bilingual terminology extracted from it is kept private?

Your Translation Memory file is uploaded to the TermExtractor.com server where it is processed automatically without any kind of human intervention. The translation memory file is not accessible from the web and it is automatically deleted from the server immediately after the extraction process has finished. In the rare case of a fatal processing error, all uploaded data is deleted automatically within one week after the upload.

Each bilingual terminology extraction process is identified by an email address and a process identification number (PIN code). Only users who can provide both these pieces of information have access to bilingual terminology extracted in the identified process. All bilingual term files are automatically deleted from the server within one week after they were generated.

TermExtractor.com stores the email addresses used to identify extraction processes. These addresses will not be sold, rented or used for sending unsolicited mail.

[Up]

10. Can I close my web browser when terms are being extracted? Can I turn off my computer?

Yes and yes. The terminology extraction process is fully executed on the TermExtractor.com server. When you have received confirmation that the extraction process is successfully in progress, you can continue working normally. There is no need for the web browser to be open or even for your computer to be switched on.

[Up]

11. When I try to start an extraction process, TermExtractor.com says a required language component is not running and cannot be restarted automatically. What should I do?

Normally you can just retry starting the extraction process a couple of times. If TermExtractor.com repeatedly reports problems with a component of a particular language, please do not hesitate to report the issue.

[Up]

12. My bilingual term file contains a lot of crap. How come?

TermExtractor.com extracts bilingual terminology based on a combination of linguistically motivated rules and well-known statistical measures. At the end of the day, it is still an automated process which is per definition error-prone.

Here are some tips and tricks that might help tuning TermExtractor.com to better fit your needs:

  • If the extracted terminology seems to be "foreign", e.g. inflections are repeatedly handled incorrectly, then check that you have set the source and target languages properly.
  • If the extracted terminology contains a lot of funny characters (slashes, weird accentuated characters etc.), then check that you have set the format correctly.
  • If most of the errors are on rows where the confidence score is low (less than 1.0), then try tightening the advanced settings on the Extract Terms tab and redo your terminology extraction.

Please note that it has been reported that Translation Memory files containing a lot of machine-readable coding (e.g. XML/HTML tags) embedded in the human-readable text will not get processed properly by TermExtractor.com. If possible, use a Translation Memory file with as little machine coding as possible.

[Up]

13. My bilingual term file contains no terms/very few terms. What's wrong?

Here are some likely reasons:

  • You are extracting terminology from a Translation Memory file that is too small (< 500 translation units/sentence pairs)
  • You have set the advanced settings too tight
  • The source and target languages are not correct

TermExtractor.com uses a combination of sophisticated linguistic rules and well-known statistical measures to extract term pair candidates from a Translation Memory file. If the file does not contain reoccuring pairs of terms or the terms cannot be paired up properly, TermExtractor.com will not find them.

[Up]

14. The estimated time left of my extraction process is way in the future. What is going on?

Woops! It seems your terminology extraction process has stopped of some reason. The server computer might have rebooted itself during the extraction process, or some of the required language components have gone offline.

You can try to redo the extraction if you like. If the same problem occurs again, we would highly appreciate if you decided to report this issue. Sorry for the inconvenience!

[Up]

15. My Macintosh-formatted Wordfast translation memory file failes to process. Can I fix this?

Yes.

On Microsoft Windows: Create a copy of the Macintosh-formatted Wordfast translation memory file. Open the copy in Microsoft Word on Windows. Microsoft Word now brings up the File Conversion dialog. Set the correct text encoding scheme of the Macintosh-formatted file in this dialog. Normally this is Other encoding > Western European (Mac). Close the File Conversion dialog, and then choose Save As... in the File menu to save yet another copy of the plain text file containing the translation memory. After you have given the file a new name and clicked Save, Microsoft Word brings up the File Conversion dialog again. Set text encoding to Unicode (UTF-8) and then save the file by clickin OK in the File Conversion dialog. Set file format to Wordfast (UTF-8) in TermExtractor.com to use the newly created file.

On Macintosh: Follow the instructions above for the Windows platform but make possible adjustments needed for the process to work with Microsoft Word for Mac.

[Up]

16. I cannot find an answer to my question in this FAQ. What should I do next?

Have you seen the Video Tutorial? If not, start by viewing it. If you still need assistance, please do not hesitate to send us a message.

[Up]