TermExtractor.com is a service for extracting bilingual terminology from Translation Memory files (TMs).
The service is provided by Masterin® and is free of charge for TMs containing up to 2 000 translation units.
Contact us for information about the Intranet version.
Supported languages are English, Swedish and Finnish.
It depends on what you do.
If you are a translator, TermExtractor.com will help you to be more consistent when working with domain-specific terminology. Spend less time doing laboursome terminology exploration in dictionaries and on the web looking for target-language equivalents of the same technical terms over and over again. Instead, extract the terminology you need for a particular translation task by feeding TermExtractor.com with the Translation Memory file you are using for the job at hand.
If you are a translation buyer or a communications officer at a company, you can promote unified, multi-lingual company tone of voice by maintaining a term base of your company's language use. Utilize TermExtractor.com to build and update this term base from the Translation Memory files you or your translation provider maintains.
If you are a Language Service Provider (LSP), you can help your translators produce higher quality in less time (thus making more profit yourself) by providing them with an accurate term base. TermExtractor.com helps you keep your term base up to date with the latest domain-specific terms and equivalents.
Read more in this whitepaper (PDF, 0.5 MB).
[Up]
TermExtractor.com supports English, Swedish and Finnish (all language pairs and translation directions).
TermExtractor.com is in a public beta phase. In other words, we value the opinion of our users and aim at improving our service based on the feedback we get from the field. In order to reach as many users as possible, we have decided to offer TermExtractor.com as a free service for the time being. However, in order not to clog the service too much, TermExtractor.com accepts only small translation memories (up to 2.000 translation units per extraction task). Ask us about an unlimited version (liable to charge) for your company's Intranet.
[Up]
No. TermExtractor.com only creates bilingual vocabularies using Translation Memory files as input.
Files ending with the file extension .tmx should be in
Translation Memory eXchange (TMX) format. Files ending with
the file extension .txt can be of various formats.
If you are unsure, open the file in Notepad/Wordpad and look for clues but make sure you do not alter the file. If the first row of the file contains the string "Wordfast", you are dealing with a Wordfast file. If the file contains lines with the string "<TrU>", you are dealing with a Trados Translator's Workbench TXT file. If the file is not readable, you might have opened a machine-coded (binary) file that TermExtractor.com cannot handle at all.
Read on if you need help with file encoding schemes (Unicode, UTF-8, ISO-8859-1, Ascii etc.)
[Up]
UTF-8 and ISO-8859-1 are file encoding schemes. Translation Memory files can be encoded by different file encoding schemes. TermExtractor.com accepts files encoded as ISO-8859-1 ("Western European") and as UTF-8 ("Unicode"). Normally, you can try using the file format without specified file encoding as this defaults to ISO-8859-1. Should TermExtractor.com report an error, simply switch to the UTF-8 version of the file format and try again.
[Up]
If you want to extract terms from files
created and maintained by Trados Translator's Workbench (TWB), you
need to export the Translation Memory database from Trados'
internal format to a readable text format. Open the Translation
Memory in TWB (File > Open...), then
select File > Export..., then select
the source and target
languages as fields to be exported, click OK and then
select a file name and format for the exported file. TermExtractor.com
supports all of the export formats available in TWB (version 7.5).
[Up]
You. All copyrights and IPRs of any Translation Memory files uploaded to and of any bilingual term files generated by and downloaded from TermExtractor.com are owned by the user and kept at his disposal only.
[Up]
Your Translation Memory file is uploaded to the TermExtractor.com server where it is processed automatically without any kind of human intervention. The translation memory file is not accessible from the web and it is automatically deleted from the server immediately after the extraction process has finished. In the rare case of a fatal processing error, all uploaded data is deleted automatically within one week after the upload.
Each bilingual terminology extraction process is identified by an email address and a process identification number (PIN code). Only users who can provide both these pieces of information have access to bilingual terminology extracted in the identified process. All bilingual term files are automatically deleted from the server within one week after they were generated.
TermExtractor.com stores the email addresses used to identify extraction processes. These addresses will not be sold, rented or used for sending unsolicited mail.
[Up]
Yes and yes. The terminology extraction process is fully executed on the TermExtractor.com server. When you have received confirmation that the extraction process is successfully in progress, you can continue working normally. There is no need for the web browser to be open or even for your computer to be switched on.
[Up]
Normally you can just retry starting the extraction process a couple of times. If TermExtractor.com repeatedly reports problems with a component of a particular language, please do not hesitate to report the issue.
[Up]
TermExtractor.com extracts bilingual terminology based on a combination of linguistically motivated rules and well-known statistical measures. At the end of the day, it is still an automated process which is per definition error-prone.
Here are some tips and tricks that might help tuning TermExtractor.com to better fit your needs:
Please note that it has been reported that Translation Memory files containing a lot of machine-readable coding (e.g. XML/HTML tags) embedded in the human-readable text will not get processed properly by TermExtractor.com. If possible, use a Translation Memory file with as little machine coding as possible.
[Up]
Here are some likely reasons:
TermExtractor.com uses a combination of sophisticated linguistic rules and well-known statistical measures to extract term pair candidates from a Translation Memory file. If the file does not contain reoccuring pairs of terms or the terms cannot be paired up properly, TermExtractor.com will not find them.
[Up]
Woops! It seems your terminology extraction process has stopped of some reason. The server computer might have rebooted itself during the extraction process, or some of the required language components have gone offline.
You can try to redo the extraction if you like. If the same problem occurs again, we would highly appreciate if you decided to report this issue. Sorry for the inconvenience!
[Up]
Yes.
On Microsoft Windows: Create a copy of the Macintosh-formatted Wordfast
translation memory file. Open the copy in Microsoft Word on Windows. Microsoft Word
now brings up the File Conversion dialog. Set the correct text encoding
scheme of the Macintosh-formatted file in this dialog. Normally this is
Other encoding > Western European (Mac). Close the File Conversion dialog,
and then choose Save As... in the File menu to
save yet another copy of the plain text file containing the translation memory.
After you have given the file a new name and
clicked Save,
Microsoft Word brings up the File Conversion dialog again.
Set text encoding to Unicode (UTF-8)
and then save the file by clickin OK in the File Conversion dialog.
Set file format to Wordfast (UTF-8) in TermExtractor.com to use the newly
created file.
On Macintosh: Follow the instructions above for the Windows platform but make possible adjustments needed for the process to work with Microsoft Word for Mac.
[Up]
Have you seen the Video Tutorial? If not, start by viewing it. If you still need assistance, please do not hesitate to send us a message.
[Up]