Project

General

Profile

Compatibility between OmegaT and OmegaT-DGT in TM-server connectivity

Added by Redmine Admin about 6 years ago

Hi,
This is not a personal doubt, it's just curiosity, because I've never actually used a team project (and I don't have a tentative idea of how they work). In fact, I'm not sure what the OmegaT and OmegaT-DGT (which I've heard incorporates new functions in its latest update) features are in this field, neither what's the difference between.
Are they compatible? I mean, can someone using OmegaT-DGT work on a team project created with OmegaT, and vice versa?

As I was saying, this is not a personal doubt, but I would like to ask the question anyway, because I think it may be of general interest.

Thank you very much,
Concepción Martín


Replies (5)

RE: Compatibility between OmegaT and OmegaT-DGT in TM-server connectivity - Added by Redmine Admin about 6 years ago

Hi Concepción

As developer of DGT-OmegaT, I thank you that you give me the occasion to clarify some complex things.

Indeed, we can consider that DGT-OmegaT has two ways to share data between translators in real time, while standard OmegaT has only one.

OmegaT's "team projects" are also present in DGT-OmegaT, simply because we started from OmegaT 3.6 and did no change in this feature. You can without any problem share a project between OmegaT 3.6 and DGT-OmegaT, using "team projects", simply because this is exactly the same code. For compatibility with OmegaT 4.1, let the core team answer : the level of compatibility would be exactly the same as between standard OmegaT 3.6 and latest OmegaT 4.1.

On the other hand, inside DGT we developed a totally different approach, based on a SQL server (namely PostgreSQL)
We wanted from the beginning to do it as a plugin, so that it can be maintained separately from OmegaT. The more, this server is also compatible with other CAT tools (actually only one, but if somebody wants to write a plugin we are open).
The problem is that the notion of TM server is not present in OmegaT's API (neither in version 3 nor in 4). So, we needed first to improve the API. That is the reason why the plugin cannot actually work with standard OmegaT. But we would be very happy to see our patches also in OmegaT 4, so that the plugin (still maintained separately) could be compatible with both versions.
But we are also conscient that this set of patches can appear as complex, so we would like that some people first test it, before we can open an RFE to have them (but not the plugin!) eventually integrated in OmegaT 4.

Let's do it step by step, so that nobody feels lost from the beginning. Even the notion of translation memory plugin is, in a sense, something new to OmegaT users.
I just published a new release of DGT-OT (3.0 update 13). In this version, you may notice a new menu "Options -> External Translation Memories", with "MyMemory given as an example.
Small description of this new menu, and how it differs from the menu in "Machine Translation" : http://www.silvestris-lab.org/node/25 http://www.silvestris-lab.org/node/25
This example is to convince users about the utility of TM plugins, compared to MT plugins which already exist since very old versions of OmegaT.
A second example, to show you that the API is not only for building client/servers, but can also be used for locally indexed translation memories: http://www.silvestris-lab.org/node/26 http://www.silvestris-lab.org/node/26
Both examples are included in the last release of DGT-OmegaT.

Let's stop here for the moment: if you can test both examples and give us feedback, later I can introduce the plugin for PostgreSQL, which is based on the same API as the examples.

Regards
Thomas Cordonnier

RE: Compatibility between OmegaT and OmegaT-DGT in TM-server connectivity - Added by Redmine Admin about 6 years ago

Hi Thomas,

As TM-server connectivity in OmegaT is something of a great interest for me, I took your 4.1.3_2 binary for a spin.

If I understand it right, currently it's capable of fetching matches from MyMemory server without a need to specify anything in a .properties files. It seems to work pretty well. There's a small delay before matches are shown, but that understandable.

There's also Http-TM-sample.properties which can be used to connect to an arbitrary TM-server. Now, if I understand it right, the server to connect to must be Cyclotis server, not just any TM-server. If it's so, it would be helpful if it was stated explicitly, and if not, I think it would be awesome if there were some examples, for instance, how to connect to a Wordfast server, or even to another instance of MyMemory. I guess, the same applies to writable external memory, too.

And it looks like there's no way to search in MyMemory, and, probably, in other remote TM's either. If that is the case, it makes external TM sources far less usable.

I really liked the thing you did to be able to read/write TM's in txt format (Anaphraseus/Wordfast). Probably it isn't something revolutionary, but it's always nice to be able to work with different formats without needing a converter.

I looked at your patches too and applied one (M1) with slight modifications to work only with private memories from MyMemory. I exported a TMX from a huge project, created an empty project with the same source files that have already been translated. Then I tested with old project_save.tmx put into /tm folder, and with MyMemory external translation memory that got matches only from my private TM. If the local tmx file is disabled, I get matches slightly faster, as the network delay to fetch matches is smaller than the time my machine needs to find them.

I haven't tried indexed TM's yet. Probably they can give me some time boost, taking fewer processor cycles to find matches, but if one can't search in them, their use is rather limited.

All in all, I'm quite impressed. I can't not speak on the technical details as I can't tell bad code from good, but I would definitely find some use for TM-server functionality in OmegaT, so I feel that what you've done so far is
worthwhile. But as to integration, maybe for the starters it would be possible to produce just one patch that would change the current /trunk into
"translation-memory plugins enabled". And in addition to that patch provide one or two separate plugins to implement some of what is now available in DGT version (for instance, MyMemory-TMX and Anaphraseus/Wordfast memory support) to serve as an example. One patch would be much easier to discuss with OmegaT
dev team, and to change as needed.

Meanwhile thank you very much for all your efforts, and I hope to see at least some of that migrated into the main source.

RE: Compatibility between OmegaT and OmegaT-DGT in TM-server connectivity - Added by Redmine Admin about 6 years ago

Hi Kos

Thanks for the tests, your report is very interesting.

If I understand it right, currently it's capable of fetching matches from MyMemory server without a need to specify anything in a .properties files.

Yes. MyMemory does not need properties file because this is a singleton (there is only one MyMemory and it does not even have collections). Using a properties file may enable, however, to access private memories which exist in MyMemory. It is not implemented because I do not have such memories and I did not want to make the prototype too complex, but it is perfectly possible.

It seems to work pretty well. There's a small delay before matches are shown, but that understandable.

Yes, of course. The real question is : how would you compare this new plugin with the already existing MyMemory plugin (the one which displays in the Machine Translation pane)? Do you agree that it makes sense to display it in the Matches pane instead, as I did? In any case, do not forget that I did this plugin only as an example: I do not really use it, so I did not take time to optimize it. If you see potential improvements, I remain open.

There's also Http-TM-sample.properties which can be used to connect to an arbitrary TM-server.

Unfortunately, no. All what is in the directory plugins/Silvestris-Cyclotis-Plugin is only linked to the Cyclotis server. The fact is that there are two ways to access such a server: directly via JDBC, or indirectly via a REST proxy application, which is a part of the server and is absolutely not generic. We can go back to Cyclotis server later. I did not write the plugin which makes possible to connect to an arbitrary HTTP-based server. It is probably possible, but it would need lot of parameters in the properties file (not only to specify the URL and its parameters, but also how to parse the result, which can be XML, JSON or something else). If you want to do it, or if you know another server you would like to give access to, don't hesitate: it is a good exercise to check whenever my API are correctly documented. The documentation for developers is [here|http://www.silvestris-lab.org/node/60] ; in case you test and have difficulties, leave a comment in the web page: this makes possible for other people to share their experience with this API (while the discussion list is normally only for users)

And it looks like there's no way to search in MyMemory, and, probably, in other remote TM's either. If that is the case, it makes external TM sources far less usable.

If by "search" you mean using the "search screen", in patch M2 and M4 is introduced the interface ISearchable. All plugins which implement it can be included in the search screen. Unfortunately MyMemory is the exception because I did not find in their API how to implement a non-fuzzy search. Again if you know how to do it, it can be a good occasion to try the API from a developer point of view.

I really liked the thing you did to be able to read/write TM's in txt format

I mainly wrote this plugin to illustrate the interface for writable memories. Except for this example, I suppose that Cyclotis is probably the only other server where it makes sense to register every new entry, as I do.

One patch would be much easier to discuss with OmegaT dev team, and to change as needed.

Considering the fact that this is a very complex set of features, I tried to prepare patches corresponding not to how I developed, but how it can serve as a support for explanation. Thanks to that I can describe each patch individually and add an example into it.
MyMemory and Wordfast/anaphaseus have only been developed to serve as examples to illustrate the API: Cyclotis plugin can work without them, but not without the new interfaces. But I think that without the samples, it would have been difficult to explain what this API is made for. In any case I can produce the same set of patches without the samples, if it is the final decision. The main feature of this set of patches is the API, not the samples.

It is good that you ever tried to modify the MyMemory plugin to access private memories. Now, with the documentation, try to make it configurable via Properties file and tell me if it is easy or not. And if you see another server which you would like to give access to, try to do it: actually I am the only one who ever wrote something with this API, I always tries to provide almost two implementations for each interface, to check that it is generic enough, but seeing another person providing a third implementation would be the ultimate proof.

In other terms, thanks for your tests, please continue and give me feedback - maybe, better as comments in the dedicated web page than in this list.

Thanks for your hep

Regards
Thomas

RE: Compatibility between OmegaT and OmegaT-DGT in TM-server connectivity - Added by Redmine Admin about 6 years ago

Hi Thomas,

I thought we could continue this conversation off-list.

Thanks for the tests, your report is very interesting.

Well, I wish I were more technically skilled to be able to test some more or help with something.

Yes. MyMemory does not need properties file because this is a singleton (there is only one MyMemory and it does not even have collections).

Using a properties file may enable, however, to access private memories which exist in MyMemory. It is not implemented because I do not have such memories and I did not want to make the prototype too complex, but it is perfectly possible.

MyMemory API is nicely described, but I don't see how to use .properties files
to make those API calls.

It seems to work pretty well. There's a small delay before matches are shown, but that understandable.

Yes, of course. The real question is : how would you compare this new plugin with the already existing MyMemory plugin (the one which displays in the Machine Translation pane)? Do you agree that it makes sense to display it in the Matches pane instead, as I did?

It made sense for a long time, it simply was never developed, so really thank you for taking time to do that. The current Machine Translate functionality in
OmegaT is far from optimal, one of the biggest problems being impossibility to select MT match to be inserted into the target. And fetching TM-server matches is really something that is badly needed, especially when OmegaT is used by companies with a lot of linguistic data to be leveraged. Sometimes I get contracted by a LSP that uses OmegaT, and they provide a few hundred MB's of TMX files. I like working with the company, but finding matches for segments is a bit of an overwhelming task for my not-too-outdated machine. I wish they had a TM-server and did all the fuzzy matching on their side, and let me just fetch the results. And yes, seeing the source of the match is a must, if it's not a MT result.

In any case, do not forget that I did this plugin only as an example: I do not really use it, so I did not take time to optimize it. If you see potential improvements, I remain open.

I do see potential improvements, but more on this further down.

There's also Http-TM-sample.properties which can be used to connect to an arbitrary TM-server.

Unfortunately, no. All what is in the directory plugins/Silvestris-Cyclotis-Plugin is only linked to the Cyclotis server. The fact is that there are two ways to access such a server: directly via JDBC, or indirectly via a REST proxy application, which is a part of the server and is absolutely not generic. We can go back to Cyclotis server later.

Is Cyclotis server open source? Maybe it's due to my technical illiteracy, but I find information on most of this stuff to be a bit cryptic.

I did not write the plugin which makes possible to connect to an arbitrary HTTP-based server. It is probably possible, but it would need lot of parameters in the properties file (not only to specify the URL and its parameters, but also how to parse the result, which can be XML, JSON or something else). If you want to do it, or if you know another server you would like to give access to, don't hesitate: it is a good exercise to check whenever my API are correctly documented. The documentation for developers is here: [[http://www.silvestris-lab.org/node/60]] ; in case you test and have difficulties, leave a comment in the web page: this makes possible for other people to share their experience with this API (while the discussion list is normally only for users)

One of the widely used TM-server software is Wordfast Server https://www.wordfast.net/?go=wfserver (Windows executable, doesn't require additional components)

They provide a full-featured demo version with 3-client limit. In the manual that comes with the server installation file there's Appendix 2 that describes the REST API which could be used to communicate with the server. There's a Wordfast server running for WordfastAnywhere users (https://freetm.com). They hid it under a jsp interface (available by clicking "Concordance" once there's a TM set up and a document for translation added) but it's actually the same kind server, which I suspect can be accessed at 207.223.244.237 on port 47110.

If OmegaT was able to work with WF servers, it would mean a complete new level of compatibility.

There's also tmserver/amaGama from Translate Toolkit. I don't think it's widely used, but it's open source, and their API is quite simple.

And it looks like there's no way to search in MyMemory, and, probably, in other remote TM's either.
If that is the case, it makes external TM sources far less usable.

If by "search" you mean using the "search screen", in patch M4 and M6 is introduced the interface ISearchable. All plugins which implement it can be included in the search screen. Unfortunately MyMemory is the exception because I did not find in their API how to implement a non-fuzzy search. Again if you know how to do it, it can be a good occasion to try the API from a developer point of view.

There's something similar to search if you just use "q" parameter, the same as for fuzzy matching.

I really liked the thing you did to be able to read/write TM's in txt format

I mainly wrote this plugin to illustrate the interface for writable memories. Except for this example, I suppose that Cyclotis is probably the only other server where it makes sense to register every new entry, as I do.

You can store your entries in MyMemory, see section "Set" in the API technical specifications (https://mymemory.translated.net/doc/spec.php)

The same is true about WF server. amaGama doesn't seem to accept new entries submitted by user, at least not from the first glance at their API (https://mymemory.translated.net/doc/spec.php).

One patch would be much easier to discuss with OmegaT dev team, and to change as needed.

Considering the fact that this is a very complex set of features, I tried to prepare patches corresponding not to how I developed, but how it can serve as a support for explanation. Thanks to that I can describe each patch individually and add an example into it.

MyMemory and Wordfast/anaphaseus have only been developed to serve as examples to illustrate the API: Cyclotis plugin can work without them, but not without the new interfaces. But I think that without the samples, it would have been difficult to explain what this API is made for. In any case I can produce the same set of patches without the samples, if it is the final decision. The main feature of this set of patches is the API, not the samples.

Well to me as a non-techie it seems that it would be easier to work with just one patch that enables TM-plugins in OmegaT, stripped of all the extra code. MyMemory, Anaphraseus, Cyclotis (or any other) examples could be provided as external plugins which could be compiled independently outside of OmegaT source (https://github.com/omegat-org/plugin-skeleton) and used as jar files.
Besides, if it's just a bare patch that makes OmegaT pluggable, stripped of any specific implementations, it would be easier to discuss its merging to the main source with OmegaT devs, I think.

It is good that you ever tried to modify the MyMemory plugin to access private memories. Now, with the documentation, try to make it configurable via Properties file and tell me if it is easy or not.

I haven't done it yet, but I'm going to sometime this week.

Thanks for all your time and effort, Thomas. I hope to see my much loved OmegaT capable of communicating with TM-servers in a near future.

Best regards,

RE: Compatibility between OmegaT and OmegaT-DGT in TM-server connectivity - Added by Redmine Admin about 6 years ago

Hi Kos

I thought we could continue this conversation off-list.

I agree. However, I tried to send an answer directly to you via the Yahoo interface, but except the confirmation that the message was correctly sent, then I saw no reaction so I suppose that you did not receive it.
That's the reason why now I try to create a topic in my forum. This discussion should not have interest for normal users, but for technicians it does, so it makes sense to continue outside OmegaT's list but no reason why not to make it public.

MyMemory API is nicely described, but I don't see how to use .properties files to make those API calls.

Add a constructor receiving a java.util.Properties instance as single parameter. Then, each time you create such a file in the tm/ folder or one of its subfolders, OmegaT will look for the "class" entry in the file, and call the constructor corresponding to the given class.
The parameter "class" is the only one which is common to all plugins: then, you can put in the configuration file whatever your plugin needs. In the case of MyMemory, you probably have to put your private credentials.
Description of the mechanism is here : http://www.silvestris-lab.org/node/44

There's something similar to search if you just use "q" parameter, the same as for fuzzy matching.

Unfortunately, it is not complete enough. Again, if by "search" we mean to give access in OmegaT's search screen, then we need to have the possibility to implement most features of this screen, not only search in the source. Another possibility would be to retreive the full contents and let OmegaT do the filtering, but MyMemory is one of the rare engines which does not provide this feature (that's the reason why I introduce the interface IBrowsableMemory, implemented by most providers except MyMemory)
If you mean something else, for example something similar to OmegaT's "external searcher", that's another story we can discuss later.

You can store your entries in MyMemory, see section "Set" in the API

Yes, I know, but here the problem is that I don't know what is the policy applied by the compagny about these results. In my API, when a class implements IWritableMemory, by default all validated segments are sent to the memory. This is specially made for Cyclotis (which is a tool to share in real time during a complex translation), but I am not sure this is what MyMemory users would eventually want (maybe we can think about a contextual menu for on-demand submission, for example)

The current Machine Translate functionality in OmegaT is far from optimal, one of the biggest problems being impossibility to select MT match to be inserted into the target.

DGT-OmegaT does it, and if I good remember, I had submitted to them the patch in 2012, which was some time discussed, but finally totally forgotten. Result: the corresponding RFE is still opened (https://sourceforge.net/p/omegat/feature-requests/776/)

Sometimes I get contracted by a LSP that uses OmegaT, and they provide a few hundred MB's of TMX files.

You are lucky: my clients don't hesitate to take gigabytes of TMX files, put them together in the tm/ folder of one project, then ask me why it does not work, telling that it is absolutely vital for them to have access to all of them, even if it is slow... The Lucene indexes I propose are made for that: they are not always faster than TMX in memory, but the advantage is exactly the fact that they are not in RAM, meaning that you can use them without any RAM limitation.

Is Cyclotis server open source? Maybe it's due to my technical illiteracy, but I find information on most of this stuff to be a bit cryptic.

Yes, absolutely : http://www.silvestris-lab.org/node/5 ; however, as you can see it is not necessarily easy to install (you have almost 2 servers to install, and up to five). So, I also distribute it in a dual model, common in open source world: if you want to use it without having to make an installation, go to the SaaS service http://www.silvestris-dom.com/ where you can rent some space (not for free except for very small memories). However, the SaaS contains exactly the same as the open source version.
The fact that it sounds cryptic is entirely my fault : I did not correctly communicate about it. Now I want to take profit that a few people expressed interest about it to take the time to understand what sounds clear to people or not.

If OmegaT was able to work with WF servers, it would mean a complete new level of compatibility.

Yes, of course.
Have a look to http://www.silvestris-lab.org/node/60 and check whenever you are able to write the plugin. This is not that I do not want to do it, but if the API is as simple as you describe, I think it is a good occasion for me to check whenever my documentation is really usable by someone else than me or not.

There's also tmserver/amaGama from Translate Toolkit. I don't think it's widely used, but it's open source, and their API is quite simple.

Yes, I know. In fact, Amagama looks for me as a alpha version (version 0.0.1) of a service which whould be similar to Cyclotis, once finished. But if this server sounds simpler for you to try implementation, feel free to do so.

Well to me as a non-techie it seems that it would be easier to work with just one patch that enables TM-plugins in OmegaT, stripped of all the extra code.

I saw you as a developer, this set of patches is for technicians, more precisely the OmegaT developer team (but for the moment, I want that external people test it, that's another thing)

MyMemory, Anaphraseus, Cyclotis (or any other) examples could be provided as external plugins

For Cyclotis it is already the case. For the other ones, I prefered to have them in the code just because I created them as samples (I do not use them). But let's wait what the core team will finally think about this point.

Besides, if it's just a bare patch that makes OmegaT pluggable, stripped of any specific implementations, it would be easier to discuss its merging to the main source with OmegaT devs, I think.

If they tell me finally that they prefer one single patch, then I will do. But if they want to discuss in detail about the code, I think it is easier for them to see which part of the code adds which specific feature. Wait and see.

Regards
Thomas

    (1-5/5)