Contributed by Jonathan Pool on 2009-09-18. Revised on 2009-09-21. This document describes a set of revisions to PanLex.
The Performance Tuning document “PanLex Performance Recommendations” describes 2 sets of improvements to PanLex, called “Baseline” and “Post-Baseline” revisions.
I have implemented some of each, for reasons explained in other forum posts (in summary, because I want to make the revisions myself that I can readily make to concentrate consulting funds on more complex work, because some “Baseline” work was not straightforward, and because some “Post-Baseline” work was straightforward). In order to test the effects of these revisions, and to extract benefits from them, I have also made another revision.
To avoid ambiguity, I shall call the set of revisions that I have made “Set A”.
The revisions in Set A are as follows:
Elements 1-6 of Set A are, to the best of my knowledge, implemented as recommended.
Element 7 of Set A is a feature that is expected to be in demand but has been impractical because of its execution-time cost. This feature allows a user to specify an expression in some language variety V and ask for a translation into some language variety W, where the translation may be direct (i.e. attested by a PanLex source) or indirect (i.e. it is a translation of a translation). In reality, users of such a feature would usually want not only a list of indirect translations, but also the execution of one or more routines describing statistics of, assigning probabilities to, or selecting among, the elements in the list. Such routines have not yet been provided, but the mere discovery of all (2-step) indirect translations has been a task with execution times of up to several minutes. This feature resembles the feature that populates the labels of the UI by translating expressions in the PanLex variety into the user’s preferred variety. The latter feature includes a selection routine to produce a unique result, which adds to its expense, but it recognizes only PanLex’s own translations of the source expression as the first translation step, which decreases the expense.
Set A has been implemented in a temporarily parallel version (2.3) of PanLex, accessible via the URL http://panlex.org/cgi-bin/plxu23.cgi. I intend to make this the ordinary version and commit its code to the repository by 2009-09-30.
I have created a revised list of evaluation tasks and have measured performance on it. The results are in the document “UI Tasks 2”.