UI Tasks 2 Comments

Introduction

Contributed by Jonathan Pool on 2009-09-30. This document contains comments on the document “UI Tasks 2”.

Legend

See “UI Tasks 1 Comments” for the legend for “UI Tasks 2”.

Introduction

“UI Tasks 2” describes and times tasks performed with the “PanLem” PanLex Web interface, version 2.3, after Set A of performance tuning measures were implemented (see “PanLex 2.3”).

This test shows the following facts:

  1. Character tabulation has been made faster and is approaching satisfactory speed.
  2. Indirect translation has been made faster, but is still very unsatisfactory.
  3. Character tabulation does not become faster in the “hot” condition.
  4. Indirect translation approaches satisfactory speed in the “hot” condition.
  5. Out-of-memory failures on character tabulation have been eliminated.

Details

The execution of character tabulation has become faster. For example, before Set A the cold tabulation for Mandarin was 278 seconds, and after Set A it is 17 seconds. The tabulation for English was not executable before Set A because of memory exhaustion, and after Set A it takes 45 seconds. This is probably the slowest language variety. With 8 seconds as the declared satisfactory execution time, this result is unsatisfactory but is not extremely so. The tabulation for a language variety with 3,000 expressions takes only 1 second. Further improvement of the speed of character tabulation seems to deserve a low priority, particularly because it is not expected to be a popular feature.

The execution of indirect translation has become faster, but remains far from satisfactory. One type of indirect translation is that which provides UI labels. Before Set A, the first two steps of a session in Italian, in the “cold” condition, required 95 and 198 seconds, respectively. After Set A, they require 52 and 110 seconds, respectively. This is about a 45% decrease in execution time, but the satisfactory times are 1 second each, so the speeds after Set A are still far from satisfactory. A new UI feature, allowing the user to obtain arbitrary indirect translations, was added after Set A. The testing shows that the execution of this feature is sometimes satisfactorily fast. Satisfactory speed is achieved when there are few translations and the target language variety has few expressions. Task 6 illustrates this case. However, tasks 5 and 4 show that the execution becomes much slower when the translations and target-variety expressions become numerous. In task 4, the cold execution time is 106 seconds, with 8 seconds being declared satisfactory. In complete contrast to the character-tabulation case, however, the execution times for indirect translation become easily satisfactory in the “hot” condition.

Diagnosis

This testing reveals that there is one feature whose execution time is very unsatisfactory, and then only in the “cold” condition. This is indirect translation. This feature is implemented with an SQL function in the database, “trtr (integer, integer)”. Its arguments are the ID of the source expression and the ID of the target language variety. The code of this function is:

select distinct dn4.ex as exid, tt as extt
from dn as dn1, dn as dn2, dn as dn3, dn as dn4, ex
where dn1.ex = $1
and dn2.mn = dn1.mn
and dn2.ex != $1
and dn3.ex = dn2.ex
and dn4.mn = dn3.mn
and dn3.mn != dn2.mn
and dn4.ex != dn3.ex
and ex.ex = dn4.ex
and lv = $2
order by extt;

In the case of indirect translation from the French expression “sens” into Japanese (task 4), the query planner’s report on the execution of this function in the “cold” condition is:

                                                                        QUERY PLAN                                                                        

 Unique  (cost=783981.61..804092.91 rows=2681507 width=23) (actual time=112647.920..112649.073 rows=991 loops=1)
   ->  Sort  (cost=783981.61..790685.38 rows=2681507 width=23) (actual time=112647.919..112648.255 rows=3649 loops=1)
         Sort Key: ex.tt, dn4.ex
         Sort Method:  quicksort  Memory: 288kB
         ->  Hash Join  (cost=23648.32..497668.89 rows=2681507 width=23) (actual time=1822.086..112644.352 rows=3649 loops=1)
               Hash Cond: (dn4.ex = ex.ex)
               ->  Nested Loop  (cost=0.00..399172.54 rows=4269597 width=4) (actual time=150.781..110910.856 rows=82956 loops=1)
                     Join Filter: (dn4.ex <> dn3.ex)
                     ->  Nested Loop  (cost=0.00..7610.25 rows=93878 width=8) (actual time=150.776..57192.587 rows=25998 loops=1)
                           Join Filter: (dn3.mn <> dn2.mn)
                           ->  Nested Loop  (cost=0.00..556.14 rows=1592 width=8) (actual time=137.682..893.878 rows=368 loops=1)
                                 ->  Index Scan using dn_ex on dn dn1  (cost=0.00..131.30 rows=35 width=4) (actual time=88.993..492.152 rows=171 loops=1)
                                       Index Cond: (ex = 269378)
                                 ->  Index Scan using dn_mn on dn dn2  (cost=0.00..11.80 rows=27 width=8) (actual time=2.305..2.348 rows=2 loops=171)
                                       Index Cond: (dn2.mn = dn1.mn)
                                       Filter: (dn2.ex <> 269378)
                           ->  Index Scan using dn_ex on dn dn3  (cost=0.00..3.91 rows=35 width=8) (actual time=10.407..152.955 rows=72 loops=368)
                                 Index Cond: (dn3.ex = dn2.ex)
                     ->  Index Scan using dn_mn on dn dn4  (cost=0.00..3.77 rows=27 width=8) (actual time=2.017..2.065 rows=4 loops=25998)
                           Index Cond: (dn4.mn = dn3.mn)
               ->  Hash  (cost=17515.52..17515.52 rows=490624 width=23) (actual time=1670.903..1670.903 rows=477425 loops=1)
                     ->  Index Scan using ex_lv on ex  (cost=0.00..17515.52 rows=490624 width=23) (actual time=91.486..1512.429 rows=477425 loops=1)
                           Index Cond: (lv = 315)
 Total runtime: 112652.670 ms

Since the planner evaluates plans exhaustively until more than 12 tables are joined, I presume it doesn’t help to force the planner to change its plan. To check this, I rewrote “trtr” to list the “where” clauses in the opposite direction, and a test revealed no difference in execution time.

The fact that execution times for indirect translation are satisfactory in the “hot” condition prompts the question: Is it possible to force all queries to be executed in the “hot” condition, and, if so, how?