PanLex Performance Tuning Tasks in Progress Compiled by Jonathan Pool Last revised 2009/11/02 Project description: Analyze and improve performance =========================================================================================== Task 1: Upgrade to PostgreSQL 8.4 Mostly done. Installed non-RPM version 8.4.0 in /opt/pgsql in parallel with Red Hat 8.1.11 in standard locations. Version 8.4.0 is listening on port 54321, while 8.1 is listening on standard port 5432. Archived 8.1.11 databases with 8.4 pg_dumpall and restored it in the 8.4 cluster. Modified all application parameters and /etc/profile variables to prioritize 8.4.0 over 8.1.11. Initial few time checks suggest slow tasks are about tripled in speed and no incompatibilities found, so 8.1.11 can likely be safely uninstalled. =========================================================================================== Task 2: Increase server RAM Done for now. Determined from Supermicro X7DWE mainboard documentation that existing 4GB DIMM kit provided 2.84GB of memory to the OS and applications, after memory reserved for system devices. Decided to increase RAM from 4GB to 16GB with 2 8GB kits and surplus the existing 4GB kit, because 8GB kits cost about $250 but 16GB kits cost about $1300. Ordered 2 Crucial CT2KIT51272AQ667 8GB DIMM kits (4GBx2): 240-pin, DDR2 PC2-5300, CL=5, Fully Buffered, ECC, DDR2-667, 1.5V, 512Meg x 72, Energy Efficient. Installed. Could increase RAM to 32GB in future. =========================================================================================== Task 3: Configure PostgreSQL to match RAM Done for now. Nondefault parameters: autovacuum = on autovacuum_analyze_scale_factor = 0.05 autovacuum_naptime = 10800 autovacuum_vacuum_scale_factor = 0.2 datestyle = 'iso, ymd' effective_cache_size = 2750MB listen_addresses = 'localhost,192.168.1.16' log_destination = 'syslog' log_error_verbosity = verbose log_min_duration_statement = 10000 log_min_error_statement = warning maintenance_work_mem = 256MB max_prepared_transactions = 5 max_stack_depth = 8192 shared_buffers = 1GB syslog_facility = 'LOCAL0' syslog_ident = 'pg' vacuum_cost_delay = 0 (setting VCD to 1000 seemed to slow vacuuming enormously) wal_buffers = 1MB work_mem = 512MB If RAM increased to 32GB in future, reconfiguration will be necessary. =========================================================================================== Task 4: Use mod_perl and Apache::Registry Not done. Attempts produced apparently complex failures. =========================================================================================== Task 5: Under mod_perl, use Apache::DBI for connection persistence Not done. Depends on Task 4. =========================================================================================== Task 6: Restore dropped database constraints Done. =========================================================================================== Task 7: Protect database passwords Done. Password found in exrm.pl 2009-08-10. Password removed from exrm.pl and code change committed to SVN 2009-08-12. Apache password changed 2009-08-15. Apache password change has no effect on code, because Apache connects via local socket with ident-based authentication, which requires no password. We could change ident-based to service-connection-file-based authentication, as recommended in the report. This change would apparently confer no benefit now, but may improve security if and when we move the Web server to a different host (see “Ident Authentication over TCP/IP” in PostgreSQL documentation). =========================================================================================== Task 8: Tabulate characters differently Done. =========================================================================================== Task 9: In Perl, use $dbh->prepare/execute/fetch and placeholders Not done. =========================================================================================== Task 10: Partition large tables Not done. =========================================================================================== Task 11: Consider denormalizing dn and mn Not done. =========================================================================================== Task 12: Optimize indexes Not done. =========================================================================================== Task 13: Stop decreasing the scheduling priority of long-running PostgreSQL processes Done. The crontab for user root was amended on 2009/10/14 to omit its execution every minute of /var/www/local/panlex/daemon/nice.pl. This script had executed “renice 15” on each normal-priority PostgreSQL process with more than 30 seconds of CPU time. PostgreSQL Experts had advised that such a treatment would likely introduce inefficiencies. =========================================================================================== Task 14: Rebenchmark after initial changes Done for now. UI Tasks 1 documents the execution times before the “Panlex Performance Recommendations” report. UI Tasks 1a, 1b, 1c, and 2 document execution times after some changes recommended in the report were made. =========================================================================================== Task 15: Add raw column data output to API Not done. =========================================================================================== Task 16: Automate chacheing of tables Done. The crontab for user pool was amended on 2009/11/03 to include this line: PATH=/opt/pgsql/bin:/usr/local/bin:/bin:/usr/bin:/var/local/utils PGDATABASE=/opt/pgsql/data PGPORT=54321 # Create a PanLex archive every 24 hours for redundancy and table cacheing, # replacing the previous archive. 38 2 * * * pg_dump plx -a -f /tmp/plxdump =========================================================================================== Task 17: Automate chacheing of indices Done. The crontab for user pool was amended on 2009/11/03 to include these lines: PATH=/opt/pgsql/bin:/usr/local/bin:/bin:/usr/bin:/var/local/utils PGDATABASE=/opt/pgsql/data PGPORT=54321 # Make PostgreSQL cache large PanLex indices every 24 hours. 37 2 * * * psql -c 'select * from ixck ()' plx This procedure calls the function ixck (), defined as: create or replace function ixck () returns void language plpgsql as declare begin perform * from ex order by ex limit 1000; perform * from ex order by lv, tt limit 1000; perform * from ex order by lv limit 1000; perform * from dn order by dn limit 1000; perform * from dn order by mn, ex limit 1000; perform * from dn order by ex limit 1000; perform * from dn order by mn limit 1000; perform * from mn order by mn limit 1000; perform * from mn order by ap limit 1000; end; This procedure was timed. In the cold condition, it executed in 7 seconds. In the warm condition, it executed in 1 second. Each statement was also subjected to EXPLAIN ANALYZE. Each execution used an index scan on the corresponding index.