Bug 415137 - h2 model db should be indexed
Summary: h2 model db should be indexed
Status: RESOLVED FIXED
Alias: None
Product: DLTK
Classification: Technology
Component: Common (show other bugs)
Version: 5.0   Edit
Hardware: PC Linux
: P3 major (vote)
Target Milestone: ---   Edit
Assignee: dltk.common-inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-08-15 06:36 EDT by Kaloyan Raev CLA
Modified: 2014-09-12 06:50 EDT (History)
4 users (show)

See Also:


Attachments
Profiling snapshot of the DLTK Indexer thread (202.35 KB, image/png)
2013-08-15 06:36 EDT, Kaloyan Raev CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kaloyan Raev CLA 2013-08-15 06:36:34 EDT
Created attachment 234452 [details]
Profiling snapshot of the DLTK Indexer thread

Attached is a screenshot from a profiling snapshot taken during import of huge PHP project in Eclipse.

It can be seen that a great amount of CPU time is spent on select statements in the h2 model db. When analyzing the code I can see that there is a check in the db if I file record exists before the file gets indexed. In the case of importing new project this check always results that a file does not exists neither in the db or the db cache. So, its wasted amount of time that is much greater than the time to insert records in the db, and comparable to the time spend for parsing the indexed files. All this is visible in the attached snapshot. 

I found that the h2 db schema does not create indexes, which can greatly optimize the time for executing select statements. When there is no index, the complexity for executing select statements is linear to the size of the db table. So, as more files are indexed by DLTK the h2 model db grows and the performance of executing select statements degrades. Adding index turns the complexity to logarithmic, which is a great improvement for large tables.
Comment 1 Kaloyan Raev CLA 2013-08-15 07:20:21 EDT
Patch: https://github.com/kaloyan-raev/dltk.core/commit/931fcc8c7d037f5b62a7d945c580f824508dc415

This patch adds index for the (PATH, CONTAINER_ID) columns of the FILE table. This improves significantly the performance of executing select/update/delete statements on the h2 db when it grows with many file records.

Other columns used in prepared statements are indexed automatically by the h2 db and it is not necessary to explicitly create indices for them:
  - CONTAINERS > PATH - because it is UNIQUE
  - FILES > CONTAINER_ID - because it is a foreign key

With this patch I get about 25% improvement in CPU utilization when importing huge PHP projects.