415137 – h2 model db should be indexed

Bug 415137 - h2 model db should be indexed

Summary: h2 model db should be indexed

Status:	RESOLVED FIXED

Alias:	None

Product:	DLTK
Classification:	Technology
Component:	Common (show other bugs)
Version:	5.0
Hardware:	PC Linux

Importance:	P3 major (vote)
Target Milestone:	---
Assignee:	dltk.common-inbox
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-08-15 06:36 EDT by Kaloyan Raev
Modified:	2014-09-12 06:50 EDT (History)
CC List:	4 users (show)

See Also:

Attachments
Profiling snapshot of the DLTK Indexer thread (202.35 KB, image/png) 2013-08-15 06:36 EDT, Kaloyan Raev	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Kaloyan Raev

2013-08-15 06:36:34 EDT

Created attachment 234452 [details]
Profiling snapshot of the DLTK Indexer thread

Attached is a screenshot from a profiling snapshot taken during import of huge PHP project in Eclipse.

It can be seen that a great amount of CPU time is spent on select statements in the h2 model db. When analyzing the code I can see that there is a check in the db if I file record exists before the file gets indexed. In the case of importing new project this check always results that a file does not exists neither in the db or the db cache. So, its wasted amount of time that is much greater than the time to insert records in the db, and comparable to the time spend for parsing the indexed files. All this is visible in the attached snapshot. 

I found that the h2 db schema does not create indexes, which can greatly optimize the time for executing select statements. When there is no index, the complexity for executing select statements is linear to the size of the db table. So, as more files are indexed by DLTK the h2 model db grows and the performance of executing select statements degrades. Adding index turns the complexity to logarithmic, which is a great improvement for large tables.

Comment 1 Kaloyan Raev

2013-08-15 07:20:21 EDT

Patch: https://github.com/kaloyan-raev/dltk.core/commit/931fcc8c7d037f5b62a7d945c580f824508dc415

This patch adds index for the (PATH, CONTAINER_ID) columns of the FILE table. This improves significantly the performance of executing select/update/delete statements on the h2 db when it grows with many file records.

Other columns used in prepared statements are indexed automatically by the h2 db and it is not necessary to explicitly create indices for them:
  - CONTAINERS > PATH - because it is UNIQUE
  - FILES > CONTAINER_ID - because it is a foreign key

With this patch I get about 25% improvement in CPU utilization when importing huge PHP projects.

Comment 2 Alex Panchenko

2013-10-19 02:50:06 EDT

Applied the patch http://git.eclipse.org/c/dltk/org.eclipse.dltk.core.git/commit/?id=f72af29b6021a71342a659ba7cc0b22cf6fb5171

Thanks.