Community
Participate
Working Groups
Created attachment 234452 [details] Profiling snapshot of the DLTK Indexer thread Attached is a screenshot from a profiling snapshot taken during import of huge PHP project in Eclipse. It can be seen that a great amount of CPU time is spent on select statements in the h2 model db. When analyzing the code I can see that there is a check in the db if I file record exists before the file gets indexed. In the case of importing new project this check always results that a file does not exists neither in the db or the db cache. So, its wasted amount of time that is much greater than the time to insert records in the db, and comparable to the time spend for parsing the indexed files. All this is visible in the attached snapshot. I found that the h2 db schema does not create indexes, which can greatly optimize the time for executing select statements. When there is no index, the complexity for executing select statements is linear to the size of the db table. So, as more files are indexed by DLTK the h2 model db grows and the performance of executing select statements degrades. Adding index turns the complexity to logarithmic, which is a great improvement for large tables.
Patch: https://github.com/kaloyan-raev/dltk.core/commit/931fcc8c7d037f5b62a7d945c580f824508dc415 This patch adds index for the (PATH, CONTAINER_ID) columns of the FILE table. This improves significantly the performance of executing select/update/delete statements on the h2 db when it grows with many file records. Other columns used in prepared statements are indexed automatically by the h2 db and it is not necessary to explicitly create indices for them: - CONTAINERS > PATH - because it is UNIQUE - FILES > CONTAINER_ID - because it is a foreign key With this patch I get about 25% improvement in CPU utilization when importing huge PHP projects.
Applied the patch http://git.eclipse.org/c/dltk/org.eclipse.dltk.core.git/commit/?id=f72af29b6021a71342a659ba7cc0b22cf6fb5171 Thanks.