Community
Participate
Working Groups
When running a JGit fetch on a local repository with millions of refs, the BasePackFetchConnection.negotiateBegin() needs to mark all local commits pointed by the refs on a RevWalk. The RevWalk.markStart(Collection<RevCommit>) does not manage this particular bulk addition and instead is calling the individual markStart() for each commit, which will re-calculate the index every 1000 commits. The problem was previously identified by Gustaf Lundh back in 2013 with a repository with 80k refs and that's why he identified batches of 1000 commits which would have reduced the number of reindexes to 80. He admitted that the static bucketing of 1000 was not ideal and he mentioned: "In the future, it should probably be dynamic and based on the number of refs in the queue, but this should serve well as a starting point." I believe the "future" has come and we need to make the bucketing more dynamic based on the total number of refs in the repository.
By timing the time to run buildIndex(), it looks quadratic and it doubles every 1000s refs. Example: 6000 refs => buildIndex takes 124us 7000 refs => buildIndex takes 226us
When testing with a repository with 1.5M refs, the buildIndex() execution times goes up to 42ms per call, but it is called 1500 times, which eventually takes 33s to complete (the earlier buildIndex iterations are faster than 42ms). I am now going to try to change the implementation and use a standard priority queue and see the progress.
The complexity of the current DevRevQueue is O(n^2/2) whilst the Java priority queue will be O(n log n). I'll create an alternative implementation (DevRevPriorityQueue) that relies on a standard java priority queue and share the resulting E2E performance improvements on the Git fetch protocol.
I've done an initial re-implementation of DevRevQueue (named DevRevPriorityQueue) which uses a Java PriorityQueue: the performance results are astonishing !!! DevRevQueue.add(RevCommit) - 850k times: 4515 ms DevRevPriorityQueue.add(RevCommit) - 850k times: 58 ms The priority queue is 77x time faster.
New Gerrit change created: https://git.eclipse.org/r/c/jgit/jgit/+/194140