Community
Participate
Working Groups
I have created a test repository with 240k refs and 720k objects, with 240k deltas (every commit is modifying the same file). See below the script used: for ((i=0; i<245760; i++ )) do echo foo$i >> foo jgit add foo jgit commit -m foo$i jgit checkout -b branch-$(expr $i '/' 100)/part-$(expr $i '%' 100) done After the repo creation, it has been fully GCed and bitmaps created, so that there is a single packfile, index and bitmap. All refs are a single packed-refs file. The clone with Git takes 2 mins and 57s: time git clone http://localhost:8080/load-test-1 Cloning into 'load-test-1'... remote: Counting objects: 16563, done remote: Finding sources: 100% (726619/726619) remote: Total 726619 (delta 242147), reused 726615 (delta 242147) Receiving objects: 100% (726619/726619), 94.22 MiB | 31.16 MiB/s, done. Resolving deltas: 100% (242147/242147), done. git clone http://localhost:8080/load-test-1 1134.36s user 98.73s system 696% cpu 2:57.12 total The clone with JGit takes 1h and 3mins: time ./jgit clone http://localhost:8080/load-test-1 Cloning into 'load-test-1'... remote: Counting objects: 1 remote: Finding sources: 100% (726619/726619) Receiving objects: 100% (726619/726619) Resolving deltas: 100% (242147/242147) Checking out files: 100% (1/1) jgit clone http://localhost:8080/load-test-1 4974.86s user 53.23s system 132% cpu 1:03:20.57 total I have already applied all the improvements on top of the latest stable-5.13: - https://git.eclipse.org/r/c/jgit/jgit/+/193464 - https://git.eclipse.org/r/c/jgit/jgit/+/194140 From what I see, the majority of the time is spent in the deltas resolution, which I know is huge in this case because all commits are appending to the same file. However, why JGit is 20x times slower than C-Git?
looks like it spends most time in methods of org.eclipse.jgit.util.sha1.SHA1 which was introduced in 982f5d1bf184a2edab726f3c77f09a1157059e35 and replaced MessageDigest in 0f25f64d4882f7853c9d3dc84ec382d8a78ae646
Created attachment 288612 [details] method profiling screenshot of method profiling done with Java Mission Control/Flight Recorder
On my Mac jgit on Java 11 took 1756 seconds vs git taking 131 seconds, that means it was 13.4 times slower.
(In reply to Matthias Sohn from comment #1) > looks like it spends most time in methods of org.eclipse.jgit.util.sha1.SHA1 > which was introduced in 982f5d1bf184a2edab726f3c77f09a1157059e35 and > replaced MessageDigest in 0f25f64d4882f7853c9d3dc84ec382d8a78ae646 I can't find those commits above, did you mean c90ab1a77430562cb543d0177fd94cf67ef97bcf? @Jonathan, I'm just curious: I don't see any comments *why* jgit switched from MessageDigest in JDK to own "pure Java SHA1" implementation? What was wrong with the one from JDK?
(In reply to Andrey Loskutov from comment #4) > (In reply to Matthias Sohn from comment #1) > > looks like it spends most time in methods of org.eclipse.jgit.util.sha1.SHA1 > > which was introduced in 982f5d1bf184a2edab726f3c77f09a1157059e35 and > > replaced MessageDigest in 0f25f64d4882f7853c9d3dc84ec382d8a78ae646 > > I can't find those commits above, did you mean > c90ab1a77430562cb543d0177fd94cf67ef97bcf? See: [1]. [1] https://git.eclipse.org/r/c/jgit/jgit/+/91837 > @Jonathan, I'm just curious: I don't see any comments *why* jgit switched > from MessageDigest in JDK to own "pure Java SHA1" implementation? What was > wrong with the one from JDK? See commit message in: [1].
Thanks @Matthias and @Davido for the analysis, very interesting finding. I was wondering if the SHA1 implemented by Shawn could fallback to MessageDigest if org.eclipse.jgit.util.sha1.detectCollision is set to false. At the end of the day, if we don't detect the SHA1 collisions, then the pure-Java implementation isn't going to have substantial benefits against the one in MessageDigest of the JDK, isn't it? WDYT?
(In reply to Luca Milanesio from comment #6) > Thanks @Matthias and @Davido for the analysis, very interesting finding. > > I was wondering if the SHA1 implemented by Shawn could fallback to > MessageDigest if org.eclipse.jgit.util.sha1.detectCollision is set to false. > > At the end of the day, if we don't detect the SHA1 collisions, then the > pure-Java implementation isn't going to have substantial benefits against > the one in MessageDigest of the JDK, isn't it? I tend to agree. But could you try to re-run your test with: -Dorg.eclipse.jgit.util.sha1.detectCollision=false With collision detection disabled, the method `UbcCheck#check(int[] w)` shouldn't be called any more, so that you should already see significant performance improvement, right?
(In reply to David Ostrovsky from comment #7) > (In reply to Luca Milanesio from comment #6) > > Thanks @Matthias and @Davido for the analysis, very interesting finding. > > > > I was wondering if the SHA1 implemented by Shawn could fallback to > > MessageDigest if org.eclipse.jgit.util.sha1.detectCollision is set to false. > > > > At the end of the day, if we don't detect the SHA1 collisions, then the > > pure-Java implementation isn't going to have substantial benefits against > > the one in MessageDigest of the JDK, isn't it? > > I tend to agree. But could you try to re-run your test with: > > -Dorg.eclipse.jgit.util.sha1.detectCollision=false > > With collision detection disabled, the method `UbcCheck#check(int[] w)` > shouldn't be called any more, so that you should already see significant > performance improvement, right? I'm running it with detectCollision disabled, however, the improvement is pretty minor.
(In reply to Luca Milanesio from comment #8) > (In reply to David Ostrovsky from comment #7) > > (In reply to Luca Milanesio from comment #6) > > > Thanks @Matthias and @Davido for the analysis, very interesting finding. > > > > > > I was wondering if the SHA1 implemented by Shawn could fallback to > > > MessageDigest if org.eclipse.jgit.util.sha1.detectCollision is set to false. > > > > > > At the end of the day, if we don't detect the SHA1 collisions, then the > > > pure-Java implementation isn't going to have substantial benefits against > > > the one in MessageDigest of the JDK, isn't it? > > > > I tend to agree. But could you try to re-run your test with: > > > > -Dorg.eclipse.jgit.util.sha1.detectCollision=false > > > > With collision detection disabled, the method `UbcCheck#check(int[] w)` > > shouldn't be called any more, so that you should already see significant > > performance improvement, right? > > I'm running it with detectCollision disabled, however, the improvement is > pretty minor. Cloning into 'load-test-1-no-cd'... remote: Counting objects: 1 remote: Finding sources: 100% (726619/726619) remote: Getting sizes: 100% (242270/242270) remote: Compressing objects: 100% (4897/4897) Receiving objects: 100% (726619/726619) Resolving deltas: 100% (242150/242150) Checking out files: 100% (1/1) ./jgit clone /tmp/gerrit_setup/instance-1/git/load-test-1.git 3675.27s user 54.14s system 126% cpu 49:01.23 total 63 - 49 = 14 mins (22% of improvement by disabling collision detection)
> > I'm running it with detectCollision disabled, however, the improvement is > > pretty minor. > > Cloning into 'load-test-1-no-cd'... > remote: Counting objects: 1 > remote: Finding sources: 100% (726619/726619) > remote: Getting sizes: 100% (242270/242270) > remote: Compressing objects: 100% (4897/4897) > Receiving objects: 100% (726619/726619) > Resolving deltas: 100% (242150/242150) > Checking out files: 100% (1/1) > > ./jgit clone /tmp/gerrit_setup/instance-1/git/load-test-1.git 3675.27s > user 54.14s system 126% cpu 49:01.23 total > > 63 - 49 = 14 mins (22% of improvement by disabling collision detection) Can you re-run the tests with this change: [1], and set the property to use JDK native message digest implementation? Also note, that SHA1 benchmark was added here: [2]. [1] https://git.eclipse.org/r/c/jgit/jgit/+/196905 [2] https://git.eclipse.org/r/c/jgit/jgit/+/196906
I'll re-run with the native MessageDigest implemented in [1]. Also, the SHA1 was responsible for a high execution time when checking if a packedrefs file was changed on the filesystem (when trustFolderStat = false, JGit computes the SHA1s and compares them) Luca.
Gerrit change https://git.eclipse.org/r/c/jgit/jgit/+/196905 was merged to [stable-5.13]. Commit: http://git.eclipse.org/c/jgit/jgit.git/commit/?id=59029aec30ff6b3cb2d9c74af77fe96a5f108595
Gerrit change https://git.eclipse.org/r/c/jgit/jgit/+/196906 was merged to [stable-5.13]. Commit: http://git.eclipse.org/c/jgit/jgit.git/commit/?id=c20e9676c4ba79ff5fa1ac0ee1a818a94abcc43c
Coming here back from release notes, https://projects.eclipse.org/projects/technology.jgit/releases/6.4.0 where this bug is mentioned as fixed and that there is a new option available. However the bug is still open (mistake or missing something?) and as user I don't see which option/API is supposed to be used now?
After the merge of [1], according to Matthias, there is a 27% improvement, which is good, but still, very far away from the C-Git performance. I believe there could be something more going on. I'd suggest to keep this bug open for now. [1] https://git.eclipse.org/r/c/jgit/jgit/+/196906
New Gerrit change created: https://git.eclipse.org/r/c/jgit/jgit/+/199711
(In reply to Andrey Loskutov from comment #14) > Coming here back from release notes, > https://projects.eclipse.org/projects/technology.jgit/releases/6.4.0 where > this bug is mentioned as fixed and that there is a new option available. > > However the bug is still open (mistake or missing something?) and as user I > don't see which option/API is supposed to be used now? The new option is core.sha1implementation, it can be set to "java" (Java implementation) or "jdkNative" (native implementation available in JDK). You can also set this via system property "org.eclipse.jgit.util.sha1.implementation". If both are set the system property takes precedence. Sorry for missing to document this, please review https://git.eclipse.org/r/c/jgit/jgit/+/199711
Gerrit change https://git.eclipse.org/r/c/jgit/jgit/+/199711 was merged to [stable-5.13]. Commit: http://git.eclipse.org/c/jgit/jgit.git/commit/?id=e55bad514bc7e66b5a2b3406e4f28545f437b8f0
I also saw hotspot in SHA1Java - and it's kinda frustrating that network is faster then CPU. An improvement could be to utilize more threads. SHA1 computation can probably not parallelized well but distinct SHA1 computations could run in parallel. For example the loop in PackParser.java:534 Stacktrace: void org.eclipse.jgit.util.sha1.SHA1Java.compress() (SHA1Java.java:258) void org.eclipse.jgit.util.sha1.SHA1Java.compress(byte[], int) (SHA1Java.java:163) void org.eclipse.jgit.util.sha1.SHA1Java.update(byte[], int, int)(SHA1Java.java:151) void org.eclipse.jgit.transport.PackParser.whole(long, int, long) (PackParser.java:1064) void org.eclipse.jgit.transport.PackParser.indexOneObject() (PackParser.java:983) PackLock org.eclipse.jgit.transport.PackParser.parse(ProgressMonitor, ProgressMonitor) (PackParser.java:534) PackLock org.eclipse.jgit.internal.storage.file.ObjectDirectoryPackParser.parse(ProgressMonitor, ProgressMonitor) (ObjectDirectoryPackParser.java:170) PackLock org.eclipse.jgit.transport.PackParser.parse(ProgressMonitor) (PackParser.java:495) void org.eclipse.jgit.transport.BasePackFetchConnection.receivePack(ProgressMonitor, OutputStream) (BasePackFetchConnection.java:1115) void org.eclipse.jgit.transport.BasePackFetchConnection.doFetchV2(ProgressMonitor, Collection, OutputStream):545 void org.eclipse.jgit.transport.BasePackFetchConnection.doFetch(ProgressMonitor, Collection, Set, OutputStream):425 void org.eclipse.jgit.transport.TransportHttp$SmartHttpFetchConnection.doFetch(ProgressMonitor, Collection, Set, OutputStream):1562 void org.eclipse.jgit.transport.BasePackFetchConnection.fetch(ProgressMonitor, Collection, Set, OutputStream):351 void org.eclipse.jgit.transport.BasePackFetchConnection.fetch(ProgressMonitor, Collection, Set):342 void org.eclipse.jgit.transport.FetchProcess.fetchObjects(ProgressMonitor):290 void org.eclipse.jgit.transport.FetchProcess.executeImp(ProgressMonitor, FetchResult, String):182 void org.eclipse.jgit.transport.FetchProcess.execute(ProgressMonitor, FetchResult, String):105 FetchResult org.eclipse.jgit.transport.Transport.fetch(ProgressMonitor, Collection, String):1462 FetchResult org.eclipse.jgit.api.FetchCommand.call():238 FetchResult org.eclipse.jgit.api.CloneCommand.fetch(Repository, URIish):325 Git org.eclipse.jgit.api.CloneCommand.call():191