Community
Participate
Working Groups
Version: 5.13 (but possibly even older) up to 6.7 and master Steps to reproduce the problem: 1. Clone a repository 2. Run jgit gc 3. Observe that the bitmap is generated ``` Pack refs: 100% (6330/6330) Counting objects: 252734 Finding sources: 100% (252734/252734) Getting sizes: 100% (80803/80803) Compressing objects: 100% (1343813/1343813) Writing objects: 100% (252734/252734) Selecting commits: 100% (8254/8254) Building bitmaps: 100% (383/383) Finding sources: 100% (41991/41991) Getting sizes: 100% (22649/22649) Writing objects: 100% (41991/41991) ``` 4. In the bare repo on the server side, go to the `objects/pack` directory and list the content. 5. Select one of the listed packfile and touch a keep file with the same same. For example, if a `pack-66ab9c61b8d9f46172bd5c1e1f5b6fd979123dbe.pack` exists, execute a `touch pack-66ab9c61b8d9f46172bd5c1e1f5b6fd979123dbe.keep` file 6. Run jgit gc 7. Observe that the bitmap is NOT generated Pack refs: 100% (6330/6330) Counting objects: 252734 Finding sources: 100% (252734/252734) Getting sizes: 100% (80522/80522) Writing objects: 100% (252734/252734) ============ The reason the bitmap is not generated is to do with the existence of the keep file. Specifically, this is the line of code that causes the canBuildBitmaps boolean flag to be set to false [1]. The exclusion list (represented by the `excludeInPacks` variable) is populated with the keep packfiles, during repack [2]. The rationale behind this approach is that the packfile in the process of being written (i.e. in the pack-123.keep file) cannot be considered of the bitmap, since the reachability of its content is still unknown. Hence the bitmap is skipped altogether (`canBuildBitmaps` evaluate to false in [3]). I believe there are quite a few problems with this: - This decision is taken by JGit silently. There is no warning or logging or output that informs the client this is happening. This has already been reported here [4] - Not having a bitmap might be a killer for repositories big repositories (i.e. mono-repos) with loads of read/write access. If the repo is the subject of many git-receive-packs the chances to encounter a keep file during GC might be high, causing the bitmap not to be generated very often. The proposal: Instead of completely skipping the building bitmap phase, it should be possible to *ignore* whatever is contained in the keep file, but *still* build a bitmap for everything else. In many scnearios this would be preferable that *not* having a bitmap at all. I believe this behaviour should at least be configurable in a back-compatible way. [1] https://git.eclipse.org/r/plugins/gitiles/jgit/jgit/+/refs/changes/40/7940/18/org.eclipse.jgit/src/org/eclipse/jgit/storage/pack/PackWriter.java#1617 [2] https://git.eclipse.org/r/plugins/gitiles/jgit/jgit/+/refs/heads/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/GC.java#845 [3] https://git.eclipse.org/r/plugins/gitiles/jgit/jgit/+/refs/changes/40/7940/18/org.eclipse.jgit/src/org/eclipse/jgit/storage/pack/PackWriter.java#1614 [4] https://bugs.eclipse.org/bugs/show_bug.cgi?id=565733
I believe this should be split into two parts: 1. Generate a warning (see [4]) 2. Giving the option to skip the ".keep" files processing altogether, rather than excluding them just for the bitmap generation. @Matthias @DavidO what do you think?
An initial proposal: https://git.eclipse.org/r/c/jgit/jgit/+/202394
After a deep analysis and discussion with Tony, I have decided to improve the GcKeepFilesTest with a condition that makes sure that .keep files are completely transparent to the whole process, including to the generation of the bitmap. [1] https://git.eclipse.org/r/c/jgit/jgit/+/202434 The above change is supposed to fail, showing that the presence of .keep files isn't transparent from the point of view of the .bitmap index generation. After interacting with Matthias on the review of the initial change [2] done by Tony, I agree that the packfiles having a .keep file should be completely ignored from a JGit GC perspective and completely excluded from all the repacking phase. The expectation of the test at [1] is that the resulting repacked file must not include the objects contained in the packfile having a .keep file and the bitmap should continue to be created as normal. The [1] should fail the build and, once that is verified, I can draft the fix. [2] https://git.eclipse.org/r/c/jgit/jgit/+/202394
The [1] is effectively failing the build [3] for the following reason: [3] https://ci.eclipse.org/jgit/job/stable/job/jgit.gerrit-pipeline.java11/3914/console 20:55:41 [ERROR] Failures: 20:55:41 [ERROR] GcKeepFilesTest.testKeepFiles:59 expected:<1> but was:<0> I am now ready to fix the problem by introducing a more explicit intention of creating the bitmap or not from the repack() method of the JGit GC.
See the smallest fix possible to the problem: https://git.eclipse.org/r/c/jgit/jgit/+/202435/1 This is addressing directly the root cause of the problem, which was accidentally introduced in: https://git.eclipse.org/r/c/jgit/jgit/+/7940/18/org.eclipse.jgit/src/org/eclipse/jgit/storage/pack/PackWriter.java#1617 It is a better fix than @Tony for the following reasons: 1) It is way smaller (+28 LOCs vs. +116 LOCs) 2) Does not introduce any changes in JGit config (less is more !) 3) Addresses the problem at its roots 4) Has no performance impact on the preparation of the bitmap @Tony @Matthias what do you think?
The test, rebased on my fix, works :-) https://ci.eclipse.org/jgit/job/stable/job/jgit.gerrit-pipeline.java11/3919/console