Bug 582039 - JGit GC does not generate bitmap when keep files exist
Summary: JGit GC does not generate bitmap when keep files exist
Status: RESOLVED FIXED
Alias: None
Product: JGit
Classification: Technology
Component: JGit (show other bugs)
Version: 6.5   Edit
Hardware: PC Mac OS X
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-07 06:45 EDT by Antonio Barone CLA
Modified: 2023-08-10 07:54 EDT (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Antonio Barone CLA 2023-06-07 06:45:23 EDT
Version: 5.13 (but possibly even older) up to 6.7 and master

Steps to reproduce the problem:
1. Clone a repository
2. Run jgit gc
3. Observe that the bitmap is generated

```
Pack refs:              100% (6330/6330)
Counting objects:       252734
Finding sources:        100% (252734/252734)
Getting sizes:          100% (80803/80803)
Compressing objects:    100% (1343813/1343813)
Writing objects:        100% (252734/252734)
Selecting commits:      100% (8254/8254)
Building bitmaps:       100% (383/383)
Finding sources:        100% (41991/41991)
Getting sizes:          100% (22649/22649)
Writing objects:        100% (41991/41991)
```

4. In the bare repo on the server side, go to the `objects/pack` directory and list the content.

5. Select one of the listed packfile and touch a keep file with the same same.
For example, if a `pack-66ab9c61b8d9f46172bd5c1e1f5b6fd979123dbe.pack` exists, execute a `touch pack-66ab9c61b8d9f46172bd5c1e1f5b6fd979123dbe.keep` file

6. Run jgit gc
7. Observe that the bitmap is NOT generated

Pack refs:              100% (6330/6330)
Counting objects:       252734
Finding sources:        100% (252734/252734)
Getting sizes:          100% (80522/80522)
Writing objects:        100% (252734/252734)

============

The reason the bitmap is not generated is to do with the existence of the keep file.

Specifically, this is the line of code that causes the canBuildBitmaps boolean flag to be set to false [1].


The exclusion list (represented by the `excludeInPacks` variable) is populated with the keep packfiles, during repack [2].

The rationale behind this approach is that the packfile in the process of being written (i.e. in the pack-123.keep file) cannot be considered of the bitmap, since the reachability of its content is still unknown.

Hence the bitmap is skipped altogether (`canBuildBitmaps` evaluate to false in [3]).


I believe there are quite a few problems with this: 

- This decision is taken by JGit silently. There is no warning or logging or output that informs the client this is happening. This has already been reported here [4]

- Not having a bitmap might be a killer for repositories big repositories (i.e. mono-repos) with loads of read/write access.
If the repo is the subject of many git-receive-packs the chances to encounter a keep file during GC might be high, causing the bitmap not to be generated very often.


The proposal:

Instead of completely skipping the building bitmap phase, it should be possible to *ignore* whatever is contained in the keep file, but *still* build a bitmap for everything else.

In many scnearios this would be preferable that *not* having a bitmap at all. 

I believe this behaviour should at least be configurable in a back-compatible way.


[1] https://git.eclipse.org/r/plugins/gitiles/jgit/jgit/+/refs/changes/40/7940/18/org.eclipse.jgit/src/org/eclipse/jgit/storage/pack/PackWriter.java#1617

[2] https://git.eclipse.org/r/plugins/gitiles/jgit/jgit/+/refs/heads/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/GC.java#845

[3] https://git.eclipse.org/r/plugins/gitiles/jgit/jgit/+/refs/changes/40/7940/18/org.eclipse.jgit/src/org/eclipse/jgit/storage/pack/PackWriter.java#1614

[4] https://bugs.eclipse.org/bugs/show_bug.cgi?id=565733
Comment 1 Luca Milanesio CLA 2023-06-07 07:57:32 EDT
I believe this should be split into two parts:

1. Generate a warning (see [4])
2. Giving the option to skip the ".keep" files processing altogether, rather than excluding them just for the bitmap generation.

@Matthias @DavidO what do you think?
Comment 2 Antonio Barone CLA 2023-06-07 15:11:11 EDT
An initial proposal:
https://git.eclipse.org/r/c/jgit/jgit/+/202394
Comment 3 Luca Milanesio CLA 2023-06-09 20:50:55 EDT
After a deep analysis and discussion with Tony, I have decided to improve the GcKeepFilesTest with a condition that makes sure that .keep files are completely transparent to the whole process, including to the generation of the bitmap.

[1] https://git.eclipse.org/r/c/jgit/jgit/+/202434

The above change is supposed to fail, showing that the presence of .keep
files isn't transparent from the point of view of the .bitmap index generation.

After interacting with Matthias on the review of the initial change [2] done
by Tony, I agree that the packfiles having a .keep file should be completely
ignored from a JGit GC perspective and completely excluded from all the repacking
phase.

The expectation of the test at [1] is that the resulting repacked file must
not include the objects contained in the packfile having a .keep file and
the bitmap should continue to be created as normal.

The [1] should fail the build and, once that is verified, I can draft the fix.

[2] https://git.eclipse.org/r/c/jgit/jgit/+/202394
Comment 4 Luca Milanesio CLA 2023-06-09 21:14:11 EDT
The [1] is effectively failing the build [3] for the following reason:

[3] https://ci.eclipse.org/jgit/job/stable/job/jgit.gerrit-pipeline.java11/3914/console

20:55:41  [ERROR] Failures: 
20:55:41  [ERROR]   GcKeepFilesTest.testKeepFiles:59 expected:<1> but was:<0>

I am now ready to fix the problem by introducing a more explicit intention of
creating the bitmap or not from the repack() method of the JGit GC.
Comment 5 Luca Milanesio CLA 2023-06-09 21:28:42 EDT
See the smallest fix possible to the problem:
https://git.eclipse.org/r/c/jgit/jgit/+/202435/1

This is addressing directly the root cause of the problem, which was
accidentally introduced in:
https://git.eclipse.org/r/c/jgit/jgit/+/7940/18/org.eclipse.jgit/src/org/eclipse/jgit/storage/pack/PackWriter.java#1617

It is a better fix than @Tony for the following reasons:

1) It is way smaller (+28 LOCs vs. +116 LOCs)
2) Does not introduce any changes in JGit config (less is more !)
3) Addresses the problem at its roots
4) Has no performance impact on the preparation of the bitmap

@Tony @Matthias what do you think?
Comment 6 Luca Milanesio CLA 2023-06-09 21:51:17 EDT
The test, rebased on my fix, works :-)

https://ci.eclipse.org/jgit/job/stable/job/jgit.gerrit-pipeline.java11/3919/console