Bug 581532 - jgit GC: loosen phase does not account for pruned packfiles
Summary: jgit GC: loosen phase does not account for pruned packfiles
Status: NEW
Alias: None
Product: JGit
Classification: Technology
Component: JGit (show other bugs)
Version: 5.13   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL: https://groups.google.com/g/repo-disc...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-02-15 05:10 EST by Antonio Barone CLA
Modified: 2023-06-10 19:03 EDT (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Antonio Barone CLA 2023-02-15 05:10:27 EST
We noticed an issue whereby unreadable pack errors are raised during `GC`.
The issue appear during the execution of the `loosen`ing phase [1], which loops through the entire packfile list.

However, such packfile list might be stale, if the `prune` phase removed (or preserved) some packfiles, since their file descriptor will no longer be available.

Here below the full stack trace:

2023-02-12 23:52:06 [main] ERROR org.eclipse.jgit.internal.storage.file.PackDirectory - Exception caught while accessing pack file repo.git/objects/pack/pack-654b809b43c8b7eadd85c15c4c15eb7e728027a7.pack, the pack file might be corrupt. Caught 1 consecutive errors while trying to read this pack.
java.io.IOException: Unreadable pack index: repo.git/objects/pack/pack-654b809b43c8b7eadd85c15c4c15eb7e728027a7.idx
        at org.eclipse.jgit.internal.storage.file.PackIndex.open(PackIndex.java:69)
        at org.eclipse.jgit.internal.storage.file.Pack.idx(Pack.java:161)
        at org.eclipse.jgit.internal.storage.file.Pack.get(Pack.java:273)
        at org.eclipse.jgit.internal.storage.file.PackDirectory.open(PackDirectory.java:212)
        at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject(ObjectDirectory.java:390)
        at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate(ObjectDirectory.java:354)
        at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObjectWithoutRestoring(ObjectDirectory.java:345)
        at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:330)
        at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:132)
        at org.eclipse.jgit.lib.ObjectReader.open(ObjectReader.java:212)
        at org.eclipse.jgit.internal.storage.file.GC.loosen(GC.java:294)
        at org.eclipse.jgit.internal.storage.file.GC.deleteOldPacks(GC.java:348)
        at org.eclipse.jgit.internal.storage.file.GC.repack(GC.java:866)
        at org.eclipse.jgit.internal.storage.file.GC.doGc(GC.java:270)
        at org.eclipse.jgit.internal.storage.file.GC.gc(GC.java:221)
        at org.eclipse.jgit.api.GarbageCollectCommand.call(GarbageCollectCommand.java:179)
        at org.eclipse.jgit.pgm.Gc.run(Gc.java:37)
        at org.eclipse.jgit.pgm.TextBuiltin.execute(TextBuiltin.java:239)
        at org.eclipse.jgit.pgm.Main.execute(Main.java:245)
        at org.eclipse.jgit.pgm.Main.run(Main.java:133)
        at org.eclipse.jgit.pgm.Main.main(Main.java:105)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49)
        at org.springframework.boot.loader.Launcher.launch(Launcher.java:107)
        at org.springframework.boot.loader.Launcher.launch(Launcher.java:58)
        at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:88)
Caused by: java.io.FileNotFoundException: repo.git/objects/pack/pack-654b809b43c8b7eadd85c15c4c15eb7e728027a7.idx (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at org.eclipse.jgit.util.io.SilentFileInputStream.<init>(SilentFileInputStream.java:31)
        at org.eclipse.jgit.internal.storage.file.PackIndex.open(PackIndex.java:65)
        ... 28 more

[1] https://gerrit.googlesource.com/jgit/+/refs/heads/stable-5.13/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/GC.java#286
Comment 1 Matthias Sohn CLA 2023-02-24 09:35:01 EST
(In reply to Antonio Barone from comment #0)
> We noticed an issue whereby unreadable pack errors are raised during `GC`.
> The issue appear during the execution of the `loosen`ing phase [1], which
> loops through the entire packfile list.

Loosening happens during repack(), this doesn't match line number of [1] 
but is called 1 line earlier in doGc().
 
> However, such packfile list might be stale, if the `prune` phase removed (or
> preserved) some packfiles, since their file descriptor will no longer be
> available.

Pruning happens at [1] after loosening which is called from repack().
 
> Here below the full stack trace:

Which JGit version did you use ? Line numbers in the stacktrace don't match the current stable-5.13.

> [1]
> https://gerrit.googlesource.com/jgit/+/refs/heads/stable-5.13/org.eclipse.
> jgit/src/org/eclipse/jgit/internal/storage/file/GC.java#286
Comment 2 Thomas Wolf CLA 2023-05-28 04:05:04 EDT
I'm seeing exactly this with scheduled GC in Gerrit 3.7.3. The log is full of these; GC is scheduled to run once a week on Sun 02:00, and now I have an 81MB error_log containing nothing but these exceptions.

Had me really worried for a moment. git fsck reports no errors, though, just a few dangling commits, and at least fetching still works, so I guess the repos have not been corrupted.

Not sure it's relevant: the git repos are on an AWS EFS, core.trustFolderStat = false.

Three side remarks: (maybe I should report these at Gerrit)

* I have _only_ these exceptions in the log. It would be nice if Gerrit at
  least produced INFO logs when it starts GC on a repo, and when it's done.
* This Gerrit is a small single-team installation; it has 12 not overly large
  git repos (plus AllProjects and AllUsers). The last exception in the log for
  the last repository has a time stamp of 2023-05-28T03:45:43.188Z. Is it
  normal that GC for 12 repos takes 1:45 hours?
* Is running command line git gc (and possibly prune or repack) via a cron job
  an alternative, or will that confuse Gerrit?

[2023-05-28T02:00:39.033Z] [WorkQueue-2[java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@5bdde903[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@106fc0bf[Wrap
ped task = com.google.gerrit.server.logging.LoggingContextAwareRunnable@15a935ad]]]] ERROR org.eclipse.jgit.internal.storage.file.PackDirectory : Exception caught while accessing pack file /mnt/efs/ge
rrit/git/repo.git/objects/pack/pack-d7bd836c8bcab96fe1c11ee0713c0c8ac10b4e22.pack, the pack file might be corrupt. Caught 1 consecutive errors while trying to read this pack.
java.io.IOException: Unreadable pack index: /mnt/efs/gerrit/git/repo.git/objects/pack/pack-d7bd836c8bcab96fe1c11ee0713c0c8ac10b4e22.idx
	at org.eclipse.jgit.internal.storage.file.PackIndex.open(PackIndex.java:70)
	at org.eclipse.jgit.internal.storage.file.Pack.idx(Pack.java:160)
	at org.eclipse.jgit.internal.storage.file.Pack.get(Pack.java:272)
	at org.eclipse.jgit.internal.storage.file.PackDirectory.open(PackDirectory.java:219)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject(ObjectDirectory.java:420)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate(ObjectDirectory.java:384)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObjectWithoutRestoring(ObjectDirectory.java:375)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:360)
	at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:140)
	at org.eclipse.jgit.lib.ObjectReader.open(ObjectReader.java:216)
	at org.eclipse.jgit.internal.storage.file.GC.loosen(GC.java:313)
	at org.eclipse.jgit.internal.storage.file.GC.deleteOldPacks(GC.java:367)
	at org.eclipse.jgit.internal.storage.file.GC.repack(GC.java:885)
	at org.eclipse.jgit.internal.storage.file.GC.doGc(GC.java:285)
	at org.eclipse.jgit.internal.storage.file.GC.gc(GC.java:234)
	at org.eclipse.jgit.api.GarbageCollectCommand.call(GarbageCollectCommand.java:180)
	at com.google.gerrit.server.git.GarbageCollection.run(GarbageCollection.java:98)
	at com.google.gerrit.server.git.GarbageCollection.run(GarbageCollection.java:72)
	at com.google.gerrit.server.git.GarbageCollection.run(GarbageCollection.java:68)
	at com.google.gerrit.server.git.GarbageCollectionRunner.run(GarbageCollectionRunner.java:64)
	at com.google.gerrit.server.logging.LoggingContextAwareRunnable.run(LoggingContextAwareRunnable.java:113)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
	at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:612)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.FileNotFoundException: /mnt/efs/gerrit/git/repo.git/objects/pack/pack-d7bd836c8bcab96fe1c11ee0713c0c8ac10b4e22.idx (No such file or directory)
	at java.base/java.io.FileInputStream.open0(Native Method)
	at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
	at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
	at org.eclipse.jgit.util.io.SilentFileInputStream.<init>(SilentFileInputStream.java:31)
	at org.eclipse.jgit.internal.storage.file.PackIndex.open(PackIndex.java:65)
	... 27 more
Comment 3 Thomas Wolf CLA 2023-05-28 04:15:03 EDT
Maybe relevant: That GC run on 2022-05-28 02:00 was the first scheduled GC run. The previous maintainer had never set up GC. Half of the repositories are old and inactive, and I don't think any GC was ever run on them. The other half is active, and I had GC'ed them (with command-line git) before moving them onto that EFS disk. The log also has these exceptions for the active repositories.
Comment 4 Thomas Wolf CLA 2023-05-28 04:20:40 EDT
Related: https://groups.google.com/g/repo-discuss/c/rfKEZnH1HoI
Comment 5 Thomas Wolf CLA 2023-05-28 05:29:29 EDT
(In reply to Thomas Wolf from comment #2)
> ... it has 12 not overly large git repos...

Total size of /mnt/efs/gerrit before GC was 3.5GB, after GC 2.7GB. db, cache, index, etc, and git subdirectories.
Comment 6 Antonio Barone CLA 2023-05-31 04:38:55 EDT
Thanks for your findings @Thomas.
Are you already looking into this? Otherwise, this is something we can take and then add you as a reviewer?
Comment 7 Thomas Wolf CLA 2023-05-31 07:28:55 EDT
(In reply to Antonio Barone from comment #6)
> Thanks for your findings @Thomas.
> Are you already looking into this? Otherwise, this is something we can take
> and then add you as a reviewer?

No, I'm currently not looking into this. Busy with other things. I also didn't quite understand your comment 0, since the prune phase comes later as far as I saw. Loosening is part of repacking, pruning comes after repacking.

Feel free to take this.
Comment 8 Luca Milanesio CLA 2023-06-10 10:49:26 EDT
I believe the problem is in the logic of looping through packs implemented in deleteOldPacks.

See below (master branch version - c7960910f0baa0fe7e5b7e24e8681afeffea22a0)

oldPackLoop: for (Pack oldPack : oldPacks) {
	checkCancelled();
	String oldName = oldPack.getPackName();
	// check whether an old pack file is also among the list of new
	// pack files. Then we must not delete it.
	for (Pack newPack : newPacks)
		if (oldName.equals(newPack.getPackName()))
			continue oldPackLoop;

	if (!oldPack.shouldBeKept()
			&& repo.getFS()
					.lastModifiedInstant(oldPack.getPackFile())
					.toEpochMilli() < packExpireDate) {
		if (shouldLoosen) {
			loosen(inserter, reader, oldPack, ids);
		}
		oldPack.close();
		prunePack(oldPack.getPackFile());
	}
}

The call to loosen() internally uses all the pack list at the time of the start of the for-loop, however, some of the packfiles could be removed by the prunePack(oldPack.getPackFile()). When that happens, you may see that the loosen() call fails because it does not expect that some of the packfiles have disappeared.

The fix is quite simple: duplicate the loop in two parts:
1st loop) loop through all the packfiles for loosening the refs, but do not actually prune any of them.
2nd loop) loop through all the packfiles for pruning some of them.

I tested the above split for several months in production and we did not have even a single instance anymore of the exceptions reported in this ticket.

Going to propose a fix on master today or tomorrow at the latest.
Comment 9 Luca Milanesio CLA 2023-06-10 19:03:04 EDT
I managed to reproduce easily the problem by modifying the test GcBasicPackingTest.testDonePruneTooYoungPacks().

Basically, the test didn't show the problem before because the loosening of the objects was done for a single packfile, hence the fact that it was done in a single loop didn't make any difference, as it is a look of 1 element.

If I create 10 packfiles that will have 10 objects to loosen, then after the first packfile is processed, the Git GC will start showing tons of errors.

See the test (still in WIP) at:
https://git.eclipse.org/r/c/jgit/jgit/+/202436

The total list of packfiles is 11, 10 of them need to be removed by the GC and its objects loosened. After the 1st packfile is processed, there are tons of errors on the log about packfiles being corrupted or not found.

That is because the in-memory list still contains 11 packfiles, but the loop is removing 10 of them one by one, so the can of packfiles from the list is going to hit inexistent files.