Bug 574178 - JGIT garbage collection fails to delete pack file becaue jgit library still opens that file.
Summary: JGIT garbage collection fails to delete pack file becaue jgit library still o...
Status: RESOLVED FIXED
Alias: None
Product: JGit
Classification: Technology
Component: JGit (show other bugs)
Version: 5.12   Edit
Hardware: PC Windows 10
: P3 normal (vote)
Target Milestone: 5.13   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-14 00:48 EDT by andy xian CLA
Modified: 2021-06-26 14:24 EDT (History)
2 users (show)

See Also:


Attachments
solution (1.18 KB, application/octet-stream)
2021-06-14 01:22 EDT, andy xian CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description andy xian CLA 2021-06-14 00:48:15 EDT
Git repository has some large pack files, i,e,  pack-ea5214c3fa09a76b9ac1dba0d2d65c8dd903d1ce.pack and Jgit garbage collection does not clean up that pack file while git command line "git gc" can clean up file pack-ea5214c3fa09a76b9ac1dba0d2d65c8dd903d1ce.pack.
Comment 1 andy xian CLA 2021-06-14 00:55:31 EDT
The direct cause of this issue is that JGIT library fails to delete file pack-ea5214c3fa09a76b9ac1dba0d2d65c8dd903d1ce.pack in source code: https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/util/FileUtils.java#L147 with following exception: java.nio.fileSystem.exception
The process cannot access the file because it is being used by another process.


After debugging the source code, I have seen that 
1 JGIT library has correctly mark file  pack-ea5214c3fa09a76b9ac1dba0d2d65c8dd903d1ce.pack to be deleted
2 JGIT library fails to delete this file pack-ea5214c3fa09a76b9ac1dba0d2d65c8dd903d1ce.pack for an exception that says “The process cannot access the file because it is being used by another process.”
3 I have verified only one process, which is the jgit library,  is using file pack-ea5214c3fa09a76b9ac1dba0d2d65c8dd903d1ce.pack
Comment 2 andy xian CLA 2021-06-14 00:57:09 EDT
Corresponding stack trace is : 

delete:221, FileUtils (org.eclipse.jgit.util)
removeOldPack:378, GC (org.eclipse.jgit.internal.storage.file)
prunePack:411, GC (org.eclipse.jgit.internal.storage.file)
deleteOldPacks:351, GC (org.eclipse.jgit.internal.storage.file)
repack:862, GC (org.eclipse.jgit.internal.storage.file)
doGc:270, GC (org.eclipse.jgit.internal.storage.file)
gc:221, GC (org.eclipse.jgit.internal.storage.file)
call:179, GarbageCollectCommand (org.eclipse.jgit.api)
main:45, Test
Comment 3 andy xian CLA 2021-06-14 01:09:28 EDT
Root cause analysis:

Line 
https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/GC.java#L349 has opened file pack-ea5214c3fa09a76b9ac1dba0d2d65c8dd903d1ce.pack and this file is still open when the program runs to line https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/GC.java#L351.

Therefore, jgit library fails to delete pack file pack-ea5214c3fa09a76b9ac1dba0d2d65c8dd903d1ce.pack

Here is the stack trace when jgit library opens file pack-ea5214c3fa09a76b9ac1dba0d2d65c8dd903d1ce.pack before cleaning it:

doOpen:649, Pack (org.eclipse.jgit.internal.storage.file)
beginWindowCache:628, Pack (org.eclipse.jgit.internal.storage.file)
load:510, WindowCache (org.eclipse.jgit.internal.storage.file)
getOrLoad:602, WindowCache (org.eclipse.jgit.internal.storage.file)
get:385, WindowCache (org.eclipse.jgit.internal.storage.file)
pin:327, WindowCursor (org.eclipse.jgit.internal.storage.file)
copy:226, WindowCursor (org.eclipse.jgit.internal.storage.file)
readFully:604, Pack (org.eclipse.jgit.internal.storage.file)
load:787, Pack (org.eclipse.jgit.internal.storage.file)
get:274, Pack (org.eclipse.jgit.internal.storage.file)
open:211, PackDirectory (org.eclipse.jgit.internal.storage.file)
openPackedObject:390, ObjectDirectory (org.eclipse.jgit.internal.storage.file)
openPackedFromSelfOrAlternate:354, ObjectDirectory (org.eclipse.jgit.internal.storage.file)
openObjectWithoutRestoring:345, ObjectDirectory (org.eclipse.jgit.internal.storage.file)
openObject:330, ObjectDirectory (org.eclipse.jgit.internal.storage.file)
open:132, WindowCursor (org.eclipse.jgit.internal.storage.file)
open:212, ObjectReader (org.eclipse.jgit.lib)
loosen:294, GC (org.eclipse.jgit.internal.storage.file)
deleteOldPacks:349, GC (org.eclipse.jgit.internal.storage.file)
repack:862, GC (org.eclipse.jgit.internal.storage.file)
doGc:270, GC (org.eclipse.jgit.internal.storage.file)
gc:221, GC (org.eclipse.jgit.internal.storage.file)
call:179, GarbageCollectCommand (org.eclipse.jgit.api)
main:44, Test
Comment 4 andy xian CLA 2021-06-14 01:21:31 EDT
Solution proposed:

close file pack-ea5214c3fa09a76b9ac1dba0d2d65c8dd903d1ce.pack before prunePack method is called.

I have tested that this approach works. There is a close method in pack object https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/GC.java#L347 but it seems to me that this close method is used for "clearing the cache".

I have added a new forceClose method in Pack.java file and the patch file is in the attachment.

I also believe this patch would boost garbage collection performance as we do not need to sleep and retry in delete method of FileUtils.
Comment 5 andy xian CLA 2021-06-14 01:22:30 EDT
Created attachment 286588 [details]
solution
Comment 6 andy xian CLA 2021-06-14 20:08:47 EDT
Pull request is https://github.com/eclipse/jgit/pull/116/
Comment 7 Eclipse Genie CLA 2021-06-22 05:52:18 EDT
New Gerrit change created: https://git.eclipse.org/r/c/jgit/jgit/+/182339
Comment 9 Eclipse Genie CLA 2021-06-24 18:41:01 EDT
New Gerrit change created: https://git.eclipse.org/r/c/jgit/jgit/+/182426