Bug 571585 - Can't apply patch file with LF against a file with CR-LF line endings
Summary: Can't apply patch file with LF against a file with CR-LF line endings
Status: RESOLVED FIXED
Alias: None
Product: JGit
Classification: Technology
Component: JGit (show other bugs)
Version: 5.10   Edit
Hardware: All All
: P3 major (vote)
Target Milestone: 5.12   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-03-01 05:44 EST by Ryan Milner CLA
Modified: 2021-05-26 08:21 EDT (History)
2 users (show)

See Also:


Attachments
ZIP containing the mentioned patch file and screenshots (65.08 KB, application/x-zip-compressed)
2021-03-01 05:44 EST, Ryan Milner CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ryan Milner CLA 2021-03-01 05:44:41 EST
Created attachment 285691 [details]
ZIP containing the mentioned patch file and screenshots

The attached patch file doesn't apply to the file it should. In command-line git on my windows pc, it has no issues but using this lib on android it fails with the below exception.

org.eclipse.jgit.api.errors.PatchApplyException: Cannot apply: HunkHeader[6310,11->6310,11]
	at org.eclipse.jgit.api.ApplyCommand.applyTextPatch(ApplyCommand.java:320)
	at org.eclipse.jgit.api.ApplyCommand.apply(ApplyCommand.java:370)
	at org.eclipse.jgit.api.ApplyCommand.call(ApplyCommand.java:105)
	at dev.projectearth.patcher.steps.PatchApp.run(PatchApp.java:47)
	at dev.projectearth.patcher.utils.LogStep$LoggedAsyncRunnable.doInBackground(LogStep.java:100)
	at dev.projectearth.patcher.utils.LogStep$LoggedAsyncRunnable.doInBackground(LogStep.java:86)
	at android.os.AsyncTask$3.call(Unknown Source:20)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at android.os.AsyncTask$SerialExecutor$1.run(Unknown Source:2)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
	at java.lang.Thread.run(Thread.java:919)


I'm unable to provide the file this is patched onto publicly, but via email I'm happy to supply it.
I have verified the file matches the hunk and have attached a screenshot also with this.


Also side note, we are using a slightly modified fork that supports binary patches fully. See https://github.com/eclipse/jgit/pull/113 for more info. But this shouldn't affect how text-based patches are applied. (this is compiled using jitpack as there is no other way of me using it currently)
Comment 2 Thomas Wolf CLA 2021-03-01 06:23:11 EST
What are the line endings in the files being patched? The patch has LF. I think if the file has CR-LF, there might be problems.

Compare https://github.com/eclipse/jgit/pull/96 .
Comment 3 Ryan Milner CLA 2021-03-01 06:30:05 EST
(In reply to Thomas Wolf from comment #2)
> What are the line endings in the files being patched? The patch has LF. I
> think if the file has CR-LF, there might be problems.
> 
> Compare https://github.com/eclipse/jgit/pull/96 .

After checking yes that's the issue, thanks for the pull request link. I will likely merge this into my fork to fix this issue.
Comment 4 Ryan Milner CLA 2021-03-01 06:34:12 EST
Looking at what you linked in the comment, this has been added? Is there something I'm missing then as I'm running on top of master and using this to apply the patches to the repo?

git.apply().setPatch(new FileInputStream(file)).call();
Comment 5 Thomas Wolf CLA 2021-03-01 07:04:28 EST
(In reply to Ryan Milner from comment #3)
> (In reply to Thomas Wolf from comment #2)
> > What are the line endings in the files being patched? The patch has LF. I
> > think if the file has CR-LF, there might be problems.
> > 
> > Compare https://github.com/eclipse/jgit/pull/96 .
> 
> After checking yes that's the issue, thanks for the pull request link. I
> will likely merge this into my fork to fix this issue.

I don't think PR 96 is the right approach. Compare these C git commits [1] and [2].

There is another difference: JGit applies the patch directly against the file, which I believe is wrong. It should apply the patch against the internal git representation, i.e., after having run the file through all clean and CRLF filters. After having applied the patch, it should pretend again to check out the file (applying smudge and CRLF filters).

That would be the correct fix for your case. Of course applying clean and smudge filters may give, let's say "interesting", effects if the file is managed by git-lfs. If we have several GB of data in the file, the current JGit implementation will likely run into an OOM.

The case handled in the two C git commits is where the patch itself contains CR-LF in the old or context lines, in which case normal CRLF filtering must be overridden to ensure that one gets an internal git representation that still has the CRLFs, no matter what the current CRLF settings say.

[1] https://github.com/git/git/commit/c24f3abaceabb590125751a67ec0e32946780ac7
[2] https://github.com/git/git/commit/2fea9de61857986431982ae89c01c89a2fc10038
Comment 6 Thomas Wolf CLA 2021-03-01 07:05:37 EST
(In reply to Ryan Milner from comment #4)
> Looking at what you linked in the comment, this has been added?

No, only the EolStreamType stuff has been fixed. That's an unrelated commit that was pushed onto PR 96.
Comment 7 Ryan Milner CLA 2021-03-01 07:28:11 EST
(In reply to Thomas Wolf from comment #5)
> (In reply to Ryan Milner from comment #3)
> > (In reply to Thomas Wolf from comment #2)
> > > What are the line endings in the files being patched? The patch has LF. I
> > > think if the file has CR-LF, there might be problems.
> > > 
> > > Compare https://github.com/eclipse/jgit/pull/96 .
> > 
> > After checking yes that's the issue, thanks for the pull request link. I
> > will likely merge this into my fork to fix this issue.
> 
> I don't think PR 96 is the right approach. Compare these C git commits [1]
> and [2].
> 
> There is another difference: JGit applies the patch directly against the
> file, which I believe is wrong. It should apply the patch against the
> internal git representation, i.e., after having run the file through all
> clean and CRLF filters. After having applied the patch, it should pretend
> again to check out the file (applying smudge and CRLF filters).
> 
> That would be the correct fix for your case. Of course applying clean and
> smudge filters may give, let's say "interesting", effects if the file is
> managed by git-lfs. If we have several GB of data in the file, the current
> JGit implementation will likely run into an OOM.
> 
> The case handled in the two C git commits is where the patch itself contains
> CR-LF in the old or context lines, in which case normal CRLF filtering must
> be overridden to ensure that one gets an internal git representation that
> still has the CRLFs, no matter what the current CRLF settings say.
> 
> [1]
> https://github.com/git/git/commit/c24f3abaceabb590125751a67ec0e32946780ac7
> [2]
> https://github.com/git/git/commit/2fea9de61857986431982ae89c01c89a2fc10038

So I don't checkout this folder from a remote, its created from a directory on the file system. How could I apply these filters? I see no way in their code to apply it to the repo.
https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit.lfs/src/org/eclipse/jgit/lfs/CleanFilter.java
https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit.lfs/src/org/eclipse/jgit/lfs/SmudgeFilter.java

I may be missing something but I'm not too experienced with having to deal with line endings.
Comment 8 Thomas Wolf CLA 2021-03-01 07:46:25 EST
I'd ignore the LFS relation for now. There can be other clean/smudge filters, also user-defined ones. The patch/diff is computed on the internal git representation, so logically it must also be applied on that. Basically one has to feed the file through the correct filter chain on reading and writing the file. It's done in several places in JGit already (check-in/check-out) and in EGit (loading a git blob for the compare editor), but not here.
Comment 9 Ryan Milner CLA 2021-03-01 08:30:28 EST
I managed to fix this by getting all the files that would be altered by the patch then replacing the line endings with just LF.
Comment 10 Thomas Wolf CLA 2021-03-02 17:41:44 EST
(In reply to Ryan Milner from comment #9)
> I managed to fix this by getting all the files that would be altered by the
> patch then replacing the line endings with just LF.

Yes, manually changing files to have LF line endings works, too.

A fix along the lines mentioned in comment 5 is nearly ready.
Comment 11 Thomas Wolf CLA 2021-03-02 18:00:40 EST
(In reply to Thomas Wolf from comment #10)
> A fix along the lines mentioned in comment 5 is nearly ready.

@Matthias: my change has files with CR-LF in test data for ApplyCommandTest. It looks like the JGit repo on Gerrit is configured to reject pushes when a file contains a CR-LF. Can we lift that restriction for test resources?
Comment 12 Matthias Sohn CLA 2021-03-03 04:15:11 EST
(In reply to Thomas Wolf from comment #11)
> (In reply to Thomas Wolf from comment #10)
> > A fix along the lines mentioned in comment 5 is nearly ready.
> 
> @Matthias: my change has files with CR-LF in test data for ApplyCommandTest.
> It looks like the JGit repo on Gerrit is configured to reject pushes when a
> file contains a CR-LF. Can we lift that restriction for test resources?

the uploadvalidator plugin does not support configuring this based on the path [1].
The documentation says that this check is skipped for binary files so we could try to use a filename we configure to be binary here [2] or configure a group whose members can skip restrictions or we temporarily switch the configuration for this change and then revert it back.

[1] https://git.eclipse.org/r/plugins/uploadvalidator/Documentation/config.md
[2] https://git.eclipse.org/r/admin/repos/jgit/jgit
Comment 13 Thomas Wolf CLA 2021-03-03 04:35:07 EST
(In reply to Matthias Sohn from comment #12)
> The documentation says that this check is skipped for binary files so we
> could try to use a filename we configure to be binary here [2] or configure
> 
> [2] https://git.eclipse.org/r/admin/repos/jgit/jgit

Hm. That plug-in doesn't use .gitattributes (as they are in the pushed commit) to figure out what is binary?
Comment 14 Thomas Wolf CLA 2021-03-11 05:24:15 EST
@Matthias: I'd really like to push a series of changes for ApplyCommand now. I cannot do so because of the CRLF check. And while I can change the setting at [1], I cannot save that settings change. (Which in turn might be related to the problem with Gerrit always rejecting my attempts to push Jenkinsfiles to tools/jenkins in EGit; compare bug 548214.)

[1] https://git.eclipse.org/r/admin/repos/jgit/jgit
Comment 15 Matthias Sohn CLA 2021-03-11 09:48:35 EST
(In reply to Thomas Wolf from comment #14)
> @Matthias: I'd really like to push a series of changes for ApplyCommand now.
> I cannot do so because of the CRLF check. And while I can change the setting
> at [1], I cannot save that settings change. (Which in turn might be related
> to the problem with Gerrit always rejecting my attempts to push Jenkinsfiles
> to tools/jenkins in EGit; compare bug 548214.)
> 
> [1] https://git.eclipse.org/r/admin/repos/jgit/jgit

Looks like you don't have sufficient permissions. I switched off the CRLF check.
Probably it worked for me since I am in the administrators group of the server.
Comment 16 Eclipse Genie CLA 2021-03-11 10:09:50 EST
New Gerrit change created: https://git.eclipse.org/r/c/jgit/jgit/+/177591
Comment 17 Thomas Wolf CLA 2021-03-11 10:17:08 EST
(In reply to Matthias Sohn from comment #15)
> I switched off the CRLF check.

Thanks. I've pushed my changes. Probably may need to keep it disabled until those are submitted (in case updates in response to build failures or reviews are needed). Or disable now again and re-enable whenever updates would be needed.
Comment 18 Thomas Wolf CLA 2021-03-11 15:02:02 EST
(In reply to Thomas Wolf from comment #17)
> Or disable now again and re-enable whenever updates would be needed.

Of course I meant the inverse: enable now again and re-disable whenever updates would be needed.