Bug 576971 - JGit text/binary detection differs from C git
Summary: JGit text/binary detection differs from C git
Status: NEW
Alias: None
Product: JGit
Classification: Technology
Component: JGit (show other bugs)
Version: 6.0   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-10-30 09:33 EDT by Thomas Wolf CLA
Modified: 2021-11-01 14:56 EDT (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Wolf CLA 2021-10-30 09:33:10 EDT
From bug 575393 comment 24:
(In reply to Jörg Kubitz from comment #21)
> Created attachment 287397 [details]
> NewWizardMessages - after git reset --hard.properties

So this is a simple problem. The file is generally LF terminated, but contains 6 CR-LFs, the first at offset 0x261F.

I was waiting for this. (Really I was. I *knew* it would happen some day.)

JGit, for whatever reasons, checks only the first 8kB of a file for CRs. Thats 0x2000. The first 8kB of this file are all properly LF terminated. JGit never sees these CRs.

git, OTOH, appears to check the whole file. (And also its logic has evolved a little bit, whereas JGit appears to use simpler older logic. For instance, git considers files containing *lone* CRs as binary, in addition to files containing NUL (0x00) bytes. JGit only considers NUL.)

The quick fix here is to increase that size, or also scan the whole blob. The next fix would then be to ensure JGit uses the same logic as C git.
Comment 1 Eclipse Genie CLA 2021-10-30 16:25:35 EDT
New Gerrit change created: https://git.eclipse.org/r/c/jgit/jgit/+/187189
Comment 3 Eclipse Genie CLA 2021-10-31 08:09:14 EDT
New Gerrit change created: https://git.eclipse.org/r/c/jgit/jgit/+/187198
Comment 5 Jörg Kubitz CLA 2021-10-31 10:44:29 EDT
So the next error to happen has a filessize of> 32KiB? Feels like "autocrlf" implementation is unsafe.
Comment 6 Thomas Wolf CLA 2021-10-31 10:57:52 EDT
(In reply to Jörg Kubitz from comment #5)
> So the next error to happen has a filessize of> 32KiB? Feels like "autocrlf"
> implementation is unsafe.

Maybe. This is giving users a pragmatic way to deal with these rare cases. And you're free to increase the level to 128KiB, making the occurrence of this even rarer.

A fuller solution will have to wait. In some cases JGit may have the full blob in memory anyway (depending on another setting), but when it doesn't and streams the blob, we'll still hit this problem. I don't have time to deal with this; it might result in fairly intrusive changes and would need extensive testing; also to make sure it doesn't break Gerrit. A single-user application like Eclipse is one thing; a git server with many users and potentially many threads running diffs or merges is another.