Bug 576250 - ArrayIndexOutOfBoundsException when creating more than 10k refs in RefTable
Summary: ArrayIndexOutOfBoundsException when creating more than 10k refs in RefTable
Status: RESOLVED FIXED
Alias: None
Product: JGit
Classification: Technology
Component: JGit (show other bugs)
Version: 5.9   Edit
Hardware: PC Mac OS X
: P3 critical (vote)
Target Milestone: 5.6.2   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-09-24 09:28 EDT by Matthias Sohn CLA
Modified: 2021-10-07 20:19 EDT (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Sohn CLA 2021-09-24 09:28:41 EDT
While running some tests using the new benchmark [1] I reproducibly hit this exception [2] when creating >10k refs in RefTable.

[1] https://git.eclipse.org/r/c/jgit/jgit/+/185683

[2] 
# JMH version: 1.21
# VM version: JDK 17, OpenJDK 64-Bit Server VM, 17+35-LTS-sapmachine
# VM invoker: /Users/d029788/.sdkman/candidates/java/17-sapmchn/bin/java
# VM options: -ea
# Warmup: 2 iterations, 10 ms each
# Measurement: 2 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.eclipse.jgit.benchmarks.GetRefsBenchmark.testGetExactRef
# Parameters: (doGc = true, numBranches = 50000, trustFolderStat = true)

# Run progress: 0,00% complete, ETA 00:00:40
# Fork: 1 of 1
# Warmup Iteration   1: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Preparing test
- repository: 		/var/folders/16/nqvdxf5s45309nxkv4k4r20r0000gq/T/dir601914784298366167/testrepo/.git
- refDatabase: 		reftable
- trustFolderStat: 	true
- doGc: 		true
- branches: 		50000

10% of branches created
20% of branches created<failure>

java.lang.ArrayIndexOutOfBoundsException: Index -16258 out of bounds for length 4096
	at org.eclipse.jgit.util.NB.decodeUInt24(NB.java:97)
	at org.eclipse.jgit.internal.storage.reftable.BlockReader.seekKey(BlockReader.java:406)
	at org.eclipse.jgit.internal.storage.reftable.ReftableReader.seek(ReftableReader.java:216)
	at org.eclipse.jgit.internal.storage.reftable.ReftableReader.seekLog(ReftableReader.java:204)
	at org.eclipse.jgit.internal.storage.reftable.MergedReftable.seekLog(MergedReftable.java:159)
	at org.eclipse.jgit.internal.storage.reftable.Reftable.seekLog(Reftable.java:165)
	at org.eclipse.jgit.internal.storage.reftable.ReftableReflogReader.getReverseEntries(ReftableReflogReader.java:80)
	at org.eclipse.jgit.internal.storage.reftable.ReftableReflogReader.getReverseEntries(ReftableReflogReader.java:52)
	at org.eclipse.jgit.internal.storage.file.GC.listRefLogObjects(GC.java:1035)
	at org.eclipse.jgit.internal.storage.file.GC.repack(GC.java:809)
	at org.eclipse.jgit.internal.storage.file.GC.doGc(GC.java:269)
	at org.eclipse.jgit.internal.storage.file.GC.gc(GC.java:220)
	at org.eclipse.jgit.api.GarbageCollectCommand.call(GarbageCollectCommand.java:179)
	at org.eclipse.jgit.benchmarks.GetRefsBenchmark$BenchmarkState.setupBenchmark(GetRefsBenchmark.java:109)
	at org.eclipse.jgit.benchmarks.generated.GetRefsBenchmark_testGetExactRef_jmhTest._jmh_tryInit_f_benchmarkstate1_G(GetRefsBenchmark_testGetExactRef_jmhTest.java:457)
	at org.eclipse.jgit.benchmarks.generated.GetRefsBenchmark_testGetExactRef_jmhTest.testGetExactRef_AverageTime(GetRefsBenchmark_testGetExactRef_jmhTest.java:166)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:453)
	at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:437)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Comment 1 Han-Wen NIenhuys CLA 2021-09-27 06:32:59 EDT
What is the recommended way to run these benchmarks? I just get complaints about slf4j.

this is with commit 677910252bcd5b59d4b2a71106deaa0ef9ae7ee3 and running bazelisk

bazelisk build  org.eclipse.jgit.benchmarks/...

bazel-bin/org.eclipse.jgit.benchmarks/benchmarks 


(I removed all other benchmarks, but the other benchmarks fail as well.)
# Run progress: 90.00% complete, ETA 00:00:00
# Fork: 5 of 5
# Warmup Iteration   1: <failure>

java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
	at org.eclipse.jgit.util.SystemReader.<clinit>(SystemReader.java:55)
	at org.eclipse.jgit.lib.BaseRepositoryBuilder.readEnvironment(BaseRepositoryBuilder.java:363)
	at org.eclipse.jgit.api.InitCommand.call(InitCommand.java:58)
	at org.eclipse.jgit.benchmarks.GetRefsBenchmark$BenchmarkState.setupBenchmark(GetRefsBenchmark.java:85)
	at org.eclipse.jgit.benchmarks.generated.GetRefsBenchmark_testGetRefsByPrefix_jmhTest._jmh_tryInit_f_benchmarkstate1_G(GetRefsBenchmark_testGetRefsByPrefix_jmhTest.java:457)
	at org.eclipse.jgit.benchmarks.generated.GetRefsBenchmark_testGetRefsByPrefix_jmhTest.testGetRefsByPrefix_AverageTime(GetRefsBenchmark_testGetRefsByPrefix_jmhTest.java:166)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:453)
	at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:437)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
	... 18 more
Comment 2 Matthias Sohn CLA 2021-09-27 09:39:19 EDT
yes, running benchmarks from bazel seems broken:

bazelisk run //org.eclipse.jgit.benchmarks:benchmarks GetRefsBenchmark
INFO: Invocation ID: e7b8ae84-89fe-444a-ab0f-dfbac783f2d2
DEBUG: Rule 'com_googlesource_gerrit_bazlets' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1581093008 -0500"
DEBUG: Repository com_googlesource_gerrit_bazlets instantiated at:
  no stack (--record_rule_instantiation_callstack not enabled)
Repository rule git_repository defined at:
  /Users/d029788/.cache/bazel/b645215973ac043930fa5900e337e822/external/bazel_tools/tools/build_defs/repo/git.bzl:195:33: in <toplevel>
INFO: Analyzed target //org.eclipse.jgit.benchmarks:benchmarks (8 packages loaded, 67 targets configured).
INFO: Found 1 target...
Target //org.eclipse.jgit.benchmarks:benchmarks up-to-date:
  bazel-bin/org.eclipse.jgit.benchmarks/benchmarks.jar
  bazel-bin/org.eclipse.jgit.benchmarks/benchmarks
INFO: Elapsed time: 2.931s, Critical Path: 1.42s
INFO: 5 processes: 4 darwin-sandbox, 1 worker.
INFO: Build completed successfully, 9 total actions
INFO: Build completed successfully, 9 total actions
# JMH version: 1.21
# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08
# VM invoker: /Users/d029788/.sdkman/candidates/java/8.0.302-tem/jre/bin/java
# VM options: <none>
# Warmup: 3 iterations, 100 ms each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.eclipse.jgit.benchmarks.GetRefsBenchmark.testGetExactRef
# Parameters: (doGc = false, numBranches = 10000, trustFolderStat = true, useRefTable = false)

# Run progress: 0.00% complete, ETA 00:08:23
# Fork: 1 of 5
# Warmup Iteration   1: <failure>

java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
	at org.eclipse.jgit.util.SystemReader.<clinit>(SystemReader.java:55)
	at org.eclipse.jgit.lib.BaseRepositoryBuilder.readEnvironment(BaseRepositoryBuilder.java:363)
	at org.eclipse.jgit.api.InitCommand.call(InitCommand.java:58)
	at org.eclipse.jgit.benchmarks.GetRefsBenchmark$BenchmarkState.setupBenchmark(GetRefsBenchmark.java:86)
	at org.eclipse.jgit.benchmarks.generated.GetRefsBenchmark_testGetExactRef_jmhTest._jmh_tryInit_f_benchmarkstate1_G(GetRefsBenchmark_testGetExactRef_jmhTest.java:460)
	at org.eclipse.jgit.benchmarks.generated.GetRefsBenchmark_testGetExactRef_jmhTest.testGetExactRef_AverageTime(GetRefsBenchmark_testGetExactRef_jmhTest.java:166)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:453)
	at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:437)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 18 more

Using Maven it works:
$ mvn -Dmaven.javadoc.skip=true -DskipTests package
$ java -jar org.eclipse.jgit.benchmarks/target/benchmarks.jar GetRefsBenchmark
Comment 3 Luca Milanesio CLA 2021-09-27 09:44:21 EDT
I also did like Matthias mentioned:
$ java -jar org.eclipse.jgit.benchmarks/target/benchmarks.jar GetRefsBenchmark
Comment 4 Matthias Sohn CLA 2021-09-27 09:56:14 EDT
fix for running benchmarks from bazel https://git.eclipse.org/r/c/jgit/jgit/+/185855
Comment 5 Matthias Sohn CLA 2021-09-27 09:57:28 EDT
run only GetRefsBenchmark using bazel:

$ bazelisk run //org.eclipse.jgit.benchmarks:benchmarks GetRefsBenchmark
Comment 6 Han-Wen NIenhuys CLA 2021-09-27 10:36:46 EDT
With this diff:

```
diff --git a/org.eclipse.jgit.benchmarks/BUILD b/org.eclipse.jgit.benchmarks/BUILD
index 66a2482f0..45daefec9 100644
--- a/org.eclipse.jgit.benchmarks/BUILD
+++ b/org.eclipse.jgit.benchmarks/BUILD
@@ -9,6 +9,7 @@ jmh_java_benchmarks(
     srcs = SRCS,
     deps = [
         "//lib:slf4j-api",
+        "//lib:javaewah",
         "//org.eclipse.jgit:jgit",
     ],
 )
diff --git a/org.eclipse.jgit.benchmarks/src/org/eclipse/jgit/benchmarks/GetRefsBenchmark.java b/org.eclipse.jgit.benchmarks/src/org/eclipse/jgit/benchmarks/GetRefsBenchmark.java
index eab01a150..df96ea23c 100644
--- a/org.eclipse.jgit.benchmarks/src/org/eclipse/jgit/benchmarks/GetRefsBenchmark.java
+++ b/org.eclipse.jgit.benchmarks/src/org/eclipse/jgit/benchmarks/GetRefsBenchmark.java
@@ -57,9 +57,9 @@ public class GetRefsBenchmark {
        @State(Scope.Benchmark)
        public static class BenchmarkState {
 
-               boolean useRefTable = false;
+               boolean useRefTable = true;
 
-               @Param({ "100", "2500", "10000", "50000" })
+               @Param({ "50000" })
                int numBranches;
 
                @Param({ "true", "false" })
hanwen@hanwen1:~/vc/jgit$ git rev-parse HEAD
47fc86ba3999e20803a222dab1151248902289bb
```

I get the following:

```
$ bazelisk run //org.eclipse.jgit.benchmarks:benchmarks GetRefsBenchmark

# JMH version: 1.21
# VM version: JDK 11.0.12, OpenJDK 64-Bit Server VM, 11.0.12+7-post-Debian-2
# VM invoker: /usr/lib/jvm/java-11-openjdk-amd64/bin/java
# VM options: <none>
# Warmup: 2 iterations, 100 ms each
# Measurement: 2 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.eclipse.jgit.benchmarks.GetRefsBenchmark.testGetExactRef
# Parameters: (doGc = true, numBranches = 50000, trustFolderStat = true)

# Run progress: 0.00% complete, ETA 00:13:28
# Fork: 1 of 5
# Warmup Iteration   1: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Preparing test
- repository: 		/tmp/dir14905737446309329634/testrepo/.git
- refDatabase: 		reftable
- trustFolderStat: 	true
- doGc: 		true
- branches: 		50000
100% of branches created
Pushing 50000 branches ... DONE
Running GC ... DONE
5.788 us/op
# Warmup Iteration   2: 4.545 us/op
Iteration   1: 2.461 us/op
Iteration   2: 2.264 us/op

# Run progress: 2.50% complete, ETA 02:17:00
# Fork: 2 of 5
# Warmup Iteration   1: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Preparing test
- repository: 		/tmp/dir15815808046509383323/testrepo/.git
- refDatabase: 		reftable
- trustFolderStat: 	true
- doGc: 		true
- branches: 		50000
100% of branches created
Pushing 50000 branches ... DONE
Running GC ... DONE
9.032 us/op
# Warmup Iteration   2: 6.676 us/op
Iteration   1: 2.598 us/op
Iteration   2: 2.671 us/op
```

note that the test setup is e x t r e m e l y   s l o w, compared to the operation being benchmarked.
Comment 7 Han-Wen NIenhuys CLA 2021-09-27 10:37:19 EDT
Is this a problem with JDK 11 vs JDK 17 ?
Comment 8 Han-Wen NIenhuys CLA 2021-09-27 10:47:32 EDT
could you provide me with a .tar.gz of the testrepo just before you start the GC operation?
Comment 9 Matthias Sohn CLA 2021-09-27 12:26:19 EDT
(In reply to Han-Wen NIenhuys from comment #6)
...
> note that the test setup is e x t r e m e l y   s l o w, compared to the
> operation being benchmarked.

I think I hit this using patchset 9 of the benchmark which was 
e v e n  m u c h  s l o w er (didn't sit in front of the computer while it was running)
Comment 10 Han-Wen NIenhuys CLA 2021-09-27 14:10:28 EDT
thanks, I can see the problem with PS 9.
Comment 11 Han-Wen NIenhuys CLA 2021-09-28 04:31:16 EDT
it's trying to find the reflog for "HEAD".  This somehow reads an incorrect offset, which then reads garbage, leading to negative indices.

I cross-checked with my own reftable code in Go, which can do the seek correctly, so it's likely a problem on the read side.
Comment 12 Han-Wen NIenhuys CLA 2021-09-28 11:28:49 EDT
reftable.BlockReader has a `truncated` boolean field, described as 


			// Its OK during sequential scan for an index block to have been
			// partially read and be truncated in-memory. This happens when
			// the index block is larger than the file's blockSize. Caller
			// will break out of its scan loop once it sees the blockType.


this is bogus as: 

- index blocks are never sequentially scanned
- no caller is checking the `truncated` field

reflog blocks can be variably sized. In the error scenario, the block is 8188 bytes long (default blocksize is 4096). 

A few lines below, the code gets the restart counts from the end of the buffer. However, if the buffer is truncated (like here), that location does not contain the restart count. 

The Go/C code I wrote correctly issues a second read if the actual blocksize is more than the initial guess.

This has not surfaced at Google, as we currently don't store the reflogs in reftable.

since the `truncated` bit seems unused, maybe it's best to drop this and issue a second read if necessary.
Comment 13 Eclipse Genie CLA 2021-09-28 14:14:56 EDT
New Gerrit change created: https://git.eclipse.org/r/c/jgit/jgit/+/185902