Bug 481796 - [newindex] Proposal for a faster JDT index
Summary: [newindex] Proposal for a faster JDT index
Status: RESOLVED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 4.5   Edit
Hardware: All All
: P1 enhancement with 9 votes (vote)
Target Milestone: 4.7 M3   Edit
Assignee: Stefan Xenos CLA
QA Contact:
URL:
Whiteboard:
Keywords: noteworthy
: 442924 (view as bug list)
Depends on: 490010 491461 491558 492488 492504 492506 492941 495557 495817 496044 496129 496142 497044 497119 497168 497355 497518 497996 498658 498730 499122 499256 499391 499472 499635 499708 500006 500095 500234 500362 500365 500462 500467 500545 500714 500785 501034 502192 502259 502884
Blocks: 84916 351410 496136 513334 495062 497513 503619
  Show dependency tree
 
Reported: 2015-11-09 22:16 EST by Stefan Xenos CLA
Modified: 2018-02-06 09:19 EST (History)
29 users (show)

See Also:


Attachments
Minutes of discussion (4.52 KB, text/plain)
2015-12-11 09:30 EST, Manoj N Palat CLA
no flags Details
Test Code to reproduce the type hierarchy issue (264 bytes, text/plain)
2016-01-13 01:21 EST, Manoj N Palat CLA
no flags Details
Stack dump from the frozen test case (12.30 KB, text/plain)
2016-08-26 21:51 EDT, Stefan Xenos CLA
no flags Details
Failing jdt.ui tests (40.49 KB, text/xml)
2016-09-26 05:44 EDT, Jay Arthanareeswaran CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Xenos CLA 2015-11-09 22:16:32 EST
JDT is currently slow for many operations and its indexes often get corrupted. We want to make it fast and reliable.

Our plan is to introduce a global index that holds semantic information rather than filename mappings, supports reliable error detection, and is extensible enough to replace all current indexes and caches. It will support most common JDT operations without the need to ever open a .jar file or perform an exhaustive search.

We have prepared a design document discussing the proposal in more detail and a functioning demo for consideration and inspection.
Comment 1 Stefan Xenos CLA 2015-11-09 22:18:44 EST
The design doc can be found here:

https://drive.google.com/open?id=1w3-ufZyISbqH8jxYv689Exjm0haAGufdcSvEAgl2HQ4

It should be publicly visible and permits public comments, so please provide feedback there if you have it.
Comment 2 Eclipse Genie CLA 2015-11-09 22:19:56 EST
New Gerrit change created: https://git.eclipse.org/r/60004

WARNING: this patchset contains 15586 new lines of code and requires a Contribution Questionnaire (CQ), as author sxenos@gmail.com is not a committer on jdt/eclipse.jdt.core. Please see:https://wiki.eclipse.org/Project_Management_Infrastructure/Creating_A_Contribution_Questionnaire
Comment 3 Stefan Xenos CLA 2015-11-09 23:42:19 EST
Some info on the currently-attached patch:

The database itself is nearly complete. The coding patterns you see in the various PDOM* objects are representative of how we'd expect the final version to look and the schema contains enough to fill in most of the fields in an IBinding.

The indexer is preliminary. Currently it only indexes the minimum information necessary to feed the first pass of the type hierarchy view when it's in subtypes mode. It works incrementally, but it is not yet being triggered at all the appropriate times. You can force it to run by saving a file.

We've only adopted it in one place - in the first pass of the type hierarchy view, in the code which searches for all possible subclasses. This demonstrates how the index is can be used from the reader's point of view and is sufficient for running some performance benchmarks.

In our tests we created a tiny demo project containing one .java file, then hit F4 on java.lang.Object. In our tests, the time spent doing index searches reduced from around 10 seconds to almost nothing (a smaller amount of time than we could measure using our rough sampling).

The total runtime reduced from 30 seconds to 19 seconds, so with this one partial adoption we sped up the type hierarchy view by 30%. Bear in mind, most of the rest of the data used by the hierarchy view could also be provided by the index so we expect a savings of over 90% once the adoption is complete.
Comment 4 Sebastian Zarnekow CLA 2015-11-10 03:05:15 EST
I'm convinced this goes into the right direction. Thanks for the effort, Stefan and Sergey.

Maybe it's stating the obvious but I'm not seeing something about concurrent reads, index building in parallel or the constraints that are applicable for readers while the index is still build. What are your plans wrt to concurrent access?
Comment 5 Sebastian Zarnekow CLA 2015-11-10 03:06:31 EST
Another thing: Currently JDT allows to ship pre-computed indexes as part of a jar. Will that still be possible, e.g. would there be a way to merge the pre-computed stuff together into the new, more scalable index-tree without the requirement to scan the jar-contents from scratch?
Comment 6 Stefan Xenos CLA 2015-11-10 10:57:07 EST
Re: Comment 4

You're absolutely right. This is important enough to deserve its own section in the design document. I'll update the doc. 

FYI, we've put a lot of thought into concurrency and the implementation already supports the intended concurrency model. Its omission from the design doc was accidental.


Re: Comment 5

I have no plans to support this unless the JDT community feels it's important. The old precomputed indices didn't contain enough information to feed the new index. The time taken to scan the jar to pull in the extra data would take just as long as rebuilding it from scratch, so they would be useless.

As for making a new precomputed index format that could be used for the same purpose, this is difficult because of the amount of cross-referencing in the index - but I think it's not impossible. We could merge the PDOM*ID types across indexes and put an abstraction on top of them for readers so that whenever you iterate over the backpointers for one index, you'd actually be iterating over the backpointers for all of them. You'd also multiply your search time by the number of indices, since you couldn't put everything in a giant b-tree like we do now. CDT does something similar now and we intentionally removed it to speed up searches and simplify the code.

Another problem is how the index deals with versioning and errors. Whenever the index version changes (and this is expected to happen frequently), the normal upgrade process is to delete the entire index and rebuild it using the indexer. This is also what happens when database corruption is detected. If we supported pre-built indexes we'd have to support backwards-compatibility and be unable to upgrade and fix them , or we'd end up rebuilding them from the .jars anyway after the first database version change.

I think this would be a huge amount of work and it would add a good chunk of complexity. It would slow down searches in order to speed up the indexer. I'm not ruling it out, but the costs are pretty hefty if we go this route.

Could you help me understand the use-case better? Would this actually be desirable?
Comment 7 Dani Megert CLA 2015-11-10 11:04:19 EST
(In reply to Stefan Xenos from comment #6)
> Re: Comment 5
> 
> I have no plans to support this unless the JDT community feels it's
> important. 

This was an important request from clients. It's fine if they can till do this the old way but if that's invalidated by your approach then this is definitely a no go.
Comment 8 Sebastian Zarnekow CLA 2015-11-10 11:13:35 EST
(In reply to Stefan Xenos from comment #6)
> Could you help me understand the use-case better? Would this actually be
> desirable?

In fact I don't have a use case for that but just remembered this to be one of the notable new features in JDT one or two years ago.
Indexing from scratch makes much more sense to me, too. Being it versioning / evolution of the index structure or simply bugs in one older version, having to stay backwards compatible for the sake of a proprietary jar format does not seem to be useful.

A few more questions that pop up:

Are there plans to support user defined fields in the index, e.g. for information specific to other JVM languages?

What will be the key that is used in the index, e.g. what kind of query will be the most efficient?

Will it support querying for fields, methods, annotations, references to any of these?
Comment 9 Stefan Xenos CLA 2015-11-10 12:10:56 EST
Re: Comment 7

> This was an important request from clients. It's fine if they can till
> do this the old way but if that's invalidated by your approach then this
> is definitely a no go.

Could you help me understand what they hoped to achieve with those precomputed indexes? What is the use-case? A distributed build system that builds these indexes as part of its output? People packaging these indexes as part of their library distribution?

It seems like all they do is speed up the initial indexing pass. I can understand why this was beneficial for the old index (it sped up indexing and had essentially no downside), but with the new index we'd need to make some serious tradeoffs to continue supporting it.

Also: Please clarify what, specifically, you asking for. Do you want continued support for the existing precomputed index format? Or a new precomputed index format that provides the benefits of the new index without the cost of initial indexing?
Comment 10 Stefan Xenos CLA 2015-11-10 12:28:07 EST
> Are there plans to support user defined fields in the index, e.g.
> for information specific to other JVM languages?

Yes and no. It's very easy to add new fields, new types, and new search trees to the index and I expect JDT core to do this often. However, I don't have plans to add an extension point to allow third-party extensions. Such newly-added fields would require JDT core code changes. At the moment, the index itself is just used by JDT internals and is not (yet) exposed as API.

If that's not sufficient for your use-case, help me understand the use-case better and I'll try to cover it.


> What will be the key that is used in the index, e.g. what kind of query will
> be the most efficient?

Each B-tree in the index provides a different kind of query. It is very easy to add new B-trees to the index to support new kinds of searches. The current implementation has three B-trees.

- Search by exact binary name
  - Mainly used by the indexer
  - Supports anything with a JVM binary name (types, methods, variables, etc).
- Search by simple name
  - Supports prefix searches, exact searches
  - Case-sensitive or insensitive searches
  - No wildcard or regex support yet
  - Supports anything with a binding (types, methods, variables, annotations).
- Search by fully-qualified path name (.jar or .class).
  - Mainly used by the indexer
  - Filenames only

Although these are the only text searches available, there are a lot of cross-references too. For example, you don't need to search to find the list of subclasses for a class -- those can be found in SUBCLASSES list in TypeId. It is very easy to add new search trees to support additional query types.


> Will it support querying for fields, methods, annotations, references to
> any of these?

All of the above, but the "references to" use-case is handled by cross-references, not searches. That is, there is already a cached list of references to every method and field stored directly in the index. No need to search for them. :-)
Comment 11 Stefan Xenos CLA 2015-11-10 12:38:46 EST
Re: comment 7

> This was an important request from clients. It's fine if they can till 
> do this the old way but if that's invalidated by your approach then
> this is definitely a no go.

Actually, I think I understand what you mean. I have not broken or disabled the old index search APIs. Third-parties can still use them if they wish, and they will still read the old cached indexes. Is this what you're asking?


What I've changed at the moment is that JDT views (like the hierarchy view) obtain their content from the new index rather than the old one so would ignore old-style precomputed indexes if present. We could include a preference to disable the new index and use the old one. That may be useful for testing, but I'm not sure why an end-user would select a preference to use the old index when the new one is well over 10x faster.
Comment 12 Stephan Herrmann CLA 2015-11-10 12:49:16 EST
I understand the old index design to be driven by the desire to make *indexing* as fast as possible. The resulting design uses only a parser, no further compilation phases are triggered (exception since Java 8: when lambdas exist which cannot meaningfully be indexed without resolving). The downside we know well: the index has no semantic information, because it was felt we don't have the time to compute that information.

Do you have any numbers for comparing the old indexing with the new strategy?

I understand that *using* the index will be more efficient, but how much do we have to pay during index computation?

Anyway it was the creation of indexes that motivated to implement pre-computing them. If creation gets more expensive, pre-computing will become even more relevant, no?

What exactly is the difficulty in precomputing indexes in the new format? Are you saying that cross-references cannot be established in a "modular" way, the index cannot be composed from several partial indexes?


BTW: does "PDOM" have a long form, what does it stand for?
Comment 13 Stefan Xenos CLA 2015-11-10 15:01:02 EST
> BTW: does "PDOM" have a long form, what does it stand for?

It came from the CDT index. I believe Doug Schafer came up with it. It stands for Persistent Document Object Model. When I started, I planned to retain as much terminology as I could from CDT, for consistency. At this point I really dislike having an all-caps word in my class names and would like to change it. Any suggestions?

> Do you have any numbers for comparing the old indexing with the new strategy?

Not yet. That is part of the reason for attaching my implementation here - so we can start benchmarking it. The indexer isn't done yet, but there's enough in place to test how long it takes to index large projects. Unfortunately, I just got everything working two days ago and haven't had time to take these sort of measurements (very soon).

> I understand that *using* the index will be more efficient, but how much do
> we have to pay during index computation?

Good question. Measuring this is pretty high on my to-do list. In the meantime, feel free to take some measurements of your own and let me know. I'll post my findings here as soon as I know.


> What exactly is the difficulty in precomputing indexes in the new format?
> Are you saying that cross-references cannot be established in a
> "modular" way, the index cannot be composed from several partial indexes?

That's the easy part of the problem. I can think of a couple ways to make that work (I described one such way in comment 6). It would take a fair amount of effort, but it's doable.

The bigger problems are:

- Slow searches. Since searches need to run separately on each module, searches run in O(m * lg(n)), where m is the number of modules and n is the size of the largest module. Having one big module is really fast but lots of small modules are slow. We've identified this as a major source of slowness in the existing JDT index when operating on large numbers of .JARs. Supporting a modular index would mean the new index would inherit this performance problem from the old one.

- Upgrades. Now that the index contains (a lot of) semantic information, we're going to need to update its format to handle each new language feature or new type of query needed by the UI, so the format will change frequently. There is no backwards-compatibility support -- upgrades are currently dealt with by deleting and re-indexing. If we needed to retain support for every old database version, it would be a lot of hard-to-test code and the migration and upgrade code could easily be as slow as the indexer (since it would be doing roughly the same thing from a different data source).

- Errors. The new index supports error detection and recovery. If the index fails an integrity check, it is deleted and rebuilt using the indexer. If we were required to keep around some prebuilt index, there would be no recourse when it gets corrupted.

- The extra effort involved. Making it modular will likely add months to this development effort.


> I understand the old index design to be driven by the desire to
> make *indexing* as fast as possible. 

I just realized I forgot to explain one of the coolest aspects of this new design: with the new index, the indexer doesn't block readers!

While the indexer runs, you can perform queries on the index and will get back the previously-indexed values for that .jar. Any value associated with the new state will be hidden from readers until indexing is done, at which point the index fires a change event that can be used to update the UI with the new state (the change events aren't done yet, but are planned).

Rather than holding a write lock the whole time indexing occurs, the indexer grabs and releases the write lock in between writing each class... so it never holds a write lock for more than a few milliseconds.

That means that the consequences of indexing aren't severe - indexing is just a background task that the user is barely aware of. If they hit F4 on a class during indexing, the view opens immediately. When indexing completes, the view would then update if the indexer encountered something new.


I know with the old index, the indexer could hold locks that could block the UI and I'm guessing (please correct me if I'm wrong) that this is the reason indexing time was considered important. Since that is not the case with the new index, would it be reasonable to say that indexing time would be less important now than it was previously?
Comment 14 Stefan Xenos CLA 2015-11-10 15:16:03 EST
Re: comment 12, comment 4

I've updated the design document with more information about concurrency and how the indexer manages to run in parallel with read operations.
Comment 15 Stefan Xenos CLA 2015-11-10 21:33:31 EST
> the indexer grabs and releases the write lock in between writing each class... 

Let me elaborate on that. The actual algorithm for the indexer looks like this:

indexJar(jarFile) {
  timestamp = jarFile.timestamp()
  whileWriteLocked {
    resFile = createResFile(jarFile);
    resFile.timestamp = 0;
  }

  for each class in resFile {
    classContents = parseClass(class);
    whileWriteLocked {
      writeClassToIndex(resFile, classContents);
    }
  }

  whileWriteLocked {
    resFile.timestamp = timestamp;
    delete all other copies of jarFile from the index
  }
}

Operations like writeClassToIndex are writing data that is already in-memory to the index. The index uses a write-back cache, so the operation is almost always  a memory-to-memory copy. The slow bits of indexing happen in operations like parseClass, but these are done without obtaining the write lock or workspace lock.

This is why, even if the indexer takes a very long time to run, it can never block a reader for more than a few milliseconds. 

If we wanted, we could even use something similar to UI freeze monitoring to ensure that a write lock is never retained for more than - say - 50ms at a time. It should really never even get close to that.

I hope this makes sense. Given the above, is there still a reason why we should be concerned about the runtime of the indexer?
Comment 16 Stefan Xenos CLA 2015-11-11 13:28:52 EST
If anyone would like to discuss this proposal in realtime, I'll be lurking on IRC all day today and tomorrow, in the channel #eclipse-dev on irc.freenode.net. Use my username (sxenos) in your message so it flashes on my screen.
Comment 17 Jay Arthanareeswaran CLA 2015-11-11 23:16:44 EST
(In reply to Stefan Xenos from comment #16)
> If anyone would like to discuss this proposal in realtime, 

Manoj is the owner of JDT search and might have a thing or two to say. Unfortunately though, he is on vacation till next week.
Comment 18 Stefan Xenos CLA 2015-11-11 23:20:36 EST
> Manoj is the owner of JDT search and might have a thing or two to say.
> Unfortunately though, he is on vacation till next week.

In that case, his feedback is essential. I'll keep lurking until then.

Also, I wrote an email on the JDT mailing list about trying to organize a google hangout where we can discuss the new index. I'll try to arrange a time when Manoj can attend. If you'd like to attend too, please let me know your availability and hangouts/gmail username.
Comment 19 Stefan Xenos CLA 2015-11-13 00:32:14 EST
In tonight's code, I've added self-timing to the index. I used a workspace containing 93289 classes.

First I built a new index from scratch, which looked like this:

  Tested 10177 fingerprints in 2174ms, average time = 0.21361894467917855ms
  Indexed 93289 classes in 76710ms, average time = 0.8222834417777015ms

Then I retriggered indexing without changing anything, to test how long it took the incremental indexer to determine that nothing had changed:

  Tested 10177 fingerprints in 330ms, average time = 0.0324260587599489ms
  Indexed 0 classes in 0ms, average time = 0.0ms

So, it currently takes 0.82ms per class to add it to the index and 0.03ms to test if each .jar or .class has changed from its last-indexed state. I haven't compared it with the old indexer yet, but these times seem fairly reasonable to me.

I'll upload a new snapshot of the branch shortly -- it contains a few important bugfixes.
Comment 20 Sebastian Zarnekow CLA 2015-11-13 04:29:44 EST
At a first glance and without knowing the perf of the old indexer, this sounds promising. Did you index the classes from jar files, directories with class files or directories with source files?
Comment 21 Stefan Xenos CLA 2015-11-13 10:16:17 EST
> Did you index the classes from jar files, directories with class
> files or directories with source files?

Last night's build only indexed jar files. IMO this is the most important use-case because it's where the old index performed the worst. The "directories with class files" use case was broken due to a bug (which I'm fixing now).

I disabled the "directories with source files" case intentionally because I haven't decided on the best approach for it.

The options (for source files) as I see them:

1. Index the source directly files using ASTParser.
2. Index only the .class files produced from the source files.
3. Use 2 if an up-to-date class file exists, fall back to 1 otherwise.
4. Don't index source files at all in the new index, but track them separately using something closer to the old index and merge the results at read-time.

Right now, I'm leaning toward #2. The code will be simpler and it will be very fast since the indexer won't have to duplicate any work that would be done anyway by the java compiler. The downside, of course, is that files won't show up in the index until an autobuild runs. We'll have to decide if that's an acceptable compromise for the performance gain.
Comment 22 Stefan Xenos CLA 2015-11-13 12:51:27 EST
Re: comment 20

I've fixed the bug that was interfering with .class indexing, and they're about 20% slower -- they take about a millisecond per class to index. Still not bad, though.
Comment 23 Stefan Xenos CLA 2015-11-13 13:00:21 EST
I also have data about index database size. For my test workspace containing 40905 classes, the database is 73.9MB. This will probably grow by several times once all the metadata is in there, but probably not by more than an order of magnitude.

It currently uses 1.85Kb per class, and I'd expect the final number to be no more than 10x this value. There's a lot of room for space optimizations if we feel this is too large.
Comment 24 Igor Fedorenko CLA 2015-11-13 15:51:00 EST
How do you expect the new indexer performance to scale from 500K+ class files and up (this is the actual size of one of my workspaces)?
Comment 25 Stefan Xenos CLA 2015-11-13 16:25:27 EST
> How do you expect the new indexer performance to scale from 500K+ class files
> and up (this is the actual size of one of my workspaces)?

It should work extremely well. Supporting very large workspaces are the whole reason we're doing this. Our intended use-cases are for workspaces that depend on millions of classes. How I would expect it to work is like this:

- The current indexer will take about 8 minutes (probably longer after we add more metadata) to process a half million classes the first time you run it on that workspace... but since it's nonblocking, Eclipse will be fully functional during this time.

- Subsequent updates should be fairly fast. The current implementation can scan a half million files for changes in about 15 seconds... and if that is too slow, I can cut it down much further using change deltas.

- After indexing is done, everything in Eclipse will work faster than you've ever seen it before. The call hierarchy view and type hierarchy views will open in well under a second, search-for-references (ctrl-shift-g) will open its results almost immediately, and you won't get random freezes when Eclipse opens hovers with quick fix proposals. I'm sure if you have a workspace with a half-million classes, you've seen all of these things go slowly.

An 8 minute wait for the indexer may seem like a lot, but on workspaces the size you're talking about we've often seen UI freezes much longer than this due to JDT searches. This index should replace the frequent UI freezes with a nonblocking background operation that only needs to happen once.
Comment 26 Jay Arthanareeswaran CLA 2015-11-16 03:57:34 EST
(In reply to Jay Arthanareeswaran from comment #17)
> Manoj is the owner of JDT search and might have a thing or two to say.
> Unfortunately though, he is on vacation till next week.

I meant to say, Manoj is on vacation till the end of next week. He is back only on 27th of November. Sorry for the confusion.

Stefan, can you reschedule the call accordingly? Thanks!
Comment 27 Dani Megert CLA 2015-11-16 04:50:53 EST
> Re: comment 7
> 
> > This was an important request from clients. It's fine if they can till 
> > do this the old way but if that's invalidated by your approach then
> > this is definitely a no go.
> 
> Actually, I think I understand what you mean. I have not broken or disabled
> the old index search APIs. Third-parties can still use them if they wish,
> and they will still read the old cached indexes. Is this what you're asking?
> 
> 
> What I've changed at the moment is that JDT views (like the hierarchy view)
> obtain their content from the new index rather than the old one so would
> ignore old-style precomputed indexes if present. We could include a
> preference to disable the new index and use the old one. That may be useful
> for testing, but I'm not sure why an end-user would select a preference to
> use the old index when the new one is well over 10x faster.

Well, if JDT UI is not using the old index then it removes the feature. The main reason for having it was indeed to allow clients to ship their big libraries with the pre-built index. You can find all the discussion in bug 356620.


So, does it mean only some features would use the new index and we still have to build both? That would be quite bad.

When you say "JDT views", I assume this includes the Open Type dialog.


> would it be reasonable to say that indexing time would be less important now 
> than it was previously?

It depends on the speed. I rather wait a bit and get all the results rather than seeing the results dropping in slowly.


> The current indexer will take about 8 minutes (probably longer after we add 
> more metadata) to process a half million classes the first time you run it on 
> that workspace... but since it's nonblocking, Eclipse will be fully functional 
> during this time.

You mean it won't block, but I still have to wait the 8 minutes until I'm sure I see all results, right? Like e.g. Search or Open Type already do now (they don't block the UI).


> Anyway it was the creation of indexes that motivated to implement 
> pre-computing them. If creation gets more expensive, pre-computing will become 
> even more relevant, no?

I'd say so too. This would give us the best of both worlds.


Note that we already have some performance tests in org.eclipse.jdt.core.tests.performance. Did you run and compare them with your new approach?
Comment 28 Dani Megert CLA 2015-11-16 06:39:51 EST
The current index can be deleted from disk by the user, e.g. in case it becomes corrupt. I assume the same will be possible with the new solution.
Comment 29 Stefan Xenos CLA 2015-11-16 10:20:59 EST
> So, does it mean only some features would use the new index and we still 
> have to build both? That would be quite bad.

I'd prefer not. I was offering this in case you wanted to keep the prebuilt indices in some form. I agree that keeping the old index around is undesirable and I'm glad to hear you feel this way.

> When you say "JDT views", I assume this includes the Open Type dialog.

Yes. Once this work is done, the Open Type dialog will also run against the new index, so there will be no need to wait for the long delay the first time you open it after restarting Eclipse.

Re: precomputed indices
> I'd say so too. This would give us the best of both worlds.

It would be possible to permit a modular index and I'm not ruling it out. Why don't we finish building and adopting the new index first and then see if we actually feel this is a problem that needs solving?

Just to be clear: A modular index would let you prebuild and share indices like you can now. What it won't get you is backwards-compatibility. Whenever the database format changes, any existing prebuilt indices would need to be discarded and rebuilt. There would be no point in migrating them since the migration step would be as expensive as re-indexing... so any prebuilt indices would only work for a short time.

If this is deemed an acceptable limitation, it's an option -- but it would be a lot of work and even this limited support for prebuilt indices would slow down searches so I'm not convinced it would be desirable even if the development cost was free.

Why would we want to sacrifice search time (which the user waits for and happens over and over) in order to speed up initial indexing time (which only happens once)?

Before rejecting the new index on the basis that "the indexer is too slow", we'd need to establish both that 1. the new indexer is, in fact, slower (we don't know this) and 2. that indexing time is more important than all the other benefits of the new index.

> Note that we already have some performance tests in 
> org.eclipse.jdt.core.tests.performance. Did you run and compare them 
> with your new approach?

I have not ported the performance tests yet. This is planned but I need to do more adoption first.

> You mean it won't block, but I still have to wait the 8 minutes until I'm
> sure I see all results, right? Like e.g. Search or Open Type already do
> now (they don't block the UI).

If you had a workspace containing a half-million classes, yes... but only once. After that everything would run quickly, even after a restart.

With the old index, ALL searches take a long time - not just the one you may try to run during first-time indexing. So you may have one slow search the first time you use it and all subsequent searches will run incredibly fast.

> The current index can be deleted from disk by the user, e.g. in case it
> becomes corrupt. I assume the same will be possible with the new solution.

Yes. In fact, the new index can normally detect corruption and will delete itself automatically without user intervention.
Comment 30 Dani Megert CLA 2015-11-17 03:54:11 EST
(In reply to Stefan Xenos from comment #29)
> > Note that we already have some performance tests in 
> > org.eclipse.jdt.core.tests.performance. Did you run and compare them 
> > with your new approach?
> 
> I have not ported the performance tests yet. This is planned but I need to
> do more adoption first.

Wouldn't it be better to first validate the performance via those tests than migrating the views?


As for the pre-built index, I agree we should first see how the new solution "feels". For that, a project with several large JARs and pre-built index needs to be measured with pre-built index and then against the new index solution.
Comment 31 Stefan Xenos CLA 2015-11-17 09:11:26 EST
> As for the pre-built index, I agree we should first see how the new
> solution "feels". For that, a project with several large JARs and
> pre-built index needs to be measured with pre-built index and then 
> against the new index solution.

Agreed, but I think that "several" may be too small. I think a workspace depending on 10000 jars and a million classes would be a good test case. We could construct such a test case with code generation.

Let's build this to work well in the worst-possible scenario so we're certain it will work well in all real-world scenarios. If we choose a test case that's too easy, our users will discover bottlenecks before we do (and I don't want that).

> Wouldn't it be better to first validate the performance via those tests
> than migrating the views?

I'll look at the performance tests and will follow whatever path looks easiest when I see the code. I agree that porting the performance tests (or writing new, harder, ones) is a requirement for this work, but I want to get a more impressive demo quickly.
Comment 32 Igor Fedorenko CLA 2015-11-17 09:23:00 EST
If you happen to have access to google compute infra, there is a copy of Maven Central repository. It maybe worth using real jars from Central instead of artificially generated jar files in indexer performance stress tests. I can find more details how to access it, if you think it'll be useful.
Comment 33 Dani Megert CLA 2015-11-17 09:33:11 EST
(In reply to Igor Fedorenko from comment #32)
> If you happen to have access to google compute infra, there is a copy of
> Maven Central repository. It maybe worth using real jars from Central
> instead of artificially generated jar files in indexer performance stress
> tests. I can find more details how to access it, if you think it'll be
> useful.

Sounds good!
Comment 34 Stefan Xenos CLA 2015-11-17 09:49:58 EST
> It maybe worth using real jars from Central instead of artificially
> generated jar files in indexer performance stress tests.
> I can find more details how to access it, if you think it'll be useful.

Actually, I already have access to enormous workspaces containing real code. The problem here is that I can't easily share this code with the rest of the Eclipse community. 

The benefit of an artificially generated performance test is that it would be a tiny download for anyone wishing to replicate my results, I could include them as part of the performance suites, and the test data won't change over time.

Please send me the instructions on how to get those jars. I'll use them for my own manual tests, but I think I'll stick with artificial data for the unit tests.
Comment 35 Jason van Zyl CLA 2015-11-17 10:13:31 EST
There are no instructions per se as it's a relative new thing to be able to provide full copies of Maven Central for research purposes (http://takari.io/2015/10/28/google-maven-central.html). If you would like a VM with a read-only copy of Maven Central I'm happy to help you get setup. These are exactly the kinds of things the new infrastructure was setup to accommodate.
Comment 36 Sebastian Zarnekow CLA 2015-11-18 02:59:35 EST
My gut feeling is the indexing mvn central is not very similar to a real world scenario since many different versions of the same jars will be indexed - and those will inherently provide a great deal of duplicate names. Could be an interesting exercise to use only the latest versions and compare the index performance with a fully indexed central repo.
Comment 37 Jason van Zyl CLA 2015-11-18 09:01:04 EST
Sure, with access to everything one can take a projection, or projections of what might be useful. Latest, random, dependencies with a high degree of clustering. Happy to provide access. Just let me know if you're interested.
Comment 38 Eclipse Genie CLA 2015-11-18 12:00:20 EST
New Gerrit change created: https://git.eclipse.org/r/60724
Comment 39 Stefan Xenos CLA 2015-11-20 16:49:36 EST
Updates in tonight's version:
- A lot more data (methods, variables) is being added by the indexer
- Types are now identified using field descriptors rather than binary names
- Support for inner types
- A number of schema changes, to support faster lookup of method names
Comment 40 Stefan Xenos CLA 2015-11-25 01:40:40 EST
More changes tonight:

- Constants are now being indexed.
- Added support for one-to-one relationships.
- Renamed FieldNodePointer to FieldOneToMany and renamed FieldBackPointer to FieldManyToOne.
- Declaration annotations are now being indexed. (No type annotations yet)
- Variables are now being indexed.

I won't upload an updated patch tonight but will include these changes with the next upload.
Comment 41 Stefan Xenos CLA 2015-11-26 13:53:46 EST
I'm currently working on indexer support for generic signatures and type annotations.
Comment 42 Stefan Xenos CLA 2015-11-27 00:10:39 EST
I've updated the index schema to support generic signatures and type annotations, but haven't made the relevant changes to the indexer yet.

Average indexing time is currently 1.2ms per class, average storage cost is currently 968 bytes per class (averages are based on a test workspace with 134309 classes from JDT, CDT, and platform).
Comment 43 Stefan Xenos CLA 2015-11-27 00:12:28 EST
There's an updated patch in gerrit, for anyone who's curious.
Comment 44 Manoj N Palat CLA 2015-11-29 23:45:31 EST
Stefan & Sergey - Thanks for the effort. From the comments and the design document (comment 1), I have the following queries:

- Design doc says a single index. So I understand that multiple jars that are shared across projects will have only one entry - Is that correct?

- Do you have numbers that compare the size of the index files with the existing infra? [you can probably give the existing index file size for those mentioned in comment 23]

- I tried fishing out some info regarding PDOMs in general and from here https://wiki.eclipse.org/CDT/designs/PDOM/Overview (link looks dated as it was updated in 2008) I see that the data in B-Trees are not deleted. Are you planning to follow the same algorithm here? In that case, can you provide more info on how the memory will be affected in such a case? 

- How easy/difficult is to do incremental changes? (Disclaimer: I have not looked into the patch) for eg: when a source file changes, say a method is deleted, or a type is added, how complex is the index change ie would it mean updating information in multiple B-Trees in addition to the plain data (plain data implies the unordered store mentioned in Section 4 of design doc)? This query is more from the perspective of network object model as this might mean updating multiple pointers and hence the query.
Comment 45 Manoj N Palat CLA 2015-11-30 05:09:14 EST
(In reply to Manoj Palat from comment #44)
> 
> - Design doc says a single index. So I understand that multiple jars that
> are shared across projects will have only one entry - Is that correct?
> 
From the code I see that you are removing the duplicates across all the projects.
Comment 46 Stefan Xenos CLA 2015-12-01 16:42:57 EST
> I see that the data in B-Trees are not deleted. Are you planning to
> follow the same algorithm here?

We plan to support deletions. Actually, I'm pretty sure we support them already... but I haven't yet added a unit test to confirm that the amount of allocated heap in the DB actually gets reduced during deletion.

Also, I'd like to switch from b-trees to tries for the text search indices. I think they'll be quite a bit faster.

> - How easy/difficult is to do incremental changes?

I'm not sure I understand what you mean by "easy". Do you mean how much code does it take? How expensive is the operation in CPU/disk access time? How many nonconsecutive writes will it make to the database? I'll try to answer all three, but please ask follow-up questions if I missed the heart of your question.

Let's consider what happens when someone deletes a method from a .java file and saves it. The current implementation doesn't index .java files - just the derived .class files. The smallest scope for an incremental update is an entire file... so when a file changes, we deal with it by re-indexing that file.

How much code does it take: Since we're re-indexing the entire file, you could either say none (since there is no special case for method deletions) or lots (if you want to count all the code in the indexer as being part of method deletion case).

How much cpu/disk time does it take: The current implementation takes 1.2ms per class (average).

Does it write to multiple B-trees: Yes. There are currently 3 b-trees in the database (for field descriptors, simple names, and filenames) and any index update will write to all three. It will also rewrite all the simple information that was scraped from the file, even if only one field changed. This is very fast, though, and is included in the 1.2ms, above.

Does this answer your question?
Comment 47 Manoj N Palat CLA 2015-12-03 18:39:52 EST
(In reply to Stefan Xenos from comment #46)
Thanks for the details, Stefan. Going through the code now. Will ping if there are queries.
Comment 48 Manoj N Palat CLA 2015-12-11 09:30:51 EST
Created attachment 258616 [details]
Minutes of discussion

Please find the important points discussed over the call attached. If anyone who has attended the meeting finds any point missing please feel free to add
Comment 49 Stefan Xenos CLA 2015-12-11 10:17:15 EST
From the meeting notes:

> S: If the expectation that they can re-generate every few months
> (when indexes changes), this is reasnable. Backward compatibility is a
> hard thing.
> M: It is ok, if for the next generation eclipses, to regenerate.
> S: Doable. 

Just one clarification: I said it's doable, but still undesirable due to the impact that a fragmented index has on search time. Someone (I'm not sure who) volunteered to talk with the teams that asked for the requirement originally and find out if the prebuilt indices could be dropped.
Comment 50 Manoj N Palat CLA 2015-12-13 22:26:15 EST
(In reply to Stefan Xenos from comment #49)
> 
> Just one clarification: I said it's doable, but still undesirable due to the
> impact that a fragmented index has on search time. Someone (I'm not sure
> who) volunteered to talk with the teams that asked for the requirement
> originally and find out if the prebuilt indices could be dropped.

Next line has the owner (D for Dani)
D: Lets not worry too much here. Will get more information here.
Comment 51 Dani Megert CLA 2015-12-14 07:39:59 EST
(In reply to Manoj Palat from comment #48)
> Created attachment 258616 [details]
> Minutes of discussion
> 
> Please find the important points discussed over the call attached. If anyone
> who has attended the meeting finds any point missing please feel free to add

"The performance is evident in the case of  large number of classes in jar."

Isn't it the big number of JARs rather than the size of the JAR(s)?

Also, second reason for the work was mentioned to be the corruption of the index, right?
Comment 52 Stefan Xenos CLA 2015-12-14 13:48:14 EST
Re: comment 51

That's correct. Dealing with large numbers of jars is the primary motivation and the ability to detect corruption is secondary.
Comment 53 Stefan Xenos CLA 2015-12-14 14:39:13 EST
Manoj asked me for for a code example for adding a new search index. Here it is:

Example 1: Adding a new search index

Let's say we wanted to add a search index containing the value of all string literals. Currently string literals are stored in PDOMConstantString, and they hold their value in a string VALUE attribute. 

First, we change VALUE from being a FieldString to being a FieldSearchKey. So the new declaration for PDOMConstantString looks like this:

public static final FieldSearchKey<JavaIndex> VALUE;

@SuppressWarnings("hiding")
public static StructDef<PDOMConstantString> type;

static {
	type = StructDef.create(PDOMConstantString.class, PDOMConstant.type);
	VALUE = FieldSearchKey.create(type, JavaIndex.STRING_LITERALS);
	type.done();
}

Notice that FieldSearchKey is parameterized by the struct that holds the singleton search index. Now we need to add the index itself. We do that by adding two lines to the JavaIndex.java class.

Declare the new index:

public static final FieldSearchIndex<PDOMConstantString> STRING_LITERALS;

...and initialize it:

STRING_LITERALS = FieldSearchIndex.create(type, PDOMConstantString.VALUE);

After that we need to advance the database version number since we made a breaking change. We need to change these constants in JavaPDOM.java:

	private static final int MIN_SUPPORTED_VERSION= PDOM.version(1, 7);
	private static final int CURRENT_VERSION = PDOM.version(1, 7);

...and that's all there is to do. Literally just 6 lines of code to add the new search index to the database.

Since FieldSearchKey has the same public interface as FieldString and we were already filling in the strings, there is no extra work to do in the indexer to fill in the new search index. FieldSearchKey knows about its search index, so just assigning to the string is enough to cause the btree insertions to happen. It also has a destructor which handles the btree removal, so there is nothing extra to do to handle cleanup.



Reading the new search field:
-----------------------------

You'd probably want to also add some helper methods for doing searches on the new btree. 

You can add this helper to JavaIndex.java:

public PDOMConstantString findStringConstant(String value) {
    SearchCriteria searchCriteria = SearchCriteria.create(value);
    return STRING_LITERALS.findBest(this.pdom, this.address, 
        searchCriteria, this.anyResult);
}


The helper is also responsible for filtering out any search results that come from PDOMConstantStrings that are currently being indexed (that belong to a PDOMResourceFile with a timestamp of 0). I don't have any good examples of doing this filtering yet but will be adding it shortly.
Comment 54 Stefan Xenos CLA 2015-12-14 14:45:07 EST
Note that all the pointers in this model are bidirectional, so you always add cross-references in two places. The following fields are always added in pairs:

FieldSearchIndex/FieldSearchKey
FieldOneToMany/FieldManyToOne
FieldOneToOne/FieldOneToOne

I've set up all the constructors so that these types always receive their counterpart in their constructor, so you'll get a compiler error if you fail to add the other half of a backpointer.
Comment 55 Stefan Xenos CLA 2015-12-14 22:08:53 EST
FYI, I'm still working on parsing type signatures. I've got most of it working except for inner classes.

I'll upload the next update to a public git server so folks can access it easier.
Comment 56 Dani Megert CLA 2015-12-16 07:18:07 EST
Stefan, during the call you mentioned the long time it takes to open the Type Hierarchy on Object in a new workspace with a new Java project. I can confirm that this takes quite long, but don't exactly understand how the new index would make this much faster. Can you explain?
Comment 57 Stefan Xenos CLA 2015-12-16 10:33:58 EST
Sure!

The hierarchy view seems to perform three slow operations.

1. It performs a sequence of searches in order to locate the set of possible subtypes of the selected object.

2. It constructs its content provider (which goes back to the jars).

3. It draws the labels (which accesses the jars again).

I'm being a bit vagues on 2 and 3 because I haven't dug into them yet. Step 1 is where I've gained the current 30% speed-up - instead of searching for classes, I just return the prebuilt list. You should be able to confirm this 30% speed-up for yourself with the attached patch.

I anticipate a speed-up for steps 2 and 3 since everything the view needs from the JAR files can be found in the index and will most likely be sitting in cache after step 1.
Comment 58 Stefan Xenos CLA 2015-12-16 15:48:37 EST
Re: comment 56

Let me explain another way: my profiler tells me that most of the time is spent either searching or reading .jar files, and I know that with the new index there will be no reason to either open a .jar file or perform a search.

There will still be some time spent dumping the raw data from the index (which appears to be much faster than getting it from .jars), and the time spent constructing the actual tree widget will be unchanged, but that doesn't seem to dominate the runtime.

I haven't yet dug into all the details of how the hierarchy view works -- but the above is my reasoning as to why the index should be much faster. Of course, we'll know for sure if it's faster and by how much after it's coded.
Comment 59 Sergey Prigogin CLA 2015-12-16 16:50:54 EST
(In reply to Dani Megert from comment #56)

An easy way to get an approximate profile of the Type Hierarchy view is to enable UI Responsiveness Monitoring and to increase the maximum number of samples to 30.
Comment 60 Stefan Xenos CLA 2015-12-16 20:00:29 EST
I've uploaded my code to this github repository, in the branch "newindex"

https://github.com/sxenos/eclipse.jdt.core

You can clone it like this:

git clone https://github.com/sxenos/eclipse.jdt.core.git
git checkout newindex

The first commit is the big patch I attached here. I've attached all subsequent work as follow-up commits. The latest code supports type annotations, type parameters, and generic signatures on class declarations.

I've also moved the functionality of JavaPDOM into JavaIndex.
Comment 61 Manoj N Palat CLA 2016-01-13 01:21:21 EST
Created attachment 259142 [details]
Test Code to reproduce the type hierarchy issue

Stefan: Still in the midst of reading code -but  here is an intermediate comment - In the PDOMMethod I believe we will have to keep pointers to the types declared.

Please use the attached code and export a jar. Then use the jar in the referenced libraries in a class say A in another project, say Test2 . In A, put the import p.X; statement and then take the type hierarchy. The "Y"s (there should be two of them) are not displayed.
Comment 62 Stefan Xenos CLA 2016-01-13 10:45:19 EST
Re: comment 61

Thank you so much for the test case! I'm actually not finished writing the indexer yet so I'm not surprised that it's missing a bunch of important stuff. In particular, there's a bunch of stuff about methods that I'm not storing yet. I agree that the methods will need pointers back to their embedded types.

I'll look at that in the next round of changes (which I hope to start on later this week).
Comment 63 Stefan Xenos CLA 2016-01-22 00:01:46 EST
I've been away / working on other urgent issues for the past few weeks, but I'll be back on the index tomorrow.

Next up, finish indexing IBinaryType.

Here's what's still missing:
  - IBinaryField.getTagBits() and IBinaryField.getTypeAnnotations()
  - getMemberTypes()
  - getMethods() and the contents of IBinaryMethod
  - getMissingTypeNames()
  - getSourceName()   (?)
  - getTagBits()
  - isAnonymous()
  - isBinaryType()
  - isLocal()
  - isMember()
  - sourceFileName()
Comment 64 Stefan Xenos CLA 2016-01-22 15:39:39 EST
I found a problem with the asynchronous deletion algorithm I was using for the index.

When an object is unused (for example, when a refcounted entity has a refcount that reaches 0), it is marked for deletion... then whenever the write lock is released (and before any waiting readers or writers are unlocked), the entities are actually removed from the database.

This avoids recursion problems in the descructors and means that I could potentially run the destructors in an order which minimizes the probability of a page fault... but it there are cases where an object marked for deletion can acquire new references prior to removal (which actually turns out to be fairly common for objects that show up in the search indices).

I'm working on a fix now.
Comment 65 Stefan Xenos CLA 2016-01-25 16:07:00 EST
I just pushed a fix for comment 64, and added support for generic type signatures and type annotations on fields.
Comment 66 Stefan Xenos CLA 2016-01-26 16:23:48 EST
I just added support for indexing methods (including pretty much everything on the IBinaryMethod interface).

getMethods() and the contents of IBinaryMethod.

Re: comment 61

I haven't tested it yet, but this version should now contain a cross-reference between methods and their embedded types. To extract them, use this:

PDOMMethod myMethod = ...
List<PDOMType> declaredTypes = myMethod.getMethodId().getDeclaredTypes();

Note that this will return all types declared for all methods with an identical signature. If you only want the ones in the same class file, you'll need to filter by PDOMType.getResourceFile().
Comment 67 Stefan Xenos CLA 2016-01-27 14:10:02 EST
I just pushed a big code cleanup.
- Fixed all warnings in the code
- Standardized the index to use char[] instead of String
- Renamed all the classes and packages to use "Nd" (Network Database) instead of "PDOM" (Persistent Document Object Model).
- Renamed all variables that hold a database address to use the suffix "address" rather than "record".
Comment 68 Stefan Xenos CLA 2016-01-27 16:17:00 EST
I think the indexer is pretty much done. Going to move onto the adoption on the type hierarchy view next.
Comment 69 Stefan Xenos CLA 2016-01-28 10:26:14 EST
I've reproduced the problem in comment 61. I followed your steps and then took the class hierarchy of java.lang.Object. I found that not only was Y missing from the hierarchy, so was X.

However, when I waited a few minutes and repeated the test, X and both Ys were present.

Manoj: are you seeing the same thing?

I believe the problem is that the new index is still missing change notifications, so the hierarchy view isn't getting notified when the indexer is finished indexing X and Y.
Comment 70 Stefan Xenos CLA 2016-01-28 10:29:25 EST
Re: comment 69

Never mind. I was seeing the X and Ys that were getting imported from the original project. The ones from the jar file were missing, which I confirmed by trying to open the hierarchy for X from the import in A.

...so I think I've reproduced your exact problem. I'll look at this next.
Comment 71 Eclipse Genie CLA 2016-01-30 00:13:02 EST
New Gerrit change created: https://git.eclipse.org/r/65507
Comment 72 Stefan Xenos CLA 2016-01-30 00:15:32 EST
Re: comment 69

I've pushed fixes for your test case. It works now. I've also added a new preference that lets you enable or disable the new index and have attached a JDT UI patch that lets you toggle it from the UI.

The new index is currently disabled by default, so if you want to test it you'll either need this JDT UI patch or will need to manually edit your preference store to enable the functionality.
Comment 73 Manoj N Palat CLA 2016-02-02 02:24:36 EST
(In reply to Stefan Xenos from comment #72)
> Re: comment 69
> 
> I've pushed fixes for your test case. It works now.

Thanks Stefan. I can confirm that it works now.

A query regarding the code around BTree - in delete(), the comment says that the record is not deleted itself and  its storage is not deallocated.
 - What are scenarios that would lead to this delete? Can this be considered insignificant so as not to affect the memory? 
also, related maybe, in the FieldSearchKey.put() when would the address be already present while putting the address in the Btree?
Comment 74 Stefan Xenos CLA 2016-02-02 12:52:18 EST
> Thanks Stefan. I can confirm that it works now.

Excellent. Also - be sure you're enabling the preference which activates the new index. The new index is now disabled by default.

> - What are scenarios that would lead to this delete?

Whenever any object that is present in a BTree is deleted, this delete is invoked in order to remove that object's key from the BTree. For example, when reindexing a jar, a new NdResourceFile is created to represent the jar file. After indexing is completed, any older NdResourceFile records for that same jar are deleted and this delete is invoked in order to remove the filenames from the global filename lookup BTree.

This happens many times during every reindex operation.

> Can this be considered insignificant so as not to affect the memory?

It's not a memory leak, if that's what you're asking. This comment says that - in the above scenario - the NdResourceFile object won't be deleted by this call... but that's because it happens in the other direction. Deleting the NdResourceFile record calls this method in its destructor. The record will still be deleted -- just not by this method call.

> also, related maybe, in the FieldSearchKey.put() when would the address
> be already present while putting the address in the Btree?

That would happen if you ever changed an object's search key (for example, if we renamed a class in the index). This sort of operation is supported by the public interface on the Nd* classes, but the current indexer doesn't ever actually do this so it doesn't happen right now.

The intended use-case is to support things like detecting moved jars and renaming the NdResourceFile rather than reindexing it.

Currently that delete(...) branch gets hit far more often than it should since we could skip it on first initialization of a search key (which is currently 100% of our use-cases).
Comment 75 Stefan Xenos CLA 2016-02-02 13:48:47 EST
> also, related maybe, in the FieldSearchKey.put() when would the address be already present while putting the address in the Btree?

In response to your comments, I've just pushed a new version of FieldSearchKey which avoids calling BTree.delete(...) unless the old value of the key was actually present in the index.
Comment 76 Stefan Xenos CLA 2016-02-02 13:56:00 EST
FYI, that change to FieldSearchKey improved indexing time by about 20%! Current performance looks like this:

  Indexed 50195 classes in 44708ms, average time = 0.890686323338978ms
Comment 77 Stefan Xenos CLA 2016-02-08 22:32:07 EST
I just hit save on the last line of code for my index-backed implementation of IBinaryType.

I haven't pushed it yet because it's going to need some testing, which I'll start tomorrow.

Once this works it should be a fairly short sprint to having the whole type hierarchy populated from the contents of the index.

To get the rest of the hierarchy view working, I'm going to need an implementation of IType. My plan is to modify the existing BinaryType to use my new implementation of IBinaryType instead of ClassFileReader if the requested type can be found in the index.
Comment 78 Stefan Xenos CLA 2016-02-10 01:20:45 EST
I've pushed my work-in-progress for the IBinaryType implementation. It includes a bunch of self-tests for the indexer. They're currently failing, so this code should be considered unstable in its current form.

I've pushed it in this form mainly so Manoj can keep up with the code inspections. If you'd like to see an example of reading an API-accessible model object directly from the index, check out IndexBinaryType.
Comment 79 Stefan Xenos CLA 2016-02-10 22:47:21 EST
I've pushed a bunch of bugfixes today, but the indexer's self-tests are still failing.

I'm going to have to rework how I store array types in the index. Currently, I store arrays as though they're a generic type called "[" with one type argument that is the array type. The trouble with this approach is that - unlike generics - arrays have a different field descriptor depending on the argument type... so with the way I've been storing them, you can't find array field descriptors in the index. It also means that array types end up with correct generic signatures but incorrect field descriptors.

I'll try to fix this tomorrow by adding more special cases for arrays and making a unique NdTypeId for each unique array field descriptor.

I've discovered that IBinaryType doesn't obey its documented API contracts. Most of its methods say that they return null instead of an empty array if their output is empty, but many of them actually return empty arrays. I've mimicked this behavior in my implementation, but I'm not sure if it's a bug in the JavaDoc or the implementation.
Comment 80 Stefan Xenos CLA 2016-02-17 00:04:26 EST
All the tests now pass in my small test workspace (which contains all of JDT, equinox, and platform UI).

The next step will be to redirect all the hierarchy view's calls from ClassFileReader to the index-backed IBinaryType.
Comment 81 Stefan Xenos CLA 2016-02-18 23:45:53 EST
Adoption in the type hierarchy view is now mostly complete, in the sense that almost all the data is read from the index rather than being read from jars. However, this initial implementation hasn't been performance tuned yet and is currently not much faster than the old version.

The next step: profiling!

(This push also contains a number of bugfixes and reverts the new locking code, which was buggy. I'll restore the new locking code once I've had time to investigate.)
Comment 82 Stefan Xenos CLA 2016-02-24 20:33:36 EST
I just pushed a number of optimizations and bugfixes. The index version is now about 40% faster than the non-index version in the Hello World type hierarchy test on GTK. The improvement should be more impressive on Windows and Mac since they don't suffer from the GTK tree widget bug.

Some other improvements:

I've adopted the index in parts of ClassFile.java -- so it should also be speeding up other areas of Eclipse as well - such as the compiler and other various views... but I haven't yet set up benchmarks to test them yet.
Comment 83 Stefan Xenos CLA 2016-02-26 14:32:35 EST
Pushed a bunch more optimizations.

I ran my benchmark on a Mac, and the improvement is pretty impressive. With my same test case, the old index ran in 37.0s and the new index ran in 4.4s average.

...so the new index is 88% faster on OSX. The improvement is less impressive on GTK because of the tree widget bug -- it's about 50% faster there.
Comment 84 Manoj N Palat CLA 2016-03-28 10:19:47 EDT
@Stefan: In Database.java:

In the documentation of Database it is mentioned that   * (1) where 2 <= m <= CHUNK_SIZE / BLOCK_SIZE_DELTA - MIN_BLOCK_DELTAS + 1; If we go by this, the last offset (sans multiplicatin by INT_SIZE) would be MAX_BLOCK_DELTAS +1; Is this intended? Is n't this inequality m < CHUNK_SIZE/BLOCK_SIZE_DELTA - MIN_BLOCK_DELTAS + 1?
Comment 85 Manoj N Palat CLA 2016-03-28 10:58:19 EDT
(In reply to Manoj Palat from comment #84)
> @Stefan: In Database.java:
> 
> In the documentation of Database it is mentioned that   * (1) where 2 <= m
> <= CHUNK_SIZE / BLOCK_SIZE_DELTA - MIN_BLOCK_DELTAS + 1; If we go by this,
> the last offset (sans multiplicatin by INT_SIZE) would be MAX_BLOCK_DELTAS
> +1; Is this intended? Is n't this inequality m < CHUNK_SIZE/BLOCK_SIZE_DELTA
> - MIN_BLOCK_DELTAS + 1?

Couple more queries in Database.java, - now regarding datasize 
ie in Database.malloc(int datasize):

1. Is the MAX_MALLOC_SIZE correct? In Database.malloc(int datasize), there is a check for datasize <= MAX_MALLOC_SIZE which is CHUNK_SIZE - BLOCK_HEADER_SIZE; In this case of max, where is the space for two pointers PREV and NEXT? Shouldn't this be CHUNK_SIZE - BLOCK_HEADER_SIZE - 8 where 8  or BLOCK_SIZE_DELTA is the space for two pointers.

2. In Database.malloc() (line 325 to 328 in current tot in the concerned branch)
.
.		
int needDeltas= (datasize + BLOCK_HEADER_SIZE + BLOCK_SIZE_DELTA - 1) / BLOCK_SIZE_DELTA;
if (needDeltas < MIN_BLOCK_DELTAS) {
	needDeltas= MIN_BLOCK_DELTAS;
}

if a datasize of 7 is requested, the needDeltas would be (7+2+8-1)/8 = 2 ie MIN_BLOCK_DELTAS; but the header and the pointers will take up 2+ 2*4 = 10 bytes, and then we have only 6 bytes for the data whereas the requested datasize is 7. Is there something which I am missing here?
Comment 86 Stefan Xenos CLA 2016-03-28 14:42:16 EDT
> Is the MAX_MALLOC_SIZE correct?

I believe so. The next and previous pointers are only used when the block itself is unallocated. They occupy the same memory that would normally be used by the structure itself when the block has been malloc'd so they don't add to the total size of the block.

The two bytes in BLOCK_HEADER_SIZE are used unconditionally, which is why the cut into the MAX_MALLOC_SIZE.

I'll respond to the rest of your questions shortly.
Comment 87 Stefan Xenos CLA 2016-03-28 14:52:47 EDT
> 2. In Database.malloc() (line 325 to 328 in current tot in the
> concerned branch)

Again, this is accounted for by the fact that the next and previous pointers are overwritten by the content of the struct when the block is in use, so the actual memory needed to store the struct + header would only be 9 bytes when in use rather than 17.

When unused, the total number of bytes needed would be the size of the header plus the next and previous pointers - which is 10 bytes as you observed. In both cases (used and unused), the 16 bytes allocated would be sufficient to hold the necessary contents.

I'll update the JavaDoc to explain the difference in header size between allocated and unallocated blocks.
Comment 88 Stefan Xenos CLA 2016-03-28 16:26:06 EDT
> If we go by this, the last offset (sans multiplicatin by INT_SIZE) would
> be MAX_BLOCK_DELTAS +1; Is this intended?

I think it is. Since the version number occupies INT_SIZE bytes at the start of the file, the offset for the minimum block delta would be at 1 * INT_SIZE and the offset for the maximum would be at (MAX_BLOCK_DELTAS + 1) * INT_SIZE.

Your calculation is correct for the offset from the start of the malloc table, but once you take into account the offset from the start of the file, I think everything matches the actual code.

Please correct me if you see an error in my math.
Comment 89 Stefan Xenos CLA 2016-03-28 16:45:59 EDT
I've updated the comments to describe the optional nature of the next and prev pointers better, and I've updated the computation of several Database constants to make their relationships more clear.
Comment 90 Stefan Xenos CLA 2016-03-28 16:56:25 EDT
> If we go by this, the last offset (sans multiplicatin by INT_SIZE) would
> be MAX_BLOCK_DELTAS +1; Is this intended?

Of course, if we got this wrong the worst consequence would just be 4 bytes wasted in the Database header, which really isn't a big deal.
Comment 91 Manoj N Palat CLA 2016-03-29 01:11:44 EDT
(In reply to Stefan Xenos from comment #90)
> > If we go by this, the last offset (sans multiplicatin by INT_SIZE) would
> > be MAX_BLOCK_DELTAS +1; Is this intended?
> 
> Of course, if we got this wrong the worst consequence would just be 4 bytes
> wasted in the Database header, which really isn't a big deal.

Thanks Stefan for making the documentation clear - Let me respond to all the replies in this comment:


Let me table the facts as I understand:
MALLOC_TABLE_OFFSET is at offset 1 * INT_SIZE 

So the MIN_BLOCK_DELTA (2) will be at offset 1 * INT_SIZE (ie 0 * INT_SIZE from base)
3 -> 2 * INT_SIZE
.
.
continuing we get
MAX_BLOCK_DELTA -> (MAX_BLOCK_DELTA - 1) * INT_SIZE which is at (MAX_BLOCK_DELTA - 2)*INT_SIZE from MALLOC_TABLE_OFFSET.

I think this is only a documentation error. When I do the code at malloc, I see that the correct offset is taken. Moreover your calculation of WRITE_NUMBER_OFFSET IS correct and it is at 2048 bytes.

I was not worried about the 4 bytes, which as you said is insignificant, but wanted to check about the cascading effect it had if we allocated memory from the next higher bucket - higher than necessary - in malloc() when we calculate the offset.
Comment 92 Manoj N Palat CLA 2016-03-29 10:04:29 EDT
A General comment - Please rename/modify the references to PDOM and PDOM*(Type/Node etc) in the documentation to the new names in the latest implementation - Nd, NdType etc. [Note: This need to be done only by the time the code gets into the main branch]. Please update the design doc as well by that time.
Comment 93 Stefan Xenos CLA 2016-03-29 16:10:25 EDT
> I think this is only a documentation error. 

Aha! I think I found the error you were speaking of. m was used in two places. In one of the places the + 1 was necessary but not the other. I've updated the doc with the fix.

Also, with the way m was used it mas more appropriate for it to be an equality rather than an inequality.

The correction of the PDOM prefix in the doc will be done in a follow-up.
Comment 94 Manoj N Palat CLA 2016-03-31 04:59:31 EDT
A couple of minor comments for the record:
-Doc incomplete at  FieldSearchIndex    * This field may only ever  
-Unnecessary repeat check if (oldTargetAddress != 0) { at FMTO.put() line 127
Comment 95 Manoj N Palat CLA 2016-03-31 09:02:12 EDT
@Stefan: review in progress - Have been reviewing the index generation part - did a fairly detailed review of the fundamental data structure Database, and some parts of the btre alongwith the general flow and the NdType and struct def infra. So far the code looks logical and good. Yet to review the consumer part - in this case the HBuilder - but a query: Currently only the TypeHierarchy is using the new index, rt? do you plan to migrate the rest of apis - for eg the search for method and type references et al soon to the new index infra?
Comment 96 Stefan Xenos CLA 2016-03-31 10:59:27 EDT
> Currently only the TypeHierarchy is using the new index, rt? 

Yes. Only TypeHierarchy and other code (such as ClassFile) used directly by the type hierarchy view.

> do you plan to migrate the rest of apis - for eg the search for method
> and type references et al soon to the new index infra?

Yes, we plan to migrate the rest of the APIs (SearchEngine, call hierarchy, etc.) to use the new index infra.

No, we don't plan to do it soon. Right now, our goal is to focus on the things you consider essential for getting this code into master. After that has happened, we'll expand the scope of the adoption (hopefully with some help from the community).
Comment 97 Stefan Xenos CLA 2016-03-31 16:33:34 EDT
Manoj, I just pushed fixes for the all the code issues you mentioned above.

I'll follow up with class name changes in the design doc.
Comment 98 Manoj N Palat CLA 2016-04-04 23:28:41 EDT
(In reply to Stefan Xenos from comment #96)
> > do you plan to migrate the rest of apis - for eg the search for method
> > and type references et al soon to the new index infra?
> 
> Yes, we plan to migrate the rest of the APIs (SearchEngine, call hierarchy,
> etc.) to use the new index infra.
> 
> No, we don't plan to do it soon. Right now, our goal is to focus on the
> things you consider essential for getting this code into master. After that
> has happened, we'll expand the scope of the adoption (hopefully with some
> help from the community).

Just an early query here: did you have any initial thoughts on how the search will be implemented - for eg lets take the initial case of finding all the types - with a CamelCase search facility (the current ctrl-shift-T on Windows) - In the the typehierarchy case, the BTree is sorted on the fullname; but in this case since the search will be based on CamelCase the BTree would be based on different sort criteria. Wouldn't this mean populating the db (in the physical disk) with a different set of pointers with for each of the search criteria?
Comment 99 Stefan Xenos CLA 2016-04-06 13:44:59 EDT
> Just an early query here: did you have any initial thoughts on how the
> search will be implemented

Yep! I've both thought about it and implemented it. :-) The relevant B-tree is called JavaIndex.SIMPLE_INDEX. It is intended to for supporting things like the open type dialog -- case-insensitive prefix searches. Right now I'm populating it but haven't written any code which reads from it yet.

> Wouldn't this mean populating the db (in the physical disk) with a
> different set of pointers with for each of the search criteria?

As the index is shaped right now, yes it does. Once we switch from B-trees to tries (which I'm planning for some future version), we would have the option of using a permuterm index which would let us accelerate any sort of wildcard search rather than just prefix searches.

http://nlp.stanford.edu/IR-book/html/htmledition/permuterm-indexes-1.html

But for now I plan to just add additional b-trees for each kind of prefix search we may want to support.
Comment 100 Manoj N Palat CLA 2016-04-06 20:01:46 EDT
(In reply to Stefan Xenos from comment #99)
> > Just an early query here: did you have any initial thoughts on how the
> > search will be implemented
> 
> Yep! I've both thought about it and implemented it. :-) The relevant B-tree
> is called JavaIndex.SIMPLE_INDEX. It is intended to for supporting things
> like the open type dialog -- case-insensitive prefix searches. Right now I'm
> populating it but haven't written any code which reads from it yet.

That's really great!

> > Wouldn't this mean populating the db (in the physical disk) with a
> > different set of pointers with for each of the search criteria?
> 
> As the index is shaped right now, yes it does.

This brings us back to the performance question (as long as we bank upon BTree) - What do your initial perf figures show the time for populating the indexes with a CamelCase tree?

> Once we switch from B-trees
> to tries (which I'm planning for some future version), we would have the
> option of using a permuterm index which would let us accelerate any sort of
> wildcard search rather than just prefix searches.

I believe the same would go for supporting regex searches (if we decide to support regex searches sometime in the future) - I don't see a way of supporting regex searches with the BTree structure - but with the this new data structure proposed is it possible to support an arbitrary regex search? [note: currently we don't have regex support save for the very limited case of wildcard]
> 
> http://nlp.stanford.edu/IR-book/html/htmledition/permuterm-indexes-1.html

Thanks for sharing the link. Will take a look.

> But for now I plan to just add additional b-trees for each kind of prefix
> search we may want to support.
Comment 101 Sergey Prigogin CLA 2016-04-06 20:24:11 EDT
(In reply to Manoj Palat from comment #100)
When the related pages are already in the cache, a full BTree scan is pretty fast and can be used for regex and other advanced searches. One possible alternative to using more advanced indexes is to just pre-warm the page cache on startup.
Comment 102 Stefan Xenos CLA 2016-04-06 20:31:28 EDT
> What do your initial perf figures show the time for populating the indexes
> with a CamelCase tree?

The tree is being populated already. All the trees are currently case-insensitive and will work perfectly for CamelCase searches. The complete indexing time currently averages out to around 1.1ms per class and that includes the time for populating all the search trees. I haven't bothered collecting separate numbers for each search tree since that's already really darn fast.

> but with the this new data structure proposed is it possible to support
> an arbitrary regex search?

You're talking about a permuterm index? Probably. I haven't done a literature search on that yet... but bear in mind I'm not planning to implement a permuterm index in the near future. That would be an enhancement for the distant future, if and when we decided that we want to optimize the index for fast wildcard searches on large indices.

The first version of the index -- the one we have here -- is tuned to be fastest for prefix searches - including camelcase, case sensitive, and case insensitive searches. Arbitrary regexes are supported and will still be faster than they are now, but they'll degenerate to linear time in the worst case (like they do now).
Comment 103 Stefan Xenos CLA 2016-04-06 20:39:21 EDT
> a full BTree scan is pretty fast and can be used for regex and other
> advanced searches

Sergey is right: even when running a regex search that degenerates to linear time it's still very quick compared with the I/O that such currently do. There would be a lot of other stuff to optimize before this ended up as the biggest remaining bottleneck.
Comment 104 Lars Vogel CLA 2016-04-07 02:05:17 EDT
Stefan and Manoj,

as a Eclipse user I'm following this discussion now for a long time. It feels to me that the review of Manoj is very detailed and that Stefan reacts very promptly in fixing the issues identified by Manoj.

Personally I would like to have the option to use this new index (and I have also customers which would like to try it out). 

Would it be possible to merge the development of Stefan into JDT core and make it available by an environment variable or (hidden?) preference value? 

This way the "innovative" people can try it easily and provide more feedback without affecting the stability of the established workflow.
Comment 105 Dani Megert CLA 2016-04-07 02:36:25 EDT
(In reply to Lars Vogel from comment #104)
> Stefan and Manoj,
> 
> as a Eclipse user I'm following this discussion now for a long time. It
> feels to me that the review of Manoj is very detailed and that Stefan reacts
> very promptly in fixing the issues identified by Manoj.
> 
> Personally I would like to have the option to use this new index (and I have
> also customers which would like to try it out). 
> 
> Would it be possible to merge the development of Stefan into JDT core and
> make it available by an environment variable or (hidden?) preference value? 
> 
> This way the "innovative" people can try it easily and provide more feedback
> without affecting the stability of the established workflow.

We first have to have something reviewed, running and tested by the JDT team for a while before doing that and M7 is definitely too late. Innovative people can take the work from the github repo. We also have the option to offer a preview as feature patch for Neon on the Marketplace.
Comment 106 Manoj N Palat CLA 2016-04-07 04:50:14 EDT
(In reply to Stefan Xenos from comment #103)
> > a full BTree scan is pretty fast and can be used for regex and other
> > advanced searches
> 
> Sergey is right: even when running a regex search that degenerates to linear
> time it's still very quick compared with the I/O that such currently do.

@Stefan and Sergey. A fast BTree scan addresses my concern. thanks for the immediate response as well.
Comment 107 Lars Vogel CLA 2016-04-07 05:54:54 EDT
(In reply to Dani Megert from comment #105)
> We first have to have something reviewed, running and tested by the JDT team 
> for a while before doing that and M7 is definitely too late. Innovative people > can take the work from the github repo. We also have the option to offer a > preview as feature patch for Neon on the Marketplace.

The proposal was to have it disabled by default but allow to enable it for example via a environment flag. Similar to -Dswt.enable.autoScale=false for HDPI.

Not sure if that was clear. From your response, as it feels like your are proposal a similar approach (feature patch), only a little bit more complex and less flexible.

I would think that if it is a op-in, that is actually better than a feature patch. Using  a feature patch is harder to create and needs additional installation effort to enable / disable it. Also such a new feature patch cannot be combined with the Java9 feature patch.
Comment 108 Dani Megert CLA 2016-04-07 09:30:11 EDT
(In reply to Lars Vogel from comment #107)
> (In reply to Dani Megert from comment #105)
> > We first have to have something reviewed, running and tested by the JDT team 
> > for a while before doing that and M7 is definitely too late. Innovative people > can take the work from the github repo. We also have the option to offer a > preview as feature patch for Neon on the Marketplace.
> 
> The proposal was to have it disabled by default but allow to enable it for
> example via a environment flag. Similar to -Dswt.enable.autoScale=false for
> HDPI.
> 
> Not sure if that was clear. From your response, as it feels like your are
> proposal a similar approach (feature patch), only a little bit more complex
> and less flexible.
> 
> I would think that if it is a op-in, that is actually better than a feature
> patch. Using  a feature patch is harder to create and needs additional
> installation effort to enable / disable it. Also such a new feature patch
> cannot be combined with the Java9 feature patch.

We don't release work that's still in the works and that hasn't been tested over some milestones by the JDT developers. We're at M7 now. The SWT example is the other way around: we expect this to work, but have a backdoor in case it fails.
Comment 109 Lars Vogel CLA 2016-04-07 10:26:37 EDT
(In reply to Dani Megert from comment #108)
> We don't release work that's still in the works and that hasn't been tested
> over some milestones by the JDT developers. We're at M7 now. The SWT example
> is the other way around: we expect this to work, but have a backdoor in case
> it fails.

Thanks for the clarification.
Comment 110 Stefan Xenos CLA 2016-04-09 00:38:59 EDT
Re: Comment 92

I've updated the design doc with an updated diagram, changed the code references to reflect the current class names, and fixed a few out-of-date paragraphs and code samples.

I think there's a lot more that should go into the doc (such as how external references to database entities are stored and more info about the java-specific bits), but the bits that are in there should now be closer to reality.
Comment 111 Stefan Xenos CLA 2016-04-09 00:40:32 EDT
BTW, Manoj if you (or anyone) would like to make comments on the design doc, you should have permission to annotate the google doc. I'll update it with any comments I find there.
Comment 112 Manoj N Palat CLA 2016-04-11 04:19:38 EDT
(In reply to Stefan Xenos from comment #111)
> BTW, Manoj if you (or anyone) would like to make comments on the design doc,
> you should have permission to annotate the google doc. I'll update it with
> any comments I find there.

Thanks Stefan. I would take a look. With that we can plan to close the review of the changes so far attributed to this bug.
For further changes, you can create smaller dependent (on this bug or vice versa depending on the bug) bugs whenever a new feature is added or a design change or optimization (or a set of optimizations) is done  - helps to manage these changes more effectively.
Comment 113 Stefan Xenos CLA 2016-04-11 12:53:34 EDT
> With that we can plan to close the review of the changes so far
> attributed to this bug.

Am I correct in understanding that this means the code currently in my branch has been reviewed and accepted by JDT? Are you giving it a +2, in the gerrit sense?

> For further changes, you can create smaller dependent (on this bug or
> vice versa depending on the bug) bugs

Will do.
Comment 114 Stefan Xenos CLA 2016-04-11 22:24:00 EDT
I've spent the day looking into the problem of merging the results between the old and new indices, and I think I've found an easy way to make it work.

JDT core has the concept of a search scope, which allows you to filter what paths are included in search results. My plan is:

- Implement a new scope which decorates another scope, removing all results that would be covered by the new indexer.
- Keep all the old query methods around. The "new" versions will first consult the new index and then invoke the old version using a decorated scope.

Short term, I think this should work and will get us correct results.

Long term, I hate it. The problem with this as a long-term solution is that it prevents us from cleaning up the code related to querying the old index. It basically doubles the code size for index access, which is awful.

IMO, it would be a good idea to start writing a source indexer for the new index very soon so that such the old index could be deleted entirely and such merging code would become unnecessary.

Unless someone would like to offer an alternative, my plan is to proceed with this as the mechanism for result merging.
Comment 115 Stefan Xenos CLA 2016-04-13 00:12:54 EDT
I just pushed the code for merging the result sets from the new and old index to bug 491558.
Comment 116 Manoj N Palat CLA 2016-04-14 23:23:38 EDT
(In reply to Manoj Palat from comment #112)
> (In reply to Stefan Xenos from comment #111)
> > BTW, Manoj if you (or anyone) would like to make comments on the design doc,
> > you should have permission to annotate the google doc. I'll update it with
> > any comments I find there.
> 
> Thanks Stefan. I would take a look. With that we can plan to close the
> review of the changes so far attributed to this bug.

Stefan asked me whether the review is over - have reviewed the code until last week -and I found the code good. Thanks a lot Stefan and Sergey!
So What is pending : 
a) design doc review and b) the review of the code changes post 7th April
Planning to do these post M7.
Comment 117 Sergey Prigogin CLA 2016-04-21 14:16:21 EDT
*** Bug 442924 has been marked as a duplicate of this bug. ***
Comment 118 Stefan Xenos CLA 2016-05-03 23:24:12 EDT
I've spent the past few days working on error detection. Most of the code for reads is written, and I'll be working on the updated algorithm for page writes shortly.

I'll post updates on bug 492941.
Comment 119 Stefan Xenos CLA 2016-05-10 22:29:08 EDT
After benchmarking various CRC approaches, I've abandoned them in favor of a "lock flag" similar to what CDT uses. If that's not sufficient, we can always go back to my CRC approach, which is finished and working. See bug 492941 for details.
Comment 120 Stefan Xenos CLA 2016-05-27 13:18:56 EDT
I've been experimenting with various eventing approaches in the new index. I wanted to change the timing of java deltas so that anything that might open a jar file in response to a delta would only do so after the index had finished indexing (and caching) the contents of the jar file.

My current approach works like this:

DeltaProcessor no longer fires any change events for external archives (although it still fires the add and remove events). These events are now fired at the end of indexing by the indexer itself.

Bug 492488 holds the state of the implementation.

If anyone has opinions about this approach, I'd appreciate the feedback.
Comment 121 Stefan Xenos CLA 2016-06-14 09:12:33 EDT
We're hoping to have the index merged into master by the end of next week. The code is ready to go, but it's waiting on:

- An updated code review.
- A resolution to bug 490010.
- Someone to commit the attached JDT UI patch.
Comment 122 Dennis CLA 2016-06-17 02:37:12 EDT
Hi all, I've read the design document and the slide from https://www.infoq.com/presentations/java-jdt-index

I have a suggestion regarding the index.
From the design document it suggest to "replaced atomically by the new one" and delete previous version of records.

Can we keep the previous copies (immutable pattern) and reuse them later?

In my case, I often checkout/switch to different branches on a Java project.
A timestamp index will force a long wait for re-indexing (and re-compile) after every checkout.

If it is possible, adding indirect index with static checksum (MD5 if it is cheap/or JAR meta header?) will allows reuse index across branches or projects if they are identical JAR file.
Comment 123 Manoj N Palat CLA 2016-06-20 01:56:00 EDT
(In reply to Stefan Xenos from comment #121)
> We're hoping to have the index merged into master by the end of next week.
> The code is ready to go, but it's waiting on:
> 
> - An updated code review.

Stefan: From the history, I see that the commits post last review have been under various bugs (thanks) - such as bug 492488, 492503, 492504, 492506, 495557, 495817, 496044, 496129, 496142, 490010 - 16 commits spread over 10 bugs, a few merges and an npe commit - planning to give the review comments in the respective bugs for easier tracking and general review comments if any here.
Comment 124 Manoj N Palat CLA 2016-06-20 07:55:58 EDT
(In reply to Manoj Palat from comment #123)
> bugs, a few merges and an npe commit - planning to give the review comments
> in the respective bugs for easier tracking and general review comments if
> any here.

https://github.com/sxenos/eclipse.jdt.core/commit/94ef3213e4ac19488a3c6f924c7e0232d53bf4c7 - This commit does not have a bug - NPE fix - please do provide the stacktrace for future reference while fixing NPE issues - no additional comments for this commit - +1.
Comment 125 Stefan Xenos CLA 2016-06-20 11:38:47 EDT
> Can we keep the previous copies (immutable pattern) and reuse them later?

We could consider this as a future enhancement, based on actual measurements of the reindexing time in this scenario.
Comment 126 Lars Vogel CLA 2016-06-30 11:47:51 EDT
Can I ask what the status is on this development? I would love to use / test the new index in my daily work.
Comment 127 Stefan Xenos CLA 2016-06-30 13:02:35 EDT
Re: comment 126

Manoj did a round of code reviews late last week and gave some great feedback. I finished making the last of the requested changes yesterday. It's possible that there may be another round of change requests, but I suspect the code is in good shape.

We had an email thread last night discussing the merge date. Some of the JDT committers have holidays coming up so we're going to delay it until they get back. The current plan is to do the merge Wednesday July 6th.

Until that time, I still have a couple changes I want to get in:

- I temporarily disabled the fix for bug 495062 because I was concerned about a possible race condition. Since I have another week to work now I'll try to investigate that corner case and restore the change.

- There is a unit test failure in one of the new test cases I added for bug 496044. I'll be looking at this today.
Comment 128 Stefan Xenos CLA 2016-06-30 14:42:31 EDT
> There is a unit test failure in one of the new test cases

Actually, that test failure seems to be gone.
Comment 129 Manoj N Palat CLA 2016-07-04 23:12:55 EDT
@Stefan: I still see quite a lot of warnings (400 odd) in the code. Please cleanup and merge the code only after these warnings are fixed.
Comment 130 Manoj N Palat CLA 2016-07-04 23:25:41 EDT
(In reply to Manoj Palat from comment #129)
> 400 odd) in the code. 
 ^^^^^
Around 200+ in the modified code - so need to clear only those.
Comment 131 Stefan Xenos CLA 2016-07-05 17:08:32 EDT
Works with me, Manoj. I'll sort out the warnings.

FYI, I'm getting 53 junit errors and 240 test failures on my branch at the moment... but it's weird: the test failures occur even if I completely disable the indexer and all reads from the index. They also go away when I try to run individual tests directly -- I only see them when I run the entire (hour-long) test suite from the start.

The majority of the test failures seem to be build path errors (where it can't find java.lang.Object). I'm starting to suspect that the PDE classpath containers are misbehaving when I'm running the tests locally, but I may need some help investigating this further.

I'll start by squash-merging all the changes on my branch and pushing them to gerrit so that we can see if jenkins is seeing the same errors.
Comment 132 Stefan Xenos CLA 2016-07-05 18:27:04 EDT
Manoj, given the presence of the unit test failures and your comments in bug 496044, I don't think I'm going to try to do the big merge tomorrow.

I'll need more than the available time to investigate things.
Comment 133 Eclipse Genie CLA 2016-07-06 00:01:58 EDT
New Gerrit change created: https://git.eclipse.org/r/76660
Comment 134 Eclipse Genie CLA 2016-07-06 00:02:07 EDT
New Gerrit change created: https://git.eclipse.org/r/76661
Comment 135 Stefan Xenos CLA 2016-07-07 13:22:47 EDT
Many of the test failures relate to my handling of the JDT cache (in JavaModelManager). I've posted a proposal on bug 497513. Suggestions and comments are welcome.
Comment 136 Lars Vogel CLA 2016-07-22 04:25:50 EDT
Adding Tobias as interested party. He reports that the current JDT indexer frequently blocks their workspace for ~40 min.
Comment 137 Stefan Xenos CLA 2016-07-27 12:04:17 EDT
Moving to M2. Turning the preference on by default revealed a bunch of test failures, and the test failures are taking a while to sort through.
Comment 138 Stefan Xenos CLA 2016-08-22 18:35:37 EDT
I'm having some trouble getting the builds to run. They seem to freeze midway.

Could someone with root access to the build servers please ssh into the one running the build for this patch and grab a stack trace from the JVM that is running the unit tests? I'd like to see what all the threads are doing while the build is frozen.

Specifically, the tests for the latest build attached here:

https://git.eclipse.org/r/#/c/76660/
Comment 139 Jay Arthanareeswaran CLA 2016-08-22 23:45:38 EDT
I don't have shell access. Copying Markus in case he can spare some time for this.
Comment 140 Stefan Xenos CLA 2016-08-26 15:06:33 EDT
I added some support in bug 500358 to help track down JDT test deadlocks.
Comment 141 Stefan Xenos CLA 2016-08-26 21:51:39 EDT
Created attachment 263807 [details]
Stack dump from the frozen test case
Comment 142 Stefan Xenos CLA 2016-08-26 22:13:20 EDT
I think I've sorted out the deadlocks. See bug 500365.
Comment 143 Manoj N Palat CLA 2016-09-06 10:55:46 EDT
To target early M3 and moving out of M2 as we have M2 next week.
Comment 144 Stefan Xenos CLA 2016-09-06 12:06:55 EDT
> To target early M3 and moving out of M2 as we have M2 next week.

Actually, this might still make it into M2. I only have 4 remaining test failures and I have patches in progress for 3 of them. There's a chance I'll be have this ready by the end of the week, and if so I'd like to get it into M2.

Any objection if we leave this targetted to M2 until Friday? If I don't have all the tests fixed by then, we can bump to M3.
Comment 145 Manoj N Palat CLA 2016-09-06 23:18:56 EDT
(In reply to Stefan Xenos from comment #144)
> 
> Any objection if we leave this targetted to M2 until Friday? 
Yes. We have too little time to do a round of testing for M2 for a bug of this size. You can commit this during the early weeks of M3.

>If I don't have
> all the tests fixed by then, we can bump to M3.
I've moved it to M3

Can you list down here what are the features that are expected to be complete (with the new indexer) for by the end of this bug (eg: Type Hierarchy)?
Comment 146 Stefan Xenos CLA 2016-09-07 00:31:19 EDT
> Can you list down here what are the features that are expected to be
> complete (with the new indexer) for by the end of this bug (eg: Type
> Hierarchy)?

The initial merge will only include the indexer, basic infrastructure, and adoption in the type hierarchy view. The initial performance improvements in the type hierarchy will be modest.

After the code is merged, I plan to close this bug and start tracking additional features in separate bugs.

In order to get the massive speed improvements we demonstrated at EclipseCon, we also need the cache improvements described in bug 497513. Those are in progress but aren't stable enough to merge yet... and I'll track that work in that bug rather than this one.

For the past few months I've only been fixing test failures, so the feature list hasn't changed in some time.

> I've moved it to M3

Fair enough.
Comment 147 Lars Vogel CLA 2016-09-21 03:05:26 EDT
(In reply to Manoj Palat from comment #145)
> 
> I've moved it to M3

Sounded like it was almost ready for M2. What is the status? Can it be merged?
Comment 148 Stefan Xenos CLA 2016-09-21 08:48:42 EDT
> What is the status? Can it be merged?

In last night's build, there was only one remaining test failure. I'll be investigating it today.

After that I want to look at previous intermittent failures that are no longer occurring (in case some of them are bugs which are still present but aren't flaking at the moment).

Assuming all goes well, I can probably do the merge on Monday.
Comment 149 Jay Arthanareeswaran CLA 2016-09-26 05:44:05 EDT
Created attachment 264408 [details]
Failing jdt.ui tests

I went through the model related changes and the changes look good to me. Just a bunch of points I noted down:

1. There are some TODOs left in DeltaProcessor, ClassFile etc. Is there a plan to address these in the time frame of this bug? Otherwise, it would be good to raise follow-up bug to handle them later.

2. Just curious about JavaElementDelta#ignoreFromTests. Doesn't seem to be used anywhere except a toString() method. Is it meant to be part of the patch?

Couple of notes to self:
  1) Changes in ClassFile about reading class content need to looked closely when merging into BETA_JAVA9.

 2) To remind Stephan to review the changes about ExternalAnnotationProvider in ClasspathJar and ClasspathDirectory. Seem alright to me, though.

And finally, I ran the jdt.ui tests and there were 3 failures in total. I am attaching the export here.
Comment 150 Jay Arthanareeswaran CLA 2016-09-26 06:51:28 EDT
Another point:

What should be the version for the @since tags? This should be 3.13 as opposed to 3.12.
Comment 151 Stefan Xenos CLA 2016-09-26 12:12:13 EDT
>  2) To remind Stephan to review the changes about
> ExternalAnnotationProvider in ClasspathJar and ClasspathDirectory. Seem
> alright to me, though.

I think he'll probably be okay with those changes since they came from his update to the patch for bug 490010, which I cherry-picked into my branch. He either authored or already reviewed them all as part of that earlier bug.

> What should be the version for the @since tags? This should be 3.13 as
> opposed to 3.12.

I don't think it matters since this patch introduces no new API. I just put the @since tags there to stop the API tooling from complaining about false positives during incremental builds. But I'm happy to change them to 3.13 (or delete them) if you like.

> Just curious about JavaElementDelta#ignoreFromTests. Doesn't seem to be
> used anywhere except a toString() method. Is it meant to be part of the patch?

Yes. It's required or a bunch of tests fail. There's a bunch of tests that verify the event order, and they do this using said toString method. These new events can be fired asynchronously at any time and are unrelated to the behaviors being tested by those tests.

This flag is a temporary thing, though. We plan to eventually change the event order to eliminate the current double notifications that the index does for external .jar files. At that point, we will update all the tests related to event order and remove this flag.
Comment 152 Stefan Xenos CLA 2016-09-26 12:23:46 EDT
To those following here:

After some discussion on the mailing list, we decided not to merge today. There's some items I've been asked to look into prior to merging. I'll be doing the merge after we get a good I-build and after I've identified the problem with the frozen builds and the UI suite test failures.
Comment 153 Stefan Xenos CLA 2016-09-26 21:50:40 EDT
> And finally, I ran the jdt.ui tests and there were 3 failures in total.

Problems reproduced.
Comment 154 Stefan Xenos CLA 2016-09-27 01:26:37 EDT
The test failures are due to a real issue in the index.

I implemented IBinaryType.getFileName() according to the JavaDoc (in IDependent.getFileName()), which describes the string as a filename followed by an optional separator and binary type name. However, the actual usage in the JDT UI expects the first portion of the string to be an IJavaElement handle rather than a filename.

This is definitely a bug in that the required contract on getFileName() doesn't match it's javadoc... but I'll roll with it and update the index to return the same sort of string that JDT UI expects.
Comment 155 Stefan Xenos CLA 2016-09-27 01:32:08 EDT
Actually, it may make sense to change it the other way -- to make getFileName() actually return the sort of strings its JavaDoc claims it will return in all cases and update the callers to handle real filenames.
Comment 156 Stephan Herrmann CLA 2016-09-27 07:55:03 EDT
(In reply to Stefan Xenos from comment #151)
> >  2) To remind Stephan to review the changes about
> > ExternalAnnotationProvider in ClasspathJar and ClasspathDirectory. Seem
> > alright to me, though.
> 
> I think he'll probably be okay with those changes since they came from his
> update to the patch for bug 490010, which I cherry-picked into my branch. He
> either authored or already reviewed them all as part of that earlier bug.

I already have bug 500024 to follow up, once this is merged :)
Comment 157 Stefan Xenos CLA 2016-09-27 10:48:14 EDT
Re: comment 149

I've fixed the UI failures in bug 502259.
Comment 158 Eclipse Genie CLA 2016-09-27 10:58:53 EDT
New Gerrit change created: https://git.eclipse.org/r/82003
Comment 159 Manoj N Palat CLA 2016-09-30 00:23:59 EDT
(In reply to Eclipse Genie from comment #158)
> New Gerrit change created: https://git.eclipse.org/r/82003

@Stefan:
a) What does this patch fix?
b) The patch shows failures - can you comment on the same?
Comment 160 Manoj N Palat CLA 2016-09-30 00:34:52 EDT
(In reply to Manoj Palat from comment #159)
> (In reply to Eclipse Genie from comment #158)
> > New Gerrit change created: https://git.eclipse.org/r/82003
> 
> @Stefan:
> a) What does this patch fix?
> b) The patch shows failures - can you comment on the same?

If this patch is abandoned, please create a new patch for merging to master and upload to gerrit, let the tests run.
Comment 161 Manoj N Palat CLA 2016-09-30 00:41:01 EDT
The @since tag would now have to 3.13 instead of 3.12 eg for the type Nd and wherever applicable
Comment 162 Stefan Xenos CLA 2016-09-30 01:28:32 EDT
> What does this patch fix?

Nothing. A few days ago the build machines were locking up so I uploaded a new copy of the patch as part of the diagnostics. Please disregard.
Comment 163 Stefan Xenos CLA 2016-09-30 01:31:44 EDT
You can find the latest patch here. The last two builds were successful.

https://git.eclipse.org/r/#/c/76660/

However, I plan to merge the original branch if gerrit will let me - not this patch. That way the change history will be preserved.
Comment 164 Lars Vogel CLA 2016-09-30 02:44:31 EDT
(In reply to Stefan Xenos from comment #163)
> However, I plan to merge the original branch if gerrit will let me - not
> this patch. That way the change history will be preserved.

You can push directly to Git. Do not use Gerrit for this, otherwise you will create a review for each commit in your branch.
Comment 165 Till Brychcy CLA 2016-09-30 10:15:50 EDT
I've fetched the gerrit change and worked a bit with it and after a time got an NPE:

java.lang.NullPointerException
	at org.eclipse.jdt.internal.core.nd.java.NdType.getSourceName(NdType.java:211)
	at org.eclipse.jdt.internal.core.nd.java.model.IndexBinaryType.getSourceName(IndexBinaryType.java:319)
	at org.eclipse.jdt.internal.compiler.classfmt.ExternalAnnotationDecorator.getSourceName(ExternalAnnotationDecorator.java:126)
	at org.eclipse.jdt.internal.compiler.lookup.BinaryTypeBinding.<init>(BinaryTypeBinding.java:279)
	at org.eclipse.jdt.internal.compiler.lookup.BinaryTypeBinding.<init>(BinaryTypeBinding.java:249)
	at org.eclipse.jdt.internal.compiler.lookup.LookupEnvironment.createBinaryTypeFrom(LookupEnvironment.java:692)
	at org.eclipse.jdt.internal.compiler.lookup.LookupEnvironment.createBinaryTypeFrom(LookupEnvironment.java:688)
	at org.eclipse.jdt.internal.compiler.Compiler.accept(Compiler.java:301)
	at org.eclipse.jdt.internal.compiler.lookup.LookupEnvironment.askForType(LookupEnvironment.java:154)
	at org.eclipse.jdt.internal.compiler.lookup.UnresolvedReferenceBinding.resolve(UnresolvedReferenceBinding.java:102)
	at org.eclipse.jdt.internal.compiler.lookup.BinaryTypeBinding.resolveType(BinaryTypeBinding.java:212)
	at org.eclipse.jdt.internal.compiler.lookup.BinaryTypeBinding.superInterfaces(BinaryTypeBinding.java:1906)
	at org.eclipse.jdt.internal.compiler.lookup.ClassScope.detectHierarchyCycle(ClassScope.java:1259)
	at org.eclipse.jdt.internal.compiler.lookup.ClassScope.connectEnumSuperclass(ClassScope.java:997)
	at org.eclipse.jdt.internal.compiler.lookup.ClassScope.connectSuperclass(ClassScope.java:949)
	at org.eclipse.jdt.internal.compiler.lookup.ClassScope.connectTypeHierarchy(ClassScope.java:1115)
	at org.eclipse.jdt.internal.compiler.lookup.ClassScope.connectMemberTypes(ClassScope.java:922)
	at org.eclipse.jdt.internal.compiler.lookup.ClassScope.connectTypeHierarchy(ClassScope.java:1124)
	at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.connectTypeHierarchy(CompilationUnitScope.java:324)
	at org.eclipse.jdt.internal.compiler.lookup.LookupEnvironment.completeTypeBindings(LookupEnvironment.java:247)
	at org.eclipse.jdt.internal.compiler.Compiler.internalBeginToCompile(Compiler.java:843)
	at org.eclipse.jdt.internal.compiler.Compiler.beginToCompile(Compiler.java:385)
	at org.eclipse.jdt.internal.compiler.Compiler.resolve(Compiler.java:999)
	at org.eclipse.jdt.internal.compiler.Compiler.resolve(Compiler.java:1075)
	at org.eclipse.jdt.internal.core.CompilationUnitProblemFinder.process(CompilationUnitProblemFinder.java:205)
	at org.eclipse.jdt.internal.core.CompilationUnitProblemFinder.process(CompilationUnitProblemFinder.java:271)
	at org.eclipse.jdt.internal.core.ReconcileWorkingCopyOperation.makeConsistent(ReconcileWorkingCopyOperation.java:191)
	at org.eclipse.jdt.internal.core.ReconcileWorkingCopyOperation.executeOperation(ReconcileWorkingCopyOperation.java:99)
	at 
[...]

In the debugger, clicking on the NDType said "Serializable".

After a clean build, the problem disappeared.

I cannot reproduce it yet, but will continue to work with the new index.
Comment 166 Stefan Xenos CLA 2016-09-30 11:06:51 EDT
I've extracted bug 502884 to cover comment 165.
Comment 167 Manoj N Palat CLA 2016-10-03 09:59:05 EDT
I am getting an IndexException (IOException) - yet to figure out how to reproduce.


org.eclipse.jdt.internal.core.nd.db.IndexException: IOException
	at org.eclipse.jdt.internal.core.nd.db.Chunk.read(Chunk.java:43)
	at org.eclipse.jdt.internal.core.nd.db.Database.getChunk(Database.java:317)
	at org.eclipse.jdt.internal.core.nd.db.Database.getShort(Database.java:574)
	at org.eclipse.jdt.internal.core.nd.field.FieldShort.get(FieldShort.java:28)
	at org.eclipse.jdt.internal.core.nd.NdNode.load(NdNode.java:65)
	at org.eclipse.jdt.internal.core.nd.field.FieldOneToMany.get(FieldOneToMany.java:113)
	at org.eclipse.jdt.internal.core.nd.field.FieldOneToMany.accept(FieldOneToMany.java:85)
	at org.eclipse.jdt.internal.core.nd.field.FieldOneToMany.asList(FieldOneToMany.java:92)
	at org.eclipse.jdt.internal.core.nd.java.NdBinding.getVariables(NdBinding.java:57)
	at org.eclipse.jdt.internal.core.nd.java.model.IndexBinaryType.getFields(IndexBinaryType.java:162)
	at org.eclipse.jdt.internal.compiler.classfmt.ExternalAnnotationDecorator.getFields(ExternalAnnotationDecorator.java:91)
	at org.eclipse.jdt.internal.compiler.lookup.BinaryTypeBinding.cachePartsFrom(BinaryTypeBinding.java:522)
	at org.eclipse.jdt.internal.compiler.lookup.LookupEnvironment.createBinaryTypeFrom(LookupEnvironment.java:705)
	at org.eclipse.jdt.internal.compiler.lookup.LookupEnvironment.createBinaryTypeFrom(LookupEnvironment.java:688)
	at org.eclipse.jdt.internal.codeassist.impl.Engine.accept(Engine.java:65)
	at org.eclipse.jdt.internal.compiler.lookup.LookupEnvironment.askForType(LookupEnvironment.java:178)
	at org.eclipse.jdt.internal.compiler.lookup.PackageBinding.getTypeOrPackage(PackageBinding.java:201)
	at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.findImport(CompilationUnitScope.java:466)
	at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.findSingleStaticImport(CompilationUnitScope.java:523)
	at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.findSingleImport(CompilationUnitScope.java:519)
	at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.faultInImports(CompilationUnitScope.java:397)
	at org.eclipse.jdt.internal.compiler.lookup.Scope.getBinding(Scope.java:2124)
	at org.eclipse.jdt.internal.compiler.lookup.BlockScope.getBinding(BlockScope.java:485)
	at org.eclipse.jdt.internal.compiler.ast.QualifiedNameReference.resolveType(QualifiedNameReference.java:999)
	at org.eclipse.jdt.internal.compiler.ast.MemberValuePair.resolveTypeExpecting(MemberValuePair.java:89)
	at org.eclipse.jdt.internal.compiler.ast.Annotation.resolveType(Annotation.java:829)
	at org.eclipse.jdt.internal.compiler.ast.ASTNode.resolveAnnotations(ASTNode.java:837)
	at org.eclipse.jdt.internal.compiler.ast.ASTNode.resolveAnnotations(ASTNode.java:705)
	at org.eclipse.jdt.internal.compiler.lookup.SourceTypeBinding.getAnnotationTagBits(SourceTypeBinding.java:1022)
	at org.eclipse.jdt.internal.compiler.lookup.LookupEnvironment.initializeUsesNullTypeAnnotation(LookupEnvironment.java:1176)
	at org.eclipse.jdt.internal.compiler.lookup.LookupEnvironment.usesNullTypeAnnotations(LookupEnvironment.java:1151)
	at org.eclipse.jdt.internal.compiler.lookup.SourceTypeBinding.evaluateNullAnnotations(SourceTypeBinding.java:2121)
	at org.eclipse.jdt.internal.compiler.ast.ASTNode.resolveAnnotations(ASTNode.java:707)
	at org.eclipse.jdt.internal.compiler.lookup.SourceTypeBinding.getAnnotationTagBits(SourceTypeBinding.java:1022)
	at org.eclipse.jdt.internal.compiler.lookup.SourceTypeBinding.checkAnnotationsInType(SourceTypeBinding.java:887)
	at org.eclipse.jdt.internal.compiler.lookup.SourceTypeBinding.faultInTypesForFieldsAndMethods(SourceTypeBinding.java:898)
	at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.faultInTypes(CompilationUnitScope.java:448)
	at org.eclipse.jdt.internal.codeassist.SelectionEngine.select(SelectionEngine.java:1028)
	at org.eclipse.jdt.internal.core.Openable.codeSelect(Openable.java:163)
	at org.eclipse.jdt.internal.core.ClassFile.codeSelect(ClassFile.java:186)
	at org.eclipse.jdt.internal.core.ClassFile.codeSelect(ClassFile.java:175)
	at org.eclipse.jdt.internal.ui.text.java.hover.AbstractJavaEditorTextHover.getJavaElementsAt(AbstractJavaEditorTextHover.java:121)
	at org.eclipse.jdt.internal.ui.text.java.hover.JavadocHover.internalGetHoverInfo(JavadocHover.java:631)
	at org.eclipse.jdt.internal.ui.text.java.hover.JavadocHover.getHoverInfo2(JavadocHover.java:627)
	at org.eclipse.jdt.internal.ui.text.java.hover.BestMatchHover.getHoverInfo2(BestMatchHover.java:164)
	at org.eclipse.jdt.internal.ui.text.java.hover.BestMatchHover.getHoverInfo2(BestMatchHover.java:130)
	at org.eclipse.jdt.internal.ui.text.java.hover.JavaEditorTextHoverProxy.getHoverInfo2(JavaEditorTextHoverProxy.java:86)
	at org.eclipse.jface.text.TextViewerHoverManager$4.run(TextViewerHoverManager.java:165)
Caused by: java.nio.channels.ClosedByInterruptException
	at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
	at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:713)
	at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:691)
	at org.eclipse.jdt.internal.core.nd.db.Database.read(Database.java:188)
	at org.eclipse.jdt.internal.core.nd.db.Chunk.read(Chunk.java:41)
	... 47 more
Comment 168 Jay Arthanareeswaran CLA 2016-10-03 10:07:19 EDT
(In reply to Jay Arthanareeswaran from comment #149)
> Created attachment 264408 [details]
> Failing jdt.ui tests

These no longer occur with the latest changeset.
Comment 169 Manoj N Palat CLA 2016-10-03 11:09:23 EDT
(In reply to Manoj Palat from comment #167)
> I am getting an IndexException (IOException) - yet to figure out how to
> reproduce.
> 
Not reproducible always but sometimes by the following:

Have a project dependent on org.eclipse.team.ui.mapping.DiffTreeChangesSection
Get the DTCS, click on runnable of the constructor, select for getting the type hierarchy, F4 and then this stack dump happens.
Comment 170 Sergey Prigogin CLA 2016-10-03 14:49:00 EDT
(In reply to Manoj Palat from comment #169)

Manoj, could you please file a separate bug for this issue. Thank you.
Comment 171 Stefan Xenos CLA 2016-10-03 16:01:53 EDT
Re: comment 170

I've extracted bug 503149.

Note that this appears to be harmless log spam. Although it's worth fixing, I don't think it's cause to delay Tuesday's merge.
Comment 172 Manoj N Palat CLA 2016-10-03 20:46:11 EDT
(In reply to Stefan Xenos from comment #171)
> Re: comment 170
> 
> I've extracted bug 503149.

Thanks Stefan for extracting the bug.

> Note that this appears to be harmless log spam. Although it's worth fixing,
> I don't think it's cause to delay Tuesday's merge.

+1. This is not a blocker for the merge. I've been doing sanity testing on the code and haven't found any blockers. 

+1 for merge.
Comment 173 Stefan Xenos CLA 2016-10-04 14:37:11 EDT
I'm going to mark this as fixed now that the initial implementation of the new index has been merged to master.

I realize we have a lot to do before we'll really be able to experience the intended benefits from this code, but we can track the remaining items in follow-up bugs.

I'll continue to use the [newindex] tag on those bugs to make them easy to find.

Most important follow-up items:
- Fix the change notifications to remove the double events for external jars.
- Implement a source indexer.
- Adopt the new index in the Open Type dialog.
- Update the type hierarchy view to avoid blocking on the indexer.
- Update the JavaModelManager cache to be more efficient for use with the indexer.
- Update the SearchEngine APIs to use the new index.

I'll file follow-up bugs for these items shortly.
Comment 174 Eclipse Genie CLA 2016-10-18 13:36:22 EDT
New Gerrit change created: https://git.eclipse.org/r/83464
Comment 176 Lorenzo Bettini CLA 2017-01-17 01:49:03 EST
From what I understand, this new indexing mechanism in Oxygen is the one creating the "index.db" file in the workspace, which is rather huge (around 300Mb).  Besides, *.index files are still created (I guess they're the ones created by the old index).

For the moment, I'd like to avoid having this huge file index.db in all my workspaces using Oxygen, thus I checked the preference "Disable new Java index". I removed the index.db but this gets recreated when Eclipse opens the workspace...

Is this expected?
Comment 177 Stefan Xenos CLA 2017-01-17 10:12:53 EST
Re: comment 176

Yes, the new indexer is responsible for creating index.db. No, the "use new java index" preference is not meant to prevent the index from being created. It is meant to help track down regressions caused by the index's adoption, not to save disk space -- so it only disables reads from the index. Yes, this is expected.

There is no preference that will save the disk space.

Can you explain your use-case? Why is the 300MB index problematic for you? If I understand your use-case better, perhaps there's some way to address it directly without disabling the index entirely.
Comment 178 Lorenzo Bettini CLA 2017-01-24 02:14:44 EST
(In reply to Stefan Xenos from comment #177)
> Can you explain your use-case? Why is the 300MB index problematic for you?
> If I understand your use-case better, perhaps there's some way to address it
> directly without disabling the index entirely.

Hi Stefan

it's just that with Oxygen every workspace is 300MB bigger... since I have several workspaces (around 20) this means that switching to Oxygen will "eat" 6GB more.  Of course it's not anything crucial but I would have liked to avoid that, even because there are still lots of big .index files around.  I guess that .index files are generated by the old index, is that right?
Comment 179 Fred Bricon CLA 2017-04-13 10:46:32 EDT
I'm currently in the same situation as Lorenzo. We consume JDT from Oxygen in the Java Language Server[1]. The server is currently consumed by our Java extension for Visual Studio Code[2] (and eventually Eclipse Orion, Eclipse Che, vim and others). 
Because each (multi)project we open is assigned a new workspace, all these index.dbs add up *really* fast[3].
Are there any plans on optimizing disk space eventually? maybe using a shared index or limiting index space? I can open a new BZ for an enhancement request, if that makes sense  

[1] https://github.com/eclipse/eclipse.jdt.ls
[2] https://marketplace.visualstudio.com/items?itemName=redhat.java
[3] https://github.com/redhat-developer/vscode-java/issues/185
Comment 180 Stefan Xenos CLA 2017-04-13 13:03:23 EDT
Re: comment 179

It's unlikely that the database will ever get much smaller. I have some optimizations planned, but there's also a lot of information we need to add to it that isn't there yet (such as the call site information needed to speed up the call hierarchy view).

> maybe using a shared index

As for a shared index, that should be possible and would probably work quite well. Those portions of the index that reflect the JRE and any external .jars on the classpath that are common among the different workspaces would only get indexed once. So depending on the workspace, it could actually make a huge difference in both disk space and indexing time. We should file a bug for this.

Something else you could try: the content of index.db is entirely derived data, so it can be safely deleted when Eclipse isn't using it. Eclipse will just rebuild it the next time it restarts. If you're auto-creating these workspaces, you probably know where they all are and could just iterate over them delete any index.db for any workspace that hasn't been used recently.
Comment 181 Fred Bricon CLA 2017-04-13 13:16:26 EDT
Thanks Stephan, I opened bug #515268.
Comment 182 Eclipse Genie CLA 2018-02-05 16:49:45 EST
New Gerrit change created: https://git.eclipse.org/r/116751