Bug 436060 - Race condition in CProjectDescriptionManager.updateProjectDescriptions()
Summary: Race condition in CProjectDescriptionManager.updateProjectDescriptions()
Status: NEW
Alias: None
Product: CDT
Classification: Tools
Component: cdt-build (show other bugs)
Version: 8.3.0   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: cdt-build-inbox@eclipse.org CLA
QA Contact: Jonah Graham CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-05-28 09:46 EDT by Christian Walther CLA
Modified: 2020-09-04 15:25 EDT (History)
4 users (show)

See Also:


Attachments
reproducing example plugin (source + binary) (13.27 KB, application/zip)
2014-05-28 11:06 EDT, Christian Walther CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Walther CLA 2014-05-28 09:46:55 EDT
As discussed starting at http://dev.eclipse.org/mhonarc/lists/cdt-dev/msg27570.html :

I am having a problem with a race condition between the GCCBuiltinSpecsDetector and a job of my own plugin: I am trying to modify the project description by calling

  ICProjectDescription writableProject = CoreModel.getDefault().getProjectDescription(m_project);
  // .. modify writableProject ...
  CoreModel.getDefault().setProjectDescription(m_project, writableProject, false, monitor);

but find that my modifications are overwritten soon after by GCCBuiltinSpecsDetector writing back what appears to be an older project description.

This happens in org.eclipse.cdt.internal.core.settings.model.CProjectDescriptionManager.updateProjectDescriptions(IProject[], IProgressMonitor), called indirectly from org.eclipse.cdt.managedbuilder.language.settings.providers.AbstractBuiltinSpecsDetector.execute() via AbstractBuiltinSpecsDetector.runForEachLanguage() ... LanguageSettingsSerializableProvider.serializeLanguageSettings() ... LanguageSettingsProvidersSerializer.notifyLanguageSettingsChangeListeners() ... CModelManager.handleEvent(ILanguageSettingsChangeEvent). 

CProjectDescriptionManager.updateProjectDescriptions() as per the logic in AbstractBuiltinSpecsDetector.execute() runs with the scheduling rule of the project and gets the project description (A), then schedules its second part to write it back (B) with the scheduling rule of the workspace root (because project does not contain workspace root). Between the execution of A and B, no scheduling rules are held and others are free to call getProjectDescription() and setProjectDescription() themselves, only to have their modifications overwritten by B.

This seems like a bug to me, updateProjectDescriptions() shouldn't let its two parts be separated like this, getProjectDescription() and setProjectDescription() should be executed as an atomic unit while holding the scheduling rule of at least the affected project (in fact, it must be at least the workspace root because that's what setProjectDescription() uses internally).

As an experiment I tried modifying AbstractBuiltinSpecsDetector.execute() to always run its job with the workspace root scheduling rule, so that updateProjectDescriptions() wouldn't schedule its part B but run it synchronously, however that just resulted in an infinite stream of GCCBuiltinSpecsDetector executions, because every call of setProjectDescription() causes another execution via XmlProjectDescriptionStorage.setCurrentDescription() ... LanguageSettingsProvidersSerializer.reRegisterListeners() ... AbstractBuiltinSpecsDetector.registerListener().

Concerning possible solutions,
I wrote:
> Maybe updateProjectDescriptions() should put both the getting and the setting of project descriptions into its scheduled part (B)? I don't even completely understand yet what the purpose of updateProjectDescriptions() is - documentation says it recalculates "cached data", but I'm not familiar enough with CDT internals to know what "cached data" is.
Andrew Gvozdev wrote:
> Those are dark parts of CDT where knowledge is lost... Maybe updateProjectDescriptions() should check if the project description is still up to date before setting? If not just forget about it, this old description is gone already.

Simply applying the former idea leads to the same infinite loop of GCCBuiltinSpecsDetector executions as described above. The reason for the loop is that GCCBuiltinSpecsDetectors are cloned faster than they can execute. As part of every setProjectDescription() operation (or possibly earlier if the description is modified, I haven't checked exactly), the project description including all its contained GCCBuiltinSpecsDetectors is cloned. If the GCCBuiltinSpecsDetector (a) hasn't gotten around to executing yet, the clone (b) inherits the isExecuted = false flag, so now both the original (a) and the clone (b) are set to execute some time in the future. When the original (a) finally executes (even though the project description it belongs to is no longer the current one and only waiting to be garbage-collected), it calls updateProjectDescriptions() at the end, causing another round of cloning, where (b), which is now the one belonging to the current project description and hasn't executed yet, is cloned into another one (c) that still needs to execute, and so on ad infinitum. It gets worse when there is more than one configuration, because every updateProjectDescriptions() call from *one* executing GCCBuiltinSpecsDetector causes the unexecuted GCCBuiltinSpecsDetectors of *all* configurations to be cloned - exponential explosion.

The reason why the infinite loop does not occur with the current (arguably flawed) implementation of updateProjectDescriptions() is presumably that because the project description that it sets is an older one, its GCCBuiltinSpecsDetectors have had a chance to execute by then and further cloning will create ones with isExecuted = true, so that their execution job will exit early and not call updateProjectDescriptions() to perpetuate the loop.

This may be related to bug 408541. I don't think it's exactly the same issue but the fix is likely to affect it.

Reproducing example and proposed fixes forthcoming.
Comment 1 Christian Walther CLA 2014-05-28 11:06:27 EDT
Created attachment 243639 [details]
reproducing example plugin (source + binary)

Here is a minimal reproducing example and a proposed fix.

The fix consists of two parts 1 and 2, and part 1 has two alternative variants 1a and 1b of which I am not sure which is better.

1a: https://git.eclipse.org/r/27462 - Fix the race condition in updateProjectDescriptions() by moving both parts into the scheduled job. By itself, this causes the infinite loop, fix 2 is required to remedy that.

1b: https://git.eclipse.org/r/27463 - Fix the race condition in updateProjectDescriptions() by checking whether the project description is still current before calling setProjectDescription(). This does not absolutely require fix 2, although it is still beneficial. The advantage of this compared to 1a is increased efficiency because setProjectDescription() is called fewer times. The potential disadvantage is that I had to change from writable project descriptions to read-only ones, and I am not sure whether that has any undesired consequences. As far as I can see it should not, but I would appreciate a second opinion on that.

2: https://git.eclipse.org/r/27464 - Don't unnecessarily run an AbstractBuiltinSpecsDetector that belongs to a project description that is no longer current (because it was replaced by a clone during a setProjectDescription() operation before it had a chance to run for the first time). This saves unnecessary GCC executions and by avoiding the final call to updateProjectDescriptions() avoids chain reactions such as the mentioned infinite loop. Required for 1a, not absolutely required but improves user experience and may avoid future problems for 1b.

Steps to reproduce:
- Install the attached plugin ch.indel.projectdescriptiontest, which adds a dummy page and corresponding operation to the New Project wizard (this is not exactly what we do in the real situation, but it's a convenient way of hooking into the project creation process at around the same time). The operation makes some changes to the CDT project, once synchronously and once in a scheduled job (both are necessary to provoke the problem and correspond to changes that occur in the real situation), see ch.indel.projectdescriptiontest.WizardPage1Operation.run().
- Bring the Progress view to the front.
- Make a new "C Project" with project type "Executable > Empty Project" and toolchain "Cross GCC". All other settings left at defaults, cross compiler prefix/path should point to a working GCC (may or may not work without one, I didn't try). (I chose "Cross GCC" because I could test that on both Windows and Linux - "Linux GCC" does not work to reproduce the problem as is, probably due to different steps being done during project creation, but could probably be made to with the right changes to WizardPage1Operation.)
- Note that a lot more "Discover compiler built-in language settings" jobs appear to be going on in the Progress view than one could expect.
- Open the .cproject file in a text editor and search for "-Dvalue".

Expected result: An option with value "-Dvalue2" should be found.

Actual result: An option with value "-Dvalue1" is found, even though the job that set it to "-Dvalue2" ran later - the new value (which once existed in the .cproject file, as you can verify by stepping through the procedure with a breakpoint in org.eclipse.cdt.internal.core.settings.model.CProjectDescriptionManager.setProjectDescription(IProject, ICProjectDescription, int, IProgressMonitor)) was overwritten by an earlier one.

Now apply fix 1a and repeat the steps to reproduce.

Actual result:
- "-Dvalue2" is now correctly saved in the .cproject file. (expected)
- The Progress view shows an infinite sequence of "Discover compiler built-in language settings" jobs running. (not expected)

Additionally apply fix 2 and repeat.

Actual and expected result:
- "-Dvalue2" is correctly saved in the .cproject file.
- Very few "Discover compiler built-in language settings" jobs appear in the Progress view.

Alternatively, start over from master and apply fix 1b, then repeat.

Actual result:
- "-Dvalue2" is correctly saved in the .cproject file. (expected)
- Still more "Discover compiler built-in language settings" jobs appear to be going on in the Progress view than one could expect. (not expected)

Additionally apply fix 2 and repeat.

Actual and expected result:
- "-Dvalue2" is correctly saved in the .cproject file.
- Very few "Discover compiler built-in language settings" jobs appear in the Progress view.
Comment 2 Christian Walther CLA 2014-05-29 06:54:36 EDT
Fix 2 causes test failures on Hudson… embarrassing. Apparently this whole business of project descriptions and language settings providers is more complicated than I thought. I can ad hoc fix the one in testAbstractBuiltinSpecsDetector_EnvChangesConfiguration_2 but I'm not sure right now what to do about the one in testAbstractBuiltinSpecsDetector_EnvChangesConfiguration_1. Under what circumstances does it occur in the wild that a language settings provider is registered on a configuration but not part of it?
Comment 3 Christian Walther CLA 2014-06-02 07:03:38 EDT
I've pushed an updated version that passes unit tests and still addresses the original problem, however I'm less convinced now that it is correct in all circumstances. Review appreciated.
Comment 4 Andrew Gvozdev CLA 2014-06-03 15:33:37 EDT
I suggest a different way of checking whether a configuration is stale. We can check read-only "cached" cfg, if it did not get replaced then our writable configuration is still up to date.
Could you try to test with this patch? For some reason I cannot reproduce it following your steps. I run debugging session with your plugin and it appears in plugins list but I don't see your project wizard page and breakpoints in your plugin are not triggered.
Comment 5 Andrew Gvozdev CLA 2014-06-03 17:13:13 EDT
I pushed a sample fix into gerrit. The same 2 tests failed but I think that it is likely a problem with the tests which check some incidental assumptions.
Comment 6 Christian Walther CLA 2014-06-04 04:13:15 EDT
I can confirm that your patch set 3 also fixes the problem, both with my example plugin and in our real situation.

I'm not sure why the example plugin wouldn't work for you, I've tried it on several installations on Windows, Linux, and Mac OS X, on Kepler and Luna. (For some reason the "nothing to see here" page obscures the Cross GCC page on Linux and Mac OS X, but I probably just forgot to override some method there.)

As to whether the test failures are significant, you are in a better position to judge that than I. It's obvious that testAbstractBuiltinSpecsDetector_EnvChangesConfiguration_1 would fail this way, because the provider was explicitly registered on the old configuration. For testAbstractBuiltinSpecsDetector_EnvChangesConfiguration_2, the difference to the non-test situation is that some language settings providers (namely those that are not ILanguageSettingsEditableProviders, such as MockBuiltinSpecsDetectorEnvironmentChangeListener, whereas GCCBuiltinSpecsDetector is one) are not cloned during setProjectDescription(), but just assigned to the new configuration, so that the provider is still registered on the old configuration while being contained in the new one (as well as the old one).
Comment 7 Andrew Gvozdev CLA 2014-06-10 17:34:10 EDT
The fix part 1 https://git.eclipse.org/r/#/c/27463/2 merged to master.
Comment 8 John Moule CLA 2014-10-08 06:11:31 EDT
I too found problems with my project modifications being overwritten when the scanner ran. Thank goodness I found this bugzilla. I've applied the patch and it solves my problem. Many, many thanks. rgds john
Comment 9 NINGAREDDY MODASE CLA 2014-12-14 09:59:58 EST
I am also facing this issue.
Could you please let me know how to apply patch to solve this problem?

I am currently using 8.3.0.

I had installed 8.4.0 and 8.5.0 as well, but look like these patch changes were not there
Comment 10 Christian Walther CLA 2014-12-15 02:40:01 EST
(In reply to NINGAREDDY MODASE from comment #9)
> Could you please let me know how to apply patch to solve this problem?

Set up the development environment as desribed in http://wiki.eclipse.org/Getting_started_with_CDT_development .

Get the patch as described on the Gerrit page (https://git.eclipse.org/r/#/c/27464/):
git fetch git://git.eclipse.org/gitroot/cdt/org.eclipse.cdt refs/changes/64/27464/4 && git checkout FETCH_HEAD

You'll probably want to try to merge it with current master or whatever release branch your product is based on.
Comment 11 NINGAREDDY MODASE CLA 2014-12-15 04:00:10 EST
Thank you.

I have taken the update from daily build. It is working.
Comment 12 x korat CLA 2015-02-03 22:22:50 EST
(In reply to Christian Walther from comment #10)
> (In reply to NINGAREDDY MODASE from comment #9)
> > Could you please let me know how to apply patch to solve this problem?
> 
> Set up the development environment as desribed in
> http://wiki.eclipse.org/Getting_started_with_CDT_development .
> 
> Get the patch as described on the Gerrit page
> (https://git.eclipse.org/r/#/c/27464/):
> git fetch git://git.eclipse.org/gitroot/cdt/org.eclipse.cdt
> refs/changes/64/27464/4 && git checkout FETCH_HEAD
> 
> You'll probably want to try to merge it with current master or whatever
> release branch your product is based on.

Thanks, this patch is very useful!
Why CDT master don't merge this patch?
Comment 13 Christian Walther CLA 2015-02-04 01:59:21 EST
As far as I am concerned: because there are still open questions about it (on Gerrit). You are welcome to chip in over there if you think you can help resolve them.
Comment 14 John Moule CLA 2015-08-12 12:35:14 EDT
Hi,
I've tried fix part 2 (https://git.eclipse.org/r/27464). I see a big improvement in the time it takes the CDT Scanner Discovery to quiesce after creating a new Cross GCC C++ Executable project. The number of "Discover compiler built-in language settings" jobs is reduced from about 20 to about 3 for me which is great.

I note there's still outstanding discussion about test failures (testAbstractBuiltinSpecsDetector_EnvChangesConfiguration_2). Is there any progress on whether this fix is safe to apply for real world?

cheers john
Comment 15 Christian Walther CLA 2015-08-17 05:54:53 EDT
No progress as far as I’m aware (still waiting for an answer from Andrew), but my understanding is that the test failures indicate problems with the test, not with the subject. I believe the patch (part 2) is safe for real-world use, but as I just checked we have not included it in our product yet so I cannot say that we tested it in the real world.
Comment 16 Andrew Gvozdev CLA 2015-08-17 10:37:57 EDT
Sorry I did not have much time to work on this but hope to get it done eventually.
Comment 17 John Moule CLA 2015-08-17 12:30:59 EDT
Thanks.
Comment 18 Christian Walther CLA 2016-01-05 10:18:18 EST
I have pushed patch set 5 to Gerrit, which modifies patch set 4 with
- what I believe is a better fix for the failing test
- a fix for NullPointerExceptions when abandoned BuiltinSpecsDetectors try to run after their project has been deleted (seen during unit tests on Windows).

I am going to integrate this in our product now.

Unfortunately the cdt-verify build failed (through none of my doing) before it reached my modified test.
Comment 19 Jonah Graham CLA 2019-12-30 17:05:01 EST
This bug was assigned and targeted at a now released milestone. As that milestone has now passed, the milestone field has been cleared. If this bug has been fixed, please set the milestone to the version it was fixed in and marked the bug as resolved.