Bug 426648 - boost/typeof/std/functional.hpp does not have 2,883,584 base classes
Summary: boost/typeof/std/functional.hpp does not have 2,883,584 base classes
Status: NEW
Alias: None
Product: CDT
Classification: Tools
Component: cdt-indexer (show other bugs)
Version: Next   Edit
Hardware: PC Linux
: P3 normal with 1 vote (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact: Jonah Graham CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-25 12:27 EST by Andrew Eidsness CLA
Modified: 2020-09-04 15:25 EDT (History)
4 users (show)

See Also:


Attachments
test case to reproduce the problem (3.92 KB, patch)
2014-01-29 16:06 EST, Andrew Eidsness CLA
no flags Details | Diff
text file with embedded ANSI colour codes (2.22 KB, text/plain)
2014-02-06 14:32 EST, Andrew Eidsness CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Eidsness CLA 2014-01-25 12:27:03 EST
The CDT has a few different problems trying to index the boost sources.  At this point I'm using boost version 1.54.0, but I've also run tests with 1.35.0 and 1.55.0.

Some of the problems are caused by intentionally aggressive code in boost (e.g., see Bug 425711).  By excluding these files from indexing you can eventually get the indexer to read boost/libs/local_function/test/macro_commas_seq.cpp.  This file includes boost/boost/typeof/std/functional.hpp.

The CDT will eventually run out memory while indexing this file.  My examining the heap dump I was able to see that the indexer had created 2.9 million base classes for this class.  They are all in a single array that consumes about 423Mb.

With further debugging I was able to trace the problem to PDOMCPPLinkage.onCreateName(PDOMFile file, IASTName name, PDOMName pdomName).  When the IASTName is a CPPASTTempateId, it will evaluate the final condition:

1085: } else if (parentNode instanceof ICPPASTCompositeTypeSpecifier) {
1086:     IBinding classBinding = name.resolveBinding();
1087:          if (classBinding instanceof ICPPClassType) {
1088:         ICPPBase[] bases;
1089:             if (classBinding instanceof ICPPClassSpecialization) {
1090:                 bases= ((ICPPClassSpecialization) classBinding).getBases(name);
1091:             } else {
1092:                 bases= ((ICPPClassType) classBinding).getBases();
1093:         }
1094:         if (bases.length > 0) {
1095:             PDOMBinding pdomBinding = pdomName.getBinding();
1096:             if (pdomBinding instanceof PDOMCPPClassType) {
1097:                 ((PDOMCPPClassType) pdomBinding).addBases(pdomName, bases);
1098:             } else if (pdomBinding instanceof PDOMCPPClassSpecialization) {
1099:                 ((PDOMCPPClassSpecialization) pdomBinding).addBases(pdomName, bases);
1100:             }
1101:         }
1102:     }
1103: }

This code incorrectly assumes that if a PDOMName has just been created then the PDOMBinding must also have just been created.

What is actually happening is that a specialization of the template has already been created in a different source file.  When this code runs, the binding returned on line 1086 is a PDOMBinding that has the same record number as the pdomBinding returned on line 1095.  The code reads all bases from the binding and then adds them back to the same PDOMBinding.

This is easier to see in boost/libs/multiprecision/test which has several nearly identical test files.  Each of those files has a specialization for is_twos_complement_integer.  Putting a breakpoint on line 1085 and indexing that folder makes the problem clear.

Each time a template is specialized with the same parameters it doubles the number of bases in the list.

At a minimum, this code should make sure that it is not adding bindings to itself:

1096: if( ! pdomBinding.equals( classBinding ) ) {

However, I'm not really convinced that is the right answer.  The PDOMBinding should be properly populated when it is first created, not just when a new name is created to point to it.
Comment 1 Andrew Eidsness CLA 2014-01-29 16:06:12 EST
Created attachment 239448 [details]
test case to reproduce the problem

This patch has a new test case that reproduces the problem.

The original source code is from boost (I used version 1.55.0, but the code doesn't seem to have changed for many versions).

    boost/libs/multiprecision/test/test_arithmetic_cpp_int_10.cpp
    boost/libs/multiprecision/test/test_arithmetic_cpp_int_11.cpp

Have identical partial specializations for is_twos_complement_integer.

Actually there are many files in that folder with specializations for is_twos_complement_integer.  If you try to index this folder the PDOM node for is_twos_complement_integer has 262144 bases.
Comment 2 Andrew Eidsness CLA 2014-02-06 14:32:19 EST
Created attachment 239712 [details]
text file with embedded ANSI colour codes

This bug is more general that what I previously described.  After more investigation I can reproduce a more general form of the same problem.  These are the new test files:

header.hh:
	1
	2 #ifndef HEADER_HH
	3 #define HEADER_HH
	4
	5 class B {};
	6 template <typename T> class Te : public B {};
	7
	8 #endif
	9

file1.cpp:
	1
	2 #include "header.hh"
	3 class B1 {};
	4 template <typename T> class Te<T *> : public B1 {};
	5
	6 int main() { }
	7

file2.cpp:
	1
	2 #include "header.hh"
	3 class B1 {};
	4 class B2 {};
	5 template <typename T> class Te<T *> : public B2 {};
	6
	7 int main() { }
	8

The Makefile for this project produces two binaries, file1 and file2, which means there is case of duplicate definitions.

There is a dump of the pdom that is generated for this project (the attachment is a text file that has this content with embedded ANSI colour codes which make it easier to follow the names and bindings):

Linkage C++[1]
12ea-1333 PDOMCPPClassType B
   defns: 1622 5: class B {};                                         header.hh[44,45]
    refs: 1942 6: template <typename T> class Te : public B {};       header.hh[90,91]

1642-1695 PDOMCPPClassTemplate Te<T>
   bases: 12ea B
   defns: 190a 6: template <typename T> class Te : public B {};       header.hh[78,80]
    refs: 274a 5: template <typename T> class Te<T *> : public B2 {}; file2.cpp[76,78]
          20f2 4: template <typename T> class Te<T *> : public B1 {}; file1.cpp[63,65]

16aa-16d8 PDOMCPPTemplateTypeParameter #0
   decls: 16fa 6: template <typename T> class Te : public B {};       header.hh[69,70]

1a6a-1ab3 PDOMCPPClassType B1 file1.cpp
   defns: 1ca2 3: class B1 {};                                        file1.cpp[28,30]
    refs: 2112 4: template <typename T> class Te<T *> : public B1 {}; file1.cpp[80,82]

1cc2-1d25 PDOMCPPClassTemplatePartialSpecialization Te<T> primary "Te<T> {1642}" file1.cpp
   bases: 1a6a B1
          1a6a B1
   defns: 272a 5: template <typename T> class Te<T *> : public B2 {}; file2.cpp[76,83]
          20ba 4: template <typename T> class Te<T *> : public B1 {}; file1.cpp[63,70]

1d3a-1d68 PDOMCPPTemplateTypeParameter #0
   decls: 26e2 5: template <typename T> class Te<T *> : public B2 {}; file2.cpp[67,68]
          1daa 4: template <typename T> class Te<T *> : public B1 {}; file1.cpp[54,55]

2132-2172 PDOMCPPFunction main()
   defns: 278a 7: int main() { }                                      file2.cpp[105,109]
          218a 6: int main() { }                                      file1.cpp[92,96]

2232-227b PDOMCPPClassType B1 file2.cpp
   defns: 246a 3: class B1 {};                                        file2.cpp[28,30]

248a-24d3 PDOMCPPClassType B2 file2.cpp
   defns: 26c2 4: class B2 {};                                        file2.cpp[41,43]
    refs: 276a 5: template <typename T> class Te<T *> : public B2 {}; file2.cpp[93,95]

[
Interpretation:
- the first line of each section shows the binding:
  - xxxx-yyyy is the range where the binding is stored
  - then binding.getClass().getSimpleName()
  - then binding.toString() (with extra detail for some types)
- the rest of each section is the lists of names where:
  - the first column is the name of the list:
    - bases: only for ICPPClassType the result of getBases
    - decls: getFirstDeclaration() and then getNextInBinding()
    - defns: getFirstDefinition() and then getNextInBinding()
    - refs:  getFirstReference() and then getNextInBinding()
    - extrefs: getExternalReferences()
  - then zzzz shows the record of the name in the pdom
  - then the line of the source file containing the name
  - then the offsets in the file where the name appears
]

The things to notice are:

1) The source code has two definitions of B1, and the PDOM has two bindings for it.

2) The source code has two partial specializations for Te<T *>, but the PDOM has only one binding for it.

3) The base class specifier list for Te<T *> incorrectly has B1, B1 instead of incorrectly having B1, B2.

I think that the fix for this bug is to handle partial template specializations the same as simple class definitions -- create a binding for each.
Comment 3 Sergey Prigogin CLA 2014-02-08 17:29:30 EST
The code in comment #2 violates SDR (single definition rule).

CDT indexer is not aware of how the code is linked and requires the whole code to conform to SDR. Attempts to make CDT index work with SDR violations are mostly hopeless.

The following code, for example, would result in a single index binding for A that will have two base classes and two members:

h1.h:
class B1 { void v1(); };
class A : public B1 {};

file1.cpp:
#include "h1.h"
int main() {
}

h2.h:
class B2 { void v2(); };
class A : public B2 {};

file2.cpp:
#include "h2.h"
int main() {
}
Comment 4 Andrew Eidsness CLA 2014-02-08 19:55:10 EST
The code compiles and runs without problems using gcc and clang.  The one definition rule applies to a translation unit or program, not to an entire codebase.

Further, as I pointed out in the PDOM dump, the index contains two bindings for B1 (which takes the place of A in your example).
Comment 5 Sergey Prigogin CLA 2014-02-08 23:48:19 EST
(In reply to Andrew Eidsness from comment #4)
Two bindings for B1 are created probably because the definitions of the class are located in cpp files and are considered "local to file". Move them to header files as in comment #3, and there will be only one binding.

There is no good reason not to apply the same "local to file" rules to template specializations as the ones applied to non-template classes. This would help with the example from comment #2, but would not help if the specializations were defined in two separate header files included by file1.cpp and file2.cpp respectively.
Comment 6 Andrew Eidsness CLA 2014-02-09 08:26:10 EST
I see:

12ea-1333 PDOMCPPClassType B1
   defns: 1622 5: class B1 { void v1(); }; header1.hh[46,48]
    refs: 192a 6: class A : public B1 {};  header1.hh[82,84]

1642-1683 PDOMCPPMethod v1()
   decls: 169a 5: class B1 { void v1(); }; header1.hh[56,58]

16ba-1703 PDOMCPPClassType A
   bases: 12ea B1
          1bb2 B2
   defns: 1eaa 6: class A : public B2 {};  header2.hh[71,72]
          18f2 6: class A : public B1 {};  header1.hh[71,72]

1a52-1a92 PDOMCPPFunction main()
   defns: 1f62 3: int main() { }           file2.cpp[27,31]
          1aaa 3: int main() { }           file1.cpp[27,31]

1bb2-1bfb PDOMCPPClassType B2
   defns: 1dea 5: class B2 { void v2(); }; header2.hh[46,48]
    refs: 1eca 6: class A : public B2 {};  header2.hh[82,84]

1e0a-1e4b PDOMCPPMethod v2()
   decls: 1e62 5: class B2 { void v2(); }; header2.hh[56,58]

Why does the pre-processing language affect the way that the C++ language is indexed and why do you consider it a feature instead of a bug?

Applying that example to this bug, if A was a partial template specialization, the index looks like the equivalent of:

    class A : public B1, B1 { };
Comment 7 Doug Schaefer CLA 2014-02-09 09:10:53 EST
I don't care if we're being correct or not, but the number of reports of our of memory errors we're getting is unacceptable.
Comment 8 Doug Schaefer CLA 2014-02-10 10:14:48 EST
Sorry, that was a half completed thought.

My first reaction to this problem was that, yes, as Sergey says, this violates the SDR rule that we've generally employed in CDT projects. However, it's clear that we're trying in some areas to support multiple definitions because, let's face it, a lot of CDT users do have multiple build outputs in a single project, unit testing is a great example of that. Sometimes it's just better to do things that way because it becomes very hard to manage if you have to set up a project for each one.

But it does take some magic to know which definition a reference or declaration is referring to. To do this properly we would need to track which header files are active when creating a definition and then look at that list when considering a reference/decl. And even then it's hard to get things right.

But at the end of the day, I'm more concerned about the out of memory errors we are getting. Assuming SDR in the cases Andrew has found here I believe is causing it. At the very least, we need to be smarter about it and make sure the definitions we do create have some sanity to them.
Comment 9 Sergey Prigogin CLA 2014-02-10 13:40:16 EST
(In reply to Andrew Eidsness from comment #6)
> Why does the pre-processing language affect the way that the C++ language is
> indexed and why do you consider it a feature instead of a bug?

Special treatment for bindings defined in source files is a heuristic intended to at least partially alleviate the problem of multiple conflicting definitions.
Comment 10 Sergey Prigogin CLA 2014-02-10 13:47:45 EST
(In reply to Doug Schaefer from comment #8)
We are already considering declaration reachability in binding resolution. The harder part is that to disambiguate bindings with multiple conflicting definitions we would have to include the definition location in the signature of every index binding. This would open a huge can of worms. Catching all theses worms would be quite an effort.

Instead of doing that we can try to extend the use of "local to file" approach to apply to template specializations.
Comment 11 Doug Schaefer CLA 2014-02-10 16:44:01 EST
(In reply to Sergey Prigogin from comment #10)
> Instead of doing that we can try to extend the use of "local to file"
> approach to apply to template specializations.

Ok. Not sure how much time I have left from Andrew so whatever is practical.
Comment 12 Andrew Eidsness CLA 2014-02-20 15:31:32 EST
I think this problem comes down to incorrect handling of section 14.5.5(6) of the standard (C++11).

    "Partial specialization declarations themselves are not found
     by name lookup. Rather, when the primary template name is used,
     any previously-declared partial specializations of the primary
     template are also considered."

I don't think there is handling for the "previously-declared" part.

I also think it is a problem that the binding is modified in the PDOMCPPLinkage.onCreateName callback.  Names are just handles on bindings, we shouldn't be modifying the binding just because a new handle has been created.  The binding should have been fully constructed when the AST definition was processed.

I think that both problems can be solved by adding a list of CPPASTTemplateIds to the CPPASTTranslationUnit.  The list would not be persisted, it would only be used to implement the "previously-declared" part during ambiguity resolution.

Secondly (and independently), I haven't figured out why the partial specialization binding is created during the ambiguity resolution phase, I think it can be done when the AST definition is processed.  The declarations and references sort make sense, since they sound like they could be ambiguous, I'm just wondering about the definitions in particular.
Comment 13 Andrew Eidsness CLA 2014-02-20 17:01:20 EST
Here's another an error related to the "previously declared" portion.

header.hh:
    1 template <typename T> struct Te { };

file1.cpp:
    1 #include "header.hh"
    2 template <typename T> struct Te<T *> { };
    3 static Te<int *> partial;

file2.cpp:
    1 #include "header.hh"
    2 static Te<int *> primary;

In this case file2.cpp cannot see the partial specialization in file1.cpp and therefore primary should be an instance of the primary template.  However:

12ea-133d PDOMCPPClassTemplate Te<T>
   defns: 16c2 5: template <typename T> struct Te { };      header.hh[67,69]
    refs: 1dea 3: static Te<int *> primary;                 file2.cpp[29,31]
          1bca 4: static Te<int *> partial;                 file1.cpp[71,73]
          1b02 3: template <typename T> struct Te<T *> { }; file1.cpp[51,53]

17ea-184d PDOMCPPClassTemplatePartialSpecialization Te<T> primary "Te<T> {12ea}" file1.cpp
   defns: 1ae2 3: template <typename T> struct Te<T *> { }; file1.cpp[51,58]

1b22-1b75 PDOMCPPClassInstance Te<int *> of {17ea}
    refs: 1dca 3: static Te<int *> primary;                 file2.cpp[29,38]
          1baa 4: static Te<int *> partial;                 file1.cpp[71,80]

1bea-1c17 PDOMCPPVariable partial file1.cpp
   defns: 1c2a 4: static Te<int *> partial;                 file1.cpp[81,88]

1e0a-1e37 PDOMCPPVariable primary file2.cpp
   defns: 1e4a 3: static Te<int *> primary;                 file2.cpp[39,46]
Comment 14 Sergey Prigogin CLA 2014-02-20 17:09:56 EST
(In reply to Andrew Eidsness from comment #13)
It looks like "declared before" check is missing or is not applied to specializations from the index.