Bug 435933 - Tracking contributors in Git vs file header comment
Summary: Tracking contributors in Git vs file header comment
Status: RESOLVED FIXED
Alias: None
Product: Community
Classification: Eclipse Foundation
Component: License (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows 7
: P3 normal (vote)
Target Milestone: 2017-Q2   Edit
Assignee: Generic Inbox CLA
QA Contact:
URL:
Whiteboard: stalebug
Keywords:
: 438633 (view as bug list)
Depends on: 387767
Blocks:
  Show dependency tree
 
Reported: 2014-05-27 10:58 EDT by Teodor Madan CLA
Modified: 2019-04-03 16:57 EDT (History)
20 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Teodor Madan CLA 2014-05-27 10:58:52 EDT
Copyright template from https://www.eclipse.org/legal/copyrightandlicensenotice.php  mentions that list of contributions to the file is maintained in the header comments.

 "Subsequent authors are listed on proceding lines with a short description of the contribution. ... "

The rationale given is to "aknowledge the contributions of individuals" and "tracking the pedigree of the code"

Guideline has not been updated since 2005 when no Git/Gerrit were in place. Taking in account that Git preserves original author and with mandatory sign-off, is it still required? at least for minor changes.

Could the guideline be updated to be slightly more relaxed, like accepting other means of tracking pedigree/authors of the code.

Handling git contribution does not cover this aspect either:
https://wiki.eclipse.org/Development_Resources/Handling_Git_Contributions
Comment 1 Wayne Beaton CLA 2014-05-27 12:04:06 EDT
I think that this is a better fit under the License component.
Comment 2 Matthias Sohn CLA 2014-09-02 10:46:17 EDT
+1 for removing this requirement since contributions are already tracked in git
Comment 3 Marc Khouzam CLA 2015-01-09 16:19:07 EST
I'm being asked repeatedly about this.

Having to update the copyright header is becoming one of our most common comment on gerrit reviews.  It can get annoying for contributors and committers to pay attention to this.

More importantly, if a file is modified a lot, changing the copyright header will cause merge conflicts quite often, which slows down the contribution process.  For example, today I pressed Gerrit's nice "rebase" button and the rebase failed.  I had to manually fetch the code change into my local git repo, do the rebase manually, merge the conflicts which were all in the copyright headers, and push back to gerrit.

I think it would be appreciated by many not having to do this anymore.
Comment 4 Mike Milinkovich CLA 2015-01-09 17:16:23 EST
I think you guys make a good point. I will raise this as a topic with the IP Advisory Committee. (Warning - this will take some time.)
Comment 5 Marc Khouzam CLA 2015-01-12 10:26:22 EST
(In reply to Mike Milinkovich from comment #4)
> I think you guys make a good point. I will raise this as a topic with the IP
> Advisory Committee. (Warning - this will take some time.)

Thanks Mike!
Comment 6 John Arthorne CLA 2015-01-12 15:41:50 EST
I agree with the general sentiment that the copyright header is not a reliable way to track pedigree. The Git history is a much more reliable way to do this.

However, the other purpose is to acknowledge contributions. For some contributors, they are very keen to have their name added to the copyright header. It is a publicly visible acknowledgement of their contribution, and this can be a point of pride for many people contributing to Eclipse. I would certainly not want to prevent contributors from adding their name if they want to. 

I think the onus should be on the *contributor* to update the author list in the copyright header along with their contribution, if they want to. If they do not, then their ownership of those changes is not publicly acknowledged (although it is still tracked via the Git history).
Comment 7 Marc Khouzam CLA 2015-01-12 15:53:36 EST
(In reply to John Arthorne from comment #6)

> However, the other purpose is to acknowledge contributions.

Good point.  Heck, I put my name up there the times I add something meaningful to the code :)

> I think the onus should be on the *contributor* to update the author list in
> the copyright header along with their contribution, if they want to.

"If they want to" only, that's what we should aim for.
Comment 8 Alexandre Montplaisir CLA 2015-02-06 15:50:36 EST
If I may express a long-going concern of mine, one thing I never really liked about the Eclipse copyright header template is that it muddles the distinction between copyright owners and the actual people contributing to a given file.

Most open-source projects I know (like the Linux kernel, among others) use a template where each copyright holder has its own line:

>  Copyright (C) <years> Company One, Person One
>  Copyright (C) <years> Company Two
>  Copyright (C) <years> Person Three

The example here assumes:
 - "Person One" works at a company where they have a joint-copyright agreement with their employer (like my previous employer did).
 - "Company Two" requests exclusive copyright from their employees' work (like my current employer does).
 - "Person Three" did a personal contribution on their own time, or their company lets employees keep exclusive copyright.

Whereas under Eclipse, the equivalent header would look like:

>  Copyright (c) <years> Company One, Company Two, and others
>  ...
>  Contributors:
>    Person One
>    Person Two
>    Person Three

With this format, it's really not clear that "Person Two" should not technically have copyright on the file, but Person Three does. The "and others" part makes it particularly unclear.

Another advantage of using one "Copyright (C)" line per holder is that it makes it easy to find all the holders of a project using tools like 'grep'. Debian packaging scripts make use of this for their licensing checks, for example.

Just my 2c.
Comment 9 Mike Milinkovich CLA 2015-02-06 16:21:12 EST
Alexandre,

I hadn't ever thought of that, but I absolutely see your point. 

However, at this point the thought of changing all of fileheaders seems impractical. 

What do others think? Can anyone think of how we could handle doing such a massive update to the content of our fileheaders?

An alternative under discussion in bug 435933 to use the git logs to track contributors rather than the fileheaders.
Comment 10 Andrey Loskutov CLA 2015-02-27 14:40:15 EST
As already mentioned, git provides way better solution to track actual contributors to the code, along with the dates.

A practical solution would be:

 * let the current code as is
 * recommend that contributors decide by themselves if they want be in the header or not
 * provide the recommended header format rules for new code
 * do *not* enforce date/contributor updates for new reviews
 * provide and document common scripts (using git) for all requested automated reports regarding legal questions, authorship etc - on demand.

I had a pleasure to contribute to 2 projects using completely different strategies in the past few months - egit (no update required) and platform UI (strict update policy) and contributing to platform UI was not fun at all. I had to update headers in files where just one single line of code was changed - but due the missing header update I had to rebase my commits, remerge them and even had merge conflicts because changes on same file due different bugs/committers conflicted in the contributors part of the header. This wasn't cool - resolve merges which were completely unrelated to the actual fix!

So clear vote for relaxing the rules.
Comment 11 Ian Bull CLA 2015-02-27 14:54:02 EST
There appears to be two reasons for adding these lines: Tracking pedigree and giving acknowledgements. As mentioned above, commit history is better for tracking pedigree and we could easily have a 'shout-out section'-- where everyone lists their names if they worked on a file, and we all give a big thank-you.

But neither of those relate to copyright. So I'm going to ask a question that's probably going to make me look stupid (or more stupid than usual): 

What does owning (or co-owning) copyright on a single file in an open source project give someone? 

We go through the trouble of updating dates every year (if a file changed), and we list the companies / authors that work on the project. 
 - What if we forget to update the date? 
 - What if a single individual owned the copyright for an important file at the core of Eclipse? 
 - What if a committer put their name, but their company owned the copyright? 

Could any of these things harm Eclipse? 

I've searched around for why we do this, and I haven't really found a good answer. Maybe everyone else understands this and it's just me :-).
Comment 12 Alexandre Montplaisir CLA 2015-02-27 15:36:47 EST
(In reply to Ian Bull from comment #11)
> What does owning (or co-owning) copyright on a single file in an open source
> project give someone? 

IANAL, but from what I know, in the context of an open-source project, to relicense a given file you need the permission of *all* copyright owners. Some organizations see this as a feature (having many different copyright owners makes it harder to change the license of a project, if not impossible, which "sets in stone" the current license). Other organizations prefer explicit copyright assignments, so that they have the exclusive copyrights, which allow them to relicense the code should they wish to.

> What if we forget to update the date?

There is a tool [1] to automatically update the copyright headers in the files, including the years, by looking through the Git history. As long as you run it once a year I assume everything should be in order.

> What if a single individual owned the copyright for an important file at the
> core of Eclipse?

Nothing much. He could not "take it back", because it was contributed to the project under EPL, so anybody who already has copy can always redistribute it under the EPL's terms.
If he is the sole copyright owner he could technically distribute copies of that file under any other license or terms he wishes, however useful that could be.

> What if a committer put their name, but their company owned the copyright? 

Depends where they "put their name". Like I mentioned in comment #8, the current header template does not make it very clear who are the copyright holders.

If an employee that is supposed to assign copyright to their employer instead puts:

  Copyright (C) 2015 Me Myself

that technically is the same as stealing from their employer. In the Contributor Assignment they sign with Eclipse, all contributors promise to only submit things they have the right to. So I think Eclipse is protected in that case, and this is a problem between the employee and employer.

[1] https://wiki.eclipse.org/Development_Resources/How_to_Use_Eclipse_Copyright_Tool
Comment 13 Christian Pontesegger CLA 2015-02-28 03:05:38 EST
I am wondering if we need a copyright at all. Is it necessary? Or is is sufficient to release the file under EPL. A Contributer could choose to add a copyright, but as mentioned above, it might be of little use to him.

A 'credits' section would be nice, however. I remember how proud I was of my first commit. Now I see the same expression on the face of my students.

I also would appreciate to have the name of the original contributor recorded in the file. For deeper technical questions it helps to find the person to get in touch with (if he is still active). While this is available via git, you often stumble over files during a google search.

The year, that needs updating all the time seems to be a totally useless information from a programmers point of view.
Comment 14 Alexandre Montplaisir CLA 2015-02-28 06:52:26 EST
(In reply to Christian Pontesegger from comment #13)
> I am wondering if we need a copyright at all. Is it necessary?

It's very important. FOSS licenses depend on copyright law to be enforceable.

> The year, that needs updating all the time seems to be a totally
> useless information

Copyright expires after 70-130 years iirc, depending on where you live. So the year is also important, technically. But I agree with the general sentiment that it's overkill to update it at every. single. commit. especially if we can run a tool to do it automatically once in a while.
Comment 15 Christian Pontesegger CLA 2015-02-28 12:59:56 EST
As apache is discouraging author notes, how do they handle the copyright message?
If there is no author mentioned, who would be the copyright holder? Would I have to query git to find out?

About the year:
who would be allowed to update the year in the copyright header? Only the committer(s)?
Comment 16 Mike Milinkovich CLA 2015-02-28 18:15:48 EST
Folks,

We are discussing this topic in the IP Advisory Committee of the Board, with an eye to implementing pretty much what Andrey described in comment 10. In other words, a pragmatic policy that says we will rely on the git history as the "real" record, but if a project or a contributor wants to see names in their headers, we're not going to say no.

We have a bit of a queue of IP topics at the moment, so I cannot promise a resolution immediately. I expect to be able to report back by mid-March. I hope that's okay.
Comment 17 Alexandre Montplaisir CLA 2015-06-07 08:25:49 EDT
Any news on this subject?

As Marc pointed out previously, copyright header changes often lead to merge conflicts and diff noise that makes the review process difficult. I would like to be able to tell contributors to stop updating these (outside of required year and copyright holder changes), and to iteratively start removing the Contributors section.
Comment 18 Mike Milinkovich CLA 2015-06-08 09:31:09 EDT
So this topic has been discussed by the IP Advisory Committee, and the EMO and there is a consensus that we will let each project decide whether or not they want to allow names in the header, but Eclipse IP will rely fully on the Git logs. 

That's the policy decision. Now we need to figure out what process documentation, and tools need to be updated to reflect that.
Comment 19 Marc Khouzam CLA 2015-07-16 10:27:06 EDT
(In reply to Mike Milinkovich from comment #18)
> So this topic has been discussed by the IP Advisory Committee, and the EMO
> and there is a consensus that we will let each project decide whether or not
> they want to allow names in the header, but Eclipse IP will rely fully on
> the Git logs. 

To be clear, it is now ok that contributions don't have the name of the contributor in each files copyright header?
Comment 20 Mike Milinkovich CLA 2015-07-16 11:26:39 EDT
(In reply to Marc Khouzam from comment #19)
> (In reply to Mike Milinkovich from comment #18)
> > So this topic has been discussed by the IP Advisory Committee, and the EMO
> > and there is a consensus that we will let each project decide whether or not
> > they want to allow names in the header, but Eclipse IP will rely fully on
> > the Git logs. 
> 
> To be clear, it is now ok that contributions don't have the name of the
> contributor in each files copyright header?

Yes.
Comment 21 Marc Khouzam CLA 2015-07-16 11:28:36 EDT
(In reply to Mike Milinkovich from comment #20)

> > To be clear, it is now ok that contributions don't have the name of the
> > contributor in each files copyright header?
> 
> Yes.

Awesome!  I'll notify the CDT committers and community.

Thanks for getting this through!
Comment 22 Stefan Xenos CLA 2015-08-07 21:54:55 EDT
Re: comment 18

What was the official decision on copyright dates? Do we still need to update the date on every single commit, or will we now be running a script once every year or so to update everything?
Comment 23 Lars Vogel CLA 2015-08-08 03:31:38 EDT
(In reply to Stefan Xenos from comment #22)
> Re: comment 18
> 
> What was the official decision on copyright dates? Do we still need to
> update the date on every single commit, or will we now be running a script
> once every year or so to update everything?

I asked Dani about it and we still need to update the copyright date. You can use the releng tool for that. https://wiki.eclipse.org/Development_Resources/How_to_Use_Eclipse_Copyright_Tool

What I do, I run the tool before I change a plug-ins to ensure its dates are uptodate. After I do some changes I run the tool again to update the files which needs update.
Comment 24 Marc-André Laperle CLA 2015-08-14 15:52:02 EDT
Just a clarification: are we allowed to *remove* previously added contributors? In order to clean things up?
Comment 25 Dani Megert CLA 2015-08-17 06:47:15 EDT
(In reply to Marc-Andre Laperle from comment #24)
> Just a clarification: are we allowed to *remove* previously added
> contributors? In order to clean things up?

As per comment 18 each project can define its policy. For the 'Eclipse' top-level project we decided to keep them and also continue to allow (but not require) contributors to add their credentials if they want to do so.
Comment 26 Mike Milinkovich CLA 2015-08-17 10:10:24 EDT
(In reply to Marc-Andre Laperle from comment #24)
> Just a clarification: are we allowed to *remove* previously added
> contributors? In order to clean things up?

I think that the old ones should be left as is. 

Unless I'm wrong, the _really_ old ones from CVS days are the only records we have of those contributors.
Comment 27 Lars Vogel CLA 2015-09-17 05:01:31 EDT
*** Bug 438633 has been marked as a duplicate of this bug. ***
Comment 28 Eclipse Genie CLA 2016-11-08 13:25:00 EST
New Gerrit change created: https://git.eclipse.org/r/84691
Comment 29 Eclipse Genie CLA 2019-04-03 14:23:01 EDT
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

--
The automated Eclipse Genie.
Comment 30 Wayne Beaton CLA 2019-04-03 16:57:31 EDT
The cited page no longer exists and instead refers now to the section on copyrights in the handbook. That section describes a "contributors" section as optional.

The Git log can reasonably serve as a record of contribution.

I think that we're done here.