Community
Participate
Working Groups
An Eclipse mirror has reported that the disk footprint for being an Eclipse mirror for download.eclipse.org is 1.4T download.eclipse.org should only be used to store current release builds. Archived builds should be moved to archive.eclipse.org, and nightly and integration builds should be deleted once stale. See the docs for more info: https://wiki.eclipse.org/IT_Infrastructure_Doc#Downloads Below is a table of projects that seem to be consuming more disk space than I'd expect. _ALL PROJECTS SHOULD CLEAN THEIR AREA, NOT JUST THOSE LISTED BELOW_ 14G 4diac 6.6G acceleo 15G app4mc 57G birt 11G dirigible 20G e4 e4/sdk/drops 2011 -> move to archive.eclipse.org 72G eclipsescada 12G ecp 9.7G efxclipse 9.5G epsilon 16G gemoc 16G ice 25G jdtls 9.8M keyple 13G kura 6.5G mat 4.8G mmt 160G modeling 37G emf 36G gmp 72G mdt 8.0G tmf 14G n4js 41G orion 98G rcptt 76G eclipselink 38G sirius 18G staging 398G epp 71G cdt 110G orbit 41G ptp 36G tracecompass 40G webtools Lots of old stuff that can be moved
> 20G e4 e4/sdk/drops 2011 -> move to archive.eclipse.org Sravan, please do this for e4. Move all but the latest.
Lots of people and infrastructure rely on the stability update sites even for older releases. Simply moving them is definitely going to break references from builds and from target platforms that in turn will break those builds and break target platform resolution across the ecosystem. If a release is to be moved to a different host, it seems important that the existing site be transformed into a composite site that references the different new (archived) location to prevent such breakage. Keep in mind, for example, how many older Eclipse installation will references older update sites... So what exactly does "current" release builds mean? When does a release no longer become current? Unfortunately a large part of the ecoystem really does use very old releases...
For "160G modeling", I think the number is a sum total of all the modeling subprojects... It seems to me there are no "modeling" sites other than those of the subprojects. But there is no such sum-total number for "tools", for example. Thinking about this some more, i.e., preserving the integrity of the long-established stable update sites, it's really not safe to move a composite because its likely to have relative references based on the current location. Of course they use no significant space, so they really aren't so much of a concern, It's not totally clear that its even safe for simple repositories when they have a mirror URL in them: https://wiki.eclipse.org/Equinox/p2/p2.mirrorsURL#Moving_a_repo_to_archive.eclipse.org The wiki suggests the artifacts likely would still be found if available at the corresponding location on archive.eclipse.org (or that because it's not mirrored anymore the mirror script will return an empty list so that only archive.eclipse.org will be used) but it's not totally clear that this is the all really works without a problem. If this really does work, then it should be pretty easy to script the moving of a simple repository to archive.eclipse.org and to also script the replacement composite that is left behind at the established old location...
(In reply to Denis Roy from comment #0) > ... > and nightly and integration builds should be deleted once stale. > ... True enough, but shouldn't these not be mirrored in the first place? For many, by the time the "outgoing bytes" from a build get pushed to, what, 50 mirrors, it is already time for another build to start becoming "outgoing out". During that time, perhaps 2 to 20 people used the mirrors to get the content. The benefit is not worth the cost, if those numbers are close to valid. I know that you do (or used to) filer out the N and I build from some projects. Perhaps this procedure should be codified.
I'm pretty sure (based on watching how the platforms I builds are resolved in a target platform resolution) that I builds (and N builds) are not mirrored, though probably then are included in the disk usage count... I know that GMP was not properly cleaning up milestones and have (had?) a monstrous composite with endless outdated repos in it; I'm sure that's included in the sums. I only noticed it because it takes a very long time to load such a huge composite...
(In reply to Ed Merks from comment #5) > I'm pretty sure (based on watching how the platforms I builds are resolved > in a target platform resolution) that I builds (and N builds) are not > mirrored, Correct.
FYI, https://wiki.eclipse.org/IT_Infrastructure_Doc#Use_mirror_sites.2Fsee_which_mirrors_are_mirroring_my_files.3F (see the "note" at the bottom of this section with the list of excluded file patterns.
Are these totals taking into account the exclusion patterns? How can I compute these numbers for a given folder? I.e., I can do the following to compute such information for "just the EMF project's downloads (excluding other projects such as CDO, Compare, and so on", but this total will include nightly and integration builds as well as Javadoc: emerks@build:/home/data/httpd/download.eclipse.org/modeling/emf> du -h -c -s emf 5.2G emf 5.2G total
(In reply to Ed Merks from comment #8) > Are these totals taking into account the exclusion patterns? No idea, sorry.
(In reply to Ed Merks from comment #5) > I'm pretty sure (based on watching how the platforms I builds are resolved > in a target platform resolution) that I builds (and N builds) are not > mirrored, though probably then are included in the disk usage count... > > I know that GMP was not properly cleaning up milestones and have (had?) a > monstrous composite with endless outdated repos in it; I'm sure that's > included in the sums. I only noticed it because it takes a very long time > to load such a huge composite... I though I had fixed it once, but apparently there's some subtelty in the legacy releng scripts that I missed. I just checked, and GMF Runtime alone uses 27Go, a large part of which can probably be simply removed. I'll clean this and Sirius (thoug I can't guarantee this will be done this week).
We really need to know how to compute numbers that are reflective of how much space we will actually save for the mirrors. Of course generally reducing disk space is a good thing, but given much of the action (other than to delete stale/outdated builds/drops) is simply to move the disk space from one host to another, such moving of the disk space seems not all that useful for saving overall resources for the overall set of Eclipse hosts themselves. Just as a suggestion perhaps one way to get such accurate information would be to set up temporarily as host that acts as a mirror so that each project can see which of their files are actually copied to a mirror. From that file system we could simply use "du" for computing what would definitely be relevant numbers.
Has anyone ever checked if deduplication of files would improve the situation? I.e. if there are many files with identical content in different update sites, it would be sufficient to store one copy. However, I'm a Windows guy, so I really don't have a good understanding whether that would require the underlying file system to support this, or if something needs to be done on the update site file level (and whether that would only be effective for new or also for existing update sites).
(In reply to Ed Merks from comment #3) > For "160G modeling", I think the number is a sum total of all the modeling > subprojects... Papyrus contributes its fair share to that amount (about 60G+) and could be trimmed down significantly. I'll try to tackle this tomorrow during the M1 release.
(In reply to Ed Merks from comment #2) > Lots of people and infrastructure rely on the stability update sites even > for older releases. Simply moving them is definitely going to break We can implement stable paths for download.e.o to redirect to archive.e.o if the same path exists. Would this be helpful? if (download.eclipse.org/some/path/myfile) == 404 Not Found) { if (file_exists(archive.eclipse.org/some/oath/myfile) { send_302_redirect(archive.eclipse.org/some/oath/myfile); else { send_404(); } } > So what exactly does "current" release builds mean? When does a release no > longer become current? Unfortunately a large part of the ecoystem really > does use very old releases... I don't have specific guidelines here. 2011 is not current. Eclipse Neon is not current. Lots of people use Windows 7, but it is not current. (In reply to Ed Merks from comment #3) > For "160G modeling", I think the number is a sum total of all the modeling > subprojects... Correct, I should have removed it. > The wiki suggests the artifacts likely would still be found if available at > the corresponding location on archive.eclipse.org (or that because it's not > mirrored anymore the mirror script will return an empty list so that only > archive.eclipse.org will be used) but it's not totally clear that this is > the all really works without a problem. It works if you use the Mirrors explicity. It does not work for direct links to download.eclipse.org but as above, we can make it work transparently.(In reply to Ed Merks from comment #8) > Are these totals taking into account the exclusion patterns? They do not. The Eclipse Foundation still needs to maintain backups of download.e.o and regardless of mirror footprint, stale files are still costly to maintain. (In reply to Ed Merks from comment #11) > Just as a suggestion perhaps one way to get such accurate information would > be to set up temporarily as host that acts as a mirror I'll try to put together a size report based on rsync -a --list-only rsync://rsync.osuosl.org/eclipse/
This one-liner will create a full directory structure of sparse files based on what is on the OSUOSL mirror. It's much cheaper than creating a mirror, takes a fraction of the space and allows disk space calculations in the same manner. rsync -a --list-only rsync://rsync.osuosl.org/eclipse/ | awk '/^-r/ {gsub(",","",$2); print $2 " " $5}' | while read size file ; do echo "Create file: $file size: $size bytes"; mkdir -p "$(dirname $file)"; truncate -s $size "$file"; done I'll make this available shortly.
(In reply to Ed Merks from comment #2) > Lots of people and infrastructure rely on the stability update sites even > for older releases. Simply moving them is definitely going to break I've made a small change to our 404 handler, for file requests only. BEFORE wget -S https://download.eclipse.org/modeling/OLD/birt-repo-3.7.2.v20120207.zip HTTP request sent, awaiting response... HTTP/1.1 404 Not Found Date: Tue, 16 Apr 2019 14:31:17 GMT X-NodeID: download1 2019-04-16 10:31:20 ERROR 404: Not Found. AFTER: wget -S https://download.eclipse.org/modeling/OLD/birt-repo-3.7.2.v20120207.zip HTTP request sent, awaiting response... HTTP/1.1 307 Moved Permanently Date: Tue, 16 Apr 2019 18:40:43 GMT Location: http://archive.eclipse.org/modeling/OLD/birt-repo-3.7.2.v20120207.zip X-NodeID: download1 Location: http://archive.eclipse.org/modeling/OLD/birt-repo-3.7.2.v20120207.zip [following] Connecting to archive.eclipse.org (archive.eclipse.org)|198.41.30.199|:443... connected. HTTP request sent, awaiting response... HTTP/1.1 200 OK (snip) > references from builds and from target platforms that in turn will break > those builds and break target platform resolution across the ecosystem. If > a release is to be moved to a different host, it seems important that the > existing site be transformed into a composite site that references the > different new (archived) location to prevent such breakage. Keep in mind, > for example, how many older Eclipse installation will references older > update sites... > > So what exactly does "current" release builds mean? When does a release no > longer become current? Unfortunately a large part of the ecoystem really > does use very old releases...
Because EPP is one of the largest consumers of disk space, I've synchronised the important parts (again) to archive.eclipse.org, and removed *many* of the older packages from download.eclipse.org. This reduces the size below /technology/epp a lot.
I've indexed data available on the mirrors. I specifically used http://ftp.fau.de/eclipse/ as the data source. The index is available here: https://download.eclipse.org/oomph/archive/eclipse This structure mirrors the folder structure found on the mirrors, but includes the cumulative sizes for each folder and sorts folder according to size, largest first. It also shows the % of space used by that folder relative to the other sibling folders as well as the date of the folder. The header of each page shows the total size of the parent folder and its % usage relative to the total size of the mirror. The link on the header is navigable; it opens a mirror page that allows you to navigate to the actual folders hosted by an actual mirror (so you could look at what files are actually in the folders). The EPP changes have already reduced the total mirror size to 895G, so that's a big improvement. Papyrus stands out as a heavy hitter https://download.eclipse.org/oomph/archive/eclipse/modeling/mdt/papyrus/index.html But that will apparently be addressed. I don't see so many folks on the CC list so I don't think many people are paying attention. Perhaps the additional useful details would helpful.
Darn, Ed, that is really cool. I've also filed bug 546528 - with very little code and effort, I think we can make managing downloads and archives much, much easier.
Is it possible to accommodate not-yet-decontainerized projects? e.g ocl and qvtd are both "0" probably because I have requested the root downloads but have yet to pluck up courage to actually move from e.g. modeling/mdt/ocl.
(In reply to Pierre-Charles David from comment #10) > (In reply to Ed Merks from comment #5) > > I'm pretty sure (based on watching how the platforms I builds are resolved > > in a target platform resolution) that I builds (and N builds) are not > > mirrored, though probably then are included in the disk usage count... > > > > I know that GMP was not properly cleaning up milestones and have (had?) a > > monstrous composite with endless outdated repos in it; I'm sure that's > > included in the sums. I only noticed it because it takes a very long time > > to load such a huge composite... > > I though I had fixed it once, but apparently there's some subtelty in the > legacy releng scripts that I missed. I just checked, and GMF Runtime alone > uses 27Go, a large part of which can probably be simply removed. Done for EMF Services (gained about 2G) and GMF Notation/Runtime (gained almost 27G). > I'll clean this and Sirius (thoug I can't guarantee this will be done this > week). Partly done, down to 17G, from 38G initially. I should be able to remove at least 10 more, but I'm waiting for feedback on downstream projects which may still depend on milestones before removing them.
Denis, do you know what mirrors do with symbolic links? For example CDO uses quite a bit of space almost 6G or 0.66% of the mirror size, but that is "accounted for" multiple times. The "real" location of CDO this accounted for like this: https://download.eclipse.org/oomph/archive/eclipse/modeling/emf/cdo/index.html I.e., the following folder exists at this location on build.eclipse.org: /home/data/httpd/download.eclipse.org/modeling/emf/cdo But this "same" content is also "accounted for" here: https://download.eclipse.org/oomph/archive/eclipse/modeling/emft/cdo/index.html That's because /home/data/httpd/download.eclipse.org/modeling/emft/cdo is a symbolic link like this: lrwxrwxrwx 1 nickb modeling.emft 10 Feb 18 2009 cdo -> ../emf/cdo I have a feeling that mirrors actually do duplicate the content because while the following link works to load a p2 repository: http://ftp.fau.de/eclipse/modeling/emft/cdo/updates/releases/ This direct download.eclipse.org link does not work: http://download.eclipse.org/eclipse/modeling/emft/cdo/updates/releases/ That kind of makes sense, because I've been told that servers won't serve up links. There are yet more links to make this even worse. I.e., we find CDO again at this location because emft_LNK is a link to ../technology/emft https://download.eclipse.org/oomph/archive/eclipse/modeling/emft_LNK/cdo/index.html And then we find it yet again in technology because technology/emft/cdo is a link to modeling/emft/cdo (which, as mentioned above, is a link to modeling/emf/cdo): https://download.eclipse.org/oomph/archive/eclipse/technology/emft/cdo/index.html So it seems to me that CDO has 4 copies on every mirror. I don't know how best to unravel this mess of links. :-( I'm tempted just to delete all these links; it seems that download.eclipse.org can't serve them and so mirrors should not copy them... Their only possible use is by builds that are directly accessing the file system. Or do I miss some other important reason for these links existing? Please share your understanding of how mirrors handle symbolic links.
(In reply to Ed Willink from comment #20) > Is it possible to accommodate not-yet-decontainerized projects? e.g ocl and > qvtd are both "0" probably because I have requested the root downloads but > have yet to pluck up courage to actually move from e.g. modeling/mdt/ocl. What specifically would it entail to "accommodate" that? The data of course reflects (and should reflect) the actual contents of a mirror, so if you have nothing in those folders at download.eclipse.org, then there is nothing in the mirrors for them...
> Please share your understanding of how mirrors handle symbolic links. Mirrors likely don't handle symbolic links, and that's why we don't enable their usage on http://download.e.o. However, to be safe, the rsync mechanism that mirrors use probably translate links to real paths, so it's entirely possible that mirrors duplicate data :/ I was not even aware of this. I can likely turn that off at our server but I wouldn't want to break anything. I don't think many Eclipse projects make use of symlinks. (In reply to Ed Merks from comment #22) > I'm tempted just to delete all these links; it seems that > download.eclipse.org can't serve them and so mirrors should not copy them... > Their only possible use is by builds that are directly accessing the file > system. Or do I miss some other important reason for these links existing? That is my understanding as well.
(In reply to Ed Merks from comment #23) > (In reply to Ed Willink from comment #20) > > Is it possible to accommodate not-yet-decontainerized projects? e.g ocl and > > qvtd are both "0" probably because I have requested the root downloads but > > have yet to pluck up courage to actually move from e.g. modeling/mdt/ocl. > > What specifically would it entail to "accommodate" that? The data of course > reflects (and should reflect) the actual contents of a mirror, so if you > have nothing in those folders at download.eclipse.org, then there is nothing > in the mirrors for them... Request retracted. All the information I wanted is in your hyper-linked report. I naively assumed it was a flat report file.
(In reply to Dani Megert from comment #1) > > 20G e4 e4/sdk/drops 2011 -> move to archive.eclipse.org > Sravan, please do this for e4. Move all but the latest. Hi Dani, I don't have committer rights on e4 project. So I don't have permissions to this activity. the folder is with group permissions eclipse.e4. There are two ways to approach this. 1. Add me to committers list 2. Combine e4 project with eclipse platform project. Since this project is used for tips, I suggest going with 2 option. Thanks Sravan
Note that I've enhanced the support for producing this page: https://download.eclipse.org/oomph/archive/eclipse/ There is now a job that rebuilds it once per day: https://ci.eclipse.org/oomph/job/mirror-index/ The page header how has a breadcrumb for better navigation/summary information and the mirror page (accessed via any -> icon in the nav bar), shows a nice table https://download.eclipse.org/oomph/archive/mirror.php?location= The only problem I can't work around is the automatic computation of the list mirrors. That's because while this URL works for me a home: http://www.eclipse.org/downloads/download.php?file=/favicon.ico&format=xml When running on Jenkins, it produces an empty list, i.e., this is in the log: <?xml version="1.0" encoding="ISO-8859-1"?> <mirrors></mirrors> No mirrors found; hard-coded defaults will be used. Denis, is there any way/URL that would return me a list of mirrors while running on Jenkins? (And isn't this a poor choice of encoding, especially given the file contains Chinese characters?)
(In reply to Ed Merks from comment #27) > There is now a job that rebuilds it once per day: Thanks. After cleaning up some of my own projects, some of which have long established releng practices. I see that some practices are questionable. Download ZIPs are pruned to the last two years of R-builds, 3 recent S-builds and perhaps a couple of I-builds and N-builds. Older R-builds are moved to archive and linked from the downloads page. Seems good, albeit a bit manual. P2 repos are not pruned to the same extent, since relevant Wiki authors seem to have neglected to advocate P2 repo archiving. The release aggregate therefore has every release ever, costing mirror space and useless content scanning time. The milestone repo grows and grows unless some enthusiastic releng manually removes both composite entry and content consistently. Taking EMF as a typical example of a long established project... https://download.eclipse.org/oomph/archive/eclipse/modeling/emf/emf/updates/index.html#releases has 15 release versions from 2.6 to 2.14. Since policies/tooling evolve, earlier and later releases are somewhere else. Surely we should try to have just the last ?5 years of P2 repo releases in one place, with all older P2 repos moved to archive without aggregation from the main releases aggregate? Perhaps a separate archive aggregate might point at them, perhaps just a Wiki/PMI page. Perhaps for really old releases, users can be told that the archive ZIPs are the only option. Why waste archiving space on almost identical P2 repos and ZIPs?
> Denis, is there any way/URL that would return me a list of mirrors while > running on Jenkins? It's designed to not return mirrors for internal (to us) hosts.
(In reply to Denis Roy from comment #29) > > Denis, is there any way/URL that would return me a list of mirrors while > > running on Jenkins? > > It's designed to not return mirrors for internal (to us) hosts. Is there perhaps some file in the file system that contains this same information? The PHP script must compute it from something... Though I suppose the list of mirrors doesn't often change so I'm being overly picky...
(In reply to Ed Willink from comment #28) > > After cleaning up some of my own projects, some of which have long > established releng practices. I see that some practices are questionable. > Yes, when I migrated to EMF to Tycho I was not very happy with the old structure under modeling/emf/emf/downloads and modeling/emf/emf/updates, replacing it all with modeling/emf/emf/builds. > Download ZIPs are pruned to the last two years of R-builds, 3 recent > S-builds and perhaps a couple of I-builds and N-builds. Older R-builds are > moved to archive and linked from the downloads page. Seems good, albeit a > bit manual. > Nightly builds and integration builds are generally excluded from the mirrors, but that depends on the naming pattern used. The EMF build job automatically remove stale builds, i.e., at most 5 N builds, and all "stale" milestone builds are removed as soon as there is a milestone build with an incremented version. So it's all completely automatic. I will not move/remove releases at this time; modeling/emf/emf/builds uses 0.07% of the mirror space, so it's not exactly compelling to spend time on this. > P2 repos are not pruned to the same extent, since relevant Wiki authors seem > to have neglected to advocate P2 repo archiving. The release aggregate > therefore has every release ever, costing mirror space and useless content > scanning time. The milestone repo grows and grows unless some enthusiastic > releng manually removes both composite entry and content consistently. > Yes, that is why I provide a "latest" child and have asked on cross-projects for others to do the same, it's pointless to process through large composites when generally (almost inevitably) one ends up resolving to the last version anyway. > Taking EMF as a typical example of a long established project... > > https://download.eclipse.org/oomph/archive/eclipse/modeling/emf/emf/updates/ > index.html#releases > > has 15 release versions from 2.6 to 2.14. Since policies/tooling evolve, > earlier and later releases are somewhere else. > Yes, I'd like to move all stuff under updates and downloads to archive.eclipse.org, but this would have a 0.16% impact so also not the most compelling activity. > Surely we should try to have just the last ?5 years of P2 repo releases in > one place, with all older P2 repos moved to archive without aggregation from > the main releases aggregate? Perhaps a separate archive aggregate might > point at them, perhaps just a Wiki/PMI page. Perhaps for really old > releases, users can be told that the archive ZIPs are the only option. Why > waste archiving space on almost identical P2 repos and ZIPs? Overall modeling/emf/emf uses .27% of the space, so there are definitely *many* projects that could invest time to have a more significant impact. But already we see the mirror size reduced from 1.1T to close to 800G... In principle, the following "automatic" process should work for a folder such as /modeling/emf/emf/updates/ and probably more generally: Copy the entire folder to the corresponding file location in archive.eclipse.org, preserving the path structure. Delete all files in all folders of the original folder tree (but preserving the folder structure). For each folder which is/was a p2 repository in the original folder tree, replace it with a p2 composite that references http(s?)://archive.eclipse.org/<correspond-archived-folder-copy>. Finally, prune empty folders. This way all the older established URLs for p2 repositories continue to work and if what Denis suggests is working properly, mirror URLs in the copied/archived repos do not need to change because archive.eclipse.org will act as a mirror automatically. And if everyone was well-behaved, links would all be using download.php and also would continue to work properly (according to Denis' suggestion): https://www.eclipse.org/downloads/download.php?file=/modeling/emf/emf/builds/release/2.17/EMF-Updates-2.17.zip When I have some spare time, I will experiment with this and test that it actually works. But I will not do something like this manually. It's too time consuming and too error prone!
(In reply to Ed Merks from comment #31) > When I have some spare time, I will experiment with this and test that it > actually works. But I will not do something like this manually. It's too > time consuming and too error prone! It would be great to have something automatic that we could all share since from the moment we have built and tested a P2 repo I think many projects' requirements are identical but independently and often manually implemented. I regularly raise bugs in regard to bad download maintenance. The new EMF Updates page is a huge improvement on its predecessor and a few initial limitations have now vanished. (The traditional alias name such as emf-xsd-Update-2.12.0M6.zip is perhaps the main regression. ?? also pre-release hiding ??) An integration of the EMF Updates page with archiving and the PMI would definitely prompt me to rip-off the technology now that bit rot has set into the PHP underlying the old modeling downloads pages. Bug 534467.
> > It's designed to not return mirrors for internal (to us) hosts. > > Is there perhaps some file in the file system that contains this same > information? The PHP script must compute it from something... Though I > suppose the list of mirrors doesn't often change so I'm being overly picky... The mirrors are stored in a database and the list is dynamic based on the GeoIP lookup of the caller. It's specifically designed to not give a mirror list to callers on our LAN as downloading from mirrors wouldn't make sense. I'm trying to think of ways this could work for you without adding a kludge to the code.
I plan to do this long overdue cleanup for CDT in the coming days. I will be sending an email to cdt-dev as a heads up.
Thanks, Jonah. I've recently seen another ping about this on the mirrors mailing list, so expect me to start yelling about this on multiple channels. ***************** A forewarning ***************** We will -- eventually -- be implementing CBI disk quotas, just as we've implemented CPU and memory quotas, because we do not have enough resources for unlimited storage and free-for-all disk space, and neither do our mirrors. As a reminder, Ed's disk space browser tool is here: https://download.eclipse.org/oomph/archive/eclipse/ Many thanks to all the projects that perform regular housecleaning. Your work is appreciated. If you need help doing this maintenance, please file a separate bug.
Since we no longer have the ability to 'cd' or even find the size of a directory using 'df', please provide information on how you expect us to do this? Or are you planning to provide some web based tools that enable us to do this?
(In reply to Greg Watson from comment #36) > Since we no longer have the ability to 'cd' or even find the size of a > directory using 'df', please provide information on how you expect us to do > this? Or are you planning to provide some web based tools that enable us to > do this? I should have read the previous post!
(In reply to Greg Watson from comment #37) > (In reply to Greg Watson from comment #36) > > Since we no longer have the ability to 'cd' or even find the size of a > > directory using 'df', please provide information on how you expect us to do > > this? Or are you planning to provide some web based tools that enable us to > > do this? > > I should have read the previous post! The report is definitely useful for identifying where you can save space, but the ability to manage the files on disk with the badly crippled set of tools available is definitely a problem.
(In reply to Ed Merks from comment #38) > > The report is definitely useful for identifying where you can save space, > but the ability to manage the files on disk with the badly crippled set of > tools available is definitely a problem. Agreed. Someone will need to figure out how to move directories to archive.eclipse.org with the restricted shell before this can be done.
(In reply to Greg Watson from comment #39) > (In reply to Ed Merks from comment #38) > > > > > The report is definitely useful for identifying where you can save space, > > but the ability to manage the files on disk with the badly crippled set of > > tools available is definitely a problem. > > Agreed. Someone will need to figure out how to move directories to > archive.eclipse.org with the restricted shell before this can be done. You can move directories just fine with the restricted shell: ssh <user>@build.eclipse.org $ mv /home/data/httpd/download.eclipse.org/tools/cdt/dir1 /home/data/httpd/archive.eclipse.org/tools/cdt/
(In reply to Jonah Graham from comment #40) > You can move directories just fine with the restricted shell: I should have said *I* can. But I assume that I am not priviledged in this way. I also find it useful to browse with my file manager to sftp://jograham@build.eclipse.org/home/data/httpd/ baobab, on Linux, can even make pretty charts if Ed's tool (https://download.eclipse.org/oomph/archive/eclipse/) does not work.
Created attachment 280901 [details] screenshot of baobab at work It is fairly fast to use baobab - but I don't know about using it on very large directories. This 25GB (which is most of CDT's mirrored downloads) took less than a minute.
(In reply to Jonah Graham from comment #40) > You can move directories just fine with the restricted shell: or you use a 'shell' Jenkins job to do each of e.g. ssh genie.qvt-oml@projects-storage.eclipse.org cd /home/data/httpd/download.eclipse.org/mmt/qvto/updates/releases ; ant -f /shared/modeling/tools/promotion/manage-composite.xml remove -Dchild.repository=3.4.0 ssh genie.qvt-oml@projects-storage.eclipse.org cd /home/data/httpd/download.eclipse.org/mmt/qvto/updates/releases ; mv 3.4.0 /home/data/httpd/archive.eclipse.org/mmt/qvto/updates/releases ssh genie.qvt-oml@projects-storage.eclipse.org cd /home/data/httpd/archive.eclipse.org/mmt/qvto/updates/releases ; ant -f /shared/modeling/tools/promotion/manage-composite.xml add -Dchild.repository=3.4.0 Another option in /shared/modeling/tools/promotion/manage-composite.xml would be good.
(In reply to Jonah Graham from comment #41) > I also find it useful to browse with my file manager to > sftp://jograham@build.eclipse.org/home/data/httpd/ Provided projects have not put in a custom index.html, as wasonce a good idea, the EF's default 404 page is now quite good. You can happily browse in your favourite browser. (I steadily raise Buzillas against projects that have an infgerior index.html.)
Is it actually useful to have milestones and release candidates available for eternity on the download server? Looking at the report from Eike it looks like every release version has between 4 and 7 sub directories due to the RCs and Ms. To my mind those should be removed the moment the final release becomes available. Is there any document describing the process how to retire RCs and Ms?
I think the process is as easy as "rm -rf" ;-) Of course a project should publicly document the retention policies for their different build types (I, M, S, R) and for M/S builds it should probably be something like "are kept here until the next release".
... and if M/S builds are also offered in a composite repo, it would be great if that was primed with the latest release build until thefirst M/S build of the subsequent release show up.
(In reply to Michael Keppler from comment #45) > Is it actually useful to have milestones and release candidates available > for eternity on the download server? No, these should be removed at some point after the release. Of course updating composites that reference them... Over eager removal is not so great because it's possible and perhaps even likely that your downstream consumers are using your integration builds in their builds, so it would be nasty to potentially break their builds before 2019-12 itself is available. And as Eike mentions leaving an integration composite empty also makes it useless. What I do for EMF is ensure that the whole cleanup process is automated. So only the last 5 nightly builds are retained (and referenced by the composite). For the milestones, the process detects that a new version is being added and then removes all older versions. So this deletes the folders and cleans the composite as soon as I do a milestone build for the next release...
> The report is definitely useful for identifying where you can save space, > but the ability to manage the files on disk with the badly crippled set of > tools available is definitely a problem. Agreed. I filed bug 546528. I think we could address this rather easily with my initial proposal there. > Provided projects have not put in a custom index.html, as wasonce a good > idea, the EF's default 404 page is now quite good. You can happily browse in > your favourite browser. Thanks - agreed the 404 does an honest job of providing users with workarounds.
(In reply to Denis Roy from comment #0) > 71G cdt CDT is done (pending Bug 553887 for a small cleanup from webmaster). CDT is now <9GB on download, approx 5GB in mirrored directories. The tools.cdt archive is up to 100GB - however I plan to delete 70GB of it (old builds dating back to 2007). (In reply to Denis Roy from comment #16) > I've made a small change to our 404 handler, for file requests only. This was very helpful in doing this cleanup - Thank you!
Created attachment 283412 [details] Screenshot The 404 handler on download and archive can now give you options on your project's downloads if you're logged into https://eclipse.org From the Download server, files/folders can be moved to same location on the Archive server. If the parent directory doesn't exist on Archive, the action will fail. From the Archive server, files/folders can be deleted permanently.
Moved to GitLab Helpdesk: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/78