CoH Website - Reference Archive

Felderburg · September 03, 2012, 05:42:16 PM

Given that there is a possibility the CoH website may not exist in the future, it seems that there should be an effort to archive the parts of it that relate to lore - particularly those referenced in the Wiki. I say this because I was looking through the wiki, part of a return to CoH and CoH related stuff, and noticed several references that direct a reader to the CoH website. It seems pretty clear that the wiki itself will go on for a while, and I'd hate for it to have links or references to things that don't exist, or that should be archived somewhere.

I'm not sure how that archive would work, but at the very least it seems we could just copy / paste pages like the Coralax description page from the website: http://na.cityofheroes.com/en/game_info/know_your_adversary/coralax_hybrids.php

Aggelakis · September 03, 2012, 09:36:17 PM

Can anyone write up a script/program/something to request the Wayback Machine to visit pages on cityofheroes.com? There is a TON of content pages there. The Wayback Machine archives stuff based partly off requests, so if we request a whole bunch of pages of the main site, it will be preserved there and *we* don't have to host it :)

http://archive.org/web/web.php

eabrace · September 04, 2012, 12:25:18 AM

Quote from: Aggelakis on September 03, 2012, 09:36:17 PM
Can anyone write up a script/program/something to request the Wayback Machine to visit pages on cityofheroes.com? There is a TON of content pages there. The Wayback Machine archives stuff based partly off requests, so if we request a whole bunch of pages of the main site, it will be preserved there and *we* don't have to host it :)

That's a really good idea.

Sekoia · September 04, 2012, 02:31:59 PM

From http://archive.org/about/faqs.php#The_Wayback_Machine -- blue is my highlighting:

QuoteHow can I get my site included in the Wayback Machine?

Much of our archived web data comes from our own crawls or from Alexa Internet's crawls. Neither organization has a "crawl my site now!" submission process. Internet Archive's crawls tend to find sites that are well linked from other sites. The best way to ensure that we find your web site is to make sure it is included in online directories and that similar/related sites link to you.

Alexa Internet uses its own methods to discover sites to crawl. It may be helpful to install the free Alexa toolbar and visit the site you want crawled to make sure they know about it.

Regardless of who is crawling the site, you should ensure that your site's 'robots.txt' rules and in-page META robots directives do not tell crawlers to avoid your site.

When a site is crawled, there is usually at least a 6-month lag, and sometimes as much as a 24-month lag, between the date that web pages are crawled and when they appear in the Wayback Machine.

In some cases, crawled content from certain projects may appear in a much shorter timeframe — as little as a few weeks from when it was crawled. Older material for the same pages and sites may still appear separately, months later.

So the Alexa toolbar may help. However, we won't know if they archived until the content shows up, which will be at least 6 months, at which point the originals will likely be gone.

So I wouldn't count on the Internet Archive since we won't know if it worked until it's too late.

Mister Bison · September 04, 2012, 03:51:10 PM

Can't we just HTTrack it ?

http://www.httrack.com/

works pretty well, you can even put filters on pages to dump or not, what to do of dangling links and make it stop after a precise depth, and so on...

Wouldn't work with the forum, but definitely should work with the official site.

Felderburg · September 05, 2012, 09:17:22 PM

Quote from: Mister Bison on September 04, 2012, 03:51:10 PM
Can't we just HTTrack it ?

http://www.httrack.com/

works pretty well, you can even put filters on pages to dump or not, what to do of dangling links and make it stop after a precise depth, and so on...

Wouldn't work with the forum, but definitely should work with the official site.

That looks like it would be pretty useful for an individual person, but can you use it make websites available to anyone? It says you can copy and send projects, but do you need HTTrack to view an HTTrack downloaded website? I can't seem to find that in the FAQs. Either way, it'd definitely be good for a few people to have the website loaded using this page.

Sekoia · September 05, 2012, 09:46:37 PM

Quote from: Felderburg on September 05, 2012, 09:17:22 PM
That looks like it would be pretty useful for an individual person, but can you use it make websites available to anyone? It says you can copy and send projects, but do you need HTTrack to view an HTTrack downloaded website? I can't seem to find that in the FAQs. Either way, it'd definitely be good for a few people to have the website loaded using this page.

It generates HTML output, so you can view it in a web browser normally (no need to have HTTrack to view it).

Making a whole website's mirror publicly available is a pretty big copyright violation. If anyone were to make such a mirror, I'd recommend against posting it up on the web for now as it might just serve to antagonize NCsoft (which we don't want to do).

However, there's nothing stopping anyone from making a local copy and sitting on it, just in case. Or from zipping it up and sharing it non-publicly. :)

CoH Website - Reference Archive

Felderburg

Aggelakis

eabrace

Sekoia

Mister Bison

Felderburg

Sekoia