oreogay.blogg.se

The way back machine
The way back machine





the way back machine
  1. The way back machine update#
  2. The way back machine archive#
  3. The way back machine download#

And that excludes external assets: images, JS, CSS, etc. This seems typical of much the modern Web. When scoping out the size of Google+, one of ArchiveTeam's recent projects, it emerged that the typical size of a post was roughly 120 bytes, but total page weight a minimum of 1 MB, for a 1% payload to throw-weight ratio. It's a rabbit hole, really, when you start to think about this. Protocols such as ORE or Memento further try to cater to this issue. In the academic world, DOI or are trying to solve this problem. That's where solutions such as URL resolving come into play. Part of ensuring persistence is the responsibility of original publisher. Moreover, when you link to the WB machine, what do you link to? A specific archived version? Or the overview page with many different archived versions? Which of those versions is currently endorsed by the original publisher, and which are deprecated? How do you know this? Moreover, the author can explicitly point to the original URI as the "canonical" URI in the HTML head of the document. The "original" URI still does carry the most authority, as that's the domain on which the content was first published. That's not something technology itself solves. Over a long period of time, no one can truly guarantee the persistence of a relationship between an URI and the resource it references to. Though, such promises are just that: promises.

The way back machine archive#

It's an okay idea to link to WB, because (a) it's de facto assumed to be authoritative by the wider global community and (b) as an archive it provides a promise that it's URL's will keep pointing to the archived content come what may. So, this is the problem of persistence of URL's always referencing the original content, regardless of where it is hosted, in an authoritative way. I am thinking of data products here, but even if the 'product' is a paper, presentation, or report that involves human judgements there should be a structured process to propagate changes.

the way back machine

and in that case you can revert to "known good" inputs If something goes wrong there are sufficient diagnostics and tests that would show invariants are broken, or that the system can't tell how many fingers you are holding upĤ.

The way back machine download#

If the inputs change the system should normally be able to download updated versions of the inputs, apply the process and produce good outputsģ. The system archives the original inputs and the process to create refined data outputsĢ. My central use case is that I might 'scrape' content from sources such asĪnd have the process be "repeatable" in the sense that:ġ. To please everyone (other than the owner) you'd need to look at the content across time, versions, alternate world views. Some consumers will want the latest and greatest content. I also probably wouldn't run it on every build because it would take a while, but once a week or once a month would probably do it. I think it would probably need to treat redirects like broken links given the prevalence of corporate sites where content is simply removed and redirected to the homepage, or geo-locked and redirected to the homepage in other locales (I'm looking at you and your international warranty, and access to tutorials, Fender.

The way back machine update#

What I can see, and I don't know if it exists yet (a quick search suggests perhaps not), is some build task that will check all links and replace those that are broken with links to WayBackMachine, or (perhaps better) generate a report of broken links and allow me to update them manually just in case a site or two happen to be down when my build runs. Similarly, when I link to other content, I want to show its creators the same courtesy by linking directly to their content rather than WayBackMachine. Apart from anything else linking to WayBackMachine only drives traffic to WayBackMachine, not my site. I can see it for corporate sites where they change content, remove pages, and break links without a moment's consideration.īut for my personal site, for example, I'd much rather you link to me directly rather than content in WayBackMachine. It's called the world wide web for a reason, and this isn't helping. I'm not sure I'm a fan of this because it just turns WayBackMachine into another content silo.







The way back machine