HN is flooded by "archive.ph" campaign which injects Russian (and other) trackin...

cato_the_elder · on July 1, 2022

It does do archiving, and it does it very well. It's fast, provides short URLs, blocks ads in the archived pages, tries to bypass paywalls and login walls, and resists censorship. That's why people use it.

And it's funded by ads, so yes, it does have tracking scripts, just like a good chunk of the internet.

If you are privacy-conscious (or just don't like ads), then you can use uBlock Origin to get rid of them.

rootusrootus · on July 1, 2022

So it blocks ads making the content producer money, then puts in some ads of its own along with trackers. This is good because?

cato_the_elder · on July 1, 2022

> This is good because?

Two reasons. One is that ads probably aren't worth archiving, and removing the bloat makes the service run smoother (and it's easier to block the archive.today ads, than it is to block all those popups, random redirects, etc.).

The other is that "content producers" aren't necessarily good guys. Depriving all those clickbait manufacturers of clicks when you are hate-reading them is probably a good things.

hericium · on July 1, 2022

Why would I want to go to JS-infested copy of a site which original version does not _require_ JS? Advice to do so and use uBlock to block something that wouldn't be run on original site is ridiculous.

If you guys do archiving, stop spamming HN with links that are fresh and don't need archiving, just to run your tracking on victims' devices for profit. How is it not just stealing content from other sites?

> It's fast, provides short URLs, blocks ads

> And it's funded by ads, so yes, it does have tracking scripts

So which one is it...?

cato_the_elder · on July 1, 2022

> If you guys do archiving

It's not my website. Also, people who run it (almost certainly) owe us nothing.

> So which one is it...?

Try fetching a URL on both Internet Archive [1] and archive.today, and you'll know what I mean.

[1]: https://web.archive.org/

freeflight · on July 1, 2022

> If you guys do archiving, stop spamming HN with links that are fresh and don't need archiving

Just because a link is fresh, does not mean that it does not need archiving.

If that link is a news article that is then taken offline, or changed, it will trivially be forgotten like it never existed, and you will have no evidence to prove that ever happened.

And it very much does happen, the www of the year 2022 has gotten scarily good in forgetting and burying stuff trough SEO and delisting.

jjulius · on July 1, 2022

>Advice to do so and use uBlock to block something that wouldn't be run on original site is ridiculous.

You generally browse the internet without the use of uBlock or something similar?

oefrha · on July 1, 2022

I’ve posted archive.is links to HN because TFA was paywalled or hugged. Glad to know I joined up with a “campaign” to flood HN with Russian tracking. /s

Guess why people don’t post web.archive.org links more? Because it’s frigging slooooow and frequently 500s itself. God forbid people flock to a more reliable service.

cronix · on July 1, 2022

Neither. It bypasses paywalls and opens the discussion up to everyone, which is the main reason why it's used on HN, or the discussion on each article would only be between the (relatively few) subscribers of that particular site and those who haven't used up the "3 free views for this month" quota.

hericium · on July 1, 2022

Horseshit.

Fire up search on the bottom on the page and tell me what percent of "archived" sites don't require JS at all (so don't have any "pay us or fu" overlays popping up). I can tell you right now that 0% of those tracking-infested archive.ph copies of original sites work without JS on Firefox on Linux and certain other configurations.

As in: without JS, you can't see the stolen content on archive.ph while you can see it w/o JS (in most cases) under original addresses.

Most recent comment-link from 16 minutes ago:

    > curl -s https://archive.ph/EHxch | grep \\.js

     <script type="text/javascript" src="//go.ezoic.net/ezoic/ezoic.js"></script>

    <script src="https://www.google.com/recaptcha/api.js?onload=onloadCallback&render=explicit" async defer>    </script>
 //     .then(response => response.json());

Would you run those, mail.ru's and other tracking scripts await on the copy of original page.

This is your "discussion open to anyone"?

LordDragonfang · on July 1, 2022

Archive links get posted in the comments to bypass paywalls and make sure the OP is available if the site gets hugged to death. It's not some conspiracy. Also, I usually see it used for sites that are absolutely bloated with javascript, so I don't know what you're stalking about there.

hericium · on July 1, 2022

archive.org is an archive site

archive.ph is a tracking-infested stolen content hosting which replaces original ads for profit

jjulius · on July 1, 2022

>... which injects Russian (and other) tracking scripts to copies of crawled 3rd party site...

Source?

ta988 · on July 1, 2022

Simply go to archive.ph and look at a website you get google.com (on front page only), buysellads.net (based in US from a quick search), mail.ru (this one Russian obviously). At least that's what I see on a couple of tries.

hericium · on July 1, 2022

Apologies but at the moment I'm on Firefox on Linux and can't see anything past a page pretending to look like Cloudflare's but with Google's recaptcha.