Why, when we reading article about leaked information, why don't we get to see o...

bobthepanda · on Nov 15, 2020

Generally speaking, uploading the original unaltered document is a great way of accidentally outing your source, since often printed materials and even digital copies have some sort of uniquely identifying marks.

izacus · on Nov 15, 2020

It's also an amazing way to provide context and sabotage the narrative agenda the news article is pushing. We can't have that kind of nuance in modern media.

bobthepanda · on Nov 15, 2020

You say this as if it was ever the case in pre-modern media either.

skinkestek · on Nov 15, 2020

Extremely valid point, I'd consider this close to table stakes.

Also, in some more "interesting" settings there's also exploit-loaded documents to be aware of, here's from a leak I've been following lately:

> If you really want to open a Word document from Psy-Group, please go ahead, knock yourself out. Below are the links to the Word document as well as the email. Just remember that Mr. [...] mistake might have been opening a Word file from Psy-Group in the first place…

I took the liberty to remove the name as I guess that particular guy is probably suffering enough at the moment.

intricatedetail · on Nov 15, 2020

Documents may also have a unique wording for each recipient and the source could be easily identified by that by whoever created a document.

piaste · on Nov 15, 2020

The full English prose text, minus headers or footers, would still provide almost all required contest to inform the reader without the fingerprinting.

ohgodplsno · on Nov 15, 2020

What if there are ten variants, all with slightly modified wording, allowing knowing immediately who leaked it?

bobthepanda · on Nov 15, 2020

You don't even need that. Documents have been identified before because some versions replace characters with nearly identical looking but different unicode characters (say, the various variations of spaces, or the semicolon with the Greek question mark.)

https://en.wikipedia.org/wiki/Whitespace_character

https://en.wikipedia.org/wiki/Question_mark#Greek_question_m...

piaste · on Nov 15, 2020

Yes, I've seen that episode of Game of Thrones too :)

First, consider the requirements to set such a trap. The authors of the document need to be actively concerned about a leaker, and to be OK with the document itself being leaked as long as they catch the culprit - at the same time, they need the document to be juicy enough that it will be leaked. They need to share the document in such a way that no two of the suspects will be able to compare notes, otherwise the jig is up. So no putting the file on a common internal resource (unless the server can stealthily serve different versions based on the user's login data); no attachments, else a reply all / forward would reveal the trap; no collaboration; no physical office where two suspects may see each other's copy.

Is that still possible? Yes. But a _lot_ of times it won't be possible, and the would-be leaker will know it's not possible. It's much more likely, and makes much more sense, for critical documents to be shared in such a way that the users _know_ they are fingerprinted, and won't leak them. IIRC, major Hollywood studios do that with their film scripts.

Second, what if the _key phrases_ are slightly altered in each version? Or hell, if your bosses want to finger you so bad, what's if they changed a small factual detail in each version? Then even the journalist quotes would reveal the leaker.

bobthepanda · on Nov 15, 2020

The not-so-great news is that common characters like spaces and semicolons have various similar-looking characters defined in Unicode, which would not be very noticeable to a human but would be noticeable to a machine.

So you just need to do random substitutions that uniquely identify the document and you'll have a fingerprint. It wouldn't be very challenging to do and it wouldn't be very challenging of a record to maintain.

You also don't need to uniquely identify it to a person; you just need to narrow the search space and then apply other techniques that would narrow it down. If it's a version of a document that leaked through an email chain then you've just limited the search space to the recipients, which is still plenty useful.

darkwater · on Nov 15, 2020

Then inevitably somebody would complain that the original document wording might have been altered.

piaste · on Nov 15, 2020

As opposed to a PDF scan which can definitely not be forged at all? ;)

Nothing less than a digital signature can prove the integrity of a digital document, and even that is worthless unless the corresponding public key has been publicly made available via a separate and trusted channel, which is unlikely.

bobthepanda · on Nov 15, 2020

Anything that can be used to prove a document's integrity can generally also be used to identify where it came from and how it was produced, which is why we generally don't see any effort to do this at all.

In fact, plenty of things that can't prove a document's integrity can also be used to identify its source, which is why this isn't done; you can't be sure that you've sanitized the document enough to protect the leaker.

foepys · on Nov 15, 2020

The NSA once got handed a copy of a whistleblowed document by a journalist to confirm its authenticity and they tracked the source down within days.

https://www.theatlantic.com/technology/archive/2017/06/the-m...

mannykannot · on Nov 15, 2020

This seems moot, as Google is denying neither the existence of this document, nor its tenor.

crocodiletears · on Nov 15, 2020

Conclusions are dangerous, and direct exposure to unregulated information may result in the wrong (potentially dangerous) ones.

You need the disciplined mind of a journalist to filter out the most dangerous elements of context and content so your mind isn't dazzled and deranged by the intricacies of Google's corporate strategies.

But, in all seriousness, there are a number of reasons at play here.

Firstly, they want to increase the friction required for you to leave the site. News publications are all but interchangeable anymore, and if they're reporting a story second-hand, then they don't want you to realize you could get more information by swimming upstream.

Secondly, they want to milk it. There's very little information in this article. If there's anything else worth knowing in the document, they probably want to atomize it into multiple articles to stimulate pageviews.

Thirdly, if you have the document, and the document is short, the journalist likely contributes no value to the system. Short term strategy documents like the one described in the article are often concise and easily consumed by members of the public. The document likely outlines the context in-which it was written, and then contains one or two pages of a proposed strategy (if that, going by the sparseness of the quotes). Almost anybody curious enough to actually read the article beyond the headline would likely be better served by the document itself.

Great journalism has been done over leaked documents. The Snowden leaks involved reams of content produced by intelligence concerning a plethora of subjects, for a diversity of purposes. Weaving that information together into a coherent story that exposed the broad strokes of what the NSA was up to was great journalism.

Likewise, great journalism can be done by distilling and contextualizing singular documents that are long, nuanced, and require domain knowledge.

This likely isn't one of those circumstances. If you had the document, you wouldn't need the writer, who may not even have seen the original document (the story dates at least back to the 29th [0]).

Fourthly, if you had the document, you would be able to check the author's work. This has been an underconsidered thorn in journalist's side for some time. Different publications have different content production policies. A writer may be expected to produce three to four pieces a day. In this context, even when they aren't trying to manipulate information push a narrative or belief on you (though they often are), the writer may only have skimmed the original document before putting out their piece. As a result, they may have been wrong in some essential, or technical manner that astute readers could pick up on.

This would diminish the writer's reputation, as well as the publication, and the journalistic field's.

[0] https://www.gizchina.com/2020/10/29/google-launches-a-new-st...

bobthepanda · on Nov 15, 2020

5) Generally speaking, uploading unaltered documents is a great way to burn your source, since both printed and digital copies of documents can be easily fingerprinted.

Massive one-time dumps like Chelsea Manning's or Edward Snowden's are not the norm when it comes to leaks; leakers often stay at their position for quite some time, and journalists have every incentive to protect their ability to get inside information

StreamBright · on Nov 15, 2020

That’s 90s like, today you get the interpretation either by highly partisan media or big tech.

AmericanChopper · on Nov 15, 2020

If you keep you pay enough attention, I think you'd find that primary sources have nearly completely disappeared from the news media. They'd much rather tell you what to think, rather than risk having you make your own mind up.