Monday, March 21, 2016

Sci-Hub, LibGen, and Total Information Awareness

"Good thing downloads NOT trackable!" was one twitter response to my post imagining a skirmish in the imminent scholarly publishing copyright war.

"You wish!" I responded.

Sooner or later, such illusions of privacy will fail spectacularly, and people will get hurt.

I had been in no hurry to see what the Sci-Hub furor was about. After writing frequently about piracy in the ebook industry, I figured that Sci-Hub would be just another copyright-flouting, adware-infested Russian website. When I finally took a look, I saw that Sci-Hub is a surprisingly sophisticated website that does a good job of facilitating evasion of research article paywalls. It styles itself as "the first pirate website in the world to provide mass and public access to tens of millions of research papers" and aspires to the righteous liberation of knowledge. David Rosenthal has written a rather comprehensive overview of the controversy surrounding it.

I also observed how easy it would be to track all the downloads being made via Sci-Hub. Today's internet is an environment where someone is tracking everything, and in the case of Sci-Hub, everything is being tracked.

My follow-up article was going to describe all the places that could track downloads via Sci-Hub, and how easy it would be to obtain a list of individuals who had downloaded or uploaded a Sci-Hub article – in violation of the laws currently governing copyright. But Sci-Hub is not doing things in the usual way of pirate websites. They're actually working to improve  user privacy. Around the time of my last post, they implemented HTTPS (SSLLabs grade: B) on their website. So instead of inducing users to announce their downloading activity to fellow WiFi users and every ISP on the planet, which is what Sci-Hub was doing in February, today Sci-Hub only registers download activity with Yandex Metrics, the Russian equivalent of Google Analytics.

As long as you trust a Russian internet company to NEVER monetize data about you by selling it to people with more money than good sense, you're not being betrayed by Sci-Hub. Unless the data SOMEHOW falls into the wrong hands.

There are more ways to track Sci-Hub downloads. Many of the downloads facilitated by Sci-Hub are fulfilled by a.k.a. "Library Genesis". LibGen is doing things in the usual way of pirate websites. The LibGen site does NOT support encryption, and it makes money by running advertising served by Google. As a result, Google gets informed of every LibGen download, and if a user has ever registered with Google, then Google knows exactly who they are, what they've downloaded and when they downloaded it. So to get a big list of downloaders, you'd just need to get Google to fork it over.

History suggests that copyright owners will eventually try to sue or otherwise monetize downloaders, and will be successful. In today's ad-network-created Total Information Awareness environment, it might even be a viable business model.

The best solution for a user wanting to download articles privately is to use the Tor Browser and Sci-Hub's onion address, http://scihub22266oqcxt.onion. Onion addresses provide encryption all the way to the destination, and since SciHub uses LibGen's onion address for linking, neither connection can be snooped by the network. Google and Yandex still get informed of all download activity, but the Tor browser hides the user's identity from them. ...Unless the user slips up and reveal their identity to another web site while using Tor.

Since .onion addresses don't use the DNS system (they won't work outside the Tor network), they won't be affected by legal attacks on the .io registrar. If you use the address in the Tor Browser, your downloads from can be monitored (and perhaps tampered with) by inquisitive exit nodes, so be sure to use the .onion address for privacy and security. I would also recommend using "medium-high" security mode (Onion > Privacy and Security Settings).

It might also be a good idea to use the Tor Browser if you want read research articles in private, even in journals you've paid for; medical journals seem to be the worst of the bunch with respect to privacy.

If publishers begin to take Sci-Hub countermeasures seriously (Library Loon has a good summary of the horribles to expect) there will be more things to worry about. PDFs can be loaded with privacy attacks in many ways, ranging from embedded security exploits to usage-monitoring links.

This isn't going to be fun for anyone.

Monday, March 7, 2016

Inside a 2016 Big Deal Negotiation...

Dramatis Personae: 
  • A Sales Representative from STM Corporation
  • An Acquisitions Librarian at Prestige University.

STM Corp Sales Rep: It's so nice to see you! We have some exciting news about your Big Deal renewal contract!

PU Acquisitions Librarian: Actually, I'm afraid we have some bad news for you. The Acquisitions Committee has had to make some cutbacks...

Sales Rep: I'm sorry to hear that. In fact, we also have some disturbing data to show you.

Librarian: We've been studying our usage data, and STM Corp's journals aren't seeing the usage we'd expected.

Sales Rep: Funny you should mention that, because STM Corp's Big Deal service has implemented a new "Total Information Awareness (TIA)" system that will answer all your usage questions. The TIA system monitors usage of our articles however they are acquired, and pinpoints the users, whoever and where ever they are. Our customers have been wanting this information for years, and now we can provide it.

Librarian: Now that's interesting. We've been discussing whether that sort of data could improve our services, but as librarians we need to respect the privacy of our users.

Sales Rep: Of course! And as publishers, we need to protect our services from unauthorized access and piracy.

Librarian: ... and our license agreements oblige us to respond to those concerns.

Sales Rep: I'm so glad you understand! But the TIA has exposed some disturbing information about journal usage on your campus.

Librarian: Yes, usage is dropping, That's what we wanted to discuss with you.

Sales Rep: Actually, total usage is increasing. It's just licensed usage that's dropping. Illicit usage is going through the roof!

Librarian: What do you mean?

Sales Rep: Have you heard of a website called Sci-Hub?

Librarian: [suppressing smile] Why yes...

Sales Rep: It seems that students and faculty on your campus have been accessing our articles via Sci-Hub quite a lot, and have been uploading...

Librarian: [starting to worry] We would never condone that! Using articles from Sci-Hub is likely copyright infringement in our jurisdiction. And uploading articles would be a violation of our campus policies!

Sales Rep: Exactly! Which is why we wanted you to see this data.

Librarian: [scanning several pages] But.. but this is a list of hundreds of our students and faculty, including some of our most prominent scientists!

Sales Rep: [grinning] ... each of them potentially facing hundreds of thousands of dollars of statutory damages for copyright infringement. Even career-ending litigation. It's such a blessing for you that we would never pursue legal actions that would hurt a good customer like Prestige U. Now about your renewal...

Librarian: Where did this list come from?

Sales Rep: As I said before, STM Corp's "Total Information Awareness" system monitors usage of our articles and pinpoints the users. You said before you had some bad news for us?

Librarian: Umm... we need to make some cutbacks.

Sales Rep: [smug] Well, then you'll be happy to know that we're limiting your big deal price to just a 19% increase over last time.

Librarian: [non-gendered expression of profound despair] ... and our Dean who's been using Sci-Hub?

Sales Rep: Sci-Hub? never heard of it.

Librarian: [resigned] OK, send us the invoice.

[Everything in this drama is fictitious except Sci-Hub and TIA. more next time.]