Wednesday, August 13, 2014

Libraries are Giving Away the User-Privacy Store

AddThis makes some really nice widgets. Here are some for sharing this blogpost:

ShareThis is another company that does pretty much the same thing. Their share buttons are down at the end of the post. AddThis is bigger. It provides "behavioral, contextual, and interest based data that spans across hundreds of content categories and topics, reaching 1.7 billion uniques a month."

The widgets help users share your content. At the same time, AddThis and ShareThis widgets help a publisher figure out who is sharing what, while distributing the content into other websites. To do this, they track users, see what sort of web sites they like. They can also work with advertising networks to improve the relevancy of ads shown to users. The user tracking works by setting user cookies, or "web beacons" that enable the tracking of users across websites. In the case of AddThis, users are also tracked using "Canvas Fingerprinting", a technique that works even when a user blocks cookie tracking. ProPublica recently wrote about this technology, calling it the "Online Tracking Device that's Nearly Impossible to Block".

Here's what the ShareThis Privacy Policy says:
In some cases, if you have chosen to make PII (like your name) publicly available through third party sites like social networks, we may seek your consent to use that PII in connection with services we offer in conjunction with our partners. We will not disclose your PII without your consent.
We and our publisher, advertiser and ad network partners also use this data for other related purposes (for example, to do research regarding the results of our online advertising campaigns or to better understand the interests or activities of users of the ShareThis Services).
Similarly, AddThis says:
When an End User downloads a page that contains an AddThis Button, we may deploy a cookie on our own behalf or on behalf of our data partners, to record information about how an End User uses the web, such as the web search that landed the End User on a particular page or categories of the End User's interests. We may use the Data to target advertising toward the End User or authorize others to do the same. 
Many websites are using Google Analytics to measure usage; they let Google track their users in the same way (the website I run,, uses Google Analytics). However, the Analytics terms of service seem not to allow Google to share the collected data as freely as AddThis and ShareThis do.

Both AddThis and ShareThis assert in the legal terms that they mustn't collect usage information from children, so if children use your site, you're not supposed to use these services. Google Analytics does not have this restriction, which presumably means they can't use their data to advertise to children.

Together with "Cookie Syncing" and "Evercookies", the cumulative effect of all this tracking is that website users can be pretty comprehensively tracked, and if need be, identified, whether they like it or not. In exchange for deploying the trackers, websites get access to the valuable pool of information about their users.

Matt Mullenweg (of WordPress) has an interesting perspective:
services like AddThis and ShareThis will always spy on and tag your audience when you use their widgets, and you should avoid them if you care about that sort of thing.
This puts libraries in somewhat of a quandary. Traditionally, libraries have been havens of privacy for their users. Librarians have famously gone to jail for their refusal to turn over circulation records to law enforcement. But it seems that libraries are not much protecting their users from the sort of information gathering done by AddThis, ShareThis, and Google. For example, New York Public Library uses Google Analytics and ShareThis. OCLC and Worldcat use AddThis. My own public library catalog (hosted by BCCLS)  sets cookies for AddThis. I suppose they don't consider that their websites could be directed at children. Even the American Library Association's webpage extolling the important of privacy in libraries makes use of Google Analytics. (ironically, the link to a website privacy policy is broken on that page!)

It's true that these trackers are very common- even has employed AddThis buttons. But it seems to me that if libraries still think that user privacy is valuable  in this age of social media, they need to rethink out their use of web user tracking companies. What disturbs me most is there hasn't been much public discussion about the future role of privacy in library websites, even as it's rapidly being lost.

Update (Aug 15): AddThis says they're not using canvas fingerprinting and have terminated their test of it. I don't think this really changes the cost/benefit analysis for libraries. It remains true that libraries that use AddThis or ShareThis are allowing a third party to track their patrons' catalog browsing (not just their social sharing), under terms which permit the companies to use the data for advertising purposes. Use of Google Analytics allows Google to do the same tracking, but does not appear to permit use for advertising. Either way, libraries need to make informed choices and communicate those choices to their users. Same for Facebook "Like" buttons. Commercial sites, obviously, have different priorities and responsibilities.

Update (Aug 19): There are a number of free open-source solutions available both for social sharing and for analytics. There's a very useful discussion of these issues on Hacker News.

Thursday, July 31, 2014

Don't Bother Reading "Acts of the Apostles"

Read Biodigital instead.

After reviewing John Sundman's Biodigital, I promised to report back after reading Acts of the Apostles which shares about 60% of its text.

It's very unusual for a lay reader to have access to two versions of a book in this way. Biodigital is partly the result of the sort of editorial work that goes on behind the scenes of publishing, and to read Acts is to become aware of sausage making that is usually invisible.

The bottom line is that Biodigital is a much better book. You won't miss anything if you skip Acts. While there's a lot of tightening here and there, there are two big changes which lead me to urge you to set aside Acts.

The first is Gordon Biersch, which has been removed from the book. Gordon Biersch opened in 1988 on Emerson Street in Palo Alto, California. I remember when it opened, it was a revelation. The beer was pretty good, and the food was designed to go with the beer. Today, this sort of place has a name: "gastro-pub", but back in 1988, that word didn't exist, at least in the vocabulary of grad students like me. Yuppies flocked to the place and by the time Sundman was writing Acts, it signified everything good and bad about Silicon Valley. But since then, Gordon Biersch has gone all Vegas. No really, the founders were bought out by money from Las Vegas. Today, there's a Gordon Biersch gastropub in 34 places where restaurants are allowed to brew beer, including 4 in Taiwan. It's owned by the same company that owns "Rock Bottom" brewpubs.

In Biodigital, the events that occurred at Gordon Biersch have been moved a mile or so southeast to Antonio's Nut House. Antonio's is still around. Like everything else in the area, it's changed, but it's not like Silicon Valley changed into Las Vegas. It's like Sun Microsystems changed into Google. I went and had a beer there when I was visiting earlier this month. I took pictures. Google maps has a walk-through view.

View Larger Map

The other big change is the book's depiction of Bartlett Aubrey. Bartlett, the estranged wife of hero Nick Aubrey, is supposed to be a brilliant molecular biologist, but in Acts, she mostly has big breasts. It's not a realistic portrait at all, more of an adolescent fantasy character. In Biodigital, references to Bartlett's breasts are cut by 50%, and I swear that's not why I thought the character was a lot smarter than in Acts.

So, support your local author. Or your local beer bar. Better yet, do both at the same time.

Thursday, July 10, 2014

"Subtleism" is a Useful Word

Allison Kaptur has written about the last of Hacker School's lightweight social rules: "No Subtle -isms":
Our last social rule, "No subtle -isms," bans subtle racism, sexism, homophobia, transphobia, and other kinds of bias. Like the first three rules, it's targeting subtle, accidental, mildly hurtful behavior. This rule isn't targeting slurs, harassment, or threats. These kinds of severe violations would have consequences, up to and including expelling someone from Hacker School. 
Breaking the fourth social rule, like breaking any other social rule, is an accident and a small thing. In theory, someone should be able to say "Hey, that was subtly sexist," get the response "Oops, sorry!" and move on just as easily as if they'd well-actually'ed. In practice, people are less likely to point out when this rule is broken, and more likely to be defensive if they were the rule-breaker. We'd like to change this.
When this was explained to me by Hacker School Co-Founder Sonali Sridhar, I thought it was brilliant, but I heard "subtle -ism" as a single word, "subtleism". "Subtleism" conveyed to me the concept that something could be harmless by itself, but multiplied by a thousand could be oppressive. So for example, using "you guys" for the second person plural when both men and women are included, is never meant to be sexist, and is rarely taken the wrong way. But an ocean of hundreds or even thousands of tiny, insignificant locutions like "you guys" can drown even a strong swimmer.

The reason subtleism is a useful word is that it can convey forgiveness in a context of working together to create a culture that is supportive of a diverse team. Reminding someone of a subtleism doesn't need to be a "shaming ritual"; after all, everyone uses subtleisms all the time. Compare the word "micro-aggression", which is used as an accusation or a lamentation.

Also, the word we should be using for the second person plural is "youse".

Sunday, June 29, 2014

Is Freemium Really Open Access?

Should the term "Open Access" be restricted to materials with licenses that allow redistribution, like Creative Commons licenses? Or, as some advocate, only materials that allow remixing and commercial re-use, like CC-BY and CC-BY-SA?

I had lunch today with folks from OpenEditions, a French publishing organization whose ebook effort I've been admiring for a while. They're here in Las Vegas at the American Library Association Conference, hoping to get libraries interested in the 1,428 ebooks they have on their platform. (Booth 1437!)

Of those 1,076 books are in a program they call "Open Access Fremium". With these books, you can read them on the OpenEditions website for free, without having to register or anything. You can even embed them into your blog. So for example, here's Opinion Mining et Sentiment Analysis by Dominique Boullier and Audrey Lohard:

So is it OpenAccess™?

In this freemium model, the main product that's being sold is access to the downloadable ebook- whether PDF or EPUB. For libraries, a subscription allows for unlimited access with IP address authentication along with additional services. Creative Commons licenses, all of which allow for format conversion, wouldn't work for this business model because the free HTML could easily be converted into EPUB and PDF. They have their own license, you can read it here.

This is clearly not completely open, but there's no doubt that it's usefully open. For me, the biggest problem is that if OpenEditions goes away for some reason- business, politics, natural disaster, or stupidity, then the ebooks disappear. Similarly, if OpenEditions policies change or urls move, they could break the embed.

On the plus side, OpenEditions have convinced a group of normally conservative publishers of the advantages of creating usefully open versions of over a thousand books. It's a step in the right direction.

Saturday, June 28, 2014

Overdrive is Making My Crazy Dream Come True

Fifteen years ago, I had this crazy dream. I imagined that popular websites would use fancy links to let their readers get books from their local libraries. And that search engines would prefer these links because their users would love to have access to their library books. I built a linking technology and tried to get people to use it. It never took off. I went on to do other things, but it was a good dream.

Tonight, at the opening of the exhibits at the American Library Association in Las Vegas, Steve Potash, the Founder of Overdrive, pulled me aside and said he had something cool to show me.
  1. Go search for a popular book on Bing or try this one.
  2. Notice the infobox on the right. Look at the Read This Book link. Click it.
  3. Now check out this Huffington Post article. Note the embedded book sample.
  4. If the Overdrive system recognizes you, it's taken you to your library's overdrive collection. If not, when you click "Borrow" you get a list of Overdrive libraries near you.
It's an embed from Overdrive. Even works here:


The read on site thing works sometimes and doesn't work sometimes, so there are still a bunch of kinks for Overdrive to work out. But that's not really the point.

The reason why the Huffposts and Buzzfeeds of the world like this is not so much the customization of a link, which is what I was trying to sell, but rather the fact the the book is embedded on the host web site. This keeps people on their site longer, and they click more ads.

Embeds are the magic of the day. You heard it here first.

Speaking of dreams, I'm having a hard time in Vegas figuring out what's real and what isn't.