Wednesday, December 31, 2014

The Year Amazon Failed Calculus

In August, Amazon sent me a remarkable email containing a treatise on ebook pricing. I quote from it:
... e-books are highly price elastic. This means that when the price goes down, customers buy much more. We've quantified the price elasticity of e-books from repeated measurements across many titles. For every copy an e-book would sell at $14.99, it would sell 1.74 copies if priced at $9.99. So, for example, if customers would buy 100,000 copies of a particular e-book at $14.99, then customers would buy 174,000 copies of that same e-book at $9.99. Total revenue at $14.99 would be $1,499,000. Total revenue at $9.99 is $1,738,000. The important thing to note here is that the lower price is good for all parties involved: the customer is paying 33% less and the author is getting a royalty check 16% larger and being read by an audience that’s 74% larger. The pie is simply bigger.
As you probably know, I'm an engineer, so when I read that paragraph, my reaction was not to write an angry letter to Hachette or to Amazon - my reaction was to start a graph. And I have a third data point to add to the graph. At Unglue.it, I've been working on a rather different price point, $0.  Our "sales" rate is currently about 100,000 copies per year. Our total "sales" revenue for all these books adds up to zero dollars and zero cents. It's even less if you convert to bitcoin.

($0 is terrible for sales revenue, but it's a great price for ebooks that want to accomplish something other than generate sales revenue. Some books want more than anything to make the world a better place, and $0 can help them do that, which is why Unglue.it is trying so hard to support free ebooks.)

So here's my graph of the revenue curve combining "repeated and careful measurements" from Amazon and Unglue.it:

I've added a fit to the simplest sensible algebraic equation possible that fits the data, Ax/(1+Bx2), which suggests that the optimum price point is $8.25. Below $8.25,  the increase in unit sales won't make up for the drop in price, and even if the price drops to zero, only twice as many books are sold as at $8.25 - the market for the book saturates.

But Amazon seems to have quit calculus after the first semester, because the real problem has a lot more variables that the one Amazon has solved for. This is because they've ignored the effect of changing a book's price on sales of ALL THE OTHER BOOKS. For you math nerds out there, Amazon has measured a partial derivative when the quantity of interest is the total derivative of revenue. Sales are higher at $10 than at $15 mostly because consumers perceive $15 as expensive for an ebook when most other ebooks are $10. So maybe your pie is bigger, but everyone else is stuck with pop-tarts.

While any individual publisher will find it advantageous to price their books slightly below the prevailing price, the advantage will go away when every publisher lowers its price.

Some price-sensitive readers will read more ebooks if the price is lowered. These are the readers who spend the most on ebooks and are the same readers who patronize libraries. Amazon wants the loyalty of customers so much that they introduced the Kindle Unlimited service. Yes, Amazon is trying to help its customers spend less on their reading obsessions. And yes, Amazon is doing their best to win these customers away from those awful libraries.

But I'm pretty sure that Jeff Bezos passed calculus. He was an EECS major at Princeton (I was, too). So maybe the calculation he's doing is a different one. Maybe his price optimization for ebooks is not maximizing publisher revenue, but rather Amazon's TOTAL revenue. Supposing someone spends less to feed their book habit, doesn't that mean they'll just spend it on something else? And where are they going to spend it? Maybe the best price for Amazon is the price that keeps the customer glued to their Kindle tablet, away from the library and away from the bookstore. The best price for Amazon might be a terrible price for a publisher that wants to sell books.

Read Shatzkin on Publishers vs. Amazon. Then read Hoffelder on the impact of Kindle Unlimited. The last Amazon article you should read this year is Benedict Evans on profits.

It's too late to buy Champagne on Amazon - this New Year's at least.

Sunday, December 7, 2014

Stop Making Web Surveillance Bugs by Mistake!

Since I've been writing about library websites that leak privacy, I figured it would be a good idea to do an audit of Unglue.it to make sure it wasn't leaking privacy in ways I wasn't aware of. I knew that some pages leak some privacy via referer headers to Google, to Twitter, and to Facebook, but we force HTTPS and make sure that user accounts can be pseudonyms. We try not to use any services that push ids for advertising networks. (Facebook "Like" button, I'm looking at you!)

I've worried about using static assets loaded from third party sites. For example, we load jQuery from https://ajax.googleapis.com (it's likely to be cached, and should load faster) and Font Awesome from https://netdna.bootstrapcdn.com (ditto). I've verified that these services don't set any cookies and allow caching, which makes it unlikely that they could be used for surveillance of unglue.it users.

It turned out that my worst privacy leakage was to Creative Commons! I'd been using the button images for the various licenses served from https://i.creativecommons.org/ I was surprised to see that id cookies were being sent in the request for these images.
In theory, the folks at Creative Commons could track the usage for any CC-licensed resource that loaded button images from Creative Commons! And it could have been worse. If I had used the HTTP version of the images, anyone in the network between me and Creative Commons would be able to track what I was reading!

Now, to be clear, Creative Commons is NOT tracking anyone. The reason my browser is sending id cookies along with button image requests is that the Creative Commons website uses Google Analytics, and Google Analytics sets a domain-wide id cookie. Google Analytics doesn't see any of this traffic- it doesn't have access to server logs. But without anyone intending it, the combination of Creative Commons, Google Analytics, and websites like mine that want to promote use of Creative Commons have conspired to build a network of web surveillance bugs BY MISTAKE.

When I inquired about this to Creative Commons, I found out they were way ahead of the issue. They've put in redirects to HTTPS version of their button images. This doesn't plug any privacy leakage, but it discourages people from using the privacy spewing HTTP versions. In addition, they'd already started to process of moving static assets like button images to a special-purpose domain. The use of this domain,  licensebuttons.net, will ensure that id cookies aren't sent and nobody could use them for surveillance.

If you care about user privacy and you have a website, here's what you should do:
  1. Avoid loading images and other assets from 3rd party sites. consider self-hosting these.
  2. When you use 3rd party hosted assets, use HTTPS references only!
  3. Avoid loading static assets from domains that use Google Analytics and set id domain cookies.
For Creative Common license buttons, use the buttons from licensebuttons.net. If you use the Creative Commons license chooser, replace "i.creativecommons.org" in the code it makes for you with "licensebuttons.net". This will help the web respect user privacy. The buttons will also load faster, because the "i.creativecommons.org" requests will get redirected there anyway.

Saturday, November 22, 2014

NJ Gov. Christie Vetoes Reader Privacy Act, Asks for Stronger, Narrower Law

According to New Jersey Governor Chris Christie's conditional veto statement, "Citizens of this State should be permitted to read what they choose without unnecessary government intrusion." It's hard to argue with that! Personally, I think we should also be permitted to read what we choose without corporate surveillance.

As previously reported in The Digital Reader, the bill passed in September by wide margins in both houses of the New Jersey State Legislature and would have codified the right to read ebooks without letting the government and everybody else knowing about it.

I wrote about some problems I saw with the bill. Based on a California law focused on law enforcement, the proposed NJ law added civil penalties on booksellers who disclosed the personal information of users without a court order. As I understood it, the bill could have prevented online booksellers from participating in ad networks (they all do!).

Governor Christie's veto statement pointed out more problems. The proposed law didn't explicitly prevent the government from asking for personal reading data, it just made it against the law for a bookseller to comply. So, for example, a local sheriff could still ask Amazon for a list of people in his town reading an incriminating book. If Amazon answered, somehow the reader would have to:
  1. find out that Amazon had provided the information
  2. sue Amazon for $500.
Another problem identified by Christie was that the proposed law imposed privacy burdens on booksellers stronger than those on libraries. Under another law, library records in New Jersey are subject to subpoena, but bookseller records wouldn't be. That's just bizarre.

In New Jersey, a governor can issue a "Conditional Veto". In doing so, the governor outlines changes in a bill that would allow it to become law. Christie's revisions to the Reader Privacy Act make the following changes:
  1. The civil penalties are stripped out of the bill. This allows Gov. Christie to position himself and NJ as "business-friendly".
  2. A requirement is added preventing the government from asking for reader information without a court order or subpoena. Christie gets to be on the side of liberty. Yay!
  3. It's made clear that the law applies only to government snooping, and not to promiscuous data sharing with ad networks. Christie avoids the ire of rich ad network moguls.
  4. Child porn is carved out of the definition of "books". Being tough on child pornography is one of those politically courageous positions that all politicians love.
The resulting bill, which was quickly reintroduced in the State Assembly, is stronger but narrower. It wouldn't apply in situations like the recent Adobe Digital Editions privacy breach, but it should be more effective at stopping "unnecessary government intrusion". I expect it will quickly pass the Legislature and be signed into law. A law that properly addresses the surveillance of ebook reading by private companies will be much more complicated and difficult to achieve.

I'm not a fan of his by any means, but Chris Christie's version of the Reader Privacy Act is a solid step in the right direction and would be an excellent model for other states. We could use a law like it on the national level as well.

(Guest posted at The Digital Reader)

Wednesday, November 5, 2014

If your website still uses HTTP, the X-UIDH header has turned you into a snitch

Does your website still use HTTP? It not, you're a snitch.

As I talk to people about privacy, I've found a lot of misunderstanding. HTTPS applies encryption to the communication channel between you and the website you're looking at. It's an absolute necessity when someone's making a password or sending a credit card number, but the modern web environment has also made it important for any communication that expects privacy.

HTTP is like sending messages on a postcard. Anyone handling the message can read the whole message. Even worse, they can change the message if they want. HTTPS is like sending the message in a sealed envelope. The messengers can read the address, but they can't read or change the contents.

It used to be that network providers didn't read your web browsing traffic or insert content into it, but now they do so routinely. This week we learned that Verizon and AT&T were inserting an "X-UIDH" header into your mobile phone web traffic. So for example, if a teen was browsing a library catalog for books on "pregnancy" using a mobile phone, Verizon's advertising partners could, in theory, deliver advertising for maternity products.

The only way to stop this header insertion is for websites to use HTTPS. So do it. Or you're a snitch.

Sorry, Blogger.com doesn't support HTTPS. So if you mysteriously get ads for snitch-related products, or if the phrase "Verizon and AT&T" is not equal to "V*erizo*n and A*T*&T" without the asterisks, blame me and blame Google.

Here's more on the X-UIDH header.

Tuesday, November 4, 2014

Reading Privacy Enables Reader Sharing

Digital privacy is a weird thing. People confuse it for digital security, but it's much more than that. Privacy isn't keeping secrets, it's controlling the information we share. What we think of as privacy depends on trusting that the people we share with won't do bad things. Privacy isn't digital at all. Maybe instead of "digital privacy" we should talk about "digital discretion".

The recent revelations of how Adobe Digital Editions was spewing the users' reading activity, unencrypted, to a logging server are an instructive example of poor digital discretion. I thought that Adobe was working on an ebook synchronization system, but it now looks like ADE was doing the logging "to support new business models" rather than for ebook sync.  It got me thinking about how ebook synchronization can and should be done.

Synchronization is a useful function. I'd like to be able to start reading a book on my iPhone while on the train in the morning, then pick up reading where I left off in the evening using my iPad. But to accomplish this function, I need to trust someone with information that discloses what I'm reading. It's easy to design a centralized sync system that requires a reader to register who they are,  what book they're reading and an activity stream of what pages are being read.

But a sync system designed for privacy doesn't need all that information. The central server doesn't even need to know the identity of the book! As Jason Griffey pointed out in his article on Adobe's spyware, the book's identifier could be hashed with a password, effectively hiding its identity from the central server.

I wrote about how Bluefire is doing sync for their apps while trying their best to respect user privacy. Rather than obscuring the identity of the book, they focus on making it hard to identify users in their system.

Adobe was justifiably criticized for sending lots of information back to its central server without encryption. Although their version 4.0.1 sends less information, mostly Adobe is just encrypting the stream and claiming the privacy problem is solved. The core privacy problem remains- when a DRM ebook is read, an encrypted activity stream is sent back to Adobe. If the information is sensitive or useful, why should Adobe get the benefit of this information at all? At the very least, providing your activity stream to Adobe should be opt-in.

There's a second privacy problem that hasn't been discussed anywhere. It may seem contradictory, but central-server synchronization systems impose TOO MUCH privacy. In many situations, a reader will want to share their reading stream. Look at GoodReads - you can share your opinions with friends. Look at Kobo Reading Life - you get awards and statistics in return for your stream. In classroom situations, students could sync their readers with the instructors'. These sorts of affordances can't be developed without access to the reading-activity stream, and won't work unless everyone participating in the stream is in the same reading ecosystem, using the same central server.

If instead of encrypting the reading-event stream, encryption were applied to the events themselves, the events could be shared over most any messaging system, and distributed according to the user's choices and desired application. In fact, you could use Twitter.

Every user of a Twitter-reading-sync system would create a Twitter feed to publish their reading activity. Other users could subscribe to the event stream. Direct messages could be used to send decryption data for private reading streams. The system could be engineered so that even Twitter would be unable to know what's being read privately. And the whole world would have access to reading that's being done publicly. In addition to page turning,  bookmarking and annotation activity could be of interest.

It's interesting to think about what might happen in a reading ecosystem where readers, not corporations, control the access to their reading activity streams. Publishers and authors might provide incentives to readers who share their reading-events with them. Social networks might match users reading the same page of the same book. Libraries could learn how to meet the needs of their communities. Teachers might be alerted to passages that students find to be difficult. Ironically, these public uses are enabled by a system design which puts a premium on privacy for the reader.

Dave Egger's novel "the Circle" gave us the expression "Privacy is Theft". The novel imagines a social norms that consider privacy to be a reflection of selfishness. But in the real world, it's the lack of discretion by companies building up vast private collections of personal information that's the true threat to social sharing. Too bad that theft is not a crime.