Saturday, May 2, 2015

Ranganathan and the 5 Blind Librarians

It's "Choose Privacy Week". To celebrate, the American Library Association is publishing a series of blog posts; today they're running mine! I wanted to write something special, so I decided to have little fun with a parable. I'm reprinting it here:

I've heard it told that after formulating his famous "Five Laws of Library Science", the great Indian librarian S. R Ranganathan began thinking about privacy in libraries. Here's what I remember of the tale:

In India at the time, there were five librarians reknowned far and wide for their tremendous organizational skills, formidable bibliographic canny, and the coincidental fact that each of them was blind. It was said that "S" could identify books by their smell. "H" could classify a book just by the sound of the footfalls of a person carrying it. "T" was famous for leading patrons by the hand to exactly the book they wanted; the feel of a person's fingernails told him all he needed to know. "P" knew everything there was to know about paper and ink. "C" was quick with her fingers on a keyboard and there was hardly a soul in his city she had not corresponded with.  But these 5 were also sought out for their discretion; powerful leaders would consult them, thinking that their blindness made them immune to passing on their secrets of affairs and of state.

So of course, Ranganathan asked the five blind librarians to come to him so he could benefit from their wisdom and experience with privacy. The great librarians began talking among themselves as they sat outside Ranganathan's house.

"On my way through the countryside I encountered a strange beast", said librarian H.  "I can't say what he was, but he had a distinctive call like a horn: Toot-to-to-toooot..." and librarian H reproduced a complicated sound that must have had at least 64 toots.

"By that sound, I think I encountered the same beast." said librarian T. "I reached out to touch him. He was hard and smooth, and ended in a point, like a great long sword."

"No, you are wrong", said librarian P. I heard the same sound, and the strange beast is like a thick parchment, I could feel the wind when it fluttered.

"You fellows are so mistaken." said librarian C "You touch for a second and you think you know everything. I spent 15 minutes playing with the beast, she is like a great squirming snake."

"I know nothing of the beast except the smell of his droppings," said librarian S.  "But what I do know is that the beast had recently eaten a huge feast of bananas."

At this, a poacher who had been eavesdropping on the five librarians picked up his shotgun and ran off.

Just then, Ranganathan emerged through his door. Surprised at seeing the poacher run off, he asked the librarians what they had been talking about.

The librarians each repeated what they had told the others. When librarian S finally recounted the banana smell, Ranganathan became alarmed. The poacher had run in the direction of a grove of banana trees. Before he could do anything, they heard the sound of a powerful shotgun in the distance, and then the final roar of a dying elephant. 

With tears in his eyes, Ranganathan thanked the 5 librarians for their trouble, and sent them home. Though Ranganathan's manuscript on privacy has been lost to time, it is said that Ranganathan's 1st law of library privacy went something like this:


"Library Spies Don't Need Eyes".


Wednesday, April 1, 2015

Suggested improvements for a medical journal privacy policy


After I gave the New England Journal of Medicine a failing grade for user privacy noting that their website used more trackers than any other scholarly journal website I looked at, the Massachusetts Medical Society asked me to review the privacy policy for NEJM.com and make changes that would improve its transparency. On the whole their website privacy policy is more informative and less misleading than most privacy policies I've looked at. Still, there's always room for improvement. They've kindly allowed me to show you the changes I recommended:


Last updated: April 1, 2015

Governing Principles 

NEJM.org is owned and operated by the Massachusetts Medical Society (“MMS”). We take privacy issues seriously and are committed to protecting your personal information. We want to say that up front because it sounds nice and is legally meaningless. Please take a moment to review our privacy policy, which explains how we collect, use, and safeguard information you enter at NEJM.org and any of our digital applications (such as our iPhone and iPad applications). This privacy policy applies only to information collected by MMS through NEJM.org and our digital applications. This privacy policy does not govern personal information furnished to MMS through any other means.


WHAT INFORMATION DO WE COLLECT?

Information You Provide to Us
We will request information from you if you establish a personal profile to gain access to certain content or services, if you ask to be notified by e-mail about online content, or if you participate in surveys we conduct. This requires the input of personal information and preferences that may include, but is not limited to, details such as your name, address (postal and e-mail), telephone number, or demographic information. You can't use secure communications to give us this information, so you should consider anything you tell us to be public information. If you request paid content from NEJM.org, including subscriptions, we will also ask for payment information such as credit card type and number. Our payment providers won't actually let us see your credit card number, because there are federal regulations and such.
Information That Is Automatically Collected
Log Files
We use log files to collect general data about the movement of visitors through our site and digital applications. This may include some or includes all of the following information: the Internet Protocol Address (IP Address) of your computer or other digital device, host name, domain name, browser version and platform, date and time of requests, and the files downloaded or viewed. We use this information to track what you read and to measure and analyze traffic and usage of NEJM.org and our digital applications. We build our site in such a way that this information is leaked to our advertisers, our widget providers, our analytics partners, the advertising partners of our widget providers, all the ISPs that connect us, and government entities such as the NSA, the Great Firewall of China, and the "Five Eyes" group.
Cookies
We use cookies to collect information and help personalize your user experience us make more money. We store minimal personally identifying information ten tracking identifiers in cookies and protect allow our partners to access this information. We do not store complete records or credit card numbers in cookies. We don't put chocolate chips in cookies either. Even if they're the other kind of cookies. Because we read about the health effects of fatty foods, in NEJM of course. You can find out more about how we use cookies at our Cookie Information page which is a separate page because it's more confusing that way.
Most web browsers automatically accept cookies. Browsers can be configured to prevent this, but if you do not accept any cookies from www.NEJM.org, you will not be able to use the site. The site will function if you block third party cookies.
In some cases we also work with receive services or get paid by third party vendors (such as Google, Google's DoubleClick Ad Network, Checkm8, Scorecard Reasearch, Unica, AddThis, Crazy Egg, Flashtalking, Monetate, DoubleVerify, and SLI Systems) who help deliver advertisements on our behalf across the Internet, and vendors like Coremetrics, Chartbeat and Mii Solutions, who provide flashy dashboards for our managers. These vendors may use cookies to collect information about your activity at our site (i.e., the pages you have visited) in order to help deliver particular ads that they believe you would find most relevant. You can opt out of those vendors' use of cookies to tailor advertising to you by visiting http://www.networkadvertising.org/managing/opt_out.asp. Except for Checkm8, Scorecard Reasearch, Unica, Crazy Egg, Monetate, Coremetrics, Chartbeat, Mii Solutions and SLI Systems. And even if you opt out of advertising customization, these companies still get all the information. We have no idea how long they retain the information or what they do with the information other than ad targetting and data dashboarding.
Clear Gifs (Web Beacons/Web Bugs)
We may also use clear gifs which are tiny graphics with unique identifiers that function similarly to cookies to help us to track site activity. We do not use these to collect personally identifying information, because that's impossible. We also do not use clear gifs to shovel snow, even though we've had a whole mess of it. Oh and by the way, some of our partners have used "flash cookies", which you can't delete. And maybe even "canvas fingerprints". But they pay us money or give us services, so we don't want to interfere.




HOW IS THIS INFORMATION USED?

Information that you provide to us will be used to process, fulfill, and deliver your requests for content and services. We may send you information about our products and services, unless you have indicated you do not wish to receive further information.

Information that is automatically collected is used to monitor usage patterns at NEJM.org and at our digital applications in order to help us improve our service offerings. We do not sell or rent your e-mail address to any third party. You may unsubscribe from our e-mail services at any time. Life is short. You may have a heart attack at any time, or get run over by a truck. For additional information on how to unsubscribe from our e-mail services, please refer to the How to Make Changes to Your Information section of this Privacy Policy.

We may report aggregate information about usage to third parties, including our service vendors and advertisers. These advertisers may include your competitors, so be careful. For additional information, please also see our Internet Advertising Policy. We may also disclose personal and demographic information about your use of NEJM.org and our digital applications to the countless companies and individuals we engage to perform functions on our behalf. Examples may include hosting our Web servers, analyzing data, and providing marketing assistance. These companies and individuals are obligated to maintain your personal information as confidential and may have access to your personal information only as necessary to perform their requested function on our behalf, which is usually to earn us more money, except as detailed in their respective privacy policies. So of course, these companies may sell the data collected in the course of your interaction with us.
Advertisers
We contract with third-party advertisers and their agents to post banner and other advertisement at our site and digital applications. These advertisements may link to Web sites not under our control. These third-party advertisers may use cookie technology or similar means i.e. Flash to measure the effectiveness of their ads or may otherwise collect personally identifying information from you when you leave our site or digital applications. We are not responsible or liable for any content, advertising, products or other materials offered from such advertisers and their agents. Transactions that occur between you and the third-party advertisers are strictly between you and the third party and are not our responsibility. You should review the privacy policy of any third-party advertiser and its agent, as their policies may differ from ours.
Advertisement Servers
In addition to advertising networks run by Google, which know everything about you already, We use a third-party ad server, CheckM8, to serve advertising at NEJM.org. Using an advertising network diminishes our ability to control what advertising is shown on the NEJM website. Instead, auctions are held between advertisers that want to show you ads. Complicated algorithms decide which ads you are most likely to click on and generate the most revenue for us. We're thinking of outsourcing our peer-review process for our article content to similar sorts of software agents, as it will save us a whole lot of money. Anyway, if you see ads for miracle drugs on our site, it's because we really need these advertising dollars to continue our charitable work of publicizing top quality medical research, not because these drugs have been validated by top quality medical research. CheckM8 does not collect any personally identifiable information regarding consumers who view or interact with CheckM8 advertisements. CheckM8 solely collects non-personally identifiable ad delivery and reporting data. For further information, see CheckM8’s privacy policy. Please note that the opt-out website we mentioned above doesn't cover CheckM8, And there's not a good way to opt out of CheckM8, so there. The Massachusetts Medical Society takes in about $25 million per year in advertising revenue, so we really don't want you to opt out of our targeted advertising.



WHAT SECURITY MEASURES ARE USED?

When you submit personal information via NEJM.org or our digital applications, your information is protected both online and offline with what we believe to be appropriate physical, electronic, and managerial procedures to safeguard and secure the information we collect. For information submitted via NEJM.org, we use the latest Secure Socket Layer (SSL) technology to encrypt your credit card and personal information. But other information is totally up for grabs.

USER-GENERATED CONTENT FORUMS
Any data or personal information that you submit to us as user-generated content becomes public and may be used by MMS in connection with NEJM.org, our digital applications, and other MMS publications in any and all media. For more information, see our User-Generated Content Guidelines. We'll have the right to publish your name and location worldwide forever if you do so, and we can sue you if you try to use a pseudonym.

OTHER INFORMATION

Do Not Track Signals
Like most web services, at this time we do not alter our behavior or change our services in response to do not track signals. In other words, our website tracks you, even if you use technical means to tell us you do not want us to track you.
Compliance with Legal Process
We may disclose personally identifying information if we are required to do so by law or we in good faith believe that such action is necessary to (1) comply with the law or legal process; (2) protect our rights and property; (3) protect against misuse or the unauthorized use of our Web site; or (4) protect the personal safety or property of our users or the public. So, for example, if you are involved in a divorce proceeding, we can help your spouse verify that you weren't staying late at your office reading up on the latest research like you said you were.

Children
NEJM.org is not intended for children under 13 years of age. We do not knowingly collect or store any personal information from children under 13. If we did not have this disclaimer, our lawyer would not let us do things we want to do. If you are under 13, we're really impressed, you should spend more time outside getting fresh air.

Changes to This Policy
This privacy policy may be periodically updated. We will post a notice that this policy has been amended by revising the “Last updated” date at the top of this page. Use of NEJM.org constitutes consent to any policy then in effect. So basically, what we say here is totally meaningless with respect to your ability to rely on it. Oh well.


Thursday, March 12, 2015

16 of the top 20 Research Journals Let Ad Networks Spy on Their Readers

A recent query to the "LibLicense" listserv asked:
Is there any kind of organization that has put together a website or list of database providers/publishers that indicate the extent to which they respect patron privacy?
The answer is "no", but I thought it would useful to look at the top journal publishers to see if their websites are built with an orientation towards reader privacy.

I came up with a list of 20 top journals. I took the 10 journals with the most citations and the 10 journals with the most citations per published article, according to the SCImago journal rankings.

I used Ghostery to count the number of trackers present on the web page for an article in each journal. Each of these trackers gets a feed of each user's browsing behavior. I looked at the trackers to see if user browsing behavior was being sent to advertising networks. I also determined whether the journal supported secure connections. Based on these results, I assigned a letter grade for each journal.

Passing, Grade A

None of the scholarly journals I looked at earned excellent grades for reader privacy.

Passing, Grade B

Two journals, both published by the American Physical Society, earned good grades for reader privacy. They use a social sharing widget that respects privacy.

Reviews of Modern Physics.  Ranked #2 in citations/article. 1 Tracker (Google Analytics). No advertising networks. Supports HTTPS, but allows insecure connections.
Physical Review Letters. Ranked #9 in total citations, #393 in citations/article. 1 Tracker (Google Analytics). No advertising networks. Supports HTTPS, but allows insecure connections.

Passing Grade C

Two journals, both published by Annual Reviews, earned acceptable grades for reader privacy.

Annual Review of Immunology. Ranked #3 in citations/article. 1 Tracker (Google Analytics). No advertising networks. Insecure connections only.
Annual Review of Biochemistry. Ranked #5 in citations/article. 1 Tracker (Google Analytics). No advertising networks. Insecure connections only.

Failing Grade D

Failing grades are earned by publishers that allow their readers to be tracked by advertising networks. These networks get access to the full browsing history of a user and track them with cookies; it's difficult for users to maintain anonymity when most of their web browsing is exposed to tracking.

Science, published by AAAS. Ranked #5 in total citations, #49 in citations/article. 10 Trackers. Multiple advertising networks. Science gets a D rather than an F because it supports HTTPS, although it allows insecure connections.

Failing Grade F

15 journals earned failing grades because their participation in advertising networks exposes their readers to tracking and spying. Some of the publishers are more flagrant about this than others. Maybe I should have given F+ to some and F- to others. All of these journals force insecure connections.


PLoS One, published by the Public Library of Science. #1 in total citations, #1776 in citations/article. 3 trackers. One advertising network.
Proceedings of the National Academy of Sciences of the United States, published by the National Academy of Sciences. #2 in total citations, #155 in citations/article. 3 trackers. One advertising network.
Journal of Biological Chemistry
, published by the American Society for Biochemistry and Molecular Biology. #8 in total citations, #513 in citations/article. 3 trackers. One advertising network.
Quarterly Journal of Economics
, published by Oxford Journals. #6 in citations/article. 4 trackers. One advertising network.
Chemical Communications
, published by the Royal Society of Chemistry. #10 in total citations, #680 in citations/article. 6 trackers. Multiple advertising networks.
Journal of the American Chemical Society
, published by the American Chemical Society. #4 in total citations, #185 in citations/article. 7 trackers. Multiple advertising networks.
Chemical Reviews
, published by the American Chemical Society. #10 in citations/article. 8 trackers. Multiple advertising networks. 
CA: A Cancer Journal for Clinicians
, published by Wiley. #1 in citations/article. 9 trackers. Multiple advertising networks.
Cell
, published by Elsevier. #4 in citations/article. 9 trackers. Multiple advertising networks.
Angewandte Chemie - International Edition
, published by Wiley. #6 in total citations, #202 in citations/article. 11 trackers. Multiple advertising networks.
Nature Genetics
, published by Nature Publishing Group. #7 in citations/article. 11 trackers. Multiple advertising networks.
Nature
, published by Nature Publishing Group. #3 in total citations, #11 in citations/article. 11 trackers. One advertising network.
Nature Reviews Genetics
, published by Nature Publishing Group. #8 in citations/article. 12 trackers. Multiple advertising networks.
Nature Reviews Molecular Cell Biology
, published by Nature Publishing Group. #9 in citations/article. 13 trackers. Multiple advertising networks.
New England Journal of Medicine,
 published by the Massachusetts Medical Society. #7 in total citations, #41 in citations/article. 14 trackers. Multiple advertising networks.

Remarks

I'm particularly concerned about the medical journals that participate in advertising networks. Imagine that someone is researching clinical trials for a deadly disease. A smart insurance company could target such users with ads that mark them for higher premiums. A pharmaceutical company could use advertising targeting researchers at competing companies to find clues about their research directions. Most journal users (and probably most journal publishers) don't realize how easily online ads can be used to gain intelligence as well as to sell products.

In defense of the publishers, it should be noted that the web advertising business has developed very rapidly over the past few years due to intense competition. A few years ago, the attacks on user privacy enabled by the ad networks' massive data collection were mostly theoretical. But competition has led the networks to increase their targeting ability and scoop up more and more "demographic" data. What was theory a few years ago is today's reality. We still have time to prevent tomorrow's privacy disaster, but change will only happen if the institutions that purchase and fund these journals learn what's really going on and start to demand the privacy that readers deserve.

Thursday, February 26, 2015

"Free" can help a book do its job



(Note: I wrote this article for NZCommons, based on my presentation at the 2015 PSP Annual Conference in February.)

Every book has a job to do. For many books, that job is to make money for its creators. But a lot of books have other jobs to do. Sometimes the fact that people pay for books helps that job, but other times the book would be able to do its job better if it was free for everyone.

That's why Creative Commons licensing is so important. But while CC addresses the licensing problem nicely, free ebooks face many challenges that make it difficult for them to do their jobs.

Let's look at some examples.

When Oral Literature in Africa was first published in 1970, its number one job was to earn tenure for the author, a rising academic. It succeeded, and then some. The book became a classic, elevating an obscure topic and creating an entire field of scholarly inquiry in cultural anthropology. But in 2012, it was failing to do any job at all. The book was out of print and unavailable to young scholars on the very continent whose culture it documented. Ruth Finnegan, the author, considered it her life's work and hoped it would continue to stimulate original research and new insights. To accomplish that, the book needed to be free. It needed to be translatable, it needed to be extendable.


Nga Reanga Youth Development: Maori Styles, an Open Access book by Josie Keelan, is another example of an academic book with important jobs to do. While its primary job is a local one, the advancement of understanding and practice in Maori youth development, it has another job, a global one. Being free helps it speak to scholars and researchers around the world.

Leanne Brown's Good and Cheap is a very different book. It's a cookbook. But the job she wanted it to do made it more than your usual cookbook. She wanted to improve the lives of people who receive "nutrition assistance"- food stamps, by providing recipes for nutritious and healthy meals that can be made without spending much money. By being free, Good and Cheap helps more people in need eat well.

My last example is Casey Fiesler's Barbie™ I Can Be A Computer Engineer The Remix! Now With Less Sexism! The job of this book is to poke fun at the original Barbie™ I Can Be A Computer Engineer, in which Barbie needs boys to do the actual computer coding. But because Fiesler uses material from the original under "fair use", anything other than free, non-commercial distribution isn't legal. Barbie, remixed can ONLY be a free ebook.

But there's a problem with free ebooks. The book industry runs on a highly evolved and optimized cradle-to-grave supply chain, comprising publishers, printers, production houses, distributors, wholesalers, retailers, aggregators, libraries, publicists, developers, cataloguers, database suppliers, reviewers, used-book dealers, even pulpers. And each entity in this supply chain takes its percentage. The entire chain stops functioning when an ebook is free. Even libraries (most of them) lack the processes that would enable them to include free ebooks in their collections.

At Unglue.it, we ran smack into this problem when we set out to bring books into the creative commons. We helped Open Book Publishers crowd fund a new ebook edition of Oral Literature in Africa. The ebook was then freely available, but it wasn't easy to make it free on Amazon, which dominates the ebook market. We couldn't get the big ebook aggregators that serve libraries to add it to their platforms. We realized that someone had to do the work that the supply chain didn't want to do.

Over the past year, we've worked to turn Unglue.it into a "bookstore for free books". The transformation isn't done yet, but we've built a database of over 1200 downloadable ebooks, licensed under Creative Commons or other free licenses. We have a long way to go, but we're distributing over 10,000 ebooks per month. We're providing syndication feeds, developing relationships with distributors, improving metadata, and promoting wonderful books that happen to be free.

The creators of these books still need to find support. To help them, we've developed three revenue programs. For books that already have free licenses, we help the creators ask for financial support in the one place where readers are most appreciative of their work- inside the books themselves. We call this "thanks for ungluing".

For books that exist as ebooks but need to recoup production costs, we offer "buy-to-unglue". We'll sell these books until they reach a revenue target, after which they'll become open access. For books that exist in print but need funding for conversion to open access ebook, we offer "pledge-to-unglue", which is a way of crowd-funding the conversion.

After a book has finished its job, it can look forward to a lengthy retirement. There's no need for books to die anymore, but we can help them enjoy retirement, and maybe even enjoy a second life. Project Gutenberg has over 50,000 books that have "retired" into the public domain. We're starting to think about the care these books need. Formats change along with the people that use them, and the book industry's supply chain does its best to turn them back into money-earners to pay for that care.

Recently we received a grant from the Knight Foundation to work on ways to provide the long-term care that these books need to be productive AND free in their retirements. GITenberg, a collaboration between the folks at Unglue.it and ebook technologist Seth Woodward is exploring the use of Github for free ebook maintenance. Github is a website that supports collaborative software development with source control and workflow tools. Our hope is that the ingredients that have made Github wildly successful in the open source software world will will prove to by similarly effective in supporting ebooks.

It wasn't so long ago that printing costs made free ebooks impossible. So it's no wonder that free ebooks haven't realized their full potential. But with cooperation and collaboration, we can really make wonderful things happen.

Monday, February 9, 2015

"Passwords are stored in plain text."

Many states have "open records" laws which mandate public disclosure of business proposals submitted to state agencies. When a state library or university requests proposals for library systems or databases, the vender responses can be obtained and reviewed. When I was in the library software business, it was routine to use these laws to do "competitor intelligence". These disclosures can often reveal the inner workings of proprietary vendor software which implicate information privacy and security.

Consider for example, this request for "eResources for Minitex". Minitex is a "publicly supported network of academic, public, state government, and special libraries working cooperatively to improve library service for their users in Minnesota, North Dakota and South Dakota" and it negotiates licenses databases for libraries throughout the three states.

Question number 172 in this Request for Proposals (RFP) was: "Password storage. Indicate how passwords are stored (e.g., plain text, hash, salted hash, etc.)."

To provide context for this question, you need to know just a little bit of security and cryptography.

I'll admit to having written code 15 years ago that saved passwords as plain text. This is a dangerous thing to do, because if someone were to get unauthorized access to the computer where the passwords were stored, they would have a big list of passwords. Since people tend to use the same password on multiple systems, the breached password list could be used, not only to gain access to the service that leaked the password file, but also to other services, which might include banks, stores and other sites of potential interest to thieves.

As a result, web developers are now strongly admonished never to save the passwords as plain text. Doing so in a new system should be considered negligent, and could easily result in liability for the developer if the system security is breached. Unfortunately many businesses would rather risk paying paying lawyers a lot of money to defend themselves should something go wrong than bite the bullet and pay some engineers a little money now to patch up the older systems.

To prevent the disclosure of passwords, the current standard practice is to "salt and hash" them.

A cryptographic hash function mixes up a password so that the password cannot be reconstructed. so for example, the hash of 'my_password' is 'a865a7e0ddbf35fa6f6a232e0893bea4'. When a user enters their password, the hash of the password is recalculated and compared to the saved hash to determine whether the password is correct.

As a result of this strategy, the password can't be recovered. But it can be reset, and the fact that no one can recover the password eliminates a whole bunch of "social engineering" attacks on the security of the service.

Given a LOT of computer power, there are brute force attacks on the hash, but the easiest attack is to compute the hashes for the most common passwords. In a large file of passwords, you should be able to find some accounts that are breachable, even with the hashing. And so a "salt" is added to the password before the hash is applied. In the example above, a hash would be computed for 'SOME_CLEVER_SALTmy_password'. Which, of course, is '52b71cb6d37342afa3dd5b4cc9ab4846'.

To attack the salted password file, you'd need to know that salt. And since every application uses a different salt, each file of salted passwords is completely different. A successful attack on one hashed password file won't compromise any of the others.

Another standard practice for user-facing password management is to never send passwords unencrypted. The best way to do this is to use HTTPS, since web browser software alerts the user that their information is secure. Otherwise, any server between the user and the destination server (there might be 20-40 of these for  typical web traffic) could read and store the user's password.

The Minitex RFP covers reference databases. For this reason, only a small subset of services offered to libraries are covered here. The authentication for these sorts of systems typically don't depend on the user creating a password; user accounts are used to save the results of a search, or to provide customization features. A Minitex patron can use many of the offered databases without providing any sort of password.

So here are the verbatim responses received for the Minitex RFP:

LearningExpress, LLC
Response: "All passwords are stored using a salted hash. The salt is randomly generated and unique for each user."
My comment: This is a correct answer. However, the LearningExpress login sends passwords in the clear over HTTP.

OCLC
Response: "Passwords are md5 hashed."
My comment: MD5 is the hash algorithm I used in my examples above. It's not considered very secure (see comments). OCLC Firstsearch does not force HTTPS and can send login passwords in the clear.

Credo
Response: "N/A"
My comment: This just means that no passwords are used in the service.

Infogroup Library Division
Response: "Passwords are currently stored as plain text. This may change once we develop the customization for users within ReferenceUSA. Currently the only passwords we use are for libraries to access usage stats."
My comment: The user customization now available for ReferenceUSA appears at first glance to be done correctly.

EBSCO Information Services
Response: "EBSCOhost passwords in EBSCOadmin are stored in plain text."
My comment: Should note that EBSCOadmin is not a end-user facing system. So if the EBSCO systems were compromised only library administrator credentials would be exposed. 

Encyclopaedia Britannica, Inc.
Response: "Passwords are stored as plain text."
My comment: I wonder if EB has an article on network security?

ProQuest
Response: "We store all passwords as plain text."
My comment: The ProQuest service available through my library creates passwords over HTTP but uses some client-side encryption. I have not evaluated the security of this encryption.

Scholastic Library Publishing, Inc.
Response: "Passwords are not stored. FreedomFlix offers a digital locker feature and is the only digital product that requires a login and password. The user creates the login and password. Scholastic Library Publishing, Inc does not have access to this information.”
My comment: The "FreedomFlix" service not only sends user passwords unencrypted over HTTP, it sends them in a GET query string. This means that not only can anyone see the user passwords in transit, but log files will capture and save them for long-term perusal. Third-party sites will be sent the password in referrer headers. When used on a shared computer, subsequent users will easily see the passwords. "Scholastic Library Publishing" may not have access to user passwords, but everyone else will have them.

Cengage Learning
Response: "Passwords are stored in plain text."
My comment: Like FreedomFlix, the Gale Infotrac service from Cengage sends user passwords in the clear in a GET query string. But it asks the user to enter their library barcode in the password field, so users probably wouldn't be exposing their personal passwords.

So, to sum up, adoption of up-to-date security practices is far from complete in the world of library databases. I hope that the laggards have improved since the submission date of this RFP (roughly a year ago) or at least have plans in place to get with the program. I would welcome comments to this post that provide updates. Libraries themselves deserve a lot of the blame, because for the most part the vendors that serve them respond to their requirements and priorities.

I think libraries issuing RFPs for new systems and databases should include specific questions about security and privacy practices, and make sure that contracts properly assign liability for data breaches with the answers to these questions in mind.

Note: This post is based on information shared by concerned librarians on the LITA Patron Privacy Technologies Interest Group list. Join if you care about this.