The nuts and bolts of building digital libraries: an interview with Kyle Banerjee and Terry Reese, Jr.

The new second edition of  Building Digital Libraries takes readers step by step through the conceptual and technical challenges of constructing a digital library. With several decades of experience between them, authors Kyle Banerjee and Terry Reese, Jr. are the ideal tour guides for navigating this complex landscape. Here they discuss the importance of knowing your community's priorities, the need to gather stakeholders for decision-making, and the difference between the Smithsonian Institution and a landfill (spoiler alert: it's metadata!)

About thirteen years have passed since the first edition of your book was published. What have been the biggest changes in the world of digital libraries since then, and how did they influence your approach to the new edition?

The biggest change is that the line between resources and the platforms they're used on is blurring and the resources themselves are linked to other things on the Internet. When we wrote the first edition, the things people were most interested in preserving were relatively simple static files. You a diagram showing Next-Generation Library Service Modelcould make things reasonably safe by storing them in certain formats, keeping track of certain information, and following certain procedures. Today, preservation takes on different forms and the conversation around what needs to be preserved and the information people want to work with has become more diverse and dynamic.  For example, preservation can now move far beyond the digital items into the user experience itself.  For resources like digital exhibits or many of the new and emerging publication methods — capturing the user experience and engagement is often as important as capturing the content itself.  These kinds of dynamic resources are inherently problematic because preservation seeks to maintains things as they are, not as they shift and change in relation to the user’s experience. Transforming materials into archival formats also presents challenges because doing so sometimes can prevent the very use that preservation seeks to protect -- for example, converting five-dimensional microscopy images into TIFFs renders them virtually useless. And this doesn’t even take into account the potential impact semantic information could have on the preservation of resources – this notion that information and relationships can be inferred through a larger network that may evolve or disappear over time.  By far, the interconnected and dynamic aspects of resource creation (however you define a resource) has significantly redefined what it means to create a digital library. The fundamental problems we're trying to solve might be the same but navigating them has become much more difficult. We don’t have answers to many of these issues, but we hope the ideas we offer help people with the challenges they face.

In your introduction you write, “Understanding digital libraries is as much a matter of recognizing what you don’t need to know as it is about learning what you do need to know.” Would you elaborate?

Librarians need to focus on meeting needs that naturally fall in their domain. This means understanding the communities that they serve and where their engagement lies. Libraries naturally serve the broader community and the outstanding strength librarians have is structuring information so it can be used again. As such, libraries should focus on methods and tools that serve the needs of their broad communities. This means you need to understand where you add value and focus on that.  For most libraries, this will mean focusing on serving their communities broadly. For other special libraries, it might be giving up these kinds of generic services to specialize in areas that are unique to them.  As organizations, we simply cannot do everything, so finding partners and understanding your community’s priorities is vitally important.

In regards to getting started, one of the key questions you ask readers to consider is whether a “new” repository is needed at all. In your view, what stakeholders need to be part of that discussion?

The most critical stakeholders are those expected to use it, those expected to maintain it, and those who secure resources for it. The repository must serve a genuine need, and the library should serve needs only when it is the best entity to do so. For example, many communities already have repositories designed around the needs of specific types of materials which typically require specialized platforms and skillsets to use. A repository represents a huge commitment of money and staff — the career of a single FTE can be a multimillion dollar investment. Libraries should always invest book cover for Building Digital Libraries, Second Editionresources where they'll deliver the most good.

Some experts have described our society’s attitude towards digital preservation as a sort of crisis in slow motion. LIS professionals ought to know better, right? So why is digital preservation so often ignored? Do you think that’s changing?

The LIS community has been concerned with digital preservation for a long time but the scope of the problem is much bigger and more complicated that most people realize.  With physical items, you need space and space is hard to come by, so we curate items.  This just doesn’t happen in digital environments.  Once something is “preserved” it’s often not touched again because there isn’t an ongoing effort to assess content.  This means that data is virtually invisible until it isn’t — and at that point, it’s too late.  This isn’t a problem for just the library community to solve.  We have a role, but one of the mistakes that we continually make as a community is this idea that we’ll develop this solution only within the library community. Instead, we should be engaging far beyond the library community to meet a need shared in many diverse spaces.

In your book you implore libraries to begin encoding their data now. Why is that undertaking so essential, and how do you envision the future of metadata?

If people can't find time to encode data now, that situation won’t be easier later when there is more to encode with fewer people while the context of the data fades with time. Metadata gives information context and value — you can’t use resources you can’t find. The main difference between the Smithsonian Institution and a landfill is that the former organizes relevant items into meaningful contexts and the latter is just a dumping ground for stuff. Without metadata, all we're doing is building digital landfills. It’s important to recognize that metadata doesn't necessarily need to be created manually. Rather, the important thing is that users have the access points they need.

Learn more at the ALA Store.