Modern civilization will not collapse from a lack of resources or a war--it will collapse from the crush of information. I am only half joking.
Information overload has become nearly intractable as content growth has accelerated. According to a study by IBM, this growth is attributable largely to rapid increases in the number of objects about which we capture information. Consequently, we are having more and more trouble finding information and seeing connections between related information. There is too much unorganized content. The result: inefficiency, ineffectiveness, and lost opportunity, amounting to millions of dollars annually in large enterprises.
In response, many companies are enlisting corporate librarians to put their digital house in order through content curation. The term content curation refers to content management activities that make content more accessible to the right people at the right time. (Some people, especially Web marketers, use content curation to describe the process of semi-manually selecting and organizing content for Web page presentation in response to the shifting interests of site visitors.)
Librarians are well suited to content curation because they possess skills related not only to labeling and organization but also, in their reference role, to understanding the information needs of their constituencies. In this article, I will discuss three key aspects of content curation that are driving organizations to look to librarians for help. These aspects are as follows:
1. Taxonomy and content tagging;
2. Governance; and
3. Content life cycle management.
Taxonomies and Metadata
Taxonomies and metadata drive information organization. Taxonomies, narrowly understood, refer to controlled vocabularies (authority lists) that are organized into meaningful groups of terms and frequently are structured into "is a" hierarchies. An example of an "is a" hierarchy is Vehicles/Cars/Makes/ Models (a Camry is a Toyota is a car is a vehicle).
I like to define taxonomy programs more broadly. In my view, a taxonomy program is not simply a navigational hierarchy or folder structure; rather, it is a means of putting in place the blueprint for how information architectures are developed, managed, applied, and maintained throughout an organization. A key aspect of this process is the definition and application of metadata.
The term metadata refers to keywords associated with an attribute of a document or other type of digital object. For instance, author is an attribute of a white paper, while sentiment might be the attribute of a gift card.
When metadata use a controlled vocabulary, it is easier to find things. For instance, as a marketer in a greeting card company, I might want to find illustrations associated with a particular sentiment. This is easier to accomplish if my company tags each illustration with one or more "sentiment" words from an enterprise taxonomy.
It is also easier to find information that has attributes that make sense in a specific business environment. "Sentiment," for example, is probably not an attribute of engineering design documents. Librarians can play important roles identifying document types, appropriate attributes, and vocabularies to be used for tagging.
Moving into the world of taxonomy and tagging often requires learning new tools. Content curation involves using tools for the following tasks:
* Semantic analysis: determining what terms should be included in the taxonomy;
* Taxonomy management maintaining the currency and organization of the taxonomy;
* Auto-categorization: automatically tagging documents with controlled vocabulary or discovered terms, based on business rules (which you may need to craft); and
* Indexing: indexing documents based on tags and text for more efficient search.
Taken together, these tools make it possible to implement the correct structures (e.g., metadata models and content objects) across the enterprise. They incorporate tagging mechanisms and content review processes to present an overall user experience that contributes to improved "findability." While these tools are powerful, keep in mind that they are only as good as the human-crafted taxonomies and business rules they use.
Developing taxonomies and metadata schema for documents can't be done in a vacuum or by working with a single business unit. The whole point is to create a structure that works across the organization. This means that processes are needed for obtaining appropriate input from different business units and making decisions that carry authority.
Governance refers to how decision rights are allocated and decisions are made. When governance processes are operationalized, there is executive sponsorship, clearly identified business objectives, and processes and mechanisms to enforce policies and procedures. There are also metrics and measurement programs that ensure accountability.
Most organizations have developed processes and governance around structured data. They may not have solved all of their information overload problems, but they have put mechanisms in place to address data management. These mechanisms, however, are often not fully applicable to unstructured information.
One of the challenges of working with unstructured information is that meaning and nuance are variable. Content is not clearly defined. Terminology is ambiguous. As a result, it is often more difficult to identify when to add new business vocabulary to a taxonomy than, say, when to add a new customer name or part.
There is a tendency to say, "Content management requires metadata, and metadata are technical. Let the IT Department handle it." The problem with this line of thinking is that not all metadata are the same. Data architects are better at dealing with unambiguous structured data; librarians are better at dealing with ambiguous data.
Nevertheless, we need business input and subject matter expertise when dealing with unstructured content and the terminology used to describe that content. Thus, metadata and taxonomy processes need to include representation from various stakeholders and subject matter experts. Librarians and other unstructured information management specialists need to be the arbiters and curators of the overall taxonomy.
Life Cycle Management
Life cycle management builds on both metadata and governance disciplines. It results in processes to reduce distraction from unnecessary content and ensures that content is managed appropriately, from initial authoring to final disposition. Overall life cycle management requires that organizations take control of content with good governance processes that emphasize ownership and accountability for reducing document duplication, maintaining document status, and optimizing document organization. Librarians can contribute to the definition, implementation, and monitoring of these highly distributed processes.
In many organizations, for example, up to 50 percent of content has little or no business value and does not need to be kept for regulatory or compliance issues. How does such content accumulate? People and organizations are pack rats and hold on to everything. Drafts, interim deliverables, works in progress, and other incomplete documents are saved, along with multiple versions. Backup copies are made inadvertently or intentionally and never deleted, Working documents with no residual value junk up workspaces and file-sharing systems.
Reviewing content sources for so-called R.O.T. (redundant, outdated and trivial) content and deleting or archiving content that does not have business value will automatically improve search results and content findability. This should be done in cooperation with records managers to ensure compliance with records retention processes. Some organizations can eliminate as much as half of their content through a review process that removes duplicate and near-duplicate information as well as content that is not needed for business or compliance purposes. Conducting a content audit will reveal much about the nature of enterprise content and identify owners of information to participate in a clean-up process.
Content audits are easier to perform when metadata and governance policies and procedures are in place. Organizations that have processes for tagging documents as "final" or "draft" are ahead of the game, as are those that tag their documents to enable automated application of retention schedules and disposition policies.
Another example of life cycle management is tagging documents when they are initially authored. The author has the best handle on content, but he or she may not have an enterprise perspective on how the content will be used. One approach is to identify which metadata fields are populated by authors and which are populated by content curators (either manually or through auto-categorization).
Perhaps the biggest challenge lies in organizing content for business areas that generate and share content among knowledge workers involved in collaborative projects. The nature of knowledge work is that people collaborate, solve problems, send e-mail messages, share documents, make calls, get together, and have conversations. When those chaotic processes are brought online, there is inherently less structure.
Organizing and tagging content can enhance structure, but employees are often so overloaded with work that they do not feel they have time to spend on anything that is "off task" or takes extra effort. This leads to a cycle of not organizing information, which makes it more difficult for people to do their jobs and leads to rework and inefficiencies, which prevent people from having the time to organize their information.
Content curators can be extremely useful under these conditions. They can provide project teams or departments with useful content organization approaches (information architecture) or participate in document management roles to ensure that appropriate version control, tagging, and location guidelines are applied.
A Process, Not an Outcome
Although information overload is bringing us to the brink of chaos, all hope is not lost. Using the approaches discussed in this article can improve the outcomes of content management projects. What is most important is to develop processes to define and apply content organizing principles.
Notice I did not say "develop a taxonomy." This is a process, not an outcome. Taxonomies and all of their related deliverables--content models, metadata schemas, tagging processes, wireframes, search mechanisms, and training--are part of an ongoing program, not a project with a defined start and finish. When people ask me when their taxonomy will be done, I tell them it will be finished when sales are finished ... when manufacturing is finished ... when finance and operations are finished.
One of the most important motivations for getting your content house in order today is to prepare for tomorrow's business needs and capabilities. Many industries are being transformed in ways not anticipated even a few years ago. The media and entertainment, publishing, education, health care, and other sectors are being forced to monetize and organize content in new ways. If foundational capabilities for content management are not in place, some enterprises will be left behind their competitors. The organization that can get its internal content in order and speed its "information metabolism" will be the organization that survives and thrives.
SETH EARLEY is chief executive officer of Earley & Associates, an information management consulting firm in Massachusetts. He can be reached at firstname.lastname@example.org.