The Knowledge Management Connection   

   Communication is the common thread of knowledge management.
   Why Categorize? ] [ Faceted Classification ] Conventional Categorization ] Full-text Search ]
 

Faceted Classification of Information

NOTE: As of December, 2007, this web site will no longer be updated.

Please go to Phil Murray's The Semantic Advantage web site or his Semantic Advantage blog for up-to-date information and opinion from Phil Murray.

 


 
Home
Our Secret
Advanced C & R
About KM
Services
Links
Resources
What's New
BRAKOR downloads
 

 

 

Given the significant difficulties in categorizing books, papers, and articles using traditional library classification techniques, it would seem next to impossible for humans to classify the small chunks of rapidly changing information that characterize information-intensive business environments. But it’s not. Library and information science professionals have already provided the foundations of an alternative to traditional classification techniques: faceted classification.

Subsections

Defining faceted classification

Wynar describes faceted classification as follows:

A faceted classification differs from a traditional one in that it does not assign fixed slots to subjects in sequence, but uses clearly defined, mutually exclusive, and collectively exhaustive aspects, properties, or characteristics of a class or specific subject. Such aspects, properties, or characteristics are called facets of a class or subject, a term introduced into classification theory and given this new meaning by the Indian librarian and classificationist S.R. Ranganathan and first used in his Colon Classification in the early 1930s.

Wynar, Bohdan S. Introduction to cataloging and classification. 8th edition. p. 320

(NOTE: In my opinion, Wynar's insistence on exhaustiveness is not a requirement in the business environment ... and perhaps not for any kind of faceted classification.  See below.)

Additionally,

Thus, a faceted structure  relieves a classification scheme from the procrustean bed of rigid hierarchical and excessively enumerative subdivision that resulted in the assignment of fixed "pigeonholes" for subjects that happened to be known or were foreseen when a system was designed but often left no room for future developments and made no provision for the expression of complex relationships and their subsequent retrieval.

Wynar, p. 321

And further,

... individual facets can be accessed and retrieved either alone or in any desired combination. This feature is especially important for computerized retrieval, which has been successfully applied to faceted classification, and and in online retrieval as a complement to verbal retrieval by subject headings or keywords.

. . .

Since the 1960s, all major classification schemes (with the exception of LCC) either have been partially restructured on a faceted basis or display a fully faceted structure.

Wynar, p. 322

NOTE: This edition of Wynar's book was published in 1992.

IMPORTANT: The model for faceted classification proposed by KMconnection for use in categorizing organizational knowledge resources does not rigorously follow library science definitions of faceted classification  and implementations of those definitions. We in the knowledge management community are not all library scientists, and the author of this article came to these principles without any familiarity with the term faceted classification and without knowledge of the literature or practices of faceted classification in the library science community.

We are simply using the library science definition of faceted classification as our starting point, and readers should not interpret the KM Connection model and framework as a canonical form of faceted classification ... if there is such a thing. The SCM is both more and less than that. In addition, faceted classification by itself has specific weaknesses, including poor support of quick grasp of the scope of the domain and access to popular topics. (See Conventional Categorization -- a Complement to Faceted Categorization.)

Faceted classification in library science and other formal classification systems

Although Ranganathan applied faceted classification principles in a formal way to classifying and retrieving information, he didn't "invent" the method.  The Dewey Decimal System shows early evidence of faceted approaches, and as early as 1992, Wynar noted the growing usefulness and applications of faceted classification outside the library environment:

Special faceted classifications have been designed for broad fields such as education or business management, as well as for more specialized ones such as occupational safety, the diamond industry, library and information science, and many others.

Wynar, p. 322

You can even see some characteristics of  faceted classification in Yahoo! and similar portals. The top-level categories are popularity-based groupings of topics, within which “hot” subcategories (like MP3 and Open software lately) are hoisted to the top. But the primary listings for Web sites can be found under loosely faceted hierarchies  (for example, Software) and cross-references point to those primary listings in the facet hierarchies. (For more about faceted knowledge classification applied to software reusability, see R. Prieto-Diaz and P. Freeman. "Classifying software for reusability." IEEE Software, 18(1), Jan. 1987.)

However, Yahoo! is about finding companies (or other organizations) and their Web pages. Organizational knowledgebases have a finer granularity and, ideally, serve as something more than a repository for documents. And you can’t hire hundreds of classification specialists for those classification requirements. The good news is that you don’t need to, because faceted knowledge classification can be implemented as a   simple and sufficient way of meeting the requirements of creating and managing shared knowledgebases.

Not just a library science technique

Faceted classification isn't an artificial system created by the library science community. It is a formalization of a communication technique we use in a wide variety of circumstances. We can see faceted approaches in technology and in everyday life:  from “personal information managers” (especially in PIMs like ECCO, InfoCentral, and Lotus' Agenda)  … to customer-support technology (for example, Primus’ SolutionBuilder and the Solution-Centered Support standard) … to the parlor game “Twenty Questions.”

It is unlikely that the designers of most technologies that employ a faceted model of knowledge organization were aware of the formal library-science technique of faceted classification. For example, KK Aw was completely unaware of the formal name for faceted classification when he created his MultiCentrix product.  It’s just a “natural” technique for categorizing and finding fine-grained, rapidly changing information.

We can also find evidence of faceted approaches in other disciplines. Most classification experts and professional indexers think of back-of-the-book indexes as very different from classification schema, but a close look at good indexes shows that they exhibit many of the characteristics associated with faceted classification. The two approaches can be reconciled, at least partially.

Some differences between faceted classification and "traditional" library science classification

In traditional library classification schemas like the Dewey Decimal System and Library of Congress, each document has a "correct" (or, at least, agreed upon) place somewhere in a single, large, hierarchically organized classificati0n system ... and in the case of books in a physical library, one "correct" place in the stacks. Of course, cross-referencing among the terms in classification systems helps information seekers.

By contrast,  a faceted classification system has the following important characteristics:

  • A faceted system focuses on the important, essential or persistent characteristics of content objects, helping it to be useful for fine-grained rapidly changing repositories.
  • You don't have to know the name of the category (or categories) into  which a document is placed. In a business world in which terminology changes faster than you can blink, this is a big asset.
  • The absence of polyhierarchy is implied, at least, by having mutually orthogonal facet hierarchies. The ordering principle in a facet is not necessarily hierarchical (general-specific, whole-part, etc.), although that will be true in most cases. It might even be alphabetical.
  • It's easy to add a new facet at any time.
  • Flexibility in general. Makes  few assumptions about the scope and organization of the domain. So it's hard to "break" a faceted classification schema.
  • Should be easier to construct the facet hierarchies.
  • Combining elements from separate facets using a defined syntax -- for example, to express the functions of a product, make assertions, or frame questions in a structured way -- is an extremely powerful method of precise retrieval. (This topic will be discussed in an upcoming article.)
  • Adding persistent (even "typed") relationships between elements in different facets -- for example, John Smith (in the Person facet) <relationship: is an employee of> Generic Company (Organization facet) provides substantial useful representation of knowledge in the faceted classification schema itself.

Comments on my interpretations of the differences are most welcome. I'm not a library scientist.

Why faceted classification is appropriate for managing organizational knowledge

One of the primary benefits of faceted classification is that even if you don’t know the name of an object, you can achieve a very accurate shared understanding of what it is by describing it in terms of several mutually exclusive categories of information. If you’re trying to describe a refrigerator, for example, you can convey what it is very accurately by specifying its size, the substance it is usually made of, its color, its typical location in a house, and its primary function.

Especially in the computer-based retrieval environment, the set of facets does not have to be exhaustive, either during classification or retrieval by information seekers. Creators of a shared knowledgebase can add a new facet at any time, and users can select elements from as few or as many facets as suits them.

In rapidly changing high-tech businesses, mutually exclusive categories (or facets) might  include:

  • Products — a hierarchical description of each of the organization's products (almost an “ontology”), from the product as a whole down to individual commands or, in a GUI-based product, a field or dialog box.
  • Applications — the real-world uses to which a product or its features apply.
  • Organizations — businesses and other groups, which would include a company's customers and prospects.
  • People  — Persons both within and outside the organization, potentially organized in several ways and associated with elements in other facets: organizations, events, publications, etc.
  • Domain objects  — For example, the technologies applied in the marketplace in which the organization participates.
  • Events -- Not just conferences, but anything that is primarily time-based.
  • Publications -- Documents, Web pages, etc.

Units of information can be associated with elements from many different facet hierarchies. Conversely, information-seekers will often want to traverse the hierarchies themselves, so they can seen which units of information are associated with a particular element — for example, a particular command or interface object used in a particular product.

New facet hierarchies can be added at any time, as needed. They simply represent a perspective of interest to at least one audience. In fact, because facets do not have a strong historical/cultural bias in long-standing domains of knowledge, they easily accommodate many different perspectives.

Classification in business organizations vs. classification in libraries

The KM Connection model of knowledge organization is different from the library science perspective on faceted classification. In the KM Connection model, the emphasis is on getting the right answers quickly in a rapidly changing environment.

  • Systematization, exhaustiveness, and "purity" are far less important than immediate effectiveness.
  •  Feedback and patterns of use are more vital to evaluating the effectiveness of the schema than a priori design assumptions. Gathering feedback from users of large-scale library classification systems and running it through an approval process is extremely difficult and time consuming. In the closed, networked environment of the organization, leveraging such feedback is possible and highly desirable.
  • The domain (organization) whose knowledge is being organized may not have existed until last week.
  • Meeting business objectives is the driving force --  from overall business mission to performing specific tasks -- not scholarly research.
  • Information-seekers are looking for answers; they need a high level of precision with rapid response. They're not looking for "documents."
  • The need for rapid, precise access to  important information trumps the requirement for depth and breadth of the resource.
  • From the perspective of syntactic queries, the focus is on the functions of artifacts and services -- not all possible ways of posing a query.
  • People are indexed, too.

Tools for supporting faceted classification

Until recently, there have been no tools for supporting faceted classification. Good news. Tools and standards for creating faceted access are on the way.

Travis Wilson is creating and testing FacetMap. See http://facetmap.com. He writes:

My project, http://facetmap.com, supports faceted queries with an arbitrary number of facets, and perhaps that's what you were looking for. The results are generated at query time and reflect true faceted classification. There is a demo on the site (or you can create a demo there by uploading your own faceted data). The interface is such that you construct the query as you go, but the cumulative query is run for every page. I would be interested to know whether my system meets your needs.

I'm also working on the release of a standalone tool that implements the system. Of course, formal standards for formatting the facet structures would be useful, so I would love to hear about any progress in that area.

Try his online demo and give him some feedback.

Also take a look at the Flamenco Search System project (http://bailando.sims.berkeley.edu/flamenco-interface.html).

MultiCentrix, which we use for the KMconnection Knowledge Management Product Guide, does support faceted knowledge organization. You can also construct a faceted knowledge access system and generate a Web site that reflects principles of faceted classification  -- for example, using such low-cost thesaurus construction tools as MultiTes. H0wever, neither product is specifically designed to support faceted classification.

Exchange standards for faceted metadata

XFML (the eXchangable Faceted Metadata Language) is described as "an open XML format for publishing and connecting faceted metadata between websites." See http://xfml.org.

XFML is based on the Topic Map standard, but uses only a limited subset of the data representation capabilities provided by the Topic Map standard.

I have not  yet evaluated this emerging standard.

Limitations of faceted classification

Faceted classification does have its limitations, especially at this time. For example, most information seekers choose the simplest methods of querying possible, even when they can substantially increase the effectiveness of retrieval by using Boolean queries. (It doesn't help that there are so many different Web site  interfaces for such "advanced" search operations.) Most people don't want to be bothered with learning new systems, and advanced systems of classification -- faceted and otherwise -- certainly seem painfully abstract and artificial to most people. 

In addition, to the best of my knowledge,

  • Although there are formal standards for construction of thesauri, to the best of my knowledge, there are no formal standards for construction of faceted thesauri.  (However, an exchange format for faceted metadata is emerging. See Exchange standards for faceted metadata.)
  • Faceted classification schemas don't respond well to the requirement for easy grasp of the scope of the knowledgebase or quick access to popular topics. (See Popularity-based categorization.)

One of the primary purposes of this Web site is to promote the development of low-cost tools that overcome these constraints. I believe that most of these difficulties can be eliminated easily.

Related information

Back Up Next

 

The impact of “managing knowledge” must be more than measurable; it must be predictable.

   

NOTE: As of December, 2007, this web site will no longer be updated.

Please go to Phil Murray's The Semantic Advantage web site or his Semantic Advantage blog for up-to-date information and opinion from Phil Murray.

 

Interested in faceted classification of information? Take a look at the Faceted Classification Discussion (FCD) mailing list.