Detail of a page of a dictionary

Using a Thesaurus to Categorize and Classify Your Data

Integrate a thesaurus as a controlled vocabulary to organize knowledge for later retrieval.

Managing data is like herding cats. It is very difficult to control, coordinate and get organized. Fortunately there are some mechansims to assist with the data part. For cats, sorry, you're on your own.

One way to help organize data is the use of thesauri (the plural of thesaurus). Without getting too esoteric or detailed, the simplest definition of thesauri is "a controlled vocabulary used to organize knowledge for subsequent retrieval." Emphasis on "controlled." What that essentially means is that concepts or keywords must adhere to a pre-defined set of words (i.e. vocabulary). The problem that thesauri solves is unorganized on-the-fly or adhoc categorization and classification. One individual might apply one keyword for a topic, while another individual applies a different keyword. Over time, this creates an unorganized data set, which makes it very difficult to understand and search through your data. The primary benefit of using thesauri is for adherence to standards across a field of knowledge.

A great example of a controlled vocabulary (i.e. thesaurus) is the Getty Art & Architecture Thesaurus (AAT)®. This thesaurus is used by librarians, museums, archivists and catalogers to describe items of art and architecture. The thesaurus can be found here: https://www.getty.edu/research/tools/vocabularies/aat/

To illustrate a use of the Getty Art and Architecture Thesaurus, one example scenario is of a researcher who is studying how ceramics were used throughout history. Ceramics are generally made by taking mixtures of clay and water and shaping them into desired forms, such as pots or statues. The researcher may want to describe the type of clay for a ceramic item, and limit descriptions of the types of clay to a standard vocabulary. Here is an abbreviated thesaurus record from AAT for ceramic materials, specifically "clay."


Materials Facet
… Materials (hierarchy name)
…… materials (substances)
……… <materials by composition>
………… inorganic material
…………… clay
……………… <clay by composition or origin)
………………… gault
………………… kaolin
………………… ball clay
………………… native clay
………………… pipe clay
………………… terre de Lorraine


The controlled vocabulary in this case is a standard set of data values to describe clay materials (e.g. gault, kaolin, ball clay, native clay, pipe clay, terre de Lorraine). This is a simplistic example, but illustrates the value of standards and consistency.

Thesauri typically also show relationships between concepts, in the form of related concepts, broader concepts, or narrower concepts. These relationships are formulated as hierarchical trees, with a main term as the heading in the tree structure.

eyebase includes a "Thesaurus Editor" that includes functionality to import a thesaurus or create a new thesaurus. The thesaurus then is used in drop-down lists to facilitate the entry of thesaurus terms for a specific record or group of records. The depth of hierarchies and the number of terms is technically unlimited. Thesauri can be exported as a CSV file. And any thesaurus can be displayed in a different language (multi-lingual).