Semantic Search and the Semantic Web

Introduction

While Semantic Web and Semantic Search are not the same thing, the two concepts are often confused. This lesson will briefly define Semantic Web and Semantic Search and then explain how the two may be used together.

Objectives

After completing this lesson, you will know:

What is meant by the term Semantic Search.
What is meant by the term Semantic Web.
How and why these two technologies might be effectively combined to perform complicated tasks and solve difficult problems.

Today’s Lesson

The fact that these two families of technologies share the word semantic has led to some confusion about the difference between them. According to Merriam-Webster, semantic means “of or related to meaning.” Both of these kinds of technologies attempt to retrieve and present information based on its meaning rather than on its structure or intended usage, as more traditional technologies do. Although they are related, the two technologies in fact solve different problems.

In brief, Semantic Search is useful for searching on a single type of data in a single domain, whereas Semantic Web technologies are useful for querying across many types of related information. Consider a few examples of each kind of technology.

Semantic Search

We are all familiar with standard, Google-style search, which is often referred to as keyword search. In keyword searching, you enter some text, and the search engine returns documents containing the text that you entered, typically ranked somehow based on a well-defined relevancy calculation.

Although Google generally does a good job in ranking web pages, most of us know that this kind of search completely fails in other contexts. For example, searching your own computer for a document by relying on keywords can be very frustrating—not to mention searching a data store the size of your corporate intranet!

In such cases, you will not succeed unless you know exactly what you are looking for. This shortfall is not the fault of the technology itself; relevancy ranking without something like Page Rank is hard. In fact, Google’s early success in large part stems from their discovery of an effective ranking strategy for the web.

This is where Semantic Search comes in. Rather than blindly returning anything that contains the text you typed into the search bar, Semantic Search takes into account the context of your search as well as the underlying meaning of the documents to be searched.

As a simple example, let’s say you go to Google and search for “jaguar”, as you see below.

However, what if you were searching for jaguar, the predatory black feline? Or Jaguar, the Mac 10.2 operating system? Or Jaguar, the Atari system? Even on Google, straightforward keyword searching does not take into account the context of your search, nor does it understand the meaning of the documents.

In an attempt to do a better job, Semantic Search technologies employ various methods (NLP, statistical modeling, etc.), to categorize and/or cluster related documents to ease searching.

Google and Bing have both already been doing this on the Web for specific topics. For example, if you go to Google and search for “Boston restaurants”, as you see below, Google will not simply return documents with the words Boston restaurants in them; instead, Google will instead you some restaurants.

Google not only returns a list restaurant names, but also provides addresses, a map, phone numbers, and even reviews—so much better than a simple list of documents. From the context of your search, the algorithm makes an educated guess that you are not looking for information about the “history of restaurants in Boston” or a random blog post arguing that “Boston restaurants offer too much seafood.” Google guesses that you’re looking for a place to eat and tries to be helpful.

This is the essence of Semantic Search: go beyond keyword search—taking into account both user context and assumptions about the underlying meaning of data—in an effort to return more relevant results and present them in a more appropriate way, always aiming for greater success in information-seeking endeavors.

Semantic Web

The Semantic Web is a set of technologies for representing, storing, and querying information. Although these technologies can be used to store textual data—such as text in a Word document or PDF file—they typically are used to store smaller bits of data. Thus, while Semantic Search focuses largely on textual information, the Semantic Web also includes numbers, dates, figures, and other data in addition to text.

For example, if you wanted to represent a collection of information about the White House using RDF (the data model for the Semantic Web), you would not store the entire page as a single value in the Semantic Web, as you would on your computer or on a single web page. Instead, you would store the Address, Date Constructed, Designer, Current Resident, etc, as specific factual values.

The purpose of this flexibility is to help us find answers to very complex questions, such as, “Which Presidents who lived in the White House had at least one child who did not live with him in the White House?” The Semantic Web excels at answering such complex questions involving multiple types of data from multiple sources in multiple formats.

Semantic Web and Semantic Search Combined

Generally speaking, anything that can be accomplished with Semantic Search can be represented as a Semantic Web query. That is, Semantic Web technologies are sufficiently broad to encompass all Semantic Search capabilities.

That said, for some problems Semantic Web technologies might be overkill. SPARQL offers a more complicated interface than simple keyword search, and even visual SPARQL query builders might be more than you need for a specific solution.

A simple way to think about which family of technologies might be useful for a specific problem is to ask yourself whether your users are searching on only one kind of information (e.g., restaurants, a flight number, etc.), or whether they are searching on many kinds of information (e.g. which presidents had children who did not live in the White House).

As an illustration of this concept, consider that what Semantic Search vendors do is focus on specific subject domains in order to provide targeted user experiences. For example, Google does something special with restaurants, as we’ve seen. Furthermore, companies like Kayak, Orbitz, and Hipmunk in the travel space can be classified as vertical search companies, meaning that they specialize in one single domain. You can imagine similar situations in a number of different fields, such as medicine and finance.

Conclusion

Semantic Web vendors focus on solving problems using many different kinds of information. Instead of simply storing data about restaurants, a Semantic Web application would have access to information about the chefs, the cities, the menus, the cuisine styles, the décor, the wine list, the wineries that produced the wine on the wine list, etc.

However, if you need to answer a question such as, “What restaurants in Boston have several wines that were produced in the Alsace region between 1998 and 2001?,” then Semantic Search will not be able to help you; instead, you will need the Semantic Web.

Next Lesson

Semantic Technologies Compared