Assumptions About Search

Essays in this search series:

Some results are usually better than no results.

A search engine may return no results because a valid query correctly fails upon finding no relevant results, or because the query was invalid (for example, over specified, typos, bad syntax, etc.). On a commerce site, returning zero results is usually a bad thing because a.) the customer may become frustrated and leave and b.) the customer can’t buy products s/he can’t find.

If the user’s query was bad, then the search engine should compensate with back-end logic, such as fuzzy logic that generates parallel queries based on letter substitution, word stemming, wildcarding, or some other similar approach. If a query fails because the user formed a bad query or violated some search rule, then the search interface should explain why the query failed and how the user’s query should be re-phrased or re-submitted.

If the query correctly fails because no matching results exist, the interface should make that clear as well.

Relevant results are better than irrelevant results.

Appropriate results (i.e., results that are useful to the user) are better than indiscriminate or irrelevant results; yes, this does seem to contradict the assumption above. Quality is more important than quantity. The quality of search results functions as feedback to the user, regarding the accuracy and success of our search engine. If the user doesn’t see what they want — regardless of what they in fact searched for — they may quit. If they see many products that have no readily apparent relevance to what they wanted, they may quit.

Users shouldn’t have to learn a new language to find what they want.

This is simple common sense: users have trouble with search already, so forcing them to use unfamiliar query languages or constructs is likely to cause them more difficulty while searching.

Searches shouldn’t fail because of bad data.

inconsistencies or errors in the data or data architecture should be transparent to user queries; for example, if the search requires case-sensitivity, failures caused by case problems should be amply explained. Another example of inconsistent data or search system architecture would be the following searches on the second world war:

  • “world war II”
  • “world war 2″
  • “world war two”

Three different ways of saying the same thing, but each will return different results on most, if not all, search engines.

Searches should not fail because a query was correctly entered but the data was misspelled. This problem is understandable in document collections, such as with web searches, when the error is the result of the document’s author. Annoying that a potentially useful document remains out of reach, but understandable. If the error is caused by the indexing or compilation of the document, however, the problem is a violation of trust: such an error is an added barrier blocking information retrieval.

Commerce sites can allow no such errors in their product catalogues, because any barriers that keep customers from finding products violate business rules: you can’t sell what your customers can’t find.