Assumptions About User Search Behavior

Essays in this search series:

Search engines, or rather the combination of search interfaces and the invisible blackbox functionality of the search engine, are a source of great frustrations for many users. The following assumptions about user interaction with search are a useful guide to designing and evaluating the usability of a site’s search functions.

Users don’t fully understand general search methodology.

Users may not understand the schemas or processes involved in searching. The report What’s Wrong with Internet Searching, a study involving novice web users, noted that:

Nearly all participants from both trials had difficulty formulating good searching keywords even when they had all the information they needed. In the real world, users go into a library or a shop and express their requirements in verbose or imprecise terms, or alternatively they browse through items on offer. They are not used to elaborating an artificial text string to match their requirements.

Users may not understand that searching is an iterative process, often requiring refinement of search queries and winnowing of search results. Moreover, users may equate a negative response to their search query as indicating the non-existence of valid results.

When a user under specifies a query, for example searching on AltaVista for “quality”, the extremely large number of returned hits and the concomitant difficulty wading through them may discourage users from searching further. With any large amount of information, if the first lots of information aren’t valuable, users may conclude that either none of the information is of value, or that the energy required to find something of value is too great.

Likewise, users may not understand that when they over specify a query and then don’t get valuable results, they need to make their query more general.

Users don’t fully understand the search interface.

Not all search interfaces are the same, and there are few standards on search interface construction, so there are no guarantees that a user will be able to successfully use any given search interface.

Pages with multiple search forms or form submit buttons may confuse users because they won’t know which button to click.

Users may not understand what they are searching against; if a search interface is querying against a subset of documents, the search form’s context and placement may suggest that it is searching against the superset of documents.

Confusion over search interfaces can go both ways, however. Cameron Barrett of CamWorld tells me that the logs for his URL submission form (a way for his readers to tell him about interesting URLs) routinely show keywords and phrases that are obviously search strings. This would suggest that many people are so used to the web convention of a text input field coupled with a submit button being a search form, that they over generalize ALL such interfaces as search. On the other hand, when I examine the search logs for Borders.com, I frequently find fully qualified URL strings, indicating that some users are confusing the search form with their browser’s location field.

Note: My wife argues that the data I see in the Borders.com logs indicates a problem with either setting or locating the cursor’s focus before the user tries to type a new URL into the browser’s location bar. If true, this would indicate a failed exit strategy, and would also dilute my point, so I think I’ll deal with this later ;)

Users don’t know the best type of search for their needs.

If your site uses a search form with more than one input box, users may not understand the different functions or roles of the different inputs. If you have a music album search that has a field that queries against title information, and another that queries against performer information, and these fields are part of the same form, then you are presenting users with a remarkably complicated form. Users must differentiate between what they know about the title and what they know about the performer; users must predict what they should do if they only know information for one of the fields; users may not know which field is more valuable to the success of the search.

The complexity increases as more distant types of information are requested on a search form, so a search form that has fields for title, author and (for example) price becomes significantly more difficult for the user.

Multiple search forms present similar problems, even when each form is optimized for a certain kind of query or for certain kinds of search parameters. For example, if a site has a basic keyword search form that queries against multiple fields in the catalogue or document collection, as well as a form that queries only against product titles, the user may not understand that one form may be better — in terms of accuracy or performance — than the other form.

Users may be confused about the scope of a search.

If your site provides the ability to specify the collection against which a search will be performed, some of your customers are going to miss the distinction and will be confused when they get back results from a query against a “wrong” or unexpected data set. Moreover, users may not be able to identify which collection might hold the answer to their query.

If your site provides search forms on most pages and your information architecture design includes a sectional breakdown for different topics, such as a sporting-goods web site with different sections for roller blades and bicycles, then the user may misunderstand what set of information a search query will run against. For example, a user contemplating the search form on a page about roller blades may assume that the search will only be against roller blades.

If your site provides context sensitive search, users must be aware of the context, and understand that their search is contextual, to avoid confusion. Likewise, if you don’t offer contextual searching, and you have distinct sections or topical areas to your site, make that clear as well.

Users have difficulty formulating queries.

Users often can’t translate what they know about a product into a successful query. Sometimes this is just a matter of phrasing, as when users enter natural language queries — “I am looking for books about the civil war”.

Other times this may be a problem of clarity, as when the customer can’t clearly enunciate what they know about the product or information they want — “I don’t remember the exact title, but it was a book about dogs and the author’s first name was John.” With clarity problems, the quality of results can’t be predicted because of the uncertain validity of the search parameters.

This issue may be about conversion; for example the customer knows something about the product that doesn’t translate into a parameter that can be included in the search query — if the customer remembers the color of the cover, that characteristic is unusable by most bookstore search engines.

A more severe problem is when the user knows how to formulate a good query, but that query doesn’t match the terminology or syntax required by the current search interface. In other words, users can frame and state their query correctly with regards to the subject domain, but not correctly with regards to the search interface/engine they are trying to use.

According to a report titled Mapping Entry Vocabulary to Unfamiliar Metadata Vocabularies,

The explosive increase in heterogeneity assures that the lack of familiarity required for efficient, effective searching is an increasing problem. When an index or categorization scheme is encountered, how is one to know what word or value has been assigned to the topic that one is interested in? Expert human search assistance is often needed, but a sufficient population of human expert search intermediaries is unaffordable. The challenge, therefore, is to provide automatically the kind of expert prompting that an expert human search intermediary would provide. It has been argued that the most cost-effective single investment for improving effectiveness in the searching of repositories would be technology to assist the searcher in coping with unfamiliar metadata vocabularies [Buckland 1992].

Users may have trouble using differentiators to make the query specific enough to return the desired information. The report What’s Wrong with Internet Searching noticed many problems using search with their user sample:

The central problem was that users did not seem to understand what were likely to be good quality differentiators. Thus one participant looking for Reebok trainers searched for ‘sports shoes’ rather than the more discriminatory ‘Reebok’. Another entered ‘competitive market share’ when trying to find information about competitors for the Rover car company, not realising that he had to enter at least some contextual element.

Users may not understand the logic applied to their query.

A user types in a string of characters or words into a search form, then submits the query; the user may not understand what happens to what s/he typed in. The search system may perform logic on the search parameters to derive word stems or to wildcard the string, or any of a range of possible parameter tweaks designed to increase the effectiveness of the search or the number of query “hits”. The problem with this logical processing of the search parameters is that the user may not understand why certain results are returned, and why other “obvious” hits are not returned.

Users don’t fully understand the subject domain.

The subject domain for a commerce site includes the collection of products that comprise the site’s catalog, product characteristics and usage assumptions unique to the market, terminology and definitions unique to the market, and schemes for organizing and categorizing the products unique to the market. That a customer may understand the products or how to use the products is no guarantee that s/he will understand how the market processes the products.

For example, an avid cyclist who has purchased many bikes and is an expert rider may still lack understanding about how the bicycle industry categorizes bicycles. Bikes are made for different purposes — road racing, road touring, off-road downhill racing, cyclocross, etc — and of different materials — various kinds and qualities of steel tubing, aluminum, magnesium, titanium, fiber, etc. — in different configurations — diamond frame, front suspension, full suspension, monocoque, etc. — in different countries — USA, Japan, Taiwan, France, Italy. Every one of these characteristics is important, but how much of the full spectrum of bicycle-related information will the typical user understand? If the user is searching against a bicycle catalogue online, which of these data points is necessary or important for finding the bike they want?

In the books, music and video retail industry, the subject domain includes how titles are catalogued, indexed, grouped, etc. Unless a user is very familiar with the realm of bookselling, s/he won’t necessarily understand the relationships between products, citation information, identity, etc., so the user may not know how information in that domain is organized. The user may also have trouble understanding the patterns of copyright ownership, the passage of titles into out-of-print status, and (my favorite), the existence of product records for non-existent products.

Many users employ search as a means of navigation.

Various studies show that some users approach search — especially a search function on a given web site — not as a means of information retrieval but as a mode of site navigation. This is an indication that the design of a search system must be part of a holistic approach to designing the information architecture of a site.

Users are likely to make simple input mistakes.

Server search logs show high rates of typographical errors in search terms. Typos are dangerous to user success in two ways: first, users may not understand that a typo was the cause of negative search results; second, users may incorrectly view a failure from typos as indicating that there are no results for the query, which may lead to the abandonment of the query.

Incorrect word or word version choice is also an issue with any strict, literal search engine: if the user enters the terms “save private ryan” when seeking saving private ryan, a strict literal engine may fail because the word “save” is incorrectly conjugated. Well, it is correctly conjugated, but is the wrong form of the word. Some engines can handle stemming and/or linguistic logic to expand the scope of the query, but literal engines will stop on this “error”.