Thursday, April 10, 2008

Interaction Design for Recommender Systems

Typically the effectiveness of recommender systems is determined by statistical accuracy metrics of the algorithm used such as MAE (Mean Absolute Error). However, Kirsten Swearingen and Rashmi Sinha argue interaction design is equally important in determining recommender system effectiveness. Performing an analysis of music recommender systems, researchers have discovered that there are two separate models for Recommender System success. The first is in terms of ecommerce (user's indication they will buy music) and the other is usefulness (how users are helped to explore musical tastes). Eleven systems were tested (including the likes of Amazon, MovieCritic, Media Unbound and CDNow). System recommendations provided by 6 of those systems are compared to recommendations provided by their friends. Study 1 involved 20 participants and Study 2 involved 12, all of which are regular Internet users in the 19 to 44 age range. Users provided input to the system and received a set of recommendations. Users were then asked to rate 10 recommendations from each system, evaluating aspects such as liking, action towards item (buy/download/do nothing), transparency (do they understand why system recommended that item) and familiarity (any previous experience of the item). users were also asked to rate the systemas a whole on a number of dimensions: usefulness, trustworthiness, and ease of use. At the end of the session, users were asked to name the system they preferred and explain their rationale.

Findings
The goal of most recommender systems is to replace or at least augment the social recommendation process. Study results showed that users preferred recommendations made by their friends versus online systems but their is a high level of overall satisfaction finding them useful in suggesting items that users had not previously heard of. Users like the breadth that online systems offer, allowing them the unique opportunity to exploer their tastes and learn about new items. Effective recommenders inspire trust and users are willing to provide more input to the system in return for more accurate recommendations. Designers often try to balance ease of use while enhancing accuracy. Of the participants studied, 67% didn't think the 4-20 input ratings amazon requires is sufficient to generate accurate recommendations. Conversely, MediaUnbound requiers 34 input ratings and 75% thinks that number is just right. Users also commented on the rating process noting some mechanisms such as genre selection, labeling your favorite artist, and rating scales are either too restrictive or redundant and boring. Users did like the rating bar/slider scale since they could click anywhere to indicate their degree of liking. They expressed interest in varying the rating process to one that is engaging and offers mixed questions and continuous feedback. Participants also like receiving information about recommended items. Identifying why it was recommended, when it was released, album covers and reviews by others is useful. Adjustments were made to RatingZone's Quick Picks to offer my detail about recommendations and they noticed a 20% increase in usefulness! In addition, users like and prefer to buy previously familiar recommendations. It helps build trust. For example, participants said 72% of Amazon's, 60% of MediaUnbounds and 45% of MoodLogic's recommendations were familiar. There is a greater willingness to buy familiar than unfamiliar recommended items. this makes sense since a familiar item is a less risky purchase decision. Users expressed a willingness to buy only 7% of the items recommended by mediaUnbound. While users show a preference for familiar items they do express frustration over recommendations that were albums by the same artists that the users had input into the system. Amazon might remind users about a favorite song not heard recently but it did not help users expand their tastes in new directions. MediaUnbound includes a silder bar for users to indicate how familiar the music suggested should be and users participants stated they liked this feature.

Again, while the algorithm used to generate the recommendation is useful in determining effectiveness, interaction factors must be equally weighed. Depending on the definition of success used, ecommerce techniques may be more important than usefulness and vice versa. Of course if the goal is to receive the best of both worlds, a hybrid system that uses the strengths of each approach is ideal!

Interaction Design for Recommender Systems (2002)
HUBRIS: Human Benchmarking of Recommender Systems (2002)
Kirsten Swearingen
Rashmi Sinha
Marti Hearst

No comments: