AskTog: Interaction Design Solutions for the Real World
 
Interaction Design Section   Living Section   About Bruce Tognazzini

The "Starfire" Video Prototype Project: A Case History

By Bruce Tognazzini

Published in the Proceedings of CHI94

(Note: this is a rather long, if useful, paper. You may want to print it out, rather than attempting to read it online.)


ABSTRACT

Developing a new working computer system can now cost hundreds of millions of dollars, all expended at great risk. Company managers who must take responsibility for making development decisions are loath to do so without being able to see and understand the system they will be "buying."

When Sunsoft launched the Starfire project to develop a next-generation interface, we turned to video prototyping, in the form of a short 35 mm film delivered in video. Not only were we thus able to show in mature form many key specifics of our new interface design, but we were able to communicate a strong sense of the resulting overall user experience. This paper describes observations and guidelines we developed during the early stages of the film, and what our experiences were in applying them.

KEYWORDS

Film, video, video prototype, prototype, observation, guideline, drama, story, interaction, gesture, stylus, mouse, voice recognition, anthropomorphic agent, agent, feedback, social, ethics, privacy, future

INTRODUCTION

In 1992, the SunSoft division of Sun Microsystems launched a project to develop and promulgate an advanced integrated computing/communication interface named Starfire. We envisioned two end results: a phased-in implementation plan, and a video prototype which would enable us to leap forward a full decade, showing a day in the life of a typical Starfire user in the year 2004.

"Starfire" is set during the week of "InterCHI ’04," which, according to our story, will be taking place in London. It was shot in 35mm film, then transferred to tape and edited using digital video techniques [18]. In 1994, this is still a relatively expensive process; the next few years will see the prices tumble as desktop video takes greater hold.

This paper refers to "Starfire" both as a film and, because it will be distributed in the video medium, as an instance of a "video prototype." It avoids using a more complex amalgam such as "film/video prototype."

The purpose of this case-study paper is to present why we chose video prototyping, and how we attempted to avoid its limitations and draw from its strengths.

WHY A VIDEO PROTOTYPE?

Computer prototypes offer a relatively inexpensive way to visualize at least parts of future systems, but may fail to communicate the overall feel of a new user experience, either because key hardware that will support the new system simply does not exist, or because of the difficulty of creating a fluid, interactive mock-up of a large system. [16, 17, 23]

Film or video enables one to build the ultimate demo out of pure "unobtanium." Gone are hardware limitations and computer artifacts. Everything works perfectly, no matter how many times the spectator looks at the tape, and messages, both subtle and explicit can move the user toward any conclusions the film maker had in mind. These are both the advantage and curse of video prototyping. Will you end up with a prototype of a system that can be built, or only a slick piece of propaganda?

"STARFIRE" OBSERVATIONS AND GUIDELINES

"Starfire" was, from the beginning, under the technical and creative control of SunSoft’s Human Factors Engineering Group. "Starfire" was to be a video prototype of a real proposed system, capturing enough detail of the interaction methodology to display the concepts. The software interface had to be capable of reaching maturity within ten years, specifically by November 16, 2004, when the story in the film takes place.

"Knowledge Navigator" [6] , its successors [1], and Hewlett-Packard’s "HP 1995" video [2] demonstrated technologies, such as highly-skilled anthropomorphic agents, that may not be approachable for 100 years. AT&T’s "Connections" video [3], featured videophones that could not only perform flawless "live" translations, but could re-articulate the caller’s lip, throat, and facial movements to perfectly track the new words.

In attempting to create a believable ten-year vision, we wanted to avoid such leaps into the future, but soon found both the film and video media inexorably pulling on us to make radical compromises of our own. Over the course of the project, to keep us from straying into fantasy, we developed the following set of observations and guidelines.

Observation: Interaction techniques most easily accomplished on film may be difficult or even impossible to actually implement on the computer.

Video prototyping at once eliminates all software limitations. Want an anthropomorphic agent that is capable of carrying out your most softly-whispered voice command? No problem. Want a 30 frame per second, real-time fly-through of an extremely complex 3D shaded model? No problem. Want instant translation with computed re-animation of mouth and lip positions? No problem.

Such seeming technological feats are easily accomplished in film. After all, to make it "appear" that a translator is not only converting the words, but the mouth position, you need only film the actor speaking in the second language. The remarkably correct mouth position then comes for free. To do a more realistic simulation of live translation, you must go to extra trouble to not "correct" the mouth position, by filming the actor saying the words in the original language, then overdubbing the dialog in the second language.

Our Guideline: In translating the interface vision into film, continually question whether each object or action can be accomplished in ten years on a real computer.

Result: We were repeatedly tempted to lower the cost of the film by pushing the time line out "just a little." We ended up eliminating some of the interactivity when the only other choice was to take some impossible leap into the future.

At the same time, we tried to not too quickly overlook film’s "suggestions." In "Starfire", we show a city representing the Heroine’s information space, which contains 100 million documents. Want information on politics: Fly down to City Hall. Searching for medical facts? Try the hospital.

The level of detail shown in the computer graphics during the brief fly-over are well beyond today’s technology and may be beyond workstation technology in 2004. However, as with the curved desktop display, we felt that such an information environment was worthy of introduction. It could well be that ten years from now, we won’t have the polygon count to produce as high-quality graphics as we show in the film, but we will have a count high enough to make effective use of the metaphor.

Observation: The "users" (actors) in a video prototype will show no distress in using the interface, regardless of its quality.

Video prototypes, while costing a tiny fraction of robust full-system prototypes, tend to be off-budget. They are not a matter of gathering together a few or a few dozen engineers together for a year or three, working kind of over in a corner. They require, instead, actual cash.

When at Apple, several Starfire members, including this author, worked on a project to develop a series of vignettes showing future users accomplishing tasks with experimental interfaces. The final results were shot in-house in video with practically no budget. Managers and outsiders were unable to look past the dearth of production values and appreciate the ideas expressed. The project had virtually no impact on Apple’s future direction.

"Knowledge Navigator" was produced by Apple’s Creative Services department, the folks responsible for the corporate logo and the building signs. They had three things going for them: the creators, Hugh Dubberly and Doris Mitch, are both talented individuals; they received good input from Alan Kay and several of his colleagues, and they had a high enough budget to produce a real film. Knowledge Navigator had a profound effect on Apple and on the industry.

We were interested in "Starfire" having a profound effect. We launched a full-blown fund-raising effort, garnering support not only within engineering, but within marketing, sales, and public relations. These latter people do not intend to shell out money for a films showing people with dour expressions making errors while stumbling through a prototype system. They want happy people basking in the warm glow of a computer that always works. We wanted to do our best to ensure that those happy people would be just as happy ten years from now when they sat down at the real thing.

Our Guideline: The Starfire interface must be designed, tested, and reiterated the same way any other interface would.

Result: We fooled the marketers. "Starfire" may be the first video prototype to show an actual bug, when the computer attempts to "read" a ham on rye sandwich. Outside of this single glitch, however, "Starfire" is indeed the story of a woman finding fame, fortune, and happiness through the good graces of her perfectly-functioning machines.

Most of the interactions seen in the film were built and tested in isolation to ensure that they would work. A few, including the most advanced one, entailing a second-person virtual reality "clipboard" display, could not be built within our time and budget constraints. In these cases, we relied upon a lot of feedback from Sun’s senior designers and other industry leaders.

Observation: In real life, things don’t always work as planned. In film, they will unless you’re very careful.

It’s much easier to write a story where everything goes as planned, rather than one where things go wrong and people must react on the spur of the moment. In the case of a video prototype, however, such a plan will safely insulate a slow and ineffective computer system from displaying its obvious flaws. (With such a story, one also has the problem of the spectators falling asleep.)

In the "Starfire" story line, Julie, the product leader of a new sports car at a major automobile manufacturer, has discovered that one of her so-called colleagues, Mike O’Connor, has sent a memo to the CEO of the company, telling him that Julie’s car is not ready. O’Connor suggests that his new sedan should be moved up into Julie’s place in the production schedule. The CEO has called an emergency meeting, and Julie now has five hours to put together a presentation she expected to work on for a week.

She goes into the executive staff meeting with a well-put-together report, along with some back-up material. She then presents her report, which she could have done on any contemporary system tied into a video projector.

We could have ended the story there, with lots of hearts and flowers and pats on the back for what a swell job Julie had done, except for the following guideline:

Our Guideline: In "Starfire," things must go terribly, terribly wrong.

When Julie finishes her report, O’Connor offers an effective rebuttal, based on a five-year-old AutoWeek report on a different car. Armed with today’s technology, Julie would now be sunk.

Writing a story line like this forced us to look at the real needs of presenters. Things don’t always go as planned. Anyone who has ever been in an executive staff meeting at a large corporation knows that it is rare to even get halfway through a formal presentation before the questions and shouting begins.

Our story line forced us to develop a design for Starfire that would enable people to reach their information space quickly and smoothly, that would enable Julie, when O’Connor waves around his copy of Auto Week, to not only pull up within 30 seconds the same issue—with annotations—but to retrieve subsequent materials that would help her negate everything in O’Connor’s central argument.

Results: Having to write the story as it would likely happen, rather than as it would happen in a perfect world, forced us to explore some very real issues—such as ubiquitous information space access with fast, intelligent retrieval—we might otherwise have been able to ignored.

Observation: Hardware in video prototypes can be complex to the point of impossibility and still appear to be easy to fabricate.

"Knowledge Navigator" featured a fold-in-half display that, once unfolded and started up, became a seamless, continuous surface. For engineers to actually build such a display would border on the impossible. For the film makers compositing screen animations over a still image of the opened "Knowledge Navigator" model, showing a seam would have required them going to the extra step of superimposing the image of a seam over the animation.

Guideline: Avoid impossible hardware designs and reintroduce hardware artifacts where needed.

Results: We developed a wide range of computing devices for "Starfire," to deliver a message we felt was important: Computer displays of the future will not only get smaller, they will get bigger. People will work with a family of devices, from tiny PDAs to wall-sized screens, choosing at any moment the devices most suitable for their work [19, 26-30].

Since we are showing a mature system only 10 years from now, we in general stuck to hardware designs that are either working in laboratories today or have been actually implemented, though perhaps on a smaller scale. For example, many of our portable models have an HDTV aspect ratio of 16:9, with an implied increase in resolution. It does not require much of a leap of faith to see such portables being widely available in ten years.

The primary display device in "Starfire" features a high-definition (150 dpi or better) 24 bit color, two square meter curved panel that acts as both an input device (touch, stylus, and digitizing) and display. This screen has a curving five and a half foot long, two foot high vertical area that then sweeps downward into an eighteen inch deep horizontal work surface. The effect is that the user is sitting before a section of a modified sphere.

For purposes of the film, we assumed we would be able to fabricate such a surface out of a solid-state matrix [27]. This allowed us the greatest freedom in creating a photogenic industrial design. It could well be that in 2004, we would have to actually build the device using projection technology, as explored by Kreuger [11] and Newman, et. al. [15].

Observation: In the film medium, the simpler and less direct the physical interaction with the interface is, the less expensive it is to film it.

The problem with film is that it is too good. In "Starfire," we show our Heroine producing a really slick presentation over the course of five hours. We could have had her walk into her office, announce to the computer, "whip me up a hot presentation by 4:00," and then leave for a long lunch. Five hours later, an anthropomorphic agent could have "handed" her the report, and we would have saved tens of thousands of dollars on special effects.

"Knowledge Navigator" used apparently-flawless continuous-speech voice and context recognition as extensively as it did to save money: Nothing had to be animated on the screen. No visible commands needed to be accepted and acknowledged. (Private conversations with Doris Mitch, co-creator of Knowledge Navigator.)

Our Guideline: First design the interface. Then make filming decisions based on budget limitations.

Result: In "Starfire", we show three classes of input: gestural and stylus (with the actors’ hands visible in the shots), mouse, and voice. In addition, we had two film techniques open to us to help reduce costs: the reverse angle shot, and conversion of action to monologue.

Direct Physical Interaction: Gestural and Stylus Input

Gestural and stylus input are by far the most expensive to film, requiring complex animations synchronized to the actors’ movements. This type of shooting entailed a considerable amount of special effects processing later on to make the final composite look real. The cost would have been considerably higher had we not created the animations in-house.

Indirect Physical Interaction: Mouse Input

The mouse turned out to be a real money saver, since, once we had established through a long shot that an actor was holding the mouse, we could to cut away to an insert shot that showed only the animation. We still have the time and expense of producing the animations themselves, but the hand-synchronization and special effects compositing costs went away. An insert shot cost around one third as much as a gestural or stylus shot to produce, again based on our developing the animations in-house.

Fortunately, we felt we needed a mouse (or other indirect pointing device) for our giant, curved display. Using the hands on the horizontal part of such a display, with the wrists and arms resting on the surface, would be quite natural. Occasionally reaching up to the vertical surface to slide an object or two would be more productive than searching for the mouse [5, 20, 22]. However, having to suspend one’s hand and arm in the air to really accomplish work on the vertical surface would be torturous [20].

Over the course of preproduction, we moved several key sequences to the vertical part of the display primarily to save the money hand or stylus interaction would have cost us: in real-life, Julie would probably spend more time working on the flat part of her desk than we have shown.

Voice Recognition

Voice recognition was a real money saver, costing one tenth the price of hand and stylus interaction. Voice fit in well with our interface strategy of enabling a wide range of overlapping input methods, and we assumed a fairly mature voice recognition technology by 2004.

We did not assume that voice recognition would be backed up by a robust contextual recognition capability that would allow the computer to "know" whether it or another human was being spoken to. We developed a set of rather simple voice guidelines. The primary rule was that if no one else was in the room or on the videophone, the computer would assume anything the protagonist said to be directed toward the computer. That meant that the user could not talk very loudly to herself, but it did eliminate the need for any preface to communication. When someone was in the room or on the screen, then the user had to initiate an instruction with a pre-defined command word.

Reverse Angle Shots

Reverse angle shots are a standard part of film technique [9, 13]. They can be seen in their most blatant form on TV news magazines, such as, "60 Minutes," or, "20/20," wherein periodically we switch from watching the interviewee to looking at the interviewer nodding sagely in response. (The latter shot having been filmed afterwards, then cut, often awkwardly, into place.)

In "Starfire," we used reverse shots not only during human-to-human dialog, but during human-to-computer dialog: by switching our spectators from a shot of the screen to a shot of the actress reacting to the screen, for seconds at a time our special effects budget fell to practically zero. Instead of costly and complex animations of screen elements whizzing around, we were able to substitute some simple sound effects, along with movements of the actors’ eyes, to convey motion in the spectator’s mind.

Our reverse shots also cost one tenth the price of hand and stylus interaction. Thus, at the price of voice recognition, we were able to deliver the impact of far more expensive physical interaction.

Monologue

Monologue is the bargain basement of video prototyping. In this technique, an actor delivers a soliloquy on some complex series of interactions accomplished off-screen at another time. Monologue not only costs a minimum amount, it can compress a series of interactions that might take an hour or two into the space of a few seconds. It can also put an audience to sleep if it runs much more than 15 or 20 seconds. (15 or 20 seconds is an eternity in film [10,13]).

We avoided monologue in Starfire.

Observation: The first goal in a video prototype is to communicate a vision to the viewing audience. The second goal is to design a usable system.

Our Guideline: Ensure that the viewing audience can see and understand the stages in complex computations, even if that requires making things visible that might be quite invisible in the actual product.

Result: Throughout the course of writing the screenplay, blocking the action, shooting, and editing, we had to constantly be aware of our two sets of "users": the fictional users parked in front of their fictional computers, and our very real spectators watching the final "Starfire" film.

Every time a new decision point arose in the project, we looked at our options both from the perspective of our future users and our immediate spectators. This process did not result in our changing the basics of the interface, but did result in our increasing the visibility of our interactivity:

User Input

Anyone who has ever been given a demo of a visual interface product by someone using short-cut keys knows what happens when, as a spectator, you can’t see every step the user is following. Things start popping up and disappearing from the screen as though by magic. We found it necessary to ensure that our spectators could see or hear every communication from user to computer.

We also found it necessary sometimes to limit the number of movements that actors had to make to accomplish a task. Too much flailing around was just as disconcerting as hidden movements. In one case, I had to come up with a simplified interaction while standing on the stage in North Hollywood while 20 highly-paid people stood around waiting. The end result was a much simpler gesture that not only made the film more watchable, but made the interface itself cleaner and more productive to use.

Computer Feedback

Both users and spectators will need to see the ultimate result of a calculation or process, but in many cases, users would only be confused by seeing the underlying algorithms that drive the process [25]. For us to communicate our vision to our spectators, however, we had to ensure that every concept was faithfully transmitted.

We developed our most complex animation strictly so that the viewing audience could understand our algorithms. In the scene, Julie wants to place a male model beside a 3D model of her car, in order to dress up her presentation. She first asks for a 3D mannequin from her Vellum 3D tool set, then chooses a male model from an existing 2D film as the source of a texture map. The system then maps the male model onto the mannequin form.

We wanted to show each step of what might be the internal process by which this 2D to 3D modeling might happen. First, to show how the male model is picked out of the film, we have her use a future version of the Photoshop wand on the model. The wand tool jogs the film back and forth, demonstrating that it is finding the edges of the object not by color or value, but by movement in time against the background. It then selects the object. Originally, we showed the selected object by removing the color from everything else in the image. Unfortunately, while test viewers could identify the object as being special, they didn’t understand he was selected. We had to add the familiar marquee of moving black lines for viewers to "get it." (And, of course, we increased the thickness of the marquee, so VHS viewers would be able to see what is going on.)

Once Julie OK’s the selection, a system tool takes control, running the film forward, taking snapshots of different views of the man as he walks through the commercial and developing them into a bottled texture map of the man that Vellum can then pour out onto the articulated mannequin Julie will place beside the new car.

In a real product, the computer could carry out all its work without showing the film moving at all, and, indeed, if it could capture the texture map quickly enough, that would probably be the correct interface. For the sake of the movie, however, we wanted people to understand that we were not using magic; we were using a specific set of algorithms that communicate an even more specific message: In the Starfire system, applications—and their developers—will work in close cooperation with each other and the system to reduce the work burden on the user and to enable end results that would otherwise be difficult or impossible to achieve.

Observation: Video prototyping offers the opportunity to explore social, as well as technical issues.

Because video prototypes show systems operating in situ, they present the opportunity to explore potential impacts, negative as well as positive, on the daily lives of their users.

Our Guideline: Showcase an unresolved social issue that Starfire will raise or exacerbate.

Starfire has the characteristics of a media space, defined by Mantei, et. al., as "a system that uses integrated video, audio, and computers to allow individuals and groups to work together despite being distributed spacially and temporally [14]." Such systems immediately raise the specter of violations of the right to privacy [4, 7, 8, 12, 14, 24]. We wanted to explore the ramifications of media spaces, not by solving the privacy problem, but by demonstrating it.

Result: We assumed our system would have such built-in safeguards as no one being able to see into your private space without your being able to watch them [21]. We also assumed that the mechanism for turning off the cameras showing your space would be tied to the physical structure of your space. For example, the glance camera, used by others to see if you are in, might be mounted in your door frame: close the door and it would be impossible for outsiders to see in.

The privacy invasion we wanted to show in the film would not be based either on a design stressing functionality over privacy, nor on someone "getting around" the system to spy. Rather, we wanted to show the result an unexpected event could have even with a well-protected system.

We considered several unexpected events. The event would have to suddenly thrust a character into an intimate moment, inadvertently shared by everyone on the network. It would have to make the film’s spectators feel that they had ceased being viewers and had suddenly crossed over to being voyeurs. At the same time, we didn’t want a lot of letters of complaint coming in after the film was released.

We finally settled on a scene surrounding the character, Molly, on Julie’s design team. Her boyfriend quite unexpectedly arrives at her office and proposes. We thus broadcast a tender and private scene that Molly might not have chosen for her co-workers to be watching, had she had the time to consider. We expect that some spectators may extrapolate to the possibility of less tender and more startling interactions initiated by a young lover, interactions inadvertently hurled at 186,000 miles per second around the world, leading to the necessity of updating résumés. Others may consider the less career-threatening embarrassment of being seen a continent away while hiking up one’s pantyhose or cleaning one’s nose. With or without such extrapolations, we hope the scene will engender discussion.

Film is a powerful medium, capable of either showing perfection, thereby stifling discussion, or showing imperfection, thereby promoting debate. In building a video prototype, we felt an ethical imperative to show the limitations of our designs.

CONCLUSION

Video prototyping is a powerful medium for communicating not only the functionality, but the spirit and personality of a new application or computer. It eliminates all the limitations of computer prototyping, but at the expense of introducing a number of seeming advantages that can work together to lure the prototyper away from the possible toward the land of fantasy.

We found that by adhering to the guidelines we developed, we were able to produce a drama with a strong story line, a large number of clear and definite messages, and a sprinkling of controversial elements, all wrapped in a video prototype that still demonstrated the fundamentals of an implementable new design.

High-budget video prototyping is a new field, and we are confident that those that come after us will improve greatly upon what we have done. We offer these observations and guidelines as a platform from which they can begin.

REFERENCES


Don't miss the next action-packed column!
Receive a brief notice when new columns are posted by sending a blank email to asktoglist-subscribe@yahoogroups.com.

return to top

---
 
Contact Us:  Bruce Tognazzini
 
Copyright Bruce Tognazzini.  All Rights Reserved