Wizard-of-Oz Testing

Interface and Task Descriptions

Both of our interfaces used Google Earth projected on a large display. Both required the user to navigate through campus locations using our navigation gestures (single foot forward/backward to walk and turn shoulders to rotate perspective left-right). We wanted to see if the gestures 1) were easy to remember and execute and 2) felt natural. We chose the Main Quad and White Plaza because we expected most students to be familiar enough with these locations so they could focus on navigating and using our gestures.

Interface 1: AnnotativeRecording/Explorative Playback

The first interface centered around exploration of locations and annotating useful information. We set the user in the Main Quad and directed them to 1) find a multimedia artifact (youTube widget) and “listen” to it, and 2) find the Rodin sculptures and record a commentary that specifies how many sculptures there are.

The Listener The Point The Microphone

Task 1 had one dependent variable: “listen” gesture to playback a multimedia artifact. Case A’s gesture was to cup a hand to an ear, imitating the gesture some people often use when they need to hear something better. Case B’s gesture was simply to point at the multimedia artifact with one arm. Similarly, Task 2 also had a dependent variable: gesture to record an voice annotation. Case A involved holding a fist below one’s face, imitating the use of a microphone. Case B was again to simply point using one arm at the object of interest. By A/B testing, we hoped to find out whether it is preferred to have many individual gestures tailored to specific functions that try to play off common mental mappings or fewer simple generic gestures like pointing and selecting, in which case the application would interpret what function to execute based on the selected item.

Task 2 also required the user to look and count how many sculptures exist in the Rodin display. The point of this was to observe whether or not the user did anything interesting with their body or gestures in an attempt to get a better view of the sculptures beyond the basic walking and turning gestures we provided. We hypothesized that it might be intuitive to either walk towards the sculptures to see better or either lean forward, etc. to zoom in on the sculpture while the user remains in one spot.

Interface 2: Immersive Tour Recording/Playback

Interface 2 centered around virtual tours. We set the user in front of Tresidder Union and asked them to 1) record and guide a tour around White Plaza to share with their parents, and 2) participate in another traveller’s recorded tour.


Both tasks involved using a clutch gesture to bring up an option menu. The option menu displays the possible functions the user can do. Task 1’s clutch required holding the left arm up at a right angle. Task 2’s clutch mimics a birdwatcher gesture. As with Interface 1, we hoped to observe whether or not a simple gesture (Task 1) or one that imitates a gesture used in real situations (Task 2) is easier to remember and execute.

For Task 1, we asked the user end by walking over to the Bookstore and turning 180 degrees to face the Claw to conclude the tour with a nice view. This was just a simple way to force the user to use the shoulder turning gesture to manipulate the perspective in the left-right directions so we could observe more closely.

During Task 2, we asked the user to pause the tour and go off to explore on their own. This was primarily to see if the user could remember to use the clutch birdwatching gesture after the first part of the tour, during which the user is not performing any gestures.

Wizard-of-Oz Data

Interface 1

Subject 1: We noticed that while he was navigating through the interface, he turned and moved forward individually instead of concurrently. Instead of trying to turn and move at the same time, he would stop, turn, and continue moving forward. We also noticed that he tried to point to the media object instead of using the gesture we provided to open it.
Thoughtless gestures: scratching his chin; looking at Huyen

Subject 2: He successfully walked up to and played the YouTube video. He used very subtle shoulder movement to navigate. He turned his shoulders while stepping forward.
Thoughtless acts: rested his hands behind his back

Subject 3: His turns were much more pronounced than the other testers; he turned almost 90 degrees to move. He also moved his feet while doing so. He got lost (walked into a building), but successfully found his way back using the gestures. After activating the playback button to playback a tour, he relaxed his posture and began to watch/enjoy the tour.
Thoughtless acts: scratching his nose (False positive!)

Interface 2

Subject 1: The clutch to record took time and several tries to explain. He was unsure if he had to hold the gesture to continue recording, or if he could put his arm down. Also, he did not know where the Claw was, so he could not complete the final task. He tried using the same hand to activate the clutch and point.

Subject 2: He kept the “playback” clutch while he was recording, instead of putting his arm down. We were surprised to see that he put his arm down when he wanted to end the recording (ordering effects?). Also, he did not know how to discard a recording or review his recording.
Thoughtless acts: crossing his arms; scratching his head

Subject 3: He had no problem using the clutches. While he was moving, he would subconsciously lean his body to keep straight/avoid obstacles.
Thoughtless acts: scratching his arm
His suggestions: What if users could points an object to annotate? What if there were gestures to re-center the camera? What if a cursor was added? What if the length of the step in the move forward feature determined the speed? Wished there was visual feedback indicating recording is taking place (i.e timer).

Clutch Gesture: Holding up the left arm clutch was tiresome. Subject 1 held his elbow lower while Subject 2 let his forearm hang loosely.


Learnings and Revision Plans

We realized that visual feedback was a huge need from the user point of view. While we put a lot of emphasis on making sure that each gesture provided immediate feedback in terms of interactive navigation and movement, we did not provided visual feedback for our sub-tasks such as recording or playing back a video tour.

In addition, we noticed how our gestures based mostly on feet and shoulder tracking, leaving the user’s hands free for potential hand based gestures, such as pointing. Most users used their hands to point at objects while they explored the virtual space. We also realized some users touched their face often, which could trigger false positives, arguing in favor of keeping gestures body and foot based. They expressed feedback about how having an on-screen cursor or pointer would’ve helped to interact with the multimedia artifacts. A help button or gesture to explain the gestures or re-center the camera are also potential features we should implement.

Regarding the clutches, users generally understood the playback clutch, but the recording was a bit more confusing. We believe that the fact that the user had to keep her hand up with her fist closed while pointing with the other hand to select an on-screen item and then putting the left hand down was not entirely clear to all users. We will try to come up with additional gestures to bring up the on-screen menu that would disappear once the user has made a selection.

Most importantly, we realized that having two different interfaces for small/large scale recording and playback of events did not really made sense to the users. We realized that regardless of the interface we proposed, users felt that all the gestures could be used on both scenarios, which led to confusion. For instance, they used the microphone gesture to annotate a virtual object as well as to record an entire tour, and they also used the “start recording a tour” gesture/clutch combo to annotate an object. Therefore we will revise our interfaces and gestures in order to come up with a single one that makes sense for all our proposed scenarios.

Interface Reflections

On a more general level, we polled our participants as to the overall “appeal” of our interface and product. We found that our initial product model of both recording and going through tours was not as appealing as we had originally thought. Most of them were interested in navigating through the Google earth/Street view- an experience most described as “really cool” and “awesome.

In our conversation with our CS247 coaches, Christine Robson and Josh Weaver, we brought this up and they shared the intuition that “tours” were unlikely to be an appealing product model. Instead, the idea that we should adopt a more “immersive” and gestural movement interface in a virtual world was advocated.

As such, we are taking a lot of the lessons learnt from the Wizard-of-Oz prototyping about the suitability of gestures, to apply it to a slightly refocused objective. We are now focusing our efforts or a more natural user experience navigating a virtual world. We are also interested in exploring a closely-related field that our users as well as Christine brought up- namely, the annotating of a virtual world with data streams such as geo-tagged Tweets and photographs. Our goal is to design a gestural interface to navigate through a data-augmented virtual reality.