The Proximity Interface and Human Computer Interaction

2004
Mike Leggett

Abstract

The tools with which the media artist works and the infrastructure within which the artwork is
made and exhibited are critical determinants of how work is received and considered. This paper
will build upon earlier investigations by myself and others into interactive art installation as models
for informing development of HCI. These areas of practice-based research and the resources
available online for the development of solutions based on modular electronics, suggest there
exists common ground for scientist and artist to explore for revising the interface as an
experience built from components – of presence, of devices and of code.

(Presented at the Biennale of Electronic Art Perth)

Pathscape (Strangers on the Land – WT Sontel)

2000
Mike Leggett

“The Australian people are mostly newcomers. They and their land must form a bond …. otherwise we will always remain poor, confused strangers in our own lands.”
Tim Flannery ‘The Future Eaters’

Concept
Land is central to Australian culture and history. For indigenous people it is the source of spiritual as well as material nourishment and has been for more than 40,000 years. As a predominently urban culture much of what Australians experience and understand about the land is conveyed and interpreted to us by a whole range of media: cinema, television, painting, photography etc. This mediation process places a frame around the subject, whereby ‘the land’ becomes landscape, an object for distant appreciation.
“Landscapes are culture before they are nature; constructs of the imagination projected onto wood and water and rock.” Simon Schama
The series of narratives, commentaries and interactions which are encountered by the user explore the transitions that occur between people and the land, the individual and the landscape, place and memory. The interactive design allows each individual’s cognitive and assimilation processes to operate in correspondence with what is experienced.

Interface: the Experience

The interface design approach is demonstrated in the prototype and based upon three principles:

•    a rapid, experiential encounter with a familiar landscape, poetic to the senses, with different narratives and different voices speaking from various perspectives: it is vivid but unsettling.

•    a more measured pace which, like a pause during a bushwalk or a break from a task, encourages reflective thought on conjective, even disputative, information: it is didactic but in the active sense, like absorbing a well constructed novel, or examining an archeological site.

•    a text-based point of access which enables the narratives and the information and the images they contain to be explored using linkings based on sources, word associations, indexes and titles.

“I am before a moving image – it is an image of the sea, the horizon line bisecting the frame of the image, top to bottom – the surf rolls in, endlessly. ”
Interface: the Audience

The user determines the degree of their involvment in this process by being able to identify and select different ‘levels’ of immersion. The experience can be about enjoying the sound and image which construct this multimedia landscape, and it can become a resource tool for gaining knowledge and insight into the contemporary and historic environment, an interactive documentary about this time and place.

Interaction

The prototype has been developed with an interface and navigation system which will enables the user to enjoy a rich visual diffusion of landscape images collected from NSW South Coast locations. The interface design provides a pleasurable experience and then as an option, provides intuitive access to knowledge and information related to that experience, via the path through the landscape or through the text-based ‘sources’ feature.
The many stories, both historical and contemporary, which lie hidden in the landscape, compel the user the piece together the real picture, often at variance with the image of landscape, a picture much richer than being simply the backdrop to events.

Content

The options for interaction offer a choice between the experiential, and a combination of the experiential and the knowledge-based. The choices are governed by the gestures made with the Mouse and may respond to questions such as:

What lies behind the beach?
What lies in the Bush?
What is obscured by what I see?
What is beyond more than the eye can see?

Motivation for the short-term encounter or sustained involvement over the long-term will rely on a compelling interactive process which leads the user through a series of remarkable encounters with short, narrative sound and image sequences, offering as an option to the user an engagment with areas of knowledge designed to intrigue and inform, linked together by the landscape in view.

The content will be conveyed through a series of discourses encountered at the various Nodes within each Zone. The two broad areas of knowledge arising from human interaction with the material circumstances of this country, the contemporary and the historical, part fact and part belief related to this landscape, will form the researched substance of this fully developed version.    

Zones

The zones form the skeletal structure for exploring the landscape, its appearances and its stories – each zone is signalled by colour-coded margins which frame the central image.
The number of zones traversed will expand from the six in the prototype to twelve  in the full version:

0    Sea and Headlands            Not in prototype
1    Beach                        In prototype
2    Creek                            “    
3    Dunes                        “
4    Light Bush                        “
5    Wetlands                        “
6    Rainforest                        “
7    Highway                    Not in prototype
8    Rainforest Gulleys                    “
9    River     Flood Plain                    “
10    Ranges Slopes                    “
11    Ranges Peaks                    “

Content Threads Summary

1.    Anecdotal and contemporary evidence:
    a) ‘Living on the land’ : indigenous and non-indigenous anecdotal accounts.
    b) representations through Popular Culture; movies, publishing, advertising etc.
               
2.    Historic and other empirical description:
    a) Historical: local, national and international archives including    official recorded colonial history, recorded personal history both indigenous and non-indigenous, reports by media both local and national.
    b) Geographical: topographical, flora and fauna, farming and mining, industry and commerce, settlement, etc

 3.    Ideas and Analysis and the authorial presence: the function of the chorus or benshai – comment, conjecture, and projection – moving scale outwards from the local and the specific to the global and the general. This will be effected in part by the use of the Web search engine feature to deliver links that will stay current with issues raised by the interactive.    

Development from Prototype
 
Content development will affect three aspects of the production:

•     the density of material within the already established zones 1 – 6;
    For instance, expanding the numbers of stories in the beach zone could include further ‘first sightings’ narratives based on imaginative interpretations of entries in the diaries of the crew of the Endeavour during their three-year expedition in the Pacific – dreams of riches, of salvation, of the erotic. These would be set with narratives dealing with the lives of the Yuin and Thawal peoples who would have lived in these places at that point in time, stories based on oral histories and upon reports made by anthropologists and archeologists.
•     further collaboration with the Budamurra Aboriginal Corporation to develop narrative sequences (such as above) that describe the land as ‘country’ from the perspective of indigenous people in this region who are its traditional custodians and who have the authority to relate stories and lore.
•     the gathering of fresh images and sounds to represent the landscape zones up to and including the coastal ranges. To characterise this development, the non-indigenous narratives encountered in zones not covered in the prototype will include:

The Road Builder’s Stories – like the surveyor, measuring and cutting the bush to liberate it from the ‘chaos’ of underdevelopment, without reference to traditional owners or proposed users, linking one urban centre to another.
The Story of the Bitumen Thread that circles the continent and the stories of the travellers who use it – the truckers, the tourists – where are they going? Why are they on the move? What will they find at their destination? These are stories which bind this region to other parts of Australia and the rest of the world – the Easter Show and the Reconciliation March in Sydney, the AFL Grand Final in Melbourne, the woolens factories of Korea, Italy and Britain.
The Motorcycle Cop’s Stories – several narratives from a retired magistrate who sat with the first Aboriginal JP in NSW and in his younger days was a patrol officer covering the whole of the South Coast.
The Marketer’s Story, the grower of beans, other vegetables and fruit, the employer of indigenous people and the itinerent swagmen in the early days of colonisation, who is now part of agribusiness and using modern technologies.
The Bloodstock Breeder’s Story – the horse enabled the settlers to colonise the land, to work and exploit it and then, as motor transport and road-making replaced animals, the horse became the focus of the new industries of pastimes, hobbies and gambling.
The Riverboatman’s Story –  using the inland waterways for the movement of settlers and their goods to and from the interior during the 19th Century. River navigation changes to farming fish and oysters and meeting the demands of tourists during the late 20th Century.
The Logger’s Story – set in the coupes of the State Forests, the struggle for sustained employment and a sustained ecology are fought out between timber workers, biologists and greenies.
The Grazier’s Story – the dairy for Sydney, and grower of beef, lamb and wool for the world.
As in the prototype, these stories would be examined from the different viewpoints of historical and contemporary polemics – see Content Threads Summary.

Indigenous People

A meeting with the Ulladulla Land Council and the Budamurra Aboriginal Corporation in Ulladulla at which the prototype was demonstrated has led to the Land Council expressing a desire to contribute stories to a full version. A non-exclusive licence to include these stories would be purchased. In addition, we are currently researching ways in which Budamurra could become the producers of the narrative sequences through a related customised training program designed with advice from Metro Screen, Sydney.
Should Budamurra not be in a position to produce all sound and image material then the project producers will provide copies of material collected during Budamurra sequences and the final production, to present to the community and to the archive of the AIATSIS library.
The aim of the project is to retain and develop the methods of consultation and collaboration with the Budamurra Aboriginal Corporation that has existed amongst the crew and copyright holders during the making of the prototype. This will preserve the integrity of stories licensed to the production, and their context, within the structure of the overall work.

Content Research and Copyright

The ‘content assets’ database has been established during Phase One of PathScape as a production and project management tool. It enables efficient storage of ‘raw’ material both images, sound, graphics and text, (complete with source and copyright information), prior to selection for useage. For the Sources level of PathScape, text is transferred directly from the database into the authoring tool. The tracking of potential rights payments to, and permissions from, copyright holders will be further developed in this sytem though it is envisaged that much of the historical material will be outside copyright. The active collaboration of scholars in the field, (Healy, Carter, Goodall et al), will also be sought.
Kathryn Wells, the researcher, will liase and develop with Budamurra, cultural protocols and a contract for the production or joint-production of suitable material based on the non-exclusive rights to stories. The project will respect Budamurra’s desire for overall product integrity and benefit, including in the process of production, the contribution of multimedia production knowledge and skills by crew members to Budamurra community members.

Interface Programming

Following adjustments and improvements to the programming from the first prototype (Sontel), further improvements to the second prototype (PathScape phase 1 – see attached report) are proposed:

• the ‘speed’ at which the image in the central screen moves will be made variable in relation to the position of the mouse cursor viz. starting movement slowly and then accelerating as the mouse is moved progressivily toward the edge of the screen. This will assist in the ‘capturing’ of margin images and the launching of narratives.

• a sample of sound from a narrative corresponding to a margin image ‘captured’ will be looped, before clicking to launch that narrative.

• margin images will indicate narrative options in the 360˚ morphing pans.

• the colour-coded buttons which offer options at the end of each narrative will ‘grow’ from dots to almost fill the screen – see images above and below.

Index Menu amendments will include:

•    fully functioning keyword index – each narrative sequence will have between 1- 5 keywords associated with it. After a keyword or phrase is selected, a list of Stories will be displayed which share that same word or expression – see illustration below. Stories can then be launched by clicking on an item in the list;

•    improvements to the appearance of screen layout, use of font and colour, design of scroll bars, print option, etc;

•    pre-scripted Web Search options may include the ability of the user to amend the recommended boolean search string;

•    the ability to print material from the Transcript screen – see illustration above.

Sound

A stereo track will be introduced to the sound design, running in conjunction with parallel mono tracks and ‘spot’ effects to provide a richer sound presence tied closely to zone character and ecology – volume level could also become relational to location within a Zone. Budamurra and Bruno Koenig the sound designer have both indicated a desire to collaborate closely in this respect. Stereo sound design will also considerably enhance the sense of place during the morphed pans, for instance.
In close collaboration with the interface programmer, sounds to indicate sequence transitions and to confirm option-taking, together with the creation of silent spaces will extend the dynamic operation of sound throughout experiencing the process of interaction.
Spoken Text

Describing the natural world, ie a geological setting, or relating the events of the past, ie the development of the timber industry, will be communicated through the use of the narrative form rather than that of the lecture as is current in both prototype versions. The Geologist’s Story, The Timber Worker’s, The Sawyer’s Story etc. will be related in the first person or as a dialogue between two or three people and will be broken into shorter sections – thus the user will have the option of discovering more about the off-screen character by using the ‘circles’ menu at the end of each story in addition to being directly accessible through the index/content section.  A greater variety of voices will read the prepared texts than was possible in the prototype, in order to create more surface and expression and convey the sense of many players who pass through the landscape.

Exhibition, Distribution & Marketing

The audience will have a broad set of interests in art, ecology, history, social relations, media study, communications theory, etc. They are most likely to be working in industries such as visual art, multimedia, education, conservation, tourism, government, publishing etc.

The complete version of PathScape is regarded as having several objectives for how different audiences may use it and thus has a full range of exhibition and marketing options available including:

•     installation in a gallery or museum with large screen projection and surround-sound to enrich the experiential aspect. Software design would enable migration to a high end platform to improve performance and image and sound quality and with further funding and research could enable a sensing system to be developed based on spatial zone proximity sensors or ultrasonics and thus remove the need for mouse control of the interface.

•     duplication and inclusion as part of an existing teaching kit for secondary and tertiary students or within the development of a new teaching kit.

•     distribution and marketing on-line and through Websites concerned with the related themes and issues – ecology, tourism, social history, indigenous affairs etc.

•     marketing as consumer item at point-of-sale in tourist retail areas.

Careful design of the software has also make it possible for different data sets of sound and image files to be substituted and thus enable other landscapes, (both the ‘real’ and the metaphorical), to be explored. Thus the project can in effect be used as a third-party software tool with full functionality enabling navigation of different kinds of ‘content’ – this would be achieved through a licensing arrangment and contracted management and programming services.

Summary of Interface Design Features in Prototype

Navigation in the prototype is centred on:

gesture, to control direction of travel through the landscape (Level One)

•     the movement forward with the mouse to cause the central image ‘go’ forward;
•     a movement back to centre to stop the image still;
•     a movement backwards, (with the cursor to the bottom of the screen) to ‘turn’ through 180˚ and look backwards;
•     a further movement backwards to make the central image ‘go’ backwards;
•     a movement to centre to stop the image still.

selection, to take a branch from the pathway to listen to a short story or music or watch a movie (Level Two)

•     a movement of the mouse to left (or right) to provide a 360˚ panorama of the zone through which the user is passing;
•      by ‘capturing’ one of the images that appear to the edges of the frame (by halting the movement on the central moving image that conceals their central section) and then clicking to launch a story.

options, to move deeper into a story (Levels Three and Four), or return to the pathway

•     the blue, yellow and green buttons colour code to correspond with Anecdotes, History and Commentary/Analysis – delivered as movies, slide shows, or audio with user control of picture framing – which develop, extend or provide background on what has gone before;
•     the red button moves back to the previous level. This ability is also possible at any point in a narrative sequence by dropping the mouse/cursor to the bottom of the screen. Time out periods likewise move back through levels to the Surf screensaver movie.

resources, that enable the material assembled for the interactive – the ‘content’ – to be accessed from the ‘backend’ (Level Five)

•     the black button provides access to data about picture and sound sources, story contents, transcripts, keyword index, Web search option and bibliography.

The various levels which the user can explore can be summarised with the following diagram:

Technical Development

The project has developed several innovative uses of the authoring software during Phase One of PathScape, including the use of .xml and .dcr files. The re-design of the folder layout and linkings has not only improved the handling of the interface but extended its ability to respond to input from the user – for instance, the desire to know about sources of material seen and heard and the ability to access material from a text-based content/indexing vector.

In addition, by thereby defining resources as external media, the project is provided with great flexibility when it comes to means of delivery. The interactive could besides functioning as a one-to-one CD-ROM in the home or classroom, could be re-purposed at modest cost to be accessed on the Web, with different components on seperate servers to improve performance. A forum or listserv function could be incorporated.

By developing its visual quality and user interface, ie spatial sensing, and utilising projection and multi channel sound, the immersive and experiential aspects could be amplified in a large open space or gallery, for the art and museum market. This could include, given its external media design, the migration of programming and content to high-end platforms.

Production methods will continue to develop more efficient means of assembling narrative sequences from component assets including a move away from resource hungry Quicktime movies made in labour intensive Premiere, towards rapid assembly in Director of more efficient .dcr file formats.

 

Hypermedia for Portable Video Players (PVP)

2007
Mike Leggett & Shigeki Amitani

Abstract: In this paper we propose the exploitation of high mobility portable battery operated Video Players (PVP) for the retrieval of video associated with the location in which it may be used. Reporting on an earlier interactive multimedia location-based prototype, we assess the possibilities for specific ontologies of a taxonomy of indexing procedure which avoids text-based retrieval methods, using instead the mnemonics of image association. We outline the proposed development of PVP firmware and a related user application enabling users to construct indexing procedures appropriate to their needs, using a metadesign approach,.

Keywords: hypermedia, video, video player, authoring

 

 

1   Introduction
The proposal emerges from current interdisciplinary research into machine memory as a context for understanding its relation to human memory and methods for storing and retrieving movie files. It proposes an approach to indexing audio-visual media utilising an ‘index-movie’ file as the taxonomy of the indexing procedure, to which is linked related movie files. An interactive experimental prototype, PathScape, has provided initial evaluation of the concept using a real-world time-space representation as the basis for indexing. Further practice-based research approaches to user-defined storage and retrieval systems for the video iPod and other PVPs as advanced portable video systems, will be described.
The proposal is for the PVP user to interactively navigate the linkages between movie files, either as an exploration of a creative maze, or as a means of recalling a particular series of operations, directions, sequences explained in pictures and sound, but under the direct and immediate control of the PVP user. This feature will enable complex data structures often represented visually – land surveys; mining topographies; design or biological sequences; architectural spaces; construction progress; cultural artifacts; etc – to be made accessible relationally rather than sequentially.
Whilst positioning of a pointer on a visible timeline provides instant access to a particular part of a movie in a conventional computer-based movie player, this is not an option in PVPs. However, the visibility of images during high speed spooling on a PVP, could assist locating entry points to a hyperlinked movies system utilising frame number metadata and mnemonics. An indexing approach of this kind implies special concerns in the design of such a system, for individual, specialised and public groupings and communities, for which metadesign approachs are being developed.
2. Navigation Principles
Interface design for multimedia databases has been the subject of investigation by earlier researchers for desk-based systems, though few have achieved avoiding the use of words or on-screen graphical devices to aid navigation. [1] [2] [3]. Experimental approaches by artists have included Twelve of My Favourite Things, effecting navigation using a touch screen over an image composite of three movies linked to other movies related by a colour selected on the screen.[4] In the late 1990s a website appeared documenting the Exeter Cathedral Vaulting: “There are two main routes into the material, Visual and Verbal. ….. The Visual route is for those who are more at ease with images than text.” The ceiling, built in the 14th Century, used the vaulting bosses as a mnemonic system related to the stories both sacred and profane, of an oral culture in the West Country of England of the time. The designers of the website echoed the memory system by using a plan of the vaulting and its bosses to access the database containing detailed photographs of each item together with several layers of metadata.[5] More recently the Digital Songlines project at the Australasian Centre for Interactive design in Queensland uses graphical representations familiar in game engines, to map the GIS data relevant to ‘country’ and cultural artefacts, related to an indigenous community.[5]
The principle of this taxonomy does not seek to index video libraries or collections, nor provide machine-based ‘importance sampling’. [7] The concept of detail-on-demand is a means of working with specific video material that avoids “…having to use a separate interface such as keyframes or a tree view”. [8] As a means of navigation it has been explored by others [3, 9, 10, 11] based on earlier experiments with video and hypermedia theory [12].
The central novelty of an approach to mnemonic movies indexing is to enable an accelerated usage of movie based data or information. The movie being watched will provide the link to the related movie(s), without the need to return to scroll a text-based index menu at the root. It will enable PVP users to engage interactively with videos using links to move from one movie to another according to relational rather than sequential connections.
These approaches overlap with the Greek oracists and rhetoricians, who before the alphabet had been handed down, developed an elaborate form of artificial memory, described so fully in Yates’ Art of Memory. Ars memoria, “…a series of loci or places. The commonest, though not the only type of mnemonic place system was the architectural type ….. We have to think of the ancient orator as moving in imagination through his memory building whilst he is making his speech, drawing from the memorised places the images he has placed on them.” [13] It could be claimed the first movies were a conceptual model made by the Greek rhetoricians, complete with wide shots, tracking shots, panning, tilts, close-ups and flashbacks. Played in the cinema of the mind’s eye, the first ‘classic film narrative’ guided his oratory from theme to theme, detail to detail, by associating each element of the speech with the loci and the objects placed there and visible only to him.
2.1   Pathscape
Our familiarity with cinema and the reading of Cartesian spatial representation is exploited in the PathScape prototype system. It explores through demonstration, a means for augmenting human memory for the purposes of storing and retrieving movie files. The detail-on-demand principle employed however, has no overarching narrative, but a series of interactive option prompts. These access movie files in the system using a taxonomy based on fragmentary images, sounds, colours and shapes. The ‘index-movie’ file (I-MF) produces apparent motion in a central image for forward direction along an X-Y axis, perceived as a movement ‘into’ the cinematic space recorded, a landscape.

Figure 1: Screen images
The movement is controlled by gesture, using a mouse in the prototype (Figure 1 & 2) to ‘move’ towards point X accessing file I-MFX; by gesturing to the central image, movement ceases; gesturing to the bottom of the screen instantly loads I-MFY movie file, swinging the image through 180˚ to return along the path previously followed towards point Y.
Figure 2: Screen area images and Cursor gesture outcomes
The taxonomy of the Path which the user traverses is ordered by three indexical devices. Two are located in the border area that surrounds a central image. The first level of indexing is within this border and seen at particular points as fragments of images, visible for short durations. These indicate a nodal junction which, when ‘captured’ by using gesture to halt movement in the central image, will enable with a click, the launch of a movie and associated sound from the database, replacing the central image movie of movement along the path.
Thus along the X-Y axis are the 1, 2, 3, …. 4, 5 etc interactive options, ‘narrative branch nodes’, which in effect are groups of movie keyframes representing a loci or location linked to an associated movie file. (Figure 3)
The second device uses changes in background colour in the border area and background sound to signify changes of zone. (In this prototype different colours represent different ecological zones). When a colour is visible in the border, gesturing to the left or right of the screen will launch the movie of a 360˚ panning movement of the landscape, (Figure 1 & 2) a movie representation of the zone through which the user is currently ‘passing’ –  gesturing to the right will pan right, to the left will pan left : AA, BB, CC … FF. (Figure 2 & 3) Within the pan will be ‘found’ further narrative branch nodes from where to launch movies set during the authoring process, associating each movie with the visible appearance of each locale.
Figure 3: Schematic for accessing database
At the completion of a narrative, the third indexical device appears as a series of circle shapes that appear over the final frame of the movie. Blue, yellow and brown and green circles function as ‘buttons’ to linked topics, colour coded to symbolically represent a narrowing of the index path from the broad to the specific. [14, 15]
The encounter in this prototype enables the user to orientate within a given topography in a way not dissimilar to a regular route followed in the country or the city. Similarly, interaction with the surroundings reveals hidden evidence, concealed information and comment, delivered as stories, as samples of discrete information enabling the interacting subject to put together knowledge of this place through information gathered. The interactive process is not through query structures addressed to a database, but as embodying gestures, using the relational terms, “more, same, less” within the interface of mnemonic cues to linked movie files. The experience is a procedure of constructing meaning through familiarity as part of a gathering process that adds to the individual’s knowledge base accumulated during this and subsequent visits.
2.2   Prototype Outcomes
The prototype explored the means and the cinematic syntax of creating a multi-layered representation of the landscape, through time as well as space. As a multi-voiced ‘interactive documentary’ over which the visitor has agency to ‘move’, to be able to order the stories and the depth of detail which could be retrieved in the prototype, revealed four main areas of response:
visitors who wholly embraced the visual and navigational experience together with the knowledge building process;
visitors who wholly embraced the experience without much concern for the documentary and informational aspects;
visitors for whom the knowledge acquired was unacceptable and without authority or specificity;
visitors who resisted the responsibilities of interactive engagement.
The prototype demonstrated a wide range of responses from users but most acknowledged the novelty and applicability of the approach to a field of their interest. This indicated to us the need to develop an authoring tool that would enable individuals and groups to design their own system for linking their movies.
2.3   Video Acquisition
The prototype was completed in 2000 and since that time the video data stream has become more ubiquitious. Whether generated by a digital video handycam, mobile-phone, a web-based stream or download, optical media and broadcast television and video-on-demand databases, an ever increasing amount of digital media images and sounds need to be managed, whether for professional or recreational purposes. The PVP is an affordance for making use of the video data stream in a variety of ways in a range of ontological contexts.
2.4   Navigating the PVP
Codecs for video files and devices to handle them in creatively useful ways have developed exponentially. The Apple video iPod for instance, can store up to 3 hours of video playback and delivers high quality video using several codecs, 320 x 240 pixels at 30 frames per second with stereo audio. Interacting with the device is through gesture related to the navigational principles used in the Pathscape prototype (Figure 2) mapped to the device front panel (Click Wheel, Figure 4):

Figure 4: Click Wheel navigation controller on PVP
A simplified mapping, based upon the ‘stories in a landscape’ approach, will achieve similar outcomes (Figure 5):

Figure 5: Click Wheel mapped functions
3   Metadesign and Authoring Principles
The use of consumer technology for productive as well as recreational purposes requires an adaptable design approach to the authoring process. Fischer and Giaccardi have shown that metadesign serves the interests primarily of the community of practice (CoP), the consumers, where the community of interest (CoI) are able to provide expert input to a complex design problem. Metadesign gathers potential from these convergences and becomes “…an emerging conceptual framework aimed at defining and creating social and technical infrastructures in which new forms of collaborative design can take place.” [16] The metadesigner as CoI, in working with the CoE could advise in establishing a consistent (or even ideosyncratic) relationality for a specific collection of video files by advising on syntax, ‘a connected order or system of things’, [17] within an image-based indexing system.
In the context of using a modified consumer device to interactively produce outcomes based on relational rather than sequential ordering, it is important that the authoring principle of syntax to be applied in each design and authoring process is determined. The authoring tool framework can then be applied to set the coordinates for the hyperlinking Node() governing the navigation options. Thus the design task can be seen to deal with mnemonic cues as much as the normally associated temporal aspects of ‘editing’ film or video, (though duration will be part of that decision-making process).
We propose two approaches to the design of the system. The first, On-Board Authoring is effected on the device itself and is capable of setting very basic relationships between the (suitably compressed) movie files uploaded to the PVP. The second, Off-Board Authoring, is more generic and involves an application external to the device on which the files and their relationships are established using drag and drop procedures before upload to the PVP.
As APIs for iPod are not publicised, we have developed a simulation to indicate how users of iPod or similar PVPs could author and navigate movies. The system was modelled with Java v.1.4.2 on Mac OS X.
3.1   On-Board Authoring
As the PVP has a limited interface, the authoring operations need to be simple and incorporated within the device’s firmware. The prototype model has the following basic functions: (1) selecting; and (2) marking the related movies. The authoring operation is:

1.Select a file to use as the “IndexMovie”. (Figure 6)

     
          
Figure 6: Choose movie. Figure 7: Play movie.
2. Play >|| (Figure 7)
3. Push >|| to Pause to stop the movie at the point a link is to be created.

4. Push “Menu” to see the movie list, and Select another file to link to;
5. Push ‘Enter’ to set the link, Node(), and return to Index movie.

6. Play >|| to continue
7. Repeat steps 3 – 6 to create additional movie links.
In this simulation the indexing information is stored as a simple text file recording movie file name and frame number from the IndexMovie for the PVP to reference during use. When IndexMovie is played a small arrowhead in the corner of the frame appears for two seconds to indicate where a linked movie can be played by pressing Enter. Otherwise the movie runs, (at fast speed if desired, in either direction, as is standard on PVPs), until the next required indicator is reached. The function of the indicator becomes redundant as the user becomes familiar with ‘incidents’, or specific images on the movie. Operating as mnemonics these enable the  user to recall and so launch, the hyperlinked movie connected to Node() in the IndexMovie.
3.2   Out-Board Authoring
An out-board approach to authoring provides greater flexibility for linking, even to the extent of ‘cascading’ related movies without using one file as the key indexing file such as the Hyper-Hitchcock project have demonstrated. [8] The more recently demonstrated HyVal system uses authoring visualisation of video objects, metadata and the overall hypermedia document as parts of an Editor tool. Shot detection algorithms effect a semi-automatic function, giving it great potential for working quickly with large video file collections or through using search engine routines. [18]
The out-board authoring we propose for the PVP would employ a timeline similar to existing video editing applications, such as iMovie, as the receptor for linking the metadata associated with the linking options – a sprite dragged to position provides a pop-up window into which the linked movie thumbnail is dragged and dropped from the movie clip viewer. Following playback in the editing tool, adjustments and changes can be more easily effected than within the PVP itself.
4   Applications
Video acquired from many sources can be indexed using visual, non text-based protocols, determined by the individual, group or corporation, at a level of complexity appropriate to the ontological context or immediate application. Practical applications would be characterised through a need for dynamic non-linear navigation of movies, represent pedagogical issues for instance, or research data, media production study or methods, visualisation of spatial or temporal dimension etc. For example:
as a user-centred product design / protocol analysis / software architecture analysis aid, the PVP becomes a mobile research tool;
explaining the life-cycle of the frog, at various points in the tadpoles development, the PVP as personal teacher is able to show the detail of a specific moment in that development;
the PVP as personal electronic tour guide enables the visitor to a place to determine, as with museum audio guides, at what point in a tour more detail is required;
for the redevelopment of a city area the PVP becomes a planning tool, capable of integrating video-based data with the location in which the data was gathered, at which it is later referred;
as the recreational device for which it was intended, the Singer Not the Song option will have the user command the iPod view behind the scenes of the recording session and concert footage.
In the creative space of a classroom, the PVP as a teaching tool in the context of its well promoted use as an entertainment and recreational device will be promoted, in conjunction with an authoring tool, as a valuable learning system, engaging critical and creative assets amongst the student body.
5   Discussion
PVPs are ‘hard-wired’ devices with no facility at present for dynamic linking of the indexing movie(s) to external databases. Navigable media spaces of the kind described in which individual files can be accessed and / or updated from more centralised media resources and databases, become a ‘soft-wired’ installation possibility, using the appropriate protocols.
The user of the ‘mnemonic movie’ option on the PVP is also the designer. Design principles in each case will be approached according to the domain in which it will be employed. As a commercially marketable entity such as a music-based package, the design of the ‘bundle of files’ will reflect the ‘culture of connections’ of the target group. For a town planner, collecting data and compiling on-the-fly for examination by other stakeholders, the design approach will be different again. For the artist, hyperlinking will reflect a different set of issues to be explored by the interacting audience, as the mobility of the device enables the city or country environs to be used as the exhibition gallery.
6   Conclusion
PathScape, an experimental interactive prototype, provided initial opportunity to evaluate the concept of indexing audio-visual media utilising a real-world time-space representation as the taxonomy of the indexing procedure. We propose a system for the PVP user to interactively navigate the linkages between movie files as a means of recalling a particular series of operations, directions, sequences explained in pictures and sound, but under the direct and immediate control of the video iPod or other PVP. The feature will enable complex data structures often represented visually – land surveys; mining topographies; design or biological sequences; architectural spaces; construction progress; cultural artifacts; etc – to be made accessible relationally rather than sequentially.
The contemporary burgeoning usage of the video data stream, whether generated by a digital video handycam, mobile-phone, a web-based stream or download, optical media and broadcast television and video-on-demand databases, determines an ever increasing amount of digital media images and sounds to be managed, whether for professional or recreational purposes.
We have proposed two practice-based research approaches to authoring suitably prepared digital video files, either on-board the PVP or off-board such that the hyperlinked prepared files are uploaded to the device for use ‘in the field’ of management and development professionals, or in the more familiar recreational ways for which the PVP is enjoyed.
References
[1]  Bolt, R. ‘Put That There’ Voice and Gesture at the Graphics Interface, (1980) Computer Graphics 4 (3) 262-270
[2]  Davenport, G. and e. al, Jerome B. Wiesner, 1915-1994: A Random Walk through the 20th Century. 1994. Accessed: 1.2.04 http://ic.media.mit.edu/projects/JBW/
[3]  Naimark, M. Place Runs Deep: Virtuality, Place and Indigenousness. in Virtual Museums Symposium. 1998. Salzburg, Austria: ARCH Foundation.
[4]  Hales, C., Portfolio Accessed 1.2.2006 from http://www.smartlabcentre.com/4people/coreres/chales.htm.
[5]  Henry, A. and A. Hulbert, Exeter Cathedral Keystones and Carvings. 1998. Accessed 1.9.04 from http://hds.essex.ac.uk/exetercath/
[6]  Leavy, B., Digital Songlines, Jones, J. Editor. 2004, Australasian Centre for Interaction Design, QUT: Brisbane.
[7]  Gatica-Perez, D. and M.-T. Sun. Linking Objects in Videos by Importance Sampling. in ICME’02 IEEE International Conference on Multimedia and Expo. 2002: IEEE.
[8]  Shipman, F., A. Girgensohn, and L. Wilcox. Hyper-Hitchcock: towards the Easy Authoring of interactive Video. in Interact 2003.
[9]  Tua, R. From Hyper-film to Hyper-web. in Electronic Imaging and the Visual Arts: EVA 2002. Florence.
[10]  Girgensohn, A., F. Shipman, and L. Wilcox. Hyper-Hitchcock: Authoring Interactive Videos and Generating Interactive Summaries. in MM’03. 2003. Berkeley, Ca.: ACM.
[11]  Girgensohn, A., et al. Designing Affordances for the Navigation of Detail-on-Demand Hypervideo. in ACM Advanced Visual Interfaces. 2004.
[12]  Tolva, J., MediaLoom: an Interactive Authoring Tool for Hypervideo. 1998, Georgia Tech: Atlanta. http://www.mindspring.com/~jntolva/medialoom/. Accessed 1.3.2006
[13]  Yates, F.A., The Art of Memory. (1992 ed) 1966: Pimlico, London.
[14]  Leggett, M. Losers and Finders: Indexing Audio-visual Digital Media. in Creativity & Cognition Conference 2005. Goldsmiths College London: ACM.
[15]  Leggett, M., Indexing Audio-visual Digital Media: the PathScape prototype, in Scan. 2005, Macquarie University: Macquarie University, Sydney. http://scan.net.au/scan/journal/index.php. Accessed 1.11.04
[16]  Fischer, G. and E. Giaccardi, Meta-Design: a Framework for the Future of End-user development, in End User Development, H. Lieberman, F. Paterno, and V. Wulf, Editors. 2004, Kluwer Academic Publishers: Dortrecht.
[17]  OED, Oxford English Dictionery. 2004.
[18]  Zhou, T. A Structured Document Model for Authoring Video-based Hypermedia. in Proceedings of the 11th International Multimedia Modelling Conference (MMM’05). 2005. Deakin University, Melbourne: IEEE Computer Society

Losers and Finders : Indexing Audio-Visual Digital Media

2005
Mike Leggett

ABSTRACT : The contemporary burgeoning usage of digital movies, photos, audio and text, their distribution through networks both electronic and physical will be considered in the context of a convergence of these media with a popular interest in personal and community history and identity.
The paper introduces interdisciplinary research into human memory as a context for understanding its relation to machine memory and methods of storing and retrieval. It proposes an approach to indexing audio-visual media utilising a time-space representational system, drawing upon a real-world time-space representation as the taxonomy of the indexing procedure.
An interactive experimental prototype, PathScape, will be described and evaluated and further practice-based research approaches to author-defined storage and retrieval systems will be outlined.
Author Keywords
Interactive, digital media, taxonomy, index.
ACM Classification Keywords
H5.2 User interfaces: user-centred design.

 

 

 

INTRODUCTION
Storage of artefacts is far easier than finding them again, as any dog will tell you. (Anon)
This paper outlines some research that seeks to develop tools for storing and retrieving audio-visual digital media. The design of the system will need to accommodate the needs of the ‘memory worker’, whether as an individual, or part of a closed or open working group.
The contemporary burgeoning usage of digital movies, photos, audio and text, their distribution through networks both electronic and physical will be considered in the context of a convergence of these media with a popular engagement with personal and community history and identity.
Interdisciplinary research into mind and memory, perception and cognition, presence and embodiment, media representation, creativeness and meaning, will provide a context for understanding this approach to investigating machine memory. A short survey of methods of storage and retrieval of audio-visual digital media will provide the background for the further development of an existing experimental prototype.
DESPERATELY SEEKING….
“Memory is a label for a diverse set of cognitive capacities by which humans and perhaps other animals retain information and reconstruct past experiences, usually for present purposes.”(30)
Lansdale and Edmonds in a 1992 study investigated the design of document filing systems by developing a prototype, MEMOIRS, that treated “…documents as a particular form of event memory”, referring to it as episodic memory. (20) Sutton describes episodic memory as “personal memory for past events and experiences accompanied …by a feeling of familiarity and a reflective awareness of having had the experiences in the personal past.” (30) Semantic memory delivers to us facts derived of the world – Freud died in London – knowledge by association. As the MEMOIRS project observed: “It is enough that the distinction between episodic and semantic memory throws into perspective an approach to the design of filing systems based upon event memory as opposed to the associative relations between items.” (21)
With interest in and the relevance of the field increasing, interdisciplinary memory research is becoming increasingly recognised and valued (17). As Sutton points out, “It’s no accident that memory is at the heart of recent work on dynamical cognition and the embodied, embedded and extended mind…” and that the “…brain and world are often engaged in an ongoing interactive dance through which adaptive action results.” (30)
Interacting with external memory machines such as collections and libraries of knowledge located on computer servers around the globe are central to academic pursuit and increasingly, the education and edutainment of the population. The machine-based memory industries that specialise in servicing this demand by storing data and knowing how to retrieve it again, are moving away from notions of information retrieval and database management towards information gathering, seeking, filtering and visualisation. (29)
DIGITAL SHOEBOXES
Another computer-based industry, growing annually, is digital video. (Notes 1). Disseminated by cable, broadcast, the internet and more recently the mobile phone into the home and the workplace, audio-visual media is ubiquitous (Notes 2) and will increasingly become the format of document that will need an advanced design of filing system. Digital media can be used simply to document an object or the appearances of an occasion, but it is also expressive. In the hands of a trusted author, (or authors), visual media can inform us and reflect us in ways of which we are often unaware. Many of us have the option to gather these images, as photos, as video, as sound. In making images as records of the passing moment, we are able to display our appearance, our presence, often instantaneously, in a place, of a time. But having made the record and following its initial consumption, what then happens to the artefact? “..there has been very little research attention given to how people organize and browse their photo collections, whether digital or non-digital.” (Notes 3) (28)
As collective or personal memory decays, whether a corporate memory or a family memory, the connectedness of events to the media artefact fade and the narrative thread is disrupted. The significance of the memory, the meaning of the image even, can be lost.
‘Episodic memory’ or personal memory is discussed by philosophers at length. Like semantic memory, episodic memory is declarative memory which sets out to represent the world, usually with the aim of truthfulness (30). Epistemologies of representational systems are debated between interdisciplinary researchers working in the fields of philosophy, cognition, perception, cultural theory and semiotics:
“Signs represent the present in its absence; they take the place of the present … when the present does not present itself, then we signify, we go through the detour of signs.“(11)
The notion of ‘memory traces’ and representations for and of recall, while remaining contested ground, form the basis of memory storage and retrieval devices, from the dictionary to the encyclopedia, from the diary to the snapshot. Autobiographical and personal memory can be prompted by what Tulving terms “synergistic ecphory” (32) whereby the emotion or the memory is evoked or revived by means of a stimulus (27). Often aided by the context of the recall, a writer for instance, through placement of artefacts or words in spatial relationship can create the circumstances which connect with the narrative (of a memory trace, event, object etc). We are not unfamiliar with the use of postcards and palm cards or scraps of paper placed around the room as a way of organising complex sources in the process of synthesising thoughts and events into fresh formulations. (Notes 4)  
Within the repositories of collected memory, in large public collections for instance, the stimulus relies on a common rather than private language of signs, most often expressed in a word index form.
INDEXING OPTIONS
“Indexing is a way to increase retrieval precision and accuracy by consistent application of subject terms in their preferred forms. … A taxonomy is a controlled vocabulary presented in an outline view, also called a classified view or hierarchy. Terms are organized in categories reflecting general concepts (Top Terms), major groups (Broader Terms), and more specific concepts (Narrower Terms). The final terms at the end of a branch, often called nodes, can represent any specific instance of a Broader Term, including terms from an authority file of people, organizations, places, or things.” (7)
A taxonomy of indexing enables an overview of the topography of the system, by reducing scale and quantity to proportions that can be comprehended, particularly by new or inexperienced users. In many ways ideal for text-based data such as large ICT parallel database systems (31), such to approach audio-visual data based upon word interpretation is constraining, useful only when words in documents need to be illustrated. On-line picture libraries use keywords associated with location, subject, artist, colour, date, owner etc. – the AHDS Data Service Visual Art (2) image resources site is an example of this tradition as are many photographic archives and stock-shot libraries. Whilst a word index is admirable for locating traces within written language sources, “..keyword searching is a crude and unsatisfactory method for sampling the information content of complex sources….” such as media collections. (8) Likewise, seeking images on the web with a search engine is similarly hit and miss, having to double guess a file name or location descriptor or other aspects of the meta-data, if present.
Glorianna Davenport is one of a group of researchers who have developed approaches to storing and retrieving the complex nuances of the audio-visual artefact within machine-memory database systems. One of these was developed by a research team in the Media Lab at MIT during the mid-90s, ‘Jerome B. Wiesner, 1915-1994: A Random Walk through the 20th Century’ (9) By monitoring the users initial selection subsequent options are reorganised to cluster related topics, using a combination of image and words and re-shuffling their relative positioning on the screen. Each thumbnail image is able to operate as an iconographic link to play the archival media material.
At about the same time, a British artist, Chris Hales made ‘Twelve of My Favourite Things’, an interactive diaristic installation, accessed using a touch screen. A composite of three Quicktime movies, through interaction with ‘hot spots’ based on visible colour zones, movies narrating the world of some young children recorded talking about their favourite colours, places and people, replace one another within the composite on the screen. Contained in scope and size by the technology of the time, the work was an early model of how it could be possible to navigate a series of recollections using wholly visual means. (15) Hale’s overall project to develop an interactive cinema based on these indexing principles has currently reached fourteen iterations of the touch screen-based model.
Research projects seeking industrial objectives, visual indexing systems for the television and cable industries, have included the IBM CueVideo research project. The project measured the productiveness of automated indexing, browsing and retrieval based on different means of summarising digital video using keyframe storage, and accelerated sound reproduction employing audio processing TSM technology. (1) Whilst the taxonomy is text-based, the final indexing stage which locates sequence or shot, is a audio and/or visual abbreviation of content, of relevance to our current concerns. (Notes 5)
Well established software tools, such as ArcView, are related to topography, recorded time and place, and are widely used in industries related to environmental planning, water and land management, urban layout, national parks, mining and agriculture, etc. These are specialised tool sets based on data derived from various methods of measurement. GIS satellite data and a range of plug-ins to the system enable digital images, sound and text files to be attached to specific coordinates. This provides extensive profiles to be constructed and navigated in real-time from numerical data using graphical and map visualisations. Such tools have been adapted by archaeologists and social scientists. In the west of Sydney, the NSW Migrant Heritage Centre has commissioned a website [13] using a application called TimeMap that links a combination of text and map metaphors with personal oral histories and localities around the City of Fairfield in western Sydney.
These tools offer a plethora of styles and codes that incorporate maps, diagrams, graphical and typographic devices, each inflected with current tools and fashions in interface design. The Fairfield project takes an approach closely related to the archaeologist’s inventory, making it possible to store and retrieve data about the past, but making the oral and written evidence useful for archaeologists and educationalists but uninvolving and distant as an experience for individuals in the community.
LOCI SYSTEMS
The Greeks oracists and rhetoricians, who before the alphabet had been handed down, developed an elaborate form of artificial memory, described so fully in Yates’ Art of Memory. Ars memoria, “…a series of loci or places. The commonest, though not the only type of mnemonic place system was the architectural type ….. We have to think of the ancient orator as moving in imagination through his memory building whilst he is making his speech, drawing from the memorised places the images he has placed on them.” (34) It could be claimed the first movies were a conceptual model made by the Greek rhetoricians, complete with wide shots, tracking shots, panning, tilts, close-ups and flashbacks, all played in the cinema of the mind’s eye, the first ‘classic film narrative’.
In a 2002 edition of Nature Neuroscience, a study included a range of tests carried out on people who were highly ranked in the World Memory Championships. Whilst their brain capacity and structure was determined to be average it, it was found with functional magnetic resonance scanning (fMRI) that the regions associated with navigation and memory were more active than in a control group attempting the same memory tasks. The contestants confirmed that they used a strategy called the ‘method of loci’ in which the objects to be remembered were placed along an imaginary pathway that could be retraced when recalling the items in order. “The longevity and success of the method of loci in particular may point to a natural human proclivity to use spatial context – and its instantiation in the right hippocampus – as one of the most effective means to learn and recall information” (23)
In this, the age of the rhizome (10), linearity need not structure thought within the confines of logic and rhetoric. In the same way as the walk from home to the station may allow interventions of the everyday to structure the day itself, even enhanced by the imprecision of the visual cues that guide us during the walk, then too the invention or re-invention of a visual literacy based on digital video and ‘machine memory’ technologies, would enable us (with the happenstance of chance encounter), to employ indexing and classification appropriate to the task in hand.
An experiment in the late 1970s by the Architectural Machine group at MIT, ‘Aspen Walk’, linked two video disc players with a computer system. By interacting with a touch screen display, the viewer could navigate the image of a drive around the town of Aspen, determining as each crossroad approached on the video screen whether to turn left or right or to proceed forward. With an appropriate touch, the video would be cued to change the image correspondingly. (25) Our familiarity with the visual cues of the urban landscape and of the principles of physical movement through linking streets, enable us in the machine version to navigate, cognitively, the visual system representing the physical layout of the town.
Criss-crossing the virtual town would enable us to gradually install in memory at first the main features of place and their relation to other features and the grid of the streets. Later as our familiarity increases, then the ‘bird’s eye view’ could be constructed in the mind at the moment it becomes necessary to reckon the most direct route between two points in the town. Such a process of conceptualizing would be similar whether in front of the representational system or within the town itself.
This begins to illustrate the complex way in which physiology, mind, agency and artefacts can interact to inform action, the outcomes of which can cause physical passage through a space as well as further updates from the system of representation.
“Clark (5,6) and Hutchins (18)… and others, have argued that just as basic forms of real-world success turn on the interplay between neural, bodily and environmental factors, so advanced cognition turns – in crucial respects – upon the complex interplay between individual reason, artifact and culture. …The external environment, actively structured by us, becomes a source of  cognition-enhancing ‘wideware’: external items (devices, media, notations) that scaffold and complement (but typically do not replicate) biological modes of computation and processing, creating extended cognitive systems whose computational profiles are quite different from those of the naked brain. Hutchins for example, gives a wonderful  and detailed account of the way multiple biological brains, tools (such as sextants and  alidades), and media (such as maps and charts) combine to make possible the act of ship navigation.” (5)
A final example of memory systems based on loci is the Exeter Cathedral ceiling website. Here the narrative of a learnéd treatise, the index of a catalogue and a graphical map of the ceiling are each linked to pictorial details of the magnificently restored ceiling of the structure. However, the authors are quite upfront: “There are two main routes into the material, Visual and Verbal. ….. The Verbal route is for those who are more at ease with text than images.” There is an elegance and appropriateness in the visual component of the site in associating a contemporary on-line database design with a medieval equivalent – the vaulting and keystones in a 700 year old cathedral. These are pathways and nodes that actual store 15th Century arcane and local knowledge using, like their modern counterpart, visual coding and systematic method. (16)
PATHSCAPE
An interactive multimedia prototype of PathScape was developed in 1999/2000 with a small team of which I was project leader, in association with the Australian Film Commission. The prototype has an interface and navigation system giving access to ‘narratives’ by their association with a specific place or location or series of locations.
The taxonomy is represented with images of contiguous cinematic space – individual photo images are pixilated to produce apparent motion in a forward direction, perceived as a movement ‘into’ the space recorded, a landscape. The movement is achieved by gesture, using a mouse in the prototype. (Figure 1)

Figure 1: Screen Cursor Areas and Gesture Outcomes
The taxonomy of the Path is ordered sequentially by three indexical devices. These are located in the border area that surrounds the central image of movement along the Path. Within this border are seen at various points, fragments of images, visible for short durations. These indicate a nodal junction which, when ‘captured’ by halting all apparent forward movement, enable with a click the launch of a movie to replace the image and sound of the Path. Thus along an X-Y axis are the 1, 2, 3, …. 8, 9 etc options, or loci ‘in’ which are stored the ‘narratives’. (Figure 2)

Figure 2: Schematic for accessing image/sound database
The second device is changes in background colour to the border and background sound, signifying changes of zones. (Differences in ecology along the Path in this prototype). In Figure 2, along the X-Y axis are the AA, BB, CC …. FF etc axes. By gesturing to the left of the screen (or to the right) will launch a 360˚ panning movement, a movie representation of the zone through which the user is currently ‘passing’ – to the right will pan right, to the left will pan left. Within the pan will be ‘found’ further nodes to launch movies storing more narratives.
At the completion of a narrative, the third indexical device appears as a series of circle shapes that appear over the final frame of the movie. (Figure 3) Blue, yellow and brown and green circles function as ‘buttons’ to linked topics colour coded to symbolically represent a broad sort (in this prototype) under the descriptors: Anecdotes, Historical Context, Commentary and Analysis. Each option extends and develops the background of what has gone before, in effect narrowing the index path to the specific, reducing from the broad.
FURTHER DEVELOPMENT
Following demonstrations to several groups of the initial prototype and receiving anecdotal responses and with the limited resources left to the development project at that stage, it was decided to implement a text-based component to PathScape. This would not compromise the initial intention of devising a visually-based indexing system as the choice to use text would be clearly indicated and separated from the visual path.

Figure 3: Screen grab within a narrative branch, with colour-coded circles.
The grey/black circles on the screen that sit behind each of the coloured circles are the route through to the traditional text-based index – the text information sits in the shadow, as it appeared, of its iteration as a movie. The text is organised sequentially as a series of ‘browser pages’ gathered, utilising XML protocols, from the Sources database of content, specifying:
For each narrative: Sound; Picture; Transcript; Keywords; Web Search;
For the whole prototype: More Stories (as a Table of Contents – the narratives – with the frame numbers of the Path movie listed against each item, from which the narratives could be launched); Keyword Index
The user in the prototype therefore has a choice – to navigate the index by using images and sounds, or by using words, or a mixture of both. The probable usefulness of the feature in an educational context was also noted.
FURTHER RESEARCH
PathScape is a project progressing through several stages and adopting several iterative forms. It could be delivered on disc (CD or DVD) or via the internet or broadband cable or conceivably, as it uses XML protocols, via a PDA or mobile phone. The software framework is dynamic, rebuilding the database interface at each launch. With the further research into the development of appropriate interfaces that help the author(s) define the ontology and epistemology of personal and collective memory, the PathScape paradigm will examine models for placing and retrieving audio-visual digital media artefacts.
At a later stage it may be appropriate to consider meta-design as an approach to developing the tool further. Fischer describes “…a fundamental objective of meta-design is to create socio-technical environment that empower users to engage in informed participation rather than being restricted to the use of existing systems.”(14) In such an event, this representational system will be open to invention by its author(s) through the placement of appropriate media into the chosen taxonomic indexing system. Different modes of taxonomic representation could be suggested in such a scenario to provide ways of thinking about the representation of memory.
RESEARCH BACKGROUND
Lansdale, Scrivener and Woodcock have shown that “useful theories of spatial memory can be developed of general utility in the design of pictorial databases” but that “…the specificity of task domain and visual material is more likely to dictate issues of design than is any generic theory of visual cognition.”(21) The prototype of PathScape is a specific model using the familiar figure of a landscape into which we walk and from which we can return as a paradigm with which to address this conclusion. Like many of the aspects of contemporary interface design, the various devices and indexing systems could become options at application launch, easily switched on or off by the user, helping the user to define for themselves, the interface with which they felt most comfortable and productive.
Though setting out to be a storage system for movies and narratives rather than just pictures, the direction indicated by Lansdale, Scrivener and Woodcock’s research into designing a system is in the same area as more recent thoughts by Clark about “…the challenge of tractable search and recall given an extremely large database.” (4) Though an interactive system may ameliorate the apparent size of a digital media database, at some point the ‘visitor’ to such a system will want tools to enable a meaningful encounter with it.
In addressing the problems associated with other ‘unknowable’ database resources like the web, Clark describes Kleinberg’s procedure, “..which exploits information implicit in the links between pages so as to identify patterns of connectivity indicative of ‘authoritative sources.’” Recent work on this approach to “…information-about-information (or second-order information) implicit in the link structures…” may be of value in creating “…a useful, low dimensional reflection of the high dimensional knowledge-space.”(4) A taxonomy based on making visible connections between locations of knowledge or evidence, whether on the unordered space of the internet or the more ordered (but possibly idiosyncratic) space of an artificial topography, provides the visitor to the system with some shapes, some vectors to move within at the outset.
The appeal is to the users knowledge and experience of moving through three-dimensional space, in the urban or rural setting and its remediation as an artificial topography. Encountering a range of spaces in representational form (loci) that engender in the user a sense of a favoured space raises the issue of motivation, particularly for the visitor to the system, or one who is not familiar with it. A ‘low dimensional reflection’ of this kind will at very least be a means by which the scale of the database and its contents can be comprehended. But the registering of presence of the user both in the space of the system and the images and sounds it can retrieve, and within the physical space the system stands, together will provide reassurance and encouragement to interact, to explore and to respond to and move through what is retrieved.
Mantovani and Riva, building on the work of Zahoric and Jenison (1998) through Heidegger and J. Gibson, proposed an ‘ecological approach’ to establishing a relational presence. Like Kleinberg’s ‘second-order information’, this is based on resources not being the ‘properties of either object or subject, but of their relation’(23). Gibson’s image of a tree in the middle of a field on a summer’s day being only an ‘affordance’ to those who seek its cool shade being an illustration of ‘resources, which are only revealed to those who seek them’. Mantovani & Riva go on to amplify this distinction with the argument that presence is a social construction “mediated by both physical and conceptual tools which belong to a given culture” in which there is “the emphasis of ecological approach on the primacy of action on mere perception” and that “action is not undertaken by isolated individuals but by members of a community. …. Ultimately, there are only two elements which guarantee presence: a cultural framework and the possibility of negotiation of both actions and their meaning”. (23)
This tends to support work developed a decade previously by R.S.Lazarus under the heading Cognitive-Relational Emotion Theory which set out to propose
“..that emotions work through a set of interdependent systems including processes for cognitive appraisal, physical interaction between person and environment, coping, and emotional response itself.” (19).
Discourse around the term embodiment has ventilated many of these concerns about presence. Dourish giving central place to Merleau-Ponty captures “…a sense of ‘phenomenological presence’, the way that a variety of interactive phenomena arise from a direct and engaged participation in the world [which] includes both physically realized and socially situated phenomena…”  Meaning and meaningfulness “…is to be found in the way in which it reveals itself to us as being available for our actions. It is only through those actions, and the possibility for actions that the world affords us, that we can come to find the world, in both its physical and social manifestations, meaningful.” (12) (Author’s emphasis)
CONCLUSION
“One of the most basic principles of plot construction is that the remembered ‘I’ traces a continuous spatio-temporal route through all the narratives of memory, a route continuous with the present and future location of the remembering subject. … This principle imposes a kind of unity on all the narratives; …” (3).
The narrative that I conclude here has briefly discussed, if not imposed unity upon, the interdisciplinary nature of the Pathscape project. Thinking about ways in which a system may be further developed has unavoidably caused me to consider the often separated disciplines that are the study of mind and memory, perception and cognition, presence and embodiment, media representation, creativeness and meaning. I am not forgetting the Machine and the interdisciplinarity of connecting with another or others through computational complexity and its magnetic appeal.
Increasingly in the contemporary context of tools like the Macintosh lifestyle suite iLife, we can anticipate if not fewer words, then a lot more images to be digitally authored and then consigned to data media, before being finally consigned to the bottoms of drawers for a want of a means of retrieving their autobiographical or historical significance. PathScape and similar projects set out to extend the potential of these cultural resources and the authors who will provide a signifying unity for the benefit of others to make meaningful enjoyment. Enjoyment should be the key because, after all is done, and as Andy Clark has recently commented, “Memory is but constrained confabulation”.
NOTES
1. Communication of digital video signals has many aspects: content; creation; formatting; encoding for data compression and channel error control; modulation; satellite, terrestrial, cable, and networked transmission; and reception – demodulation, decoding and digital signal processing. Accompanying every signal operation is a piece of hardware to perform the task. Cameras, displays, switching arrays, servers, mass storage devices, and computers are examples of the kinds of hardware required for the generation and distribution of digital video and which will be affected by technological advances in the state of the art. (17)
2. Broadband services enable, if not video quality, access to audio-visual digital media. Broadband subscribers have increased six fold from a base of 6.6 mil in 2000 to 35.8 mil in 2004. Source: US Bancorp Piper Jaffery. ‘Streaming Media Guide’ Viewcast (32)
3. Rodden and Wood’s research came up with several interesting proposals for further research. In the conclusion they went on to cast doubt on the usefulness of text-based indexing and retrieval providing the subject group with “enough extra motivation to invest the effort in annotating their photographs.” (27)
4. The author witnessed two professional script-writers working method, which involved them laying out palm cards and images around a studio, whilst working with a computer in the centre of the room to synthesis their content. Russell Crowe portrayal of the schitzophrenic John Nash in the movie ‘A Beautiful Mind’ provides an image of this process in its pathological state. (19)
5. “The on-line video server is composed of our speech-based search and retrieval system, a multimedia streaming server (Real Networks, IBM’s VideoCharger and/or Apple’s Quicktime), and query processing and a process that compose and deliver the retrieved results back to the user. The search and browse system includes an Internet-based Graphical User Interface (GUI) that can be run by any browser, on different platforms, using standard plug-ins. The GUI includes a text query box and associated advanced searching options, and allows easy navigation between the different views which blend together into an advanced video browser. … The results further show that there is no difference between speed assessment of video, MSB and audio only. This means that in many cases of remote education we can replace the video with a moving storyboard, which is much smaller in size and can be streamed across low bandwidth networks. … The results also vary between people. Among the 24 subjects we have some prefer to watch the full video, some prefer to watch the MSB, and others prefer audio only. The main lesson from this diversity in preferences is not to “optimize” the system for an “average” user, but to leave him/her to decide which media and what speed to use for a given task.” (1)
6. Pixilation is defined as “A technique used in theatrical and cinematographic productions, whereby human characters move or appear to move as if artificially animated. (26) This should not be confused with the pixel, a compression of the term, ‘picture element’ being “The smallest resolvable rectangular area of an image” (18)
REFERENCES
1. Amir, A. Ponceleon, D. Blanchard, B. Petkovic, D. Srinivasan, S. Cohen G.  Using Audio Time Scale Modification for Video Browsing, Proceedings of the 33rd Hawaii International Conference on System Sciences  2000.
2. AHDS Arts and Humanities Data Service Visual Arts. http://vads.ahds.ac.uk/about/index.html. Accessed 1.9.04.
3. Campbell, J. The Structure of Time in Autobiographical Memory, European Journal of Philosophy 5:2, Blackwall Publishers, Oxford 1997.
4. Clark, A Global Abductive Inference and Authoritative Sources, or How Search Engines can Save Cognitive Science. Cognitive Science Quarterly 115-140 2:2:2002
5. Clark, A.  Being There: Putting Brain, Body and World Together Again, MIT  Press, 1997.
6. Clark, A. Where Brain, Body and World Collide Daedalus 127, 257-80, 1998.
7. Data Harmony Inc http://www.dataharmony.com/ faq.htm#b1 Accessed 1.9.04.
8. Davenport, G.Indexes Are Out Visions & Views, MIT Media Lab Fall 1996.
9. Davenport, G. et al  ‘Jerome B. Wiesner, 1915-1994: A Random Walk through the 20th Century’ (1994) http://ic.media.mit.edu/projects/JBW/ Accessed 1.9.04.
10. Deleuze, G & F Guattari ‘A Thousand Plateaus: Capitalism and Schizophrenia'(1994), trans B Massumi, Minneapolis: University of Minnesota Press.
11. Derrida, J. Difference. Speech and Phenomena, NW University Press 1973.
12. Dourish, P  Where the Action Is – the foundations of embodied interaction, MIT Press. 2001
13. Fairfield, City of ‘Peopling Fairfield’ website http://acl.arts.usyd.edu.au/projects/consulting/fairfield/index.html Accessed 1.9.2004.
14. Fischer, G Meta-Design: Beyond User-Centered and Participatory Design, Proceedings  of HCI International 2003, Julie Jacko and Constantine Stephanidis  (eds.), Crete, Greece, June 2003, pp. 88-92.,
15. Hales, Christopher. http://www.smartlabcentre.com/ 4people/coreres/chales.htm Accessed 1.9.04.
16. Henry, A. and Hulbert, A. Exeter Cathedral Keystones and Carvings. http://hds.essex.ac.uk/exetercath/ Accessed 1.9.04.
17. Hoerl, C. and McCormack, T. (eds) Time and Memory: philosophical and psychological perspectives. OUP 2001.
18. Hutchins, E.  Cognition in the Wild, MIT Press,1995
19. Huang, M  Presence as an Emotional Experience, Medicine Meets Virtual Reality ed Westwood et al, IOS Press, Amsterdam. 1999
20. IMDB, International Movie Database. http://www.imdb.com/title/tt0268978/ Accessed 1.9.04.
21. Lansdale, M Edmonds, E  Using Memory for events in the design of personal filing systems, International Journal Man-Machine Studies, 36 97-126. 1992
22. Lansdale, M Scrivener, S Woodcock, A Developing Practice with Theory in HCI: applying models of spatial cognition for the design of pictorial databases, International Journal of Human-Computer Studies, 44, 777-799, 1996.
23. Maguire, Dr Eleanor, et al  ‘Routes to Remembering: the brains behind superior memory’ in Nature Neuroscience V6 N1. 2002
24. Mantovani, G Riva, G  “Real” presence: how different ontologies generate different criteria for presence, telepresence and virtual presence’, Presence: Journal Teleoperators and Virtual Environments, 8 (5) 538-548. 1999
25. Naimark, M. Place Runs Deep: Virtuality, Place and Indigenousness. Virtual Museums Symposium, ARCH Foundation, Salzburg, Austria 1998.
26. NSF Industry/University Co-Operative, Research Center for Digital Video at Rensselaer Polytechnic Institute Troy, New York.  http://www.ecse.rpi.edu/CNGV/proposal.html#intro. Accessed 1.9.04.
27. OED, Oxford English Dictionary http://dictionary.oed.com/ Accessed 1.9.04.
28. Rodden, K Wood, K How Do People Manage Their Digital Photographs? Proceedings of CHI 2003, Ft Lauderdale, Texas, 2003.
29. Schneiderman, B. Designing for User Interface, Addison-Wesley, 3rd edition 1998 pp 510-511.
30. Sutton, J. Memory, The Stanford Encyclopedia of Philosophy (Summer 2004 Edition), Edward N. Zalta  (ed.),http://plato.stanford.edu/archives/sum2004/ entries/ memory/.2004
31. Taniar, D Rahayu, W. A Taxonomy of Indexing Schemes for Parallel Database Systems, Distributed and Parallel Databases, 12, 73–106, Kluwer Academic Publishers. 2002.
32. Tulving, E. Elements of Episodic memory, OUP 1983
33. US Bancorp Piper Jaffery. ‘Streaming Media Guide’ Viewcast http://www.viewcast.com/whitepaperdownload.asp# Accessed 1.9.04.
34. Yates, F A. ‘The Art of Memory’, Pimlico, London 1966 (1992 ed)
 

KMS Models for Video Files using Visual Mnemonics

2007
Mike Leggett

ABSTRACT
A series of Models were built to explore and test the precept of navigating movies using gesture to control both forward and backward movement, and to launch movie files linked using visual elements associatively and semantically related to the knowledge domain represented within a movie collection.

Author Keywords
Video, indexing, hypermedia, mnemonics, meta-design
ACM Classification Keywords
H.5.4 Information interfaces and presentation: hypermedia.

 

 

INTRODUCTION
The browse searching of digital video files using proprietary software and commercial applications relies on alphanumeric indexing and keyword selection. This is appropriate for ontologies with established taxonomies and structures for maintaining Knowledge Management Systems (KMS) including information contained in movie files. But as movie files become ubiquitous in our everyday lives as a means of conveying information for a range of purposes, the design of computer systems for storage and retrieval employing other forms of visual mnemonics could add efficiencies of speed with ease of accurate access.
The research approach taken utilises the mnemonics contained in the motion-picture images of a movie collection and offers possibilities for non text-based interaction with a KMS. A series of Models were built to explore and test the behaviours of subjects navigating movie files, encountered as full screen motion-picture images, using either arrow keys on the keyboard, or mouse, to effect 4-way control: the playback of the movie – up for forward, down for backward; and launch movie files linked to mnemonics in the movie being viewed – left and right to link.
Linking to left and right is according to a schema, (from the Greek skhema, meaning shape), designed for each Model, that aids in the retrieval of movies in the collection.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
Using visual elements associatively and semantically related to the knowledge domain represented within the movie collection, evaluation will compare Models operated by novice and expert groups.

METHODS
The software tool, Mnemovie, was developed by the researcher, in collaboration with a professional multimedia developer. The approach to this initial task has been described by Fischer, Giaccardi et al, using the seeding, evolutionary growth, and reseeding (SER) process model [1] p 492. The practice of building the software tool by Leggett and Hinshaw was guided by the fourth model described by Fischer et al for collaboration paths in software development, where both the domain practitioner and the software professional has some knowledge of the others practice (p 487).
Building on related hypermedia research by Girgensohn et al [2] the Mnemovie tool facilitated the rapid building of Models to create linkages between movie files as a means of realising proof of concept. Following analysis and reflection on the semantic domain of each of six movie collections, navigational schemas were designed before links were created between movie items. The tool enabled the researcher to define parameters within the code to link movie files and iteratively develop and refine the schema for each Model.
Retrieval Schemas
Four of the six schemas were advanced further in preparation for evaluation using a specific movie collection of people talking about their research activity.
1. The segmented Loop schema, where each segment is a compression of each movie item in the collection. Linking thereby has a direct indexical relationship between a Loop segment and the item, and vice versa to return to the Loop. (Figure 1)
2. The Pathway line schema, from A to B, linking to the researcher collection has an indirect connection, where a particular location along the path is the mnemonic for a particular movie item.
3. A development of the Pathway line to the horizontal Grid is applied to a collection of movies shot and linked by the system through the intersecting streets of an inner city block. Linking to the collection of movies about the work of the researchers has an indirect connection with the schema, a particular location in the grid of streets being the mnemonic for a particular movie item.
4. The Clock face schema, dividing by convention the passing of time and indexically, direct links to the proportional durations of a movie; or indirectly to different movies in the researcher collection.

Fig 1. Sample movie collection using Loop schema.
Mnemovie Tool
Arising from an earlier prototype that used Macromedia Director to construct the system framework, conceptual and technical elements used for the PathScape project were extended into the requirements for the current investigation. Specifically this moved away from hard-coded ‘content’ to a modular and externalised framework, (.mov, .swf, .dcr files), subject to an “external importation routine” incorporating an XML file, “..more extensible to handle growth of later versions.” This approach has more recently adopted by other developers whereby a:
“…presentation engine allows content authors to describe … content through associated XML files. Interpretation of those files, content layout, and all … communication is automatically handled by the presentation engine. The content is described external to the application, creating a natural separation from the … interface.” [3]
In Table 1, the conceptual data model for building the Mnemovie tools includes the software framework, the presentation engine and the media directory.
Root Directory of Model
Presentation Engine
Media
Mnemovie b3
MNEMOVIE (application) •Mnemovie.dcr •movie_data.xml
MOVIES (video files directory)
Table 1: Mnemovie data model
Media files are prepared using a digital video editing application and saved with consistent resolution and frame size into the directory. The .dcr file compresses
specification data for the Director application and is prepared by the software professional. The file movie_data.xml contains a description of the tags and the layout of the program source code specific to the manipulation of the movie files contained in the adjacent Movies directory. The modular construction of the source code enables the researcher to expand the scale of the instruction set according to the requirements of the interactive Model.
The XML-file structure throughout was based on each <track> having a <movie id> for the forward motion movie and a different <move id> for a reverse motion movie. From each <track> links to other movies could be created. The Beta 1.0 used the following structure (sample):
<track id=”PD”
 <movie id=”PZF” file=”movies/PulledZfore.mov” dir=”F” ><!–F–>
<link side=”L” start_time=”00:00:13″ end_time=”00:00:14″
movie_id=”11″ link_start_time=”00:00:00″ />
    </movie>
<movie id=”PZB” file=”movies/PulledZback.mov” dir=”B” ><!– B –>
    <link side=”R” start_time=”00:00:12″ end_time=”00:00:13″
movie_id=”R11″ link_start_time=”00:00:00″ />
    </movie>
</track>
As experimentation progressed iteratively, the <link_> group of tags operating within the system were increased and thus offered additional linking possibilities for schema design.
Evaluation
Each Model is in the process of predictive and operator evaluation with a movie collection to assess interaction efficiencies between Models, the quality of the experience and the interacting subject’s ability to:
navigate a movie collection using the schema approach;
retrieve information contained within movie files;
retain memory traces from the navigational process, such that subsequent interactions with the Models can demonstrate accumulated learning behaviour.
REFERENCES
1.    Fischer, G., Giaccardi, E., Eden, H., Sugimoto, M. and Yunwen, Y. Beyond binary choices: Integrating individual and social creativity. International Journal of Human-Computer Studies, 63. 482-512. 2005
2.    Girgensohn, A., et al. Designing Affordances for the Navigation of Detail-on-Demand Hypervideo. in ACM Advanced Visual Interfaces. 2004.
3.    Mentor, K. Director and SCORM 1.3 SCORM SCO Presentation Engine (S2PE) Director Developer Center, Adobe Inc, 2006.