Envisage: Investigating Design Intentions, Visual Perception through Eye Tracking of Architectural Sketches

Thesis Advisor: Takehiko Nagakura
Reader: Axel Kilian

Are we able to perceive an architect’s intention through observation of his or her sketches? Yes, but it requires a probing process of observation. Across time and continents, master architects have developed a collection of the processes for expressing powerful design intentions through succinct and dynamic representation, or design sketches. Different types of sketches describe, express, or gesture about the architecture they represent. They deliver active ideas that are not limited to objects but provide a raw sense for both the perception and creation enabled through visual thinking.

I propose a method to utilize eye-tracking as a translator between the graphics and the architects’ perception of three types of intention: shape, composition, and circulation. My hypothesis is that we can perceive how architects represent these intentions -- through the means of graphics, which allows a more ambiguous and dynamic translation between intention and sketches, we can probe the underlying process by observing a viewer’s eye movements. Furthermore, heat maps, obtained from eye movements, can be adapted to a machine learning algorithm -- Image-conditioned Generative Adversarial Networks (GANs). I use this algorithm to translate the raw sense of space and visual gesture to capture human-level information acquisition of these intentions.

To demonstrate the work, I first discuss the history of visual power in design and a shift towards units and segmentation, covering the development from the emergence of design drawings to the innovation in parametric design. I then proceed with an eye-tracking study where I asked graduate architecture students to observe sketches by Louis Kahn. I study how the graphics of heat maps from eye-tracking decode the participants’ perception of intentions in sketches based on a shared educational background in architecture. Then, I propose a framework of utilizing such a representation system to train machines to predict human-level view patterns. Finally, I examine how effective this system will function with an image-to-image machine learning algorithm known as the image-conditional GANs.

From the study it can be implied that mechanical eye-movements reveal a shared visual-thinking procedure that has been unconsciously practiced by human designers. Such a procedure, if learned by machines, will facilitate a creative process that utilize such informal dynamics derived from eye movement in visual representation in design.