On the Relationship between HCI Researchers and Creators---Or How I Became a Toolsmith

Magazine: Features
On the Relationship between HCI Researchers and Creators---Or How I Became a Toolsmith

FREE CONTENT FEATURE

Research on creativity support tools in human-computer interaction often focuses on novel interaction design, but that is just the tip of the iceberg. Let's dive deeper and help creative activities "in the wild."

On the Relationship between HCI Researchers and Creators---Or How I Became a Toolsmith

By Jun Kato, June 2023

Full text also available in the ACM Digital Library as PDF | HTML | Digital Edition

Tags: Development frameworks and environments, Human computer interaction (HCI), Interactive systems and tools, Sound and music computing

During my doctoral studies, I specialized in human-computer interaction (HCI), particularly interaction techniques that incorporate graphical representations to intuitively process real-world data that is difficult to handle in character-based programming environments. Paraphrasing this in the light of the issue theme, I researched creativity support tools to facilitate the creation of new media expressions. (Herein, I do not distinguish programming from art because I consider programming to be a kind of computer-aided art in itself.) I wrote papers for top-tier international conferences every year, basking in the intellectual excitement of coming up with interface design ideas that would enable new programming experiences. These ideas for new technologies would come to the attention of people creating production-ready programming environments (creativity support tools) and would gradually be incorporated into the environments used by ordinary programmers (creators).

I still do not doubt the value of the novel technologies developed through this kind of research. The intense, short-term research cycle of writing academic papers has taught me many things, which is an irreplaceable experience. However, by the time I finished my doctoral studies, I had become frustrated with the lack of support for user creativity "in the wild." For example, I would conduct user studies before writing a paper, and I would find it very excruciating when people criticized me for usability issues. Of course, I would write in the paper: "Although there were criticisms regarding usability, they are easy to solve from an engineering standpoint, and there are no inherent flaws in the interaction design we are proposing." As researchers, we can easily leave issues as future work and expect some of them to be addressed by technology transfer, where innovative technologies created through research are delivered to society through development. But who is responsible for the entire process?

Frankly, I became tired of "sampling" the creative activities of creators every year, putting them on the chopping block of research, and continually proposing new technologies that had potential—only potential—to improve these activities. I imagined a possible consequence of such a sampling approach; there could be creators whom I respected and with whom I could build a personal relationship through research, but the following year I might be forced to move on to the next subject. It seemed like a loss to me, not so much as a researcher, but as a human being. Therefore, since obtaining my doctorate degree, I have been engaged in research and development as a toolsmith who works side by side with creators. It has not been a smooth journey, and I have had obvious failures. Nevertheless, there are experiences that can only come from being immersed in a creative culture, and I have come to feel that this is one of the privileges of being an HCI researcher in the wild.

Building and Residing in the Creative Public Square

Let me give you a concrete example. In the past, lyrics were viewed using the lyric sheets that came with physical CDs, but with the rise of video streaming services, people have come to enjoy lyric videos in which the lyrics are animated in an engaging way. With help from my brilliant colleagues, I developed an integrated design environment named TextAlive, as shown in Figure 1, and presented it at the ACM Conference on Human Factors in Computing Systems (CHI) 2015 [1]. TextAlive provides both a programming interface for live programming text motions and visual effects in lyric videos and a video authoring interface for creating lyric videos by adjusting program parameters with sliders, color palettes, and so on. By designing them as a single unified environment, programmers, video authoring designers, and musicians can collaborate more easily. The paper won an award, and I still consider it a good and valuable piece of HCI research. However, when we released TextAlive as a web service,^a we did not observe much co-creation. What we observed instead were programmers who could produce beautiful videos and musicians who could learn to program fluently. A good example of such a programmer is daniwell.^b Internationally known for composing the original Nyan Cat song, he is a creator who designs, illustrates, models, programs, makes videos, and creates websites. I fell in love with his versatility and became convinced of his artistic and technical compatibility with TextAlive. After speaking to him at a music event, I improved the TextAlive programming interface for him, and he utilized it to develop the style now favored by many TextAlive users (see Figure 2). A good example of a programming musician is BIGHEAD.^c He had an exceptional affinity for novel information technology, and he creatively pioneered the use of TextAlive to produce various music videos. He said technology like TextAlive, which allows him to easily produce lyric videos, is indispensable for bringing music to his audience as soon as possible. Eventually, he learned to use Unity and created a 3D virtual live performance video utilizing TextAlive for the first time ever (see Figure 3).

Going beyond paper writing by releasing tools to creators and getting involved with them is, in my view, akin to creating a public square. I think this square metaphor has the following important implications for turning research into a publicly available service. First, by making research outcomes accessible to the public, new paths are created for creators to flourish as many of them pass through and interact. As an author, I can be inspired just by looking at the variety of activities that take place. Some creators may stay in the square and become interested in the square itself. These creators can continue to grow and develop their own artistic expression, incorporate tools into their own workflow, and use them in ways that are far beyond the author's expectations. In addition, squares are greatly influenced by their locality. TextAlive is well known in the context of Vocaloid music, which is a major genre in the Japanese music scene. The singing voice synthesizer software Hatsune Miku, a representative of Vocaloids, is a creativity support tool that allows musicians to have their songs sung by a virtual character, and its users and audiences are highly receptive to emerging technologies. I have been invited to speak at the annual Hatsune Miku event Magical Mirai^d ("magical future" in English) every year, contributed to her (yes, she is a humanoid character while being a software) live performance with TextAlive, and experienced firsthand people's support for TextAlive. Research that supports creative activity is always implicitly embedded in creative culture. It is gratifying to be able to give back to people by explicitly stepping into the culture, and I hope more researchers will find the same pleasure of being a toolsmith.

Cultivating a New Path at the Intersection of Art and Programming

TextAlive, which we had created with the goal of co-creation between programmers and musicians, had failed in that objective. Hence, my next step was to analyze the reasons for the failure. Why didn't co-creation occur? There seemed to be two reasons. First, there was inadequate support for programming; in terms of video authoring, TextAlive was only a part of the overall workflow. Video authors create their videos with a combination of tools of their choice, and they had a choice in whether or not to use TextAlive. However, in terms of coding, TextAlive's programming interface was the only way to develop visual effects for TextAlive. Programmers were bound by TextAlive's web-based editor, poor debugging capabilities, and limited API for drawing graphics. This contrast in the degree of freedom between video authors and programmers was not something I felt good about, as I have been deeply engaged in research to improve the programming experience. More freedom for programmers! Second, there was a limitation in the media format of lyric videos. The activity of generating creative visual expressions through code is called creative coding, and various programming environments for creative coding, such as Processing and openFrameworks, have been developed. Creative coders can choose their preferred environment and use their favorite graphic libraries to create attractive visual expressions. In these environments, adding interaction capabilities to the artistic expressions, such as responding to mouse events, camera input, and others, is a no-brainer. In contrast, at the time, TextAlive could not be used to create interactive content, not only because the programming experience was constrained, but also because of the strong assumption that the final output would be a video. Again, more freedom and control for programmers.

So, I decided to reorganize TextAlive's features and make them into a useful API to give programmers more freedom and control. In 2020, we released the TextAlive App API^e to develop a "lyric app," a new form of lyric-driven visual art, as we name it, that can render different lyrical content depending on user interaction and address the limitations of static media. We designed the API to be utilized with typical design patterns in creative coding and implemented it to allow programmers to choose their favorite creative coding environment and graphic libraries. Its unveiling took place, of course, at Magical Mirai. This annual event focuses on the creative culture surrounding Hatsune Miku, and we could not have found a more appropriate place to hold a programming contest using TextAlive. Since then, every year in the months before Magical Mirai, many creative programmers develop and submit lyric apps while listening to music. While there were early examples of lyric apps, such as karaoke and music rhythm games, we believed lyric apps would have more potential than their predecessors. Indeed, the contest applications exemplified these possibilities. For example, the 2022 winner "Miracle Universe =" by Misora Ryo^f (see Figure 4) beautifully displays lyrics in a three dimensional space. The character climbs a spiral staircase as the music plays. Clicking on the floating cubes allows the user to look at the character from various angles, and depending on the number of cubes clicked, the cherry blossoms may be half or fully bloomed at the end. The lyrics are eye-catching, and there is also a good amount of gameplay that makes the user want to listen to the music repeatedly, a truly interactive application that can only be called a lyric app. We named the framework that supports the development of such lyric apps the "Lyric App Framework" and presented it at CHI 2023, along with findings from an analysis of 52 entries from two years of programming contests [2].

The Lyric App Framework could have been published as an academic paper as early as 2020 if only to claim its technical novelty. However, I wanted to test it "in the wild." It took time, but by holding programming contests, we discovered eight lyric app categories and obtained insights into new API designs. In the eight years since we published our first TextAlive paper that supports the existing lyric video format, by continuously supporting creative activities in the wild, we were able to discover and establish a new artistic media format, lyric apps, with direct feedback from our users. Computer science is a field where the industry is often ahead of academia. It should neither be a surprise that a community of toolsmiths and progressive users move ahead of academia.

Artistic Outcome-Centric Design

What we have described so far is a case in which a toolsmith researcher created new tools and opened them to the public, and although his initial aim was off, there was much to be gained. In other words, the researcher's original blueprint of the future was redrawn through dialogue with the creators. So, what would happen if toolsmith researchers create tools to satisfy demands from people in the creative industry? Would we end up with something that fulfills the user's needs, like a product that goes through the common user-centered design process? To answer these questions, I would like to present a case study I have been working on in the context of Japanese animation production.

We computer science researchers must always be careful that we are not cherry-picking the activities of artists just for the sake of research.

Japanese animation is a commercial art that requires the involvement of a large number of people. The production process can be roughly divided into the pre-production stage, which involves several dozen people, and production and post-production, which involve several hundred people. In pre-production, a storyboard ("E-conte" in Japanese) is created based on a script to give visual and written instructions on when, where, and how the characters will perform. This format is a set of instructions that contains information about the pictures to be drawn in later stages of the production process. If we liken this to software development, pre-production is when specifications are created, and in production, the specifications are implemented in a waterfall manner. While many software tools have been developed to digitize these processes, the storyboarding process is still a rare case where there are no de facto standard digital tools, and many directors and storyboard artists draw on paper.

In 2018, I took a position as a technical advisor to an animation production company, Arch Inc., and oversaw the development of a storyboarding tool. We completed a prototype of a tool that would generate blank storyboards line by line by filling in the text of the script and allow the user to draw storyboards by filling in the blank frames. Unfortunately, this was a complete failure. First of all, the quality of drawing with the tool was not nearly as good as pen and paper. Moreover, the navigation in the storyboards was paginated in the same way as turning a sheet of paper, and because the storyboards could not be turned continuously like on paper, it was exceedingly difficult to grasp the whole picture of the storyboards. In response to this painful failure, in 2019, I began re-implementing the tool from scratch while interviewing animation directors. This was the beginning of Griffith,^g a web-based storyboarding tool that is still under active development today (see Figure 5). In 2020, we also decided to research the history of storyboards to verify what we learned from our interviews. We learned the Japanese storyboard style was imported from Disney by industry professionals who went to the United States in the early 1950s. However, while there were two types of Disney storyboards, one of which was a wall-mounted cork board for discussions among the parties involved, only the other style, with six pictures per page, was imported to Japan. The current storyboards are still largely unchanged. Why was only one style imported? Are creators utilizing the storyboards as much as they can be? There arose so many follow-up questions, and I should stop here—if you are interested, you can refer to the presentation slides used for my talk at the annual conference of the Society of Animation Studies [3].

By making research outcomes accessible to the public, new paths are created for creators to flourish as many of them pass through and interact.

The more seriously we try to support creativity in the wild, the more we realize a lot of domain knowledge is required to understand the context, which is built on top of a rich history. A tool development process that considers the background, which is unknown even to the domain experts themselves, could be a very distinct type of user-centered design. While this is very exciting for me personally, as a computer science researcher, I am also at a loss. This is because none of the aforementioned domain knowledge, history, or issues related to Japanese animation story-boards have been compiled in a form that can be referenced in computer science literature. I am an HCI researcher and would like to write a systems paper, but I feel I would fill up the entire paper just summarizing the information I have gathered from my literature review and interviews. This problem has not yet been solved at the time of writing this manuscript. On a personal, narrower scope, Griffith itself has not yet been published in an archival place, but on a broader scope, I feel a certain limitation in the current research methods in computer science.

Creativity and Cultures in Computing

In this article, I have discussed two cases of toolsmiths creating tools to support creative activities in the wild. One began as a technology-driven effort and ultimately led to new media expressions, and the other began to address a growing need in the domain and has now enabled a novel production process. Through these examples, I have considered the tension between art and technology. In the words of John Lasseter of Pixar Inc., "The art challenges technology and the technology inspires the art." Ken Anjyo, the founder of the research and development division of OLM Digital Inc., which is famous for producing the Pokémon cartoon, has stated "anime production is a fusion of art and technology." These are certain idealized images, and in reality, art and technology often go back and forth.

Artists are valued for their creativity in new expressions, while researchers are valued for their creativity in new technologies (tools). We computer science researchers must always be careful that we are not cherry-picking the activities of artists just for the sake of research. To discuss relevant topics through a sociocultural lens, researchers gathered at the Special Interest Group on Creativity and Cultures in Computing (SIGCCC) meeting at CHI 2023 [4]. My journey to go out into the "wild" was not always smooth, but as a community of researchers, we can find a better way. These days, I even feel the need to apply research methods from cultural anthropology to understand the nuanced context and adopt perspectives from media theories to consider the tools' impact on users. While they might not be our typical neighboring disciplines (yet), such as cognitive science and social science, we need to gain more help, as computers are human artifacts, and computer science is (somehow) a study in humanities, after all. Only when researchers and artists share cultural backgrounds and acknowledge each other's values will the creation of new expressions and technologies become a shared goal and coherent co-creation become possible.

Acknowledgments

This work was supported in part by JST ACT-X Grant Number JPMJAX22A3 and JST CREST Grant Number JPMJCR20D4, Japan.

References

[1] Kato, J., Nakano, T., and Goto, M. TextAlive: Integrated design environment for kinetic typography. In Proceedings of the 33^rd Annual ACM Conference on Human Factors in Computing Systems [CHI '15]. ACM, New York, 2015, 3403–3412; https://doi.org/10.1145/2702123.2702140

[2] Kato, J. and Goto, M. Lyric app framework: A web-based framework for developing interactive lyric-driven musical applications. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems [CHI '23]. ACM, New York, 2023; https://doi.org/10.1145/3544548.3580931

[3] Kato, J., Mihara, R., and Hirasawa, N. Past, present, and future of storyboarding in Japanese animation. In Society for Animation Studies 32^nd Annual Conference. Non-archival paper presentation; https://research.archinc.jp/static/files/sas2021-kato-storyboarding-in-anime.pdf

[4] Kato, J., Frich, J. et al. Special interest group on creativity and cultures in computing. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems [CHI EA '23]. ACM, New York, 2023; https://doi.org/10.1145/3544549.3583175

Author

Jun Kato is a toolsmith researcher who builds creativity support tools and works at the intersection of programming and conventional art. He serves as a senior researcher at the National Institute of Advanced Industrial Science and Technology [AIST] and the technical advisor at Arch Inc.

Footnotes

a. https://textalive.jp

b. https://aidn.jp

c. https://www.bighead01.com

d. https://magicalmirai.com

e. https://developer.textalive.jp

f. https://developer.textalive.jp/events/magicalmirai10th

g. https://research.archinc.jp/en/griffith

Figures

Figure 1. TextAlive was initially prototyped as a Java application.