Experimental record of applying materials generated by GAN to visual works

Deep Learning and AI Art

Naoyuki Hirasawa
6 min readJun 26, 2022

Are you familiar with GANs? It is a technique in machine learning that can create wondrous visual such as the following

Morphing video by StyleGAN3

As a visual artist I have created works (and a large number of WIPs) using GAN. This article will introduce them and keep them as my memories on the Internet.

How GAN works

Structure of GAN

GAN (generative adversarial network) is one of the techniques called generative models in machine learning. This was proposed in 2014 by Ian Goodfellow et al. For example, by training a large number of images, a new image can be generated based on their features.

GAN consists of two neural networks, Generator and Discriminator. When noise (Z) is input to the Generator, it generates false data, and the Discriminator is given the real and false data to judge its authenticity.
By repeating this process to the extent that the Discriminator is unable to judge, it is possible to generate data that is closer to the real thing.

It is also possible to create mysterious morphing video like the one shown at the top of this article by changing the input Z value little by little.

I find this technology very interesting and am looking for new ways to express myself based on GAN output. The goal is not to get close to the real thing, but how to create unrealistic and interesting visuals is my motivation. The following are some examples of my work and experiment using GAN output.

Note: Much of the information I consulted in the creation of our work is included in the YouTube overview section. I will also add examples and information to this article when I feel like it.

some works using GAN

Audio Reactive

This is a simple example of the kind of work we often see. The video responds to the kick, snare, and volume of the music. I often use the method to adjust the playback speed of high frame rate morphing video to match the audio.

This is a video produced by homage to Fatboy Slim’s famous music video. In addition to audio reactivity, I took advantage of BigGAN’s ability to manipulate the output category of an image.

Slit Scan

I tried to combine GAN’s morphing and slit scan. In both cases, the change in the time axis is interesting, so I expected that it would be even more interesting to combine them.

I often use TouchDesigner for slit scan. This makes it very easy to achieve effects. There are many helpful tutorials like this one.

Inversion / Kaleidoscope

Simply flipping the GAN morphing video up, down, left, and right and connecting them together creates a pleasant visual; it is also a useful technique to compensate for the limited resolution of the GAN.

Add more inversion effects to the displace and slit scan effects.

However, I feel that too much inversion does not take advantage of the good qualities of GAN.

Tiling

I trained GAN with images of the terrain obtained from ‘地理院タイル’. In addition, a large number of its morphing videos are lined up. No special techniques are used, but tiling can make the visuals look stylish.

This video is a combination of rotation operation by StyleGAN3, slit-scan, and tiling.

Datamosh / Pixel Sorting

Morphing video by Ukiyoe of 53 Stations of the Tokaido are used. By the way, I do not understand data mosh as an algorithm very well. This work was well received and I received some messages about the exhibition, but they somehow lost touch with me.

This one uses video output based on text input as material. These are various ‘generative arts’. Eventually they will dissolve.

3D

An interesting visual on a flat surface can be interesting even when it is converted to three dimensions. In many cases, RGB information is used to assign values to the Z-axis.

In some cases, just surrounding it with a cube frame makes it look somewhat better.

This is like exhibiting in a virtual 3D space. The cherry blossom design by Kuniyuki Takizawa from ‘NDLイメージバンク’ is used for the study data.

No special technique is used, but just putting several different visuals together looks cool.

VR / AR

This is a GAN morphing video pasted as a texture on a spherical surface. As it is WebVR content, you are free to move your point of view around. When I made this, I did not find it particularly interesting, but when I saw ABRA’s music video, I was very shocked. This video is very cool.

This is also a WebAR version with a texture on a spherical surface, using AR markers.

This is a morphing video of the 53 Stations of the Tokaido using image tracking. The accuracy is not very good.

I put it on my face as a texture. You may think it looks a little weird, but personally I find it interesting. It might be more interesting if you use morphing images of the face.

Color palette

The output of the early stages of GAN learning is used as the color palette. So far I have not come up with a more interesting use for this.

Scaling / Rotation (StyleGAN3)

This is a feature of StyleGAN3. It allows you to add operations such as scale and rotation in the output of your model. I did not modify the images afterwards.

Some visuals are fixed images

This is a technique for the learning phase of images. It is difficult to explain in writing, but I use visuals that are partly changed and partly unchanged as training data. In the video above, only the TV screen appears to morph.

This one used a video taken inside a cab as training data. Only the monitor and the scenery outside morph.

Estimating body parts

I used a pose estimation model for the GAN morphing video. Incidentally, my dance videos were used as training data. This is very WIP, but it might be more interesting to add visuals using body part coordinates. Also, apart from the visual interest, I think there is a conceptual interest in how AI perceives the output of AI.

A facial recognition model was also used for the face morphing video. This is also very WIP.

Segmentation

A segmentation model was run on the morphing video of the face and visuals were overlaid within the segmented areas. I often use the segmentation technique because it is easy to come up with many ideas.

Breakdance with GAN

These are the works using breakdance (breaking) as training data for GAN. I call it “BreakGAN”. Breakdance is a form of street dance in which originality is pursued by utilizing the entire body. I thought GAN’s morphing could extend those elements, and GAN could help dancers to create fresh movements that are not bound by stereotypes.

I have written information about the works in the summary section of YouTube. Please check there as well.

Writing Articles with GAN

I have written several articles using GAN. It is not a technical explanation as in this article, but more like using GAN as material in writing (which is a little harder to explain). It’s more like using it as material in creating a story. I am not that good at English, but I can’t explain it well even if it were in my native language, Japanese.

For example, “GAN散歩” is an article about walking in an imaginary city generated by GAN. There are many examples of GAN learning about landscapes. However, the fact that this article was published in a non-technical web media about walking is also very significant to me.

Final Thoughts

This article has presented many examples of work with GANs. Although it takes time to collect and learn a large amount of data, you can enjoy a variety of outputs just by changing the data. I am posting information on SNS, blogs, and YouTube, so please follow me there if you like.

--

--

Naoyuki Hirasawa
Naoyuki Hirasawa

Written by Naoyuki Hirasawa

FollowTheDarkside (FTD) - creative coder / visual artist / writer / bboy - Portfolio: https://followthedarkside.com

No responses yet