Neural Style Transfer (Student Choice Award winner)
Please refer to the Project webpage for result images
B & W are explained in part4
Stylize images from the previous homework.
Implemented my own cropping method.
Tried to use a feedforward network to output style transfer results directly.
Part 1 Content Reconstruction
1.1 Content loss optimization
1.2 Reconstruction from random noises
Part 2 Texture Synthesis
2.1 Style loss optimization
As we can see, when the content loss is optimized at deeper layers, the synthesized texture is more abstract and has more similar shapes but less similar colors. From my point of view, the combination of conv1, conv3, conv5, conv7, and conv9 looks the best. Therefore, I am going to choose this configuration for my style loss.
Layer optimized content lossSynthesized textureconv1, conv2, conv3, conv4, conv5conv1, conv3, conv5, conv7, conv9conv6, conv7, conv8, conv9, conv10conv11, conv12, conv13, conv14, conv15
2.2 Reconstruction from random noises
LayerNoise1Noise2Synthesized texture from noise1Synthesized texturefrom noise2conv1, conv2, conv3, conv4, conv5conv1, conv3, conv5, conv7, conv9conv6, conv7, conv8, conv9, conv10conv11, conv12, conv13, conv14, conv15
Part 3 Style Transfer
3.1 Tune the hyper-parameters
3.2 Optimized results
From the result, the performance differs with different combinations of content image and style image. But mostly 100000 and 1000000 work the best.
3.3 Noise & Content image comparison
By comparison, we can see the output images from noise are more texture-liked. There are elements from the original content image maintained, but still very vague.
3.4 Additional results (with time compared)
Generally speaking, generating the style-transferred image from the content image is approx. a quarter faster than from noise.
I took this photo in the Versailles Palace in Paris last December. Very impressive and I really recommend everyone to sightsee there :)
This photo is also taken by me! I was standing on the left bank of the Seine river, in front of the Musée d'Orsay in Paris. The style image, "Starry Night Over the Rhone", is one of Van Gogh's collections in Musée d'Orsay. When I saw the Seine river after visiting the museum, I immediately recalled this painting.
This style of image is painted by Klimt. He is one of my favorite artists, I really recommend everyone to check his collections out.
Part 4 Bells & Whistles
4.1 Stylize image from the previous homework.
I, again, became the sun.
4.2 Implemented my own cropping section
The basic idea is to keep the content image the original size. Therefore, the first step we are going to do is to check if the style image is large enough. If it is, we can directly jump to cropping. (If the style image is way too large, we can also indicate an upper limit of the ratio between the size of the style image and the content image, and then resize the style image to a reasonably smaller scale)
Then, for those style images which are not large enough, we are going to check the ratio of the width of the style image with the width of the content image, as well as the ratio of lengths of them.
When one of the ratios >=1, we can consider that edge is long enough, and we will expand the style image with the ratio of another edge.
When both of the ratios <1, we will take the edge with the smaller ratio as the divider to expand the style image. When doing this, the other edge is automatically expanded to a valid length at the same time.
width_ratio = style_width / content_width height_ratio = style_height / content_height style_ratio = style_height / style_width # Step 1 if width_ratio >= 1 and height_ratio >= 1: pass # Step 2 elif width_ratio < 1 and height_ratio >= 1: new_width = content_width new_height = int(new_width * style_ratio) + 1 style_img = F.resize(style_img, (new_height, new_width)) # Step 3 elif width_ratio >= 1 and height_ratio < 1: new_height = content_height new_width = int(new_height / style_ratio) + 1 style_img = F.resize(style_img, (new_height, new_width)) else: ratio = min(width_ratio, height_ratio) new_height = int(style_height / ratio) + 1 new_width = int(style_width / ratio) + 1 style_img = F.resize(style_img, (new_height, new_width))
Finally, we can crop the image. We could start from the top left, or we can indicate any valid start point that is not out of range.
top, left, height, width = 0, 0, content_height, content_width
We could also use random crop to get more diverse results, for that synthesized image varies according to different style image input.
from torchvision.transforms import RandomCrop top, left, height, width = RandomCrop.get_params(style_img, (content_height, content_width))
Then, we manipulate the style image:
style_img = F.crop(style_img, top=top, left=left, height=height, width=width)