Neural style transfer is the use of neural networks to transfer the style of an input image – e.g. a famous painting – to another input image – e.g. a backyard photograph.
Researchers have proposed a variety of techniques for performing style transfer, but which one works best? There is no right answer to that question; Viewers’ opinions differ. In the results reported in previous style transfer articles, the most preferred methods rarely receive more than two-thirds of reviewers’ votes, while the least preferred methods rarely receive less than 5%.
In a paper we presented at this year’s meeting of the Association for the Advancement of Artificial Intelligence (AAAI), my colleagues and I describe a new style transfer model that can output multiple options, controlled by a model parameter that the user chooses.
We show that most previous approaches to style transfer can be rewritten in a standardized form, which we call the assign-and-mix model. The “assignment” step of the model involves a assignment matrixwhich maps features in one input image to features in the other. In the paper, we show that the differences between style transfer techniques generally come down to the entropy of the assignment matrix, or the diversity of the matrix’s values.
Finally, we show that, given a user-specified setting of the input parameter, an algorithm called Sinkhorn-Knopp can efficiently compute the associated assignment matrix, enabling a diversity of outputs from the same style transfer model.
In a series of experiments, we compared our approach with its predecessors. We found that, by standard metrics, our method did a better job of preserving the content of the content input and the style input, and produced more diverse outputs. We also conducted a study with 10 human evaluators and found that—at a certain setting of our diversity parameter—images generated by our preferred method outperform those produced by other methods.
Assign and mix
In style transfer, the first step is to pass both the content example and the style example to the same visual encoder, which is typically pretrained on a broad object recognition task. The encoder produces a representation of each image, where each image region has an associated feature vector.
The function vectors will typically code for visual information – about e.g. colors and orientations of gradients – but also semantic information – indicating, for example, that a certain image area depicts part of an eye.
Style transfer typically involves (1) reshuffling elements of the style image to reflect the contents of the content image, (2) warping the content image so that its overall statistics resemble those of the style image, or (3) a combination of the two. We assimilate all such approaches to the assign-and-mix model.
The “assign” step in assign-and-mix corresponds to approach (1). It involves the assignment matrix, which assigns feature vectors from the style representation to regions of a new image, controlled by the content representation. Although previous style transfer approaches use a variety of techniques to find correspondences between style and content features, we analyze several of them in the paper and show that they can often be assimilated to the task-matrix model.
The assignment for a particular point in the new image can be a single vector from the style encoding, or it can be a weighted combination of vectors. In the first case, the assignment matrix is ​​binary: each matrix entry is either 0 gold 1. This is a minimum-entropy assignment.
In contrast, if each point in the new content image consists of a weighted combination of each vector in the style image, the assignment matrix has higher entropy. There are existing style transfer methods with binary assignment matrices and there are existing approaches with high-entropy matrices, and our method can approximate both.
After the allocation step, we proceed to the mixing phase, which corresponds to procedure (2), above. In this phase, we review the encoding of the new, synthetic image, and for each image region, we measure the distance between its encoding and the original content sample. Then, we mix the feature vectors from the original content encoding according to the degree of divergence. This ensures that the new image preserves the content of the original.
The computational bottleneck in this process is the creation of multiple allocation matrices with different degrees of entropy. However, we show in our paper that the Sinkhorn-Knopp algorithm, which allows matrices to be rewritten in a standardized form that enables efficient solution, can be applied to the problem of constructing assignment matrices.
In the paper, we rewrite three previous style transfer methods using the assign-and-mix format. We chose these methods because their assignment matrices cover the full range of entropies. Our method should be able to approximate the output of any style transfer models whose assignment matrix entropies also fall within a more restricted range.