How should I go about doing rotations? Just shove the entire dataset into ImageMagick and tell it to rotate everything like 15deg? I know that by default it trains with mirroring on, not sure if it auto rotates as well but I didn’t see anything on it so I’d assume not.
There is a class in tensorflow that does this for you. If I recall correctly it is called ImageDataGenerator or something like that, and it’s whole schtick is that it automatizes all pre-processing for you. I’m not sure how much of the code from stylegan you actually messed with, but if it is invariant to horizontal flipping I suspect it is already doing some pre-processing somewhere, possibly via ImgDataGen already, and if so it is just the case of adding rotation to the constructor call.
Still want to have a look at the code you played around with later, btw.
As for the wg sequences, my thought right now is to import the image tuples as just one entry to the dataset, and have them as different dimensions of the same entry(I will call it z from now on, even though that is not strictly correct since the image already has 3 color bytes as a z). I assume it won’t be something you can do out of the box, but tensorflow already works great with multi-dimensional data, so I don’t think it will complain much about that, it will just require a bit of finnagling to get the data in as we want.
Then comes the more iffy part of this approach: I am not 100% if you just run it buck naked(just the pure vanilla stylegan) it will be able to learn the patterns in the z dimension(the changes through time/images). I think it might, and I want to try it, since it is literally just running it as it already is. My guess is that it will either get confused with how to run the image convolutions over the extra dimension, or it might also get cheeky and automatically run standard vector space convolutions. I think the vector space ones might be able to learn on their own, because if you think about it, the z dimension displays patterns just the same as any 2D feature in an image. If it doesn’t automatically default to vector space convolutions, we will have to teach it how to do it ourselves.
Finally, there is also the possibility the vectorial pattern recognition won’t be able to do the job. This is a worst case scenario, because if this is the case we will have to think up another way to model the data ourselves. Maybe representing the combination of the both images as the difference between them would work for instance.
P.S.: I just noticed that you might have meant “how would I go about creating that dataset” with your second question. As I said before, right now the only thing we want is to have a bunch of sequences with good filenames so we don’t get lost. After that we will want to manually arrange them as tuples, at this point cropping them manually if required(like those that the whole sequence is a single file). To arrange them as tuples we have a few options, we can automatically encode them into json, we can convert them to raw and then append the z dimension after an arbitrary point in the file. Finally, we can try dropping each tuple into a folder, and TF already has a few ways in which it can automatimally understand directory structure when it is importing data. We would have to think which of the three is the best option for us depending on how we choose to structure our work.