I'm Jon Olick. I make shiny things. I simplify.


I presented Sparse Voxel Octrees at Siggraph 2008.

Thursday, March 17, 2011

Know your SPU transposes

It has come to my attention that this is some needed and useful information to have. Too useful to keep to one's self. At Naughty Dog, some years back, Cort Stratton and I compiled a pretty comprehensive list of transposes. Its unbelievably handy to have on the fly as you need it. I've reconstructed the transposes as best I can and put it up here for your general use. Enjoy!

Introduction


There is more than one way to skin a cat. When converting from AOS to SOA and back there are many variations. How many instructions you have available to schedule in your even/odd pipes, how many spare registers you have to spend, and the latency of the combination of instructions used will dictate which you should use.

The goal is to first find all variations where you can trade even for odd or odd for even instructions -- and minimize the number of registers used in the process (including shuffle masks used).

Shuffle Masks
To specify shuffle masks, I'll use A-D,0 to specify the element 1 through 4 in the first parameter and a-d,0 to specify element 1 through 4 in the second parameter. '0' is special in that it means put a zero in the output for that element.

For example, s_ABab would take the first 2 elements in parameter 1 and the first 2 elements in parameter 2 and put them side by side into the output register.


Example - AOS to SOA, 1 element - 0 even, 3 odd, 1 shuffle mask, 9 cycles
Lets first consider the simple case of 4 input registers, where we are interested in combining the first element of each in(1-4) register into a single out register.

InstructionsIn English
shufb t1, in1, in2, s_ACac t1 = in1.x, ?, in2.x, ?
shufb t2, in3, in4, s_ACac t2 = in3.x, ?, in4.x, ?
shufb out, t1, t2, s_ACac out = in1.x, in2.x, in3.x, in4.x


Example - AOS to SOA, 1 element - 1 even, 2 odd, 2 shuffle masks, 7 cycles
This variation on the above splits up the even/odd pipe usage a bit at the cost of more masks.

InstructionsIn English
shufb t1, in1, in2, s_Aa00 t1 = in1.x, in2.x, 0, 0
shufb t2, in3, in4, s_00Aa t2 = 0, 0, in3.x, in4.x
or out, t1, t2 out = in1.x, in2.x, in3.x, in4.x


Example - SOA to AOS, 1 element - 0 even, 3 odd, 0 shuffle masks, 6 cycles
This example converts back from SOA to AOS. Still working on 1 element.

InstructionsIn English
shlqbyi out2, in, 4 out2 = in.y, in.z, in.w, in.x
shlqbyi out3, in, 8 out3 = in.z, in.w, in.x, in.y
shlqbyi out4, in, 12 out4 = in.w, in.x, in.y, in.z

Contribute
Know a transpose that I didn't list? Find a better one? Post in the comments and I'll update the post.

Next post...
... will be on 2 element transposes.

No comments:

Post a Comment