I'm Jon Olick. I make shiny things. I simplify.


I presented Sparse Voxel Octrees at Siggraph 2008.

Friday, March 18, 2011

Know your SPU transposes - part 2

In this part of the SPU transposes series, we will cover 2 element transposes. Same format as last time.

AOS to SOA, 2 elements


4 input vectors of the format:

in1 = in1.x, in1.y, ?, ?
in2 = in2.x, in2.y, ?, ?
in3 = in3.x, in3.y, ?, ?
in4 = in4.x, in4.y, ?, ?


2 output vectors of the format:

out1 = in1.x, in2.x, in3.x, in4.x
out2 = in1.y, in2.y, in3.y, in4.y


I'll start with the simplest way to do it - all on the odd pipe. Then show some ways to do things differently to trade odd instructions for even instructions.

0 even, 4 odd, 3 shuffle masks, 10 cycles

InstructionsIn English
shufb t1, in1, in2, s_AaBb t1 = in1.x, in2.x, in1.y, in2.y
shufb t2, in3, in4, s_AaBb t2 = in3.x, in4.x, in3.y, in4.y
shufb out1, t1, t2, s_ABab out1 = in1.x, in2.x, in3.x, in4.x
shufb out2, t1, t2, s_CDcd out2 = in1.y, in2.y, in3.y, in4.y

1 even, 3 odd, 4 shuffle masks, 9 cycles

InstructionsIn English
shufb t1, in1, in2, s_AaBb t1 = in1.x, in2.x, in1.y, in2.y
shufb t2, in3, in4, s_BbAa t2 = in3.y, in4.y, in3.x, in4.x
selb out1, t2, t1, m_FF00 out1 = in1.x, in2.x, in3.x, in4.x
shufb out2, t1, t2, s_CDab out2 = in1.y, in2.y, in3.y, in4.y

4 even, 2 odd, 4 shuffle masks, 10 cycles

InstructionsIn English
selb t1, in1, in2, m_F000 t1 = in2.x, in1.y, ?, ?
shufb t2, in3, in4, s_aABb t2 = in4.x, in3.x, in3.y, in4.y
shufb t3, t1, t2, s_BAba t3 = in1.y, in2.x, in3.x, in4.x
selb out2, t2, in2, m_FF00 out2 = in2.x, in2.y, in3.y, in4.y
selb out2, out2, t3, m_F000 out2 = in1.y, in2.y, in3.y, in4.y
selb out1, t3, in1, m_F000 out1 = in1.x, in2.x, in3.x, in4.x

SOA to AOS, 2 elements


2 input vectors of the format:

in1 = in1.x, in2.x, in3.x, in4.x
in2 = in1.y, in2.y, in3.y, in4.y


4 output vectors of the format:

out1 = in1.x, in1.y, ?, ?
out2 = in2.x, in2.y, ?, ?
out3 = in3.x, in3.y, ?, ?
out4 = in4.x, in4.y, ?, ?


Again, I'll start with the simplest way to do it - all on the odd pipe. Then show some ways to do things differently to trade odd instructions for even instructions.

0 even, 4 odd, 4 shuffle masks, 7 cycles

InstructionsIn English
shufb out1, in1, in2, s_Aa00 out1 = in1.x, in1.y, 0, 0
shufb out2, in1, in2, s_Bb00 out2 = in2.x, in2.y, 0, 0
shufb out3, in1, in2, s_Cc00 out3 = in3.x, in3.y, 0, 0
shufb out4, in1, in2, s_Dd00 out4 = in4.x, in4.y, 0, 0

0 even, 4 odd, 2 shuffle masks, 9 cycles

InstructionsIn English
shufb out1, in1, in2, s_AaBb out1 = in1.x, in1.y, in2.x, in2.y
shufb out3, in1, in2, s_CcDd out3 = in3.x, in3.y, in4.x, in4.y
shlqbyi out2, out1, 8 out2 = in2.x, in2.y, 0, 0
shlqbyi out4, out3, 8 out4 = in4.x, in4.y, 0, 0

2 even, 3 odd, 4 masks, 7 cycles

InstructionsIn English
shufb out2, in1, in2, s_Ba00 out2 = in2.x, in1.y, 0, 0
shufb out3, in1, in2, s_Cc00 out3 = in3.x, in3.y, 0, 0
shufb out4, in1, in2, s_Dd00 out4 = in4.x, in4.y, 0, 0
selb out1, in1, out2, m_0F00 out1 = in1.x, in1.y, 0, 0
selb out2, out2, in1, m_0F00 out2 = in2.x, in2.y, 0, 0

2 even, 3 odd, 3 masks, 8 cycles

InstructionsIn English
shufb out2, in1, in2, s_BaCc out2 = in2.x, in1.y, in3.x, in3.y
shufb out4, in1, in2, s_Dd00 out4 = in4.x, in4.y, 0, 0
shlqbyi out3, out2, 8 out3 = in3.x, in3.y, 0, 0
selb out1, in1, out2, m_0F00 out1 = in1.x, in1.y, 0, 0
selb out2, out2, in1, m_0F00 out2 = in2.x, in2.y, in3.x, in3.y

Next post...

... 3 elements

Thursday, March 17, 2011

Know your SPU transposes

It has come to my attention that this is some needed and useful information to have. Too useful to keep to one's self. At Naughty Dog, some years back, Cort Stratton and I compiled a pretty comprehensive list of transposes. Its unbelievably handy to have on the fly as you need it. I've reconstructed the transposes as best I can and put it up here for your general use. Enjoy!

Introduction


There is more than one way to skin a cat. When converting from AOS to SOA and back there are many variations. How many instructions you have available to schedule in your even/odd pipes, how many spare registers you have to spend, and the latency of the combination of instructions used will dictate which you should use.

The goal is to first find all variations where you can trade even for odd or odd for even instructions -- and minimize the number of registers used in the process (including shuffle masks used).

Shuffle Masks
To specify shuffle masks, I'll use A-D,0 to specify the element 1 through 4 in the first parameter and a-d,0 to specify element 1 through 4 in the second parameter. '0' is special in that it means put a zero in the output for that element.

For example, s_ABab would take the first 2 elements in parameter 1 and the first 2 elements in parameter 2 and put them side by side into the output register.


Example - AOS to SOA, 1 element - 0 even, 3 odd, 1 shuffle mask, 9 cycles
Lets first consider the simple case of 4 input registers, where we are interested in combining the first element of each in(1-4) register into a single out register.

InstructionsIn English
shufb t1, in1, in2, s_ACac t1 = in1.x, ?, in2.x, ?
shufb t2, in3, in4, s_ACac t2 = in3.x, ?, in4.x, ?
shufb out, t1, t2, s_ACac out = in1.x, in2.x, in3.x, in4.x


Example - AOS to SOA, 1 element - 1 even, 2 odd, 2 shuffle masks, 7 cycles
This variation on the above splits up the even/odd pipe usage a bit at the cost of more masks.

InstructionsIn English
shufb t1, in1, in2, s_Aa00 t1 = in1.x, in2.x, 0, 0
shufb t2, in3, in4, s_00Aa t2 = 0, 0, in3.x, in4.x
or out, t1, t2 out = in1.x, in2.x, in3.x, in4.x


Example - SOA to AOS, 1 element - 0 even, 3 odd, 0 shuffle masks, 6 cycles
This example converts back from SOA to AOS. Still working on 1 element.

InstructionsIn English
shlqbyi out2, in, 4 out2 = in.y, in.z, in.w, in.x
shlqbyi out3, in, 8 out3 = in.z, in.w, in.x, in.y
shlqbyi out4, in, 12 out4 = in.w, in.x, in.y, in.z

Contribute
Know a transpose that I didn't list? Find a better one? Post in the comments and I'll update the post.

Next post...
... will be on 2 element transposes.