I'm Jon Olick. I make shiny things. I simplify.


I presented Sparse Voxel Octrees at Siggraph 2008.

Friday, March 18, 2011

Know your SPU transposes - part 2

In this part of the SPU transposes series, we will cover 2 element transposes. Same format as last time.

AOS to SOA, 2 elements


4 input vectors of the format:

in1 = in1.x, in1.y, ?, ?
in2 = in2.x, in2.y, ?, ?
in3 = in3.x, in3.y, ?, ?
in4 = in4.x, in4.y, ?, ?


2 output vectors of the format:

out1 = in1.x, in2.x, in3.x, in4.x
out2 = in1.y, in2.y, in3.y, in4.y


I'll start with the simplest way to do it - all on the odd pipe. Then show some ways to do things differently to trade odd instructions for even instructions.

0 even, 4 odd, 3 shuffle masks, 10 cycles

InstructionsIn English
shufb t1, in1, in2, s_AaBb t1 = in1.x, in2.x, in1.y, in2.y
shufb t2, in3, in4, s_AaBb t2 = in3.x, in4.x, in3.y, in4.y
shufb out1, t1, t2, s_ABab out1 = in1.x, in2.x, in3.x, in4.x
shufb out2, t1, t2, s_CDcd out2 = in1.y, in2.y, in3.y, in4.y

1 even, 3 odd, 4 shuffle masks, 9 cycles

InstructionsIn English
shufb t1, in1, in2, s_AaBb t1 = in1.x, in2.x, in1.y, in2.y
shufb t2, in3, in4, s_BbAa t2 = in3.y, in4.y, in3.x, in4.x
selb out1, t2, t1, m_FF00 out1 = in1.x, in2.x, in3.x, in4.x
shufb out2, t1, t2, s_CDab out2 = in1.y, in2.y, in3.y, in4.y

4 even, 2 odd, 4 shuffle masks, 10 cycles

InstructionsIn English
selb t1, in1, in2, m_F000 t1 = in2.x, in1.y, ?, ?
shufb t2, in3, in4, s_aABb t2 = in4.x, in3.x, in3.y, in4.y
shufb t3, t1, t2, s_BAba t3 = in1.y, in2.x, in3.x, in4.x
selb out2, t2, in2, m_FF00 out2 = in2.x, in2.y, in3.y, in4.y
selb out2, out2, t3, m_F000 out2 = in1.y, in2.y, in3.y, in4.y
selb out1, t3, in1, m_F000 out1 = in1.x, in2.x, in3.x, in4.x

SOA to AOS, 2 elements


2 input vectors of the format:

in1 = in1.x, in2.x, in3.x, in4.x
in2 = in1.y, in2.y, in3.y, in4.y


4 output vectors of the format:

out1 = in1.x, in1.y, ?, ?
out2 = in2.x, in2.y, ?, ?
out3 = in3.x, in3.y, ?, ?
out4 = in4.x, in4.y, ?, ?


Again, I'll start with the simplest way to do it - all on the odd pipe. Then show some ways to do things differently to trade odd instructions for even instructions.

0 even, 4 odd, 4 shuffle masks, 7 cycles

InstructionsIn English
shufb out1, in1, in2, s_Aa00 out1 = in1.x, in1.y, 0, 0
shufb out2, in1, in2, s_Bb00 out2 = in2.x, in2.y, 0, 0
shufb out3, in1, in2, s_Cc00 out3 = in3.x, in3.y, 0, 0
shufb out4, in1, in2, s_Dd00 out4 = in4.x, in4.y, 0, 0

0 even, 4 odd, 2 shuffle masks, 9 cycles

InstructionsIn English
shufb out1, in1, in2, s_AaBb out1 = in1.x, in1.y, in2.x, in2.y
shufb out3, in1, in2, s_CcDd out3 = in3.x, in3.y, in4.x, in4.y
shlqbyi out2, out1, 8 out2 = in2.x, in2.y, 0, 0
shlqbyi out4, out3, 8 out4 = in4.x, in4.y, 0, 0

2 even, 3 odd, 4 masks, 7 cycles

InstructionsIn English
shufb out2, in1, in2, s_Ba00 out2 = in2.x, in1.y, 0, 0
shufb out3, in1, in2, s_Cc00 out3 = in3.x, in3.y, 0, 0
shufb out4, in1, in2, s_Dd00 out4 = in4.x, in4.y, 0, 0
selb out1, in1, out2, m_0F00 out1 = in1.x, in1.y, 0, 0
selb out2, out2, in1, m_0F00 out2 = in2.x, in2.y, 0, 0

2 even, 3 odd, 3 masks, 8 cycles

InstructionsIn English
shufb out2, in1, in2, s_BaCc out2 = in2.x, in1.y, in3.x, in3.y
shufb out4, in1, in2, s_Dd00 out4 = in4.x, in4.y, 0, 0
shlqbyi out3, out2, 8 out3 = in3.x, in3.y, 0, 0
selb out1, in1, out2, m_0F00 out1 = in1.x, in1.y, 0, 0
selb out2, out2, in1, m_0F00 out2 = in2.x, in2.y, in3.x, in3.y

Next post...

... 3 elements

No comments:

Post a Comment