AOS to SOA, 2 elements
4 input vectors of the format:
in1 = in1.x, in1.y, ?, ?
in2 = in2.x, in2.y, ?, ?
in3 = in3.x, in3.y, ?, ?
in4 = in4.x, in4.y, ?, ?
2 output vectors of the format:
out1 = in1.x, in2.x, in3.x, in4.x
out2 = in1.y, in2.y, in3.y, in4.y
I'll start with the simplest way to do it - all on the odd pipe. Then show some ways to do things differently to trade odd instructions for even instructions.
0 even, 4 odd, 3 shuffle masks, 10 cycles
Instructions | In English |
shufb t1, in1, in2, s_AaBb | t1 = in1.x, in2.x, in1.y, in2.y |
shufb t2, in3, in4, s_AaBb | t2 = in3.x, in4.x, in3.y, in4.y |
shufb out1, t1, t2, s_ABab | out1 = in1.x, in2.x, in3.x, in4.x |
shufb out2, t1, t2, s_CDcd | out2 = in1.y, in2.y, in3.y, in4.y |
1 even, 3 odd, 4 shuffle masks, 9 cycles
Instructions | In English |
shufb t1, in1, in2, s_AaBb | t1 = in1.x, in2.x, in1.y, in2.y |
shufb t2, in3, in4, s_BbAa | t2 = in3.y, in4.y, in3.x, in4.x |
selb out1, t2, t1, m_FF00 | out1 = in1.x, in2.x, in3.x, in4.x |
shufb out2, t1, t2, s_CDab | out2 = in1.y, in2.y, in3.y, in4.y |
4 even, 2 odd, 4 shuffle masks, 10 cycles
Instructions | In English |
selb t1, in1, in2, m_F000 | t1 = in2.x, in1.y, ?, ? |
shufb t2, in3, in4, s_aABb | t2 = in4.x, in3.x, in3.y, in4.y |
shufb t3, t1, t2, s_BAba | t3 = in1.y, in2.x, in3.x, in4.x |
selb out2, t2, in2, m_FF00 | out2 = in2.x, in2.y, in3.y, in4.y |
selb out2, out2, t3, m_F000 | out2 = in1.y, in2.y, in3.y, in4.y |
selb out1, t3, in1, m_F000 | out1 = in1.x, in2.x, in3.x, in4.x |
SOA to AOS, 2 elements
2 input vectors of the format:
in1 = in1.x, in2.x, in3.x, in4.x
in2 = in1.y, in2.y, in3.y, in4.y
4 output vectors of the format:
out1 = in1.x, in1.y, ?, ?
out2 = in2.x, in2.y, ?, ?
out3 = in3.x, in3.y, ?, ?
out4 = in4.x, in4.y, ?, ?
Again, I'll start with the simplest way to do it - all on the odd pipe. Then show some ways to do things differently to trade odd instructions for even instructions.
0 even, 4 odd, 4 shuffle masks, 7 cycles
Instructions | In English |
shufb out1, in1, in2, s_Aa00 | out1 = in1.x, in1.y, 0, 0 |
shufb out2, in1, in2, s_Bb00 | out2 = in2.x, in2.y, 0, 0 |
shufb out3, in1, in2, s_Cc00 | out3 = in3.x, in3.y, 0, 0 |
shufb out4, in1, in2, s_Dd00 | out4 = in4.x, in4.y, 0, 0 |
0 even, 4 odd, 2 shuffle masks, 9 cycles
Instructions | In English |
shufb out1, in1, in2, s_AaBb | out1 = in1.x, in1.y, in2.x, in2.y |
shufb out3, in1, in2, s_CcDd | out3 = in3.x, in3.y, in4.x, in4.y |
shlqbyi out2, out1, 8 | out2 = in2.x, in2.y, 0, 0 |
shlqbyi out4, out3, 8 | out4 = in4.x, in4.y, 0, 0 |
2 even, 3 odd, 4 masks, 7 cycles
Instructions | In English |
shufb out2, in1, in2, s_Ba00 | out2 = in2.x, in1.y, 0, 0 |
shufb out3, in1, in2, s_Cc00 | out3 = in3.x, in3.y, 0, 0 |
shufb out4, in1, in2, s_Dd00 | out4 = in4.x, in4.y, 0, 0 |
selb out1, in1, out2, m_0F00 | out1 = in1.x, in1.y, 0, 0 |
selb out2, out2, in1, m_0F00 | out2 = in2.x, in2.y, 0, 0 |
2 even, 3 odd, 3 masks, 8 cycles
Instructions | In English |
shufb out2, in1, in2, s_BaCc | out2 = in2.x, in1.y, in3.x, in3.y |
shufb out4, in1, in2, s_Dd00 | out4 = in4.x, in4.y, 0, 0 |
shlqbyi out3, out2, 8 | out3 = in3.x, in3.y, 0, 0 |
selb out1, in1, out2, m_0F00 | out1 = in1.x, in1.y, 0, 0 |
selb out2, out2, in1, m_0F00 | out2 = in2.x, in2.y, in3.x, in3.y |