## AOS to SOA, 2 elements

4 input vectors of the format:

in1 = in1.x, in1.y, ?, ?

in2 = in2.x, in2.y, ?, ?

in3 = in3.x, in3.y, ?, ?

in4 = in4.x, in4.y, ?, ?

2 output vectors of the format:

out1 = in1.x, in2.x, in3.x, in4.x

out2 = in1.y, in2.y, in3.y, in4.y

I'll start with the simplest way to do it - all on the odd pipe. Then show some ways to do things differently to trade odd instructions for even instructions.

**0 even, 4 odd, 3 shuffle masks, 10 cycles**

Instructions | In English |

shufb t1, in1, in2, s_AaBb | t1 = in1.x, in2.x, in1.y, in2.y |

shufb t2, in3, in4, s_AaBb | t2 = in3.x, in4.x, in3.y, in4.y |

shufb out1, t1, t2, s_ABab | out1 = in1.x, in2.x, in3.x, in4.x |

shufb out2, t1, t2, s_CDcd | out2 = in1.y, in2.y, in3.y, in4.y |

**1 even, 3 odd, 4 shuffle masks, 9 cycles**

Instructions | In English |

shufb t1, in1, in2, s_AaBb | t1 = in1.x, in2.x, in1.y, in2.y |

shufb t2, in3, in4, s_BbAa | t2 = in3.y, in4.y, in3.x, in4.x |

selb out1, t2, t1, m_FF00 | out1 = in1.x, in2.x, in3.x, in4.x |

shufb out2, t1, t2, s_CDab | out2 = in1.y, in2.y, in3.y, in4.y |

**4 even, 2 odd, 4 shuffle masks, 10 cycles**

Instructions | In English |

selb t1, in1, in2, m_F000 | t1 = in2.x, in1.y, ?, ? |

shufb t2, in3, in4, s_aABb | t2 = in4.x, in3.x, in3.y, in4.y |

shufb t3, t1, t2, s_BAba | t3 = in1.y, in2.x, in3.x, in4.x |

selb out2, t2, in2, m_FF00 | out2 = in2.x, in2.y, in3.y, in4.y |

selb out2, out2, t3, m_F000 | out2 = in1.y, in2.y, in3.y, in4.y |

selb out1, t3, in1, m_F000 | out1 = in1.x, in2.x, in3.x, in4.x |

## SOA to AOS, 2 elements

2 input vectors of the format:

in1 = in1.x, in2.x, in3.x, in4.x

in2 = in1.y, in2.y, in3.y, in4.y

4 output vectors of the format:

out1 = in1.x, in1.y, ?, ?

out2 = in2.x, in2.y, ?, ?

out3 = in3.x, in3.y, ?, ?

out4 = in4.x, in4.y, ?, ?

Again, I'll start with the simplest way to do it - all on the odd pipe. Then show some ways to do things differently to trade odd instructions for even instructions.

**0 even, 4 odd, 4 shuffle masks, 7 cycles**

Instructions | In English |

shufb out1, in1, in2, s_Aa00 | out1 = in1.x, in1.y, 0, 0 |

shufb out2, in1, in2, s_Bb00 | out2 = in2.x, in2.y, 0, 0 |

shufb out3, in1, in2, s_Cc00 | out3 = in3.x, in3.y, 0, 0 |

shufb out4, in1, in2, s_Dd00 | out4 = in4.x, in4.y, 0, 0 |

**0 even, 4 odd, 2 shuffle masks, 9 cycles**

Instructions | In English |

shufb out1, in1, in2, s_AaBb | out1 = in1.x, in1.y, in2.x, in2.y |

shufb out3, in1, in2, s_CcDd | out3 = in3.x, in3.y, in4.x, in4.y |

shlqbyi out2, out1, 8 | out2 = in2.x, in2.y, 0, 0 |

shlqbyi out4, out3, 8 | out4 = in4.x, in4.y, 0, 0 |

**2 even, 3 odd, 4 masks, 7 cycles**

Instructions | In English |

shufb out2, in1, in2, s_Ba00 | out2 = in2.x, in1.y, 0, 0 |

shufb out3, in1, in2, s_Cc00 | out3 = in3.x, in3.y, 0, 0 |

shufb out4, in1, in2, s_Dd00 | out4 = in4.x, in4.y, 0, 0 |

selb out1, in1, out2, m_0F00 | out1 = in1.x, in1.y, 0, 0 |

selb out2, out2, in1, m_0F00 | out2 = in2.x, in2.y, 0, 0 |

**2 even, 3 odd, 3 masks, 8 cycles**

Instructions | In English |

shufb out2, in1, in2, s_BaCc | out2 = in2.x, in1.y, in3.x, in3.y |

shufb out4, in1, in2, s_Dd00 | out4 = in4.x, in4.y, 0, 0 |

shlqbyi out3, out2, 8 | out3 = in3.x, in3.y, 0, 0 |

selb out1, in1, out2, m_0F00 | out1 = in1.x, in1.y, 0, 0 |

selb out2, out2, in1, m_0F00 | out2 = in2.x, in2.y, in3.x, in3.y |

## No comments:

## Post a Comment