LLVM学习笔记(14)补1

3.3.6.4.3. 定义composeSubRegIndicesImpl()方法

前面我们已经输出了寄存器索引的数据,这里面包括了复合寄存器索引。我们已经知道复合寄存器索引是两个寄存器索引共同在一个寄存器上作用的结果,它与这两个索引是等价的。比如,R:a:b与R:c都是指向寄存器同一个部分的索引,我们需要一个方法来确定R:a:b与R:c援引的是同一个东西。为此,LLVM提供了一个方法TargetRegisterInfo::composeSubRegIndices(),对R:a:b与R:c这个例子,composeSubRegIndices(a, b)返回c。

535       unsigned composeSubRegIndices(unsigned a, unsigned b) const {

536         if (!a) return b;

537         if (!b) return a;

538         return composeSubRegIndicesImpl(a, b);

539       }

具体的执行方法是composeSubRegIndicesImpl(),它由下面的RegisterInfoEmitter::runTargetDesc()代码输出的。

RegisterInfoEmitter::runTargetDesc(续)

1331    std::string ClassName = Target.getName() + "GenRegisterInfo";

1332 

1333    auto SubRegIndicesSize =

1334        std::distance(SubRegIndices.begin(), SubRegIndices.end());

1335 

1336    if (!SubRegIndices.empty()) {

1337      emitComposeSubRegIndices(OS, RegBank, ClassName);

1338      emitComposeSubRegIndexLaneMask(OS, RegBank, ClassName);

1339    }

输出的composeSubRegIndicesImpl()是类X86GenRegisterInfo的方法(我们以X86目标机器为例)。基类TargetRegisterInfo的这个方法是不可用的虚函数。

629     void

630       RegisterInfoEmitter::emitComposeSubRegIndices(raw_ostream &OS,

631                                                   CodeGenRegBank &RegBank,

632                                                   const std::string &ClName) {

633       const auto &SubRegIndices = RegBank.getSubRegIndices();

634       OS << "unsigned " << ClName

635          << "::composeSubRegIndicesImpl(unsigned IdxA, unsigned IdxB) const {\n";

636    

637       // Many sub-register indexes are composition-compatible, meaning that

638       //

639       //   compose(IdxA, IdxB) == compose(IdxA', IdxB)

640       //

641       // for many IdxA, IdxA' pairs. Not all sub-register indexes can be composed.

642       // The illegal entries can be use as wildcards to compress the table further.

643    

644       // Map each Sub-register index to a compatible table row.

645       SmallVector<unsigned, 4> RowMap;

646       SmallVector<SmallVector<CodeGenSubRegIndex*, 4>, 4> Rows;

647    

648       auto SubRegIndicesSize =

649           std::distance(SubRegIndices.begin(), SubRegIndices.end());

650       for (const auto &Idx : SubRegIndices) {

651         unsigned Found = ~0u;

652         for (unsigned r = 0, re = Rows.size(); r != re; ++r) {

653           if (combine(&Idx, Rows[r])) {

654             Found = r;

655             break;

656           }

657         }

658         if (Found == ~0u) {

659           Found = Rows.size();

660           Rows.resize(Found + 1);

661           Rows.back().resize(SubRegIndicesSize);

662           combine(&Idx, Rows.back());

663         }

664         RowMap.push_back(Found);

665       }

666    

667       // Output the row map if there is multiple rows.

668       if (Rows.size() > 1) {

669         OS << "  static const " << getMinimalTypeForRange(Rows.size()) << " RowMap["

670            << SubRegIndicesSize << "] = {\n    ";

671         for (unsigned i = 0, e = SubRegIndicesSize; i != e; ++i)

672           OS << RowMap[i] << ", ";

673         OS << "\n  };\n";

674       }

675

676       // Output the rows.

677       OS << "  static const " << getMinimalTypeForRange(SubRegIndicesSize + 1)

678          << " Rows[" << Rows.size() << "][" << SubRegIndicesSize << "] = {\n";

679       for (unsigned r = 0, re = Rows.size(); r != re; ++r) {

680         OS << "    { ";

681         for (unsigned i = 0, e = SubRegIndicesSize; i != e; ++i)

682           if (Rows[r][i])

683             OS << Rows[r][i]->EnumValue << ", ";

684           else

685             OS << "0, ";

686         OS << "},\n";

687       }

688       OS << "  };\n\n";

689    

690       OS << "  --IdxA; assert(IdxA < " << SubRegIndicesSize << ");\n"

691          << "  --IdxB; assert(IdxB < " << SubRegIndicesSize << ");\n";

692       if (Rows.size() > 1)

693         OS << "  return Rows[RowMap[IdxA]][IdxB];\n";

694       else

695         OS << "  return Rows[0][IdxB];\n";

696       OS << "}\n\n";

697     }

650行遍历所有的寄存器索引,容器Rows则保存着每个子寄存器索引到其合成子寄存器索引的映射(这是一个二维数组,每一行以寄存器索引的EnumValue-1为下标(这也是它在容器SubRegIndices中的索引),因此如果某个子寄存器索引存在多个复合方案,将相应存在多个行)。652行的循环检查SubRegIndices[i]是否已经出现在Rows中。combine()函数会逐个比较该索引与Rows的每一行。如果在这一行上,SubRegIndices[i]还未映射,在624行建立这个映射关系。

610     static bool combine(const CodeGenSubRegIndex *Idx,

611                         SmallVectorImpl<CodeGenSubRegIndex*> &Vec) {

612       const CodeGenSubRegIndex::CompMap &Map = Idx->getComposites();

613       for (const auto &I : Map) {

614         CodeGenSubRegIndex *&Entry = Vec[I.first->EnumValue - 1];

615         if (Entry && Entry != I.second)

616           return false;

617       }

618    

619       // All entries are compatible. Make it so.

620       for (const auto &I : Map) {

621         auto *&Entry = Vec[I.first->EnumValue - 1];

622         assert((!Entry || Entry == I.second) &&

623                "Expected EnumValue to be unique");

624         Entry = I.second;

625       }

626       return true;

627     }

而如果与SubRegIndices[i]的映射关系没有建立或者没有映射关系,在616行返回false,继续652行循环。如果循环结束都没有找到这个映射,通过658~665行扩展Rows来新建这个映射。容器RowMap依次记录了这些子寄存器索引对应Rows的行号(664行)。如果Rows多于一行,需要输出RowMap。那么上面的代码将输出这样的代码片段(以ARM为例):

unsigned ARMGenRegisterInfo::composeSubRegIndicesImpl(unsigned IdxA, unsigned IdxB) const {

  static const uint8_t RowMap[56] = {

    0, 1, 2, 3, 4, 5, 6, 7, 0, 0, 0, 4, 0, 2, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5, 2,

  };

  static const uint8_t Rows[8][56] = {

    { 1, 2, 3, 4, 5, 0, 7, 0, 0, 0, 0, 0, 13, 14, 0, 0, 17, 18, 19, 20, 21, 22, 23, 24, 0, 0, 27, 28, 0, 0, 31, 32, 33, 34, 35, 36, 37, 38, 0, 0, 0, 0, 43, 0, 45, 0, 0, 0, 0, 0, 51, 0, 0, 0, 0, 0, },

    { 2, 3, 4, 5, 6, 0, 8, 0, 0, 0, 0, 0, 37, 49, 0, 0, 19, 20, 21, 22, 23, 24, 31, 32, 0, 0, 25, 26, 0, 0, 29, 30, 35, 36, 43, 44, 14, 40, 0, 0, 0, 0, 46, 0, 48, 0, 0, 0, 0, 0, 53, 0, 0, 0, 0, 0, },

    { 3, 4, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 14, 15, 0, 0, 21, 22, 23, 24, 31, 32, 29, 30, 0, 0, 0, 0, 0, 0, 27, 28, 43, 44, 46, 47, 49, 0, 0, 0, 0, 0, 51, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },

    { 4, 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 49, 55, 0, 0, 23, 24, 31, 32, 29, 30, 27, 28, 0, 0, 0, 0, 0, 0, 25, 26, 46, 47, 51, 52, 15, 0, 0, 0, 0, 0, 53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },

    { 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0, 15, 16, 0, 0, 31, 32, 29, 30, 27, 28, 25, 26, 0, 0, 0, 0, 0, 0, 0, 0, 51, 52, 53, 54, 55, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },

    { 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 55, 0, 0, 0, 29, 30, 27, 28, 25, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 53, 0, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },

    { 7, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 28, 25, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },

    { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },

  };

 

  --IdxA; assert(IdxA < 56);

  --IdxB; assert(IdxB < 56);

  return Rows[RowMap[IdxA]][IdxB];

}

Rows中0表示不存在复合关系(685行)。

LLVM还提供了一个composeSubRegIndexLaneMask方法,当给定一个寄存器索引的Lane掩码与另一个寄存器索引,该方法返回相应复合寄存器索引的Lane掩码,如果存在的话。

544       unsigned composeSubRegIndexLaneMask(unsigned IdxA, unsigned LaneMask) const {

545         if (!IdxA)

546           return LaneMask;

547         return composeSubRegIndexLaneMaskImpl(IdxA, LaneMask);

548       }

同样,这个方法的实际执行者是TableGen生成的composeSubRegIndexLaneMaskImpl,它由下面的方法输出。

629     void

630 RegisterInfoEmitter::emitComposeSubRegIndexLaneMask(raw_ostream &OS,

631                                                         CodeGenRegBank &RegBank,

632                                                         const std::string &ClName) {

633       // See the comments in computeSubRegLaneMasks() for our goal here.

634       const auto &SubRegIndices = RegBank.getSubRegIndices();

635    

636       // Create a list of Mask+Rotate operations, with equivalent entries merged.

637       SmallVector<unsigned, 4> SubReg2SequenceIndexMap;

638       SmallVector<SmallVector<MaskRolPair, 1>, 4> Sequences;

639       for (const auto &Idx : SubRegIndices) {

640         const SmallVector<MaskRolPair, 1> &IdxSequence

641           = Idx.CompositionLaneMaskTransform;

642    

643         unsigned Found = ~0u;

644         unsigned SIdx = 0;

645         unsigned NextSIdx;

646         for (size_t s = 0, se = Sequences.size(); s != se; ++s, SIdx = NextSIdx) {

647           SmallVectorImpl<MaskRolPair> &Sequence = Sequences[s];

648           NextSIdx = SIdx + Sequence.size() + 1;

649           if (Sequence == IdxSequence) {

650             Found = SIdx;

651             break;

652           }

653         }

654         if (Found == ~0u) {

655           Sequences.push_back(IdxSequence);

656           Found = SIdx;

657         }

658         SubReg2SequenceIndexMap.push_back(Found);

659       }

660    

661       OS << "unsigned " << ClName

662          << "::composeSubRegIndexLaneMaskImpl(unsigned IdxA, unsigned LaneMask)"

663             " const {\n";

664    

665       OS << "  struct MaskRolOp {\n"

666             "    unsigned Mask;\n"

667             "    uint8_t  RotateLeft;\n"

668             "  };\n"

669             "  static const MaskRolOp Seqs[] = {\n";

670       unsigned Idx = 0;

671       for (size_t s = 0, se = Sequences.size(); s != se; ++s) {

672         OS << "    ";

673         const SmallVectorImpl<MaskRolPair> &Sequence = Sequences[s];

674         for (size_t p = 0, pe = Sequence.size(); p != pe; ++p) {

675           const MaskRolPair &P = Sequence[p];

676           OS << format("{ 0x%08X, %2u }, ", P.Mask, P.RotateLeft);

677         }

678         OS << "{ 0, 0 }";

679         if (s+1 != se)

680           OS << ", ";

681         OS << "  // Sequence " << Idx << "\n";

682         Idx += Sequence.size() + 1;

683       }

684       OS << "  };\n"

685             "  static const MaskRolOp *const CompositeSequences[] = {\n";

686       for (size_t i = 0, e = SubRegIndices.size(); i != e; ++i) {

687         OS << "    ";

688         unsigned Idx = SubReg2SequenceIndexMap[i];

689         OS << format("&Seqs[%u]", Idx);

690         if (i+1 != e)

691           OS << ",";

692         OS << " // to " << SubRegIndices[i].getName() << "\n";

693       }

694       OS << "  };\n\n";

695    

696       OS << "  --IdxA; assert(IdxA < " << SubRegIndices.size()

697          << " && \"Subregister index out of bounds\");\n"

698             "  unsigned Result = 0;\n"

699             "  for (const MaskRolOp *Ops = CompositeSequences[IdxA]; Ops->Mask != 0; ++Ops)"

700             " {\n"

701             "    unsigned Masked = LaneMask & Ops->Mask;\n"

702             "    Result |= (Masked << Ops->RotateLeft) & 0xFFFFFFFF;\n"

703             "    Result |= (Masked >> ((32 - Ops->RotateLeft) & 0x1F));\n"

704             "  }\n"

705             "  return Result;\n"

706             "}\n";

707     }

641行CodeGenSubRegIndex的CompositionLaneMaskTransform容器记录了若干个MaskRolPair对象,每个对象记录了从伙伴索引的Lane掩码出发,经过指定位数的右移,能得到复合索引的Lane掩码(参考完整的索引信息一节)。709行的循环遍历所有的寄存器索引,临时容器Sequences用于记录不重复的MaskRolPair序列。注意IdxSequence都不可能是空的,因为在调用方法CodeGenRegBank::computeSubRegLaneMasks时,这个方法将不存在复合索引的索引所对应的MaskRolPair对象设置为{~0u, 0}。

在741行与744行的循环里,将MaskRolPair序列的内容输出,声明为一个MaskRolPair数组Seqs,这个数组以元素{0, 0}结尾(空序列只输出{0, 0})。756行循环则是声明了一个CompositeSequences数组,显示指定索引所对应的Seqs元素。最终,我们将得到这样的方法(以ARM为例,X86的太简单、无趣了)。

unsigned ARMGenRegisterInfo::composeSubRegIndexLaneMaskImpl(unsigned IdxA, unsigned LaneMask) const {

  struct MaskRolOp {

    unsigned Mask;

    uint8_t  RotateLeft;

  };

  static const MaskRolOp Seqs[] = {

    { 0xFFFFFFFF,  0 }, { 0, 0 },   // Sequence 0

    { 0xFFFFFFFF,  2 }, { 0, 0 },   // Sequence 2

    { 0xFFFFFFFF,  4 }, { 0, 0 },   // Sequence 4

    { 0xFFFFFFFF,  6 }, { 0, 0 },   // Sequence 6

    { 0xFFFFFFFF, 14 }, { 0, 0 },   // Sequence 8

    { 0xFFFFFFFF, 12 }, { 0, 0 },   // Sequence 10

    { 0xFFFFFFFF, 10 }, { 0, 0 },   // Sequence 12

    { 0xFFFFFFFF,  8 }, { 0, 0 },   // Sequence 14

    { 0x0000000C, 14 }, { 0x00000030, 10 }, { 0x000000C0,  6 }, { 0x00000300,  2 }, { 0, 0 },   // Sequence 16

    { 0x0000000C, 14 }, { 0x00000030, 10 }, { 0, 0 },   // Sequence 21

    { 0x0000000C, 10 }, { 0x00000030,  6 }, { 0, 0 },   // Sequence 24

    { 0x000000CC,  2 }, { 0x00030000, 30 }, { 0, 0 },   // Sequence 27

    { 0x000000CC,  2 }, { 0x00033000, 30 }, { 0, 0 },   // Sequence 30

    { 0x000000FC,  2 }, { 0x00000300,  8 }, { 0, 0 },   // Sequence 33

    { 0x0000000C,  4 }, { 0x000000C0, 10 }, { 0, 0 },   // Sequence 36

    { 0x0000003C,  4 }, { 0x000000C0, 10 }, { 0, 0 },   // Sequence 39

    { 0x0000000C,  4 }, { 0x000000C0, 10 }, { 0x00030000, 28 }, { 0, 0 },   // Sequence 42

    { 0x0000000C,  6 }, { 0x000000C0,  8 }, { 0, 0 },   // Sequence 46

    { 0x0000000C,  6 }, { 0x00000030, 12 }, { 0x000000C0,  8 }, { 0, 0 },   // Sequence 49

    { 0x0000000C,  6 }, { 0x000000C0,  8 }, { 0x00030000, 26 }, { 0, 0 },   // Sequence 53

    { 0x0000000C,  6 }, { 0x00000030, 12 }, { 0, 0 },   // Sequence 57

    { 0x0000000C,  6 }, { 0x00000030, 12 }, { 0x000000C0,  8 }, { 0x00000300,  4 }, { 0, 0 },   // Sequence 60

    { 0x0000000C, 14 }, { 0x000000C0,  6 }, { 0, 0 },   // Sequence 65

    { 0x0000000C, 14 }, { 0x00000030, 10 }, { 0x000000C0,  6 }, { 0, 0 },   // Sequence 68

    { 0x0000000C, 12 }, { 0x000000C0,  4 }, { 0, 0 },   // Sequence 72

    { 0x0000000C, 12 }, { 0x00000030,  8 }, { 0x000000C0,  4 }, { 0, 0 },   // Sequence 75

    { 0x0000000C, 12 }, { 0x00000030,  8 }, { 0, 0 },   // Sequence 79

    { 0x0000003C,  4 }, { 0x000000C0, 10 }, { 0x00000300,  6 }, { 0, 0 }  // Sequence 82

  };

  static const MaskRolOp *const CompositeSequences[] = {

    &Seqs[0], // to dsub_0

    &Seqs[2], // to dsub_1

    &Seqs[4], // to dsub_2

    &Seqs[6], // to dsub_3

    &Seqs[8], // to dsub_4

    &Seqs[10], // to dsub_5

    &Seqs[12], // to dsub_6

    &Seqs[14], // to dsub_7

    &Seqs[0], // to gsub_0

    &Seqs[0], // to gsub_1

    &Seqs[0], // to qqsub_0

    &Seqs[16], // to qqsub_1

    &Seqs[0], // to qsub_0

    &Seqs[4], // to qsub_1

    &Seqs[21], // to qsub_2

    &Seqs[24], // to qsub_3

    &Seqs[0], // to ssub_0

    &Seqs[0], // to ssub_1

    &Seqs[0], // to ssub_2

    &Seqs[0], // to ssub_3

    &Seqs[0], // to dsub_2_then_ssub_0

    &Seqs[0], // to dsub_2_then_ssub_1

    &Seqs[0], // to dsub_3_then_ssub_0

    &Seqs[0], // to dsub_3_then_ssub_1

    &Seqs[0], // to dsub_7_then_ssub_0

    &Seqs[0], // to dsub_7_then_ssub_1

    &Seqs[0], // to dsub_6_then_ssub_0

    &Seqs[0], // to dsub_6_then_ssub_1

    &Seqs[0], // to dsub_5_then_ssub_0

    &Seqs[0], // to dsub_5_then_ssub_1

    &Seqs[0], // to dsub_4_then_ssub_0

    &Seqs[0], // to dsub_4_then_ssub_1

    &Seqs[0], // to dsub_0_dsub_2

    &Seqs[0], // to dsub_0_dsub_1_dsub_2

    &Seqs[2], // to dsub_1_dsub_3

    &Seqs[2], // to dsub_1_dsub_2_dsub_3

    &Seqs[2], // to dsub_1_dsub_2

    &Seqs[0], // to dsub_0_dsub_2_dsub_4

    &Seqs[0], // to dsub_0_dsub_2_dsub_4_dsub_6

    &Seqs[27], // to dsub_1_dsub_3_dsub_5

    &Seqs[30], // to dsub_1_dsub_3_dsub_5_dsub_7

    &Seqs[33], // to dsub_1_dsub_2_dsub_3_dsub_4

    &Seqs[36], // to dsub_2_dsub_4

    &Seqs[39], // to dsub_2_dsub_3_dsub_4

    &Seqs[42], // to dsub_2_dsub_4_dsub_6

    &Seqs[46], // to dsub_3_dsub_5

    &Seqs[49], // to dsub_3_dsub_4_dsub_5

    &Seqs[53], // to dsub_3_dsub_5_dsub_7

    &Seqs[57], // to dsub_3_dsub_4

    &Seqs[60], // to dsub_3_dsub_4_dsub_5_dsub_6

    &Seqs[65], // to dsub_4_dsub_6

    &Seqs[68], // to dsub_4_dsub_5_dsub_6

    &Seqs[72], // to dsub_5_dsub_7

    &Seqs[75], // to dsub_5_dsub_6_dsub_7

    &Seqs[79], // to dsub_5_dsub_6

    &Seqs[82] // to qsub_1_qsub_2

  };

 

  --IdxA; assert(IdxA < 56 && "Subregister index out of bounds");

  unsigned Result = 0;

  for (const MaskRolOp *Ops = CompositeSequences[IdxA]; Ops->Mask != 0; ++Ops) {

    unsigned Masked = LaneMask & Ops->Mask;

    Result |= (Masked << Ops->RotateLeft) & 0xFFFFFFFF;

    Result |= (Masked >> ((32 - Ops->RotateLeft) & 0x1F));

  }

  return Result;

}

Seqs[0]对应不存在复合索引的情形。如果CompositeSequences给出Seqs[0],那么在倒数第5行Result得到Masked的值,而倒数第4行的右手侧是0。因此返回的是Masked值。

有些Sequence超过两项,这些都对应一个索引能复合从多个索引的情形(只有两项的Sequence也可以对应一个索引能复合从多个索引的情形,关键看Mask有几个比特1),在上面的for循环里,实际上只有其中一个会起作用,其他都会产生0。

V7.0还引入了reverseComposeSubRegIndexLaneMask方法,它是composeSubRegIndexLaneMask方法的逆。假设Mask是有效的Lane掩码,那么下面成立:

X0 = composeSubRegIndexLaneMask(Idx, Mask)

X1 = reverseComposeSubRegIndexLaneMask(Idx, X0)

可以推导出X1 == Mask

611       LaneBitmask reverseComposeSubRegIndexLaneMask(unsigned IdxA,

612                                                     LaneBitmask LaneMask) const {

613         if (!IdxA)

614           return LaneMask;

615        return reverseComposeSubRegIndexLaneMaskImpl(IdxA, LaneMask);

616       }

v7.0生成的X86相关的函数是这样的:

  struct MaskRolOp {

    LaneBitmask Mask;

    uint8_t  RotateLeft;

  };

  static const MaskRolOp LaneMaskComposeSequences[] = {

    { LaneBitmask(0xFFFFFFFF),  0 }, { LaneBitmask::getNone(), 0 },   // Sequence 0

    { LaneBitmask(0xFFFFFFFF),  1 }, { LaneBitmask::getNone(), 0 },   // Sequence 2

    { LaneBitmask(0xFFFFFFFF),  2 }, { LaneBitmask::getNone(), 0 },   // Sequence 4

    { LaneBitmask(0xFFFFFFFF),  3 }, { LaneBitmask::getNone(), 0 },   // Sequence 6

    { LaneBitmask(0xFFFFFFFF),  4 }, { LaneBitmask::getNone(), 0 }  // Sequence 8

  };

  static const MaskRolOp *const CompositeSequences[] = {

    &LaneMaskComposeSequences[0], // to sub_8bit

    &LaneMaskComposeSequences[2], // to sub_8bit_hi

    &LaneMaskComposeSequences[4], // to sub_8bit_hi_phony

    &LaneMaskComposeSequences[0], // to sub_16bit

    &LaneMaskComposeSequences[6], // to sub_16bit_hi

    &LaneMaskComposeSequences[0], // to sub_32bit

    &LaneMaskComposeSequences[8], // to sub_xmm

    &LaneMaskComposeSequences[0] // to sub_ymm

  };

 

LaneBitmask X86GenRegisterInfo::composeSubRegIndexLaneMaskImpl(unsigned IdxA, LaneBitmask LaneMask) const {

  --IdxA; assert(IdxA < 8 && "Subregister index out of bounds");

  LaneBitmask Result;

  for (const MaskRolOp *Ops = CompositeSequences[IdxA]; Ops->Mask.any(); ++Ops) {

    LaneBitmask::Type M = LaneMask.getAsInteger() & Ops->Mask.getAsInteger();

    if (unsigned S = Ops->RotateLeft)

      Result |= LaneBitmask((M << S) | (M >> (LaneBitmask::BitWidth - S)));

    else

      Result |= LaneBitmask(M);

  }

  return Result;

}

 

LaneBitmask X86GenRegisterInfo::reverseComposeSubRegIndexLaneMaskImpl(unsigned IdxA,  LaneBitmask LaneMask) const {

  LaneMask &= getSubRegIndexLaneMask(IdxA);

  --IdxA; assert(IdxA < 8 && "Subregister index out of bounds");

  LaneBitmask Result;

  for (const MaskRolOp *Ops = CompositeSequences[IdxA]; Ops->Mask.any(); ++Ops) {

    LaneBitmask::Type M = LaneMask.getAsInteger();

    if (unsigned S = Ops->RotateLeft)

      Result |= LaneBitmask((M >> S) | (M << (LaneBitmask::BitWidth - S)));

    else

      Result |= LaneBitmask(M);

  }

  return Result;

}

v7.0emitComposeSubRegIndexLaneMask方法与v3.6.1差异较大,但代码本身不算复杂而且较大,因此不在此处列出。

3.3.6.4.4. ​​​​​​​定义getSubClassWithSubReg()方法

RegisterInfoEmitter::runTargetDesc()接下来输出X86GenRegisterInfo方法getSubClassWithSubReg()。这个方法给定一个寄存器类与寄存器索引,返回该寄存器类支持该索引的最大寄存器子类。

RegisterInfoEmitter::runTargetDesc(续)

1341    // Emit getSubClassWithSubReg.

1342    if (!SubRegIndices.empty()) {

1343      OS << "const TargetRegisterClass *" << ClassName

1344         << "::getSubClassWithSubReg(const TargetRegisterClass *RC, unsigned Idx)"

1345         << " const {\n";

1346      // Use the smallest type that can hold a regclass ID with room for a

1347      // sentinel.

1348      if (RegisterClasses.size() < UINT8_MAX)

1349        OS << "  static const uint8_t Table[";

1350      else if (RegisterClasses.size() < UINT16_MAX)

1351        OS << "  static const uint16_t Table[";

1352      else

1353        PrintFatalError("Too many register classes.");

1354      OS << RegisterClasses.size() << "][" << SubRegIndicesSize << "] = {\n";

1355      for (const auto &RC : RegisterClasses) {

1356        OS << "    {\t// " << RC.getName() << "\n";

1357        for (auto &Idx : SubRegIndices) {

1358          if (CodeGenRegisterClass *SRC = RC.getSubClassWithSubReg(&Idx))

1359            OS << "      " << SRC->EnumValue + 1 << ",\t// " << Idx.getName()

1360               << " -> " << SRC->getName() << "\n";

1361          else

1362            OS << "      0,\t// " << Idx.getName() << "\n";

1363        }

1364        OS << "    },\n";

1365      }

1366      OS << "  };\n  assert(RC && \"Missing regclass\");\n"

1367         << "  if (!Idx) return RC;\n  --Idx;\n"

1368         << "  assert(Idx < " << SubRegIndicesSize << " && \"Bad subreg\");\n"

1369         << "  unsigned TV = Table[RC->getID()][Idx];\n"

1370         << "  return TV ? getRegClass(TV - 1) : nullptr;\n}\n\n";

1371    }

CodeGenRegisterClass的SubClassWithSubReg容器已经记录了所需要的信息(参见方法CodeGenRegBank::inferSubClassWithSubReg()),因此只要在1355行对CodeGenRegisterClass的遍历过程中,针对每个寄存器索引获取该容器里对应的记录就可以了。对于X86目标机器,输出的函数为:

const TargetRegisterClass *X86GenRegisterInfo::getSubClassWithSubReg(const TargetRegisterClass *RC, unsigned Idx) const {

  static const uint8_t Table[80][6] = {

    { // GR8

      0,   // sub_8bit

      0,   // sub_8bit_hi

      0,   // sub_16bit

      0,   // sub_32bit

      0,   // sub_xmm

      0,   // sub_ymm

    },

    …

    { // GR16_ABCD

      18, // sub_8bit -> GR16_ABCD

      18, // sub_8bit_hi -> GR16_ABCD

      0,   // sub_16bit

      0,   // sub_32bit

      0,   // sub_xmm

      0,   // sub_ymm

    },

    …

    { // VR512_with_sub_xmm_in_FR32

      0,   // sub_8bit

      0,   // sub_8bit_hi

      0,   // sub_16bit

      0,   // sub_32bit

      80, // sub_xmm -> VR512_with_sub_xmm_in_FR32

      80, // sub_ymm -> VR512_with_sub_xmm_in_FR32

    },

  };

  assert(RC && "Missing regclass");

  if (!Idx) return RC;

  --Idx;

  assert(Idx < 6 && "Bad subreg");

  unsigned TV = Table[RC->getID()][Idx];

  return TV ? getRegClass(TV - 1) : nullptr;

}

这个方法太大了,我们省略了Table数组的部分定义。在数组定义里注释给出的是对应的寄存器索引,如果是0,表明该寄存器类别没有支持该索引的子类,否则注释会进一步给出这个子类的名字。比如,GR16_ABCD部分,它的前两项分别对应寄存器类GR8与GR8_NOREX,18-1 = 17是GR16_ABCD在RegisterClasses容器的下标(由getRegClass方法获取)。

猜你喜欢

转载自blog.csdn.net/wuhui_gdnt/article/details/88737151