LLVM daga hangen nesa Go

Ƙirƙirar na'ura mai tarawa aiki ne mai wuyar gaske. Amma, an yi sa'a, tare da ci gaban ayyuka kamar LLVM, maganin wannan matsala yana da sauƙi sosai, wanda ya ba da damar ko da mai tsara shirye-shirye guda ɗaya don ƙirƙirar sabon harshe wanda ke kusa da aikin C. Yin aiki tare da LLVM yana da rikitarwa ta gaskiyar cewa wannan yana da wuyar gaske. tsarin yana wakiltar adadi mai yawa, sanye take da ƙananan takardu. Domin kokarin gyara wannan gazawar, marubucin littafin, wanda muke bugawa a yau, zai nuna misalan lambar da aka rubuta a cikin Go kuma ya nuna yadda aka fara fassara su zuwa Go SSA, sannan a cikin LLVM IR ta amfani da mai tarawa kankaninGO. An ɗan gyara lambar ta Go SSA da LLVM IR don cire abubuwan da ba su dace da bayanin da aka bayar a nan ba, domin a ƙara fahimtar bayanin.

LLVM daga hangen nesa Go

Misali na farko

Aikin farko da zan duba anan shine hanya mai sauƙi don ƙara lambobi:

func myAdd(a, b int) int{
    return a + b
}

Wannan aikin yana da sauƙi, kuma, watakila, babu abin da zai iya zama mafi sauƙi. Yana fassara zuwa lambar Go SSA mai zuwa:

func myAdd(a int, b int) int:
entry:
    t0 = a + b                                                    int
    return t0

Tare da wannan ra'ayi, ana sanya alamun nau'in bayanan a dama kuma ana iya yin watsi da su a mafi yawan lokuta.

Wannan ƙaramin misali ya riga ya ba ku damar ganin ainihin wani bangare na SSA. Wato, lokacin da ake canza lamba zuwa nau'in SSA, kowane magana yana rushewa zuwa mafi yawan sassan farko waɗanda aka haɗa su. A cikin yanayinmu, umarnin return a + b, a gaskiya, yana wakiltar ayyuka guda biyu: ƙara lambobi biyu da mayar da sakamakon.

Bugu da kari, a nan za ku iya ganin ainihin tubalan shirin; a cikin wannan lambar akwai toshe ɗaya kawai - toshe shigarwa. Za mu yi magana game da tubalan da ke ƙasa.

Lambar Go SSA tana canzawa cikin sauƙi zuwa LLVM IR:

define i64 @myAdd(i64 %a, i64 %b) {
entry:
  %0 = add i64 %a, %b
  ret i64 %0
}

Abin da za ku iya lura shi ne cewa ko da yake ana amfani da sifofi daban-daban a nan, tsarin aikin ba ya canzawa. Lambar LLVM IR tana da ɗan ƙarfi fiye da lambar Go SSA, kama da C. Anan, a cikin bayanin aikin, da farko akwai bayanin nau'in bayanan da ya dawo, ana nuna nau'in hujja kafin sunan hujja. Bugu da ƙari, don sauƙaƙe fassarar IR, sunayen abubuwan duniya suna gaba da alamar @, kuma kafin sunayen gida akwai alama % (aikin kuma ana ɗaukarsa a matsayin mahallin duniya).

Abu daya da yakamata a lura game da wannan lambar shine shawarar wakilcin nau'in Go int, wanda za a iya wakilta a matsayin ƙimar 32-bit ko 64-bit, dangane da mai tarawa da makasudin tattarawa, ana karɓa lokacin da LLVM ta haifar da lambar IR. Wannan shine ɗayan dalilai da yawa waɗanda lambar LLVM IR ba ta, kamar yadda mutane da yawa ke tunani, dandamali mai zaman kansa. Irin wannan lambar, wanda aka ƙirƙira don dandamali ɗaya, ba za a iya ɗauka kawai a haɗa shi zuwa wani dandamali ba (sai dai idan kun dace don magance wannan matsalar). tare da taka tsantsan).

Wani abu mai ban sha'awa da ya kamata a lura shi ne cewa nau'in i64 ba lamba ce da aka sanya hannu ba: tsaka tsaki ne dangane da wakiltar alamar lambar. Dangane da umarnin, yana iya wakiltar lambobi biyu da aka sa hannu da waɗanda ba sa hannu. A cikin yanayin wakilcin aikin ƙarawa, wannan ba kome ba ne, don haka babu bambanci a cikin aiki tare da lambobi masu sa hannu ko ba a sanya hannu ba. Anan ina so in lura cewa a cikin yaren C, zubar da madaidaicin lamba da aka sa hannu yana haifar da halayen da ba a bayyana ba, don haka Clang frontend yana ƙara tuta ga aikin. nsw (ba a sanya hannu ba), wanda ke gaya wa LLVM cewa zai iya ɗauka cewa ƙari ba ya cika cikawa.

Wannan na iya zama mahimmanci ga wasu ingantawa. Misali, ƙara ƙima biyu i16 akan dandamali na 32-bit (tare da rijistar 32-bit) yana buƙatar, bayan ƙari, aikin faɗaɗa alamar don kasancewa cikin kewayo. i16. Saboda haka, sau da yawa yana da inganci don yin ayyukan lamba bisa girman rajistar injin.

Abin da zai biyo baya tare da wannan lambar IR ba ta da sha'awar mu musamman yanzu. An inganta lambar (amma a cikin yanayin misali mai sauƙi kamar namu, babu abin da aka inganta) sannan a canza shi zuwa lambar injin.

Misali na biyu

Misali na gaba da za mu duba zai dan fi rikitarwa. Wato, muna magana ne game da wani aiki da ya tara yanki na lamba:

func sum(numbers []int) int {
    n := 0
    for i := 0; i < len(numbers); i++ {
        n += numbers[i]
    }
    return n
}

Wannan lambar tana juyawa zuwa lambar Go SSA mai zuwa:

func sum(numbers []int) int:
entry:
    jump for.loop
for.loop:
    t0 = phi [entry: 0:int, for.body: t6] #n                       int
    t1 = phi [entry: 0:int, for.body: t7] #i                       int
    t2 = len(numbers)                                              int
    t3 = t1 < t2                                                  bool
    if t3 goto for.body else for.done
for.body:
    t4 = &numbers[t1]                                             *int
    t5 = *t4                                                       int
    t6 = t0 + t5                                                   int
    t7 = t1 + 1:int                                                int
    jump for.loop
for.done:
    return t0

Anan kun riga kun ga ƙarin gine-gine na yau da kullun don wakiltar lamba a cikin sigar SSA. Wataƙila mafi kyawun fasalin wannan lambar shine gaskiyar cewa babu wani tsari na sarrafa kwararar umarni. Don sarrafa kwararar ƙididdiga, akwai tsalle-tsalle kawai na sharadi da rashin sharadi, kuma, idan muka ɗauki wannan umarni azaman umarni don sarrafa kwararar, umarnin dawowa.

A gaskiya ma, a nan za ku iya kula da gaskiyar cewa ba a raba shirin zuwa tubalan ta amfani da takalmin gyaran kafa (kamar yadda a cikin iyalin C na harsuna). An raba shi da lakabi, mai tunawa da harsunan taro, kuma an gabatar da shi a cikin nau'i na asali. A cikin SSA, ana bayyana tubalan asali a matsayin jerin jerin lambobi waɗanda ke farawa da lakabi kuma suna ƙarewa tare da ainihin ƙa'idodin toshewa, kamar - return и jump.

Wani bayani mai ban sha'awa na wannan lambar yana wakilta ta umarnin phi. Umarnin ba sabon abu ba ne kuma yana iya ɗaukar ɗan lokaci don fahimta. tuna, cewa S.S.A. gajere ne don Aiyuka Single Single. Wannan shi ne matsakaicin wakilcin lambar da masu tarawa ke amfani da su, inda kowane ma'auni ke ba da ƙima sau ɗaya kawai. Wannan yana da kyau don bayyana ayyuka masu sauƙi kamar aikin mu myAddwanda aka nuna a sama, amma bai dace da ƙarin hadaddun ayyuka kamar aikin da aka tattauna a wannan sashe ba sum. Musamman ma, masu canji suna canzawa yayin aiwatar da madauki i и n.

SSA ta ƙetare ƙuntatawa akan sanya ƙima masu canzawa sau ɗaya ta amfani da abin da ake kira umarni phi (an ciro sunanta daga haruffan Girkanci). Gaskiyar ita ce, don samar da wakilcin SSA na lambar don harsuna kamar C, dole ne ku yi amfani da wasu dabaru. Sakamakon kiran wannan umarni shine ƙimar canjin halin yanzu (i ko n), kuma ana amfani da jerin mahimman tubalan azaman sigoginsa. Misali, la'akari da wannan umarni:

t0 = phi [entry: 0:int, for.body: t6] #n

Ma'anarsa shine kamar haka: idan tubalin da ya gabata ya kasance toshe entry (input), sannan t0 ne akai-akai 0, kuma idan tubalin asali na baya ya kasance for.body, to kuna buƙatar ɗaukar ƙimar t6 daga wannan block. Wannan yana iya zama kamar abin ban mamaki ne, amma wannan tsarin shine abin da ke sa SSA aiki. Daga hangen nesa na ɗan adam, wannan duka yana sa lambar ta yi wahalar fahimta, amma gaskiyar cewa an sanya kowane ƙima sau ɗaya kawai yana sa yawancin haɓakawa da sauƙi.

Lura cewa idan ka rubuta naka mai tarawa, yawanci ba za ka yi hulɗa da irin wannan kayan ba. Ko Clang baya haifar da duk waɗannan umarnin phi, yana amfani da tsari alloca (yana kama da aiki tare da talakawa masu canji na gida). Sannan, lokacin gudanar da LLVM ingantawa da ake kira mem2reg, umarni alloca canza zuwa SSA form. TinyGo, duk da haka, yana karɓar shigarwar daga Go SSA, wanda, dacewa, an riga an canza shi zuwa sigar SSA.

Wani sabon sabon juzu'in lambar tsaka-tsaki da aka yi la'akari shi ne cewa samun dama ga abubuwan yanki ta hanyar fihirisa ana wakilta ta hanyar aiki na ƙididdige adireshi da aiki na soke ma'anar da aka samu. Anan zaku iya ganin ƙari kai tsaye na madaidaicin zuwa lambar IR (misali - 1:int). A cikin misali tare da aikin myAdd ba a yi amfani da wannan ba. Yanzu da muka sami waɗannan fasalulluka daga hanya, bari mu kalli abin da wannan lambar ke zama lokacin da aka canza zuwa nau'in LLVM IR:

define i64 @sum(i64* %ptr, i64 %len, i64 %cap) {
entry:
  br label %for.loop

for.loop:                                         ; preds = %for.body, %entry
  %0 = phi i64 [ 0, %entry ], [ %5, %deref.next ]
  %1 = phi i64 [ 0, %entry ], [ %6, %deref.next ]
  %2 = icmp slt i64 %1, %len
  br i1 %2, label %for.body, label %for.done

for.body:                                         ; preds = %for.loop
  %3 = getelementptr i64, i64* %ptr, i64 %1
  %4 = load i64, i64* %3
  %5 = add i64 %0, %4
  %6 = add i64 %1, 1
  br label %for.loop

for.done:                                         ; preds = %for.loop
  ret i64 %0
}

Anan, kamar yadda ya gabata, zamu iya ganin tsari iri ɗaya, wanda ya haɗa da sauran sifofi. Misali, a cikin kira phi dabi'u da lakabi sun canza. Duk da haka, akwai wani abu a nan wanda ya cancanci kulawa ta musamman.

Da farko, a nan za ku iya ganin sa hannun ayyuka daban-daban. LLVM baya goyan bayan yanka, kuma a sakamakon haka, azaman ingantawa, mai tara TinyGo wanda ya samar da wannan matsakaiciyar lambar ya raba bayanin wannan tsarin bayanan zuwa sassa. Yana iya wakiltar abubuwan yanki guda uku (ptr, len и cap) a matsayin tsari (tsari), amma wakiltar su azaman ƙungiyoyi daban-daban guda uku suna ba da damar wasu haɓakawa. Sauran masu tarawa na iya wakiltar yanki ta wasu hanyoyi, dangane da tarurrukan kira na ayyukan dandalin da ake niyya.

Wani fasali mai ban sha'awa na wannan lambar shine amfani da umarnin getelementptr (wanda aka fi sani da GEP).

Wannan umarni yana aiki tare da masu nuni kuma ana amfani dashi don samun mai nuni zuwa yanki na yanki. Misali, bari mu kwatanta shi da lambar da aka rubuta a C:

int* sliceptr(int *ptr, int index) {
    return &ptr[index];
}

Ko tare da mai zuwa daidai da wannan:

int* sliceptr(int *ptr, int index) {
    return ptr + index;
}

Abu mafi mahimmanci anan shine umarnin getelementptr baya aiwatar da ayyukan cirewa. Kawai yana ƙididdige sabon mai nuni bisa ga wanda yake. Ana iya ɗauka azaman umarni mul и add a matakin hardware. Kuna iya karanta ƙarin game da umarnin GEP a nan.

Wani fasali mai ban sha'awa na wannan matsakaicin lambar shine amfani da umarnin icmp. Wannan koyarwa ce ta gaba ɗaya da ake amfani da ita don aiwatar da kwatancen lamba. Sakamakon aiwatar da wannan umarni koyaushe darajar nau'in ce i1 - ma'ana darajar. A wannan yanayin, ana yin kwatanta ta amfani da kalmar maɓalli slt (wanda aka sanya hannu ƙasa da), tunda muna kwatanta lambobi biyu a baya da nau'in int. Idan muna kwatanta lamba biyu marasa sa hannu, to za mu yi amfani da su icmp, kuma kalmar da aka yi amfani da ita a cikin kwatancen za ta kasance ult. Don kwatanta lambobi masu iyo, ana amfani da wani umarni, fcmp, wanda ke aiki a irin wannan hanya.

Sakamakon

Na yi imani cewa a cikin wannan kayan na rufe mahimman abubuwan LLVM IR. Tabbas, akwai ƙari da yawa a nan. Musamman ma, matsakaicin wakilcin lambar na iya ƙunsar bayanai da yawa waɗanda ke ba da damar haɓaka haɓakawa don la'akari da wasu fasalulluka na lambar da aka sani ga mai tarawa waɗanda ba za a iya bayyana su ba a cikin IR. Misali, wannan tuta ce inbounds Umarnin GEP, ko tutoci nsw и nuw, wanda za'a iya ƙarawa zuwa umarnin add. Haka ke ga maɓalli private, yana nuni zuwa ga ingantawa cewa aikin da yake yiwa alama ba za a yi la'akari da shi daga wajen naúrar harhadawa na yanzu ba. Wannan yana ba da damar haɓaka haɓaka tsakanin tsarin aiki da yawa mai ban sha'awa kamar kawar da gardama da ba a yi amfani da su ba.

Kuna iya karanta ƙarin game da LLVM a ciki takardun, wanda za ku yi nuni akai-akai lokacin haɓaka naku mai haɗawa na tushen LLVM. nan jagora, wanda ke kallon haɓaka mai tarawa don harshe mai sauƙi. Duk waɗannan hanyoyin samun bayanai biyu za su kasance masu amfani a gare ku lokacin ƙirƙirar naku mai tarawa.

Ya ku masu karatu! Kuna amfani da LLVM?

LLVM daga hangen nesa Go

source: www.habr.com

Add a comment