Ana dawo da bayanai daga tebur na XtraDB ba tare da tsarin tsarin ba ta amfani da nazarin byte-byte na fayil ibd

Ana dawo da bayanai daga tebur na XtraDB ba tare da tsarin tsarin ba ta amfani da nazarin byte-byte na fayil ibd

prehistory

Hakan ya faru ne cewa kwayar cutar ransomware ta kai hari ga uwar garken, wanda, ta hanyar "hadarin sa'a," wani bangare ya bar fayilolin .ibd (fayil ɗin bayanan albarkatun innodb) ba a taɓa su ba, amma a lokaci guda ya ɓoye fayilolin .fpm gaba ɗaya ( tsarin fayiloli). A wannan yanayin, .idb za a iya raba shi zuwa:

  • batun maidowa ta hanyar daidaitattun kayan aiki da jagorori. Don irin waɗannan lokuta, akwai mai kyau zama;
  • wani bangare rufaffiyar tebur. Galibi waɗannan manyan tebura ne, waɗanda (kamar yadda na fahimta) maharan ba su da isasshen RAM don cikakken ɓoyewa;
  • To, cikakken rufaffiyar teburi waɗanda ba za a iya maido da su ba.

Yana yiwuwa a ƙayyade wane zaɓi ne tebur ɗin ta hanyar buɗe shi a cikin kowane editan rubutu a ƙarƙashin abin da ake so (a cikin akwati na UTF8) kuma kawai duba fayil ɗin don kasancewar filayen rubutu, misali:

Ana dawo da bayanai daga tebur na XtraDB ba tare da tsarin tsarin ba ta amfani da nazarin byte-byte na fayil ibd

Hakanan, a farkon fayil ɗin zaku iya lura da adadi mai yawa na 0 bytes, da ƙwayoyin cuta waɗanda ke amfani da toshe ɓoyayyen algorithm (mafi na yau da kullun) galibi suna shafar su.
Ana dawo da bayanai daga tebur na XtraDB ba tare da tsarin tsarin ba ta amfani da nazarin byte-byte na fayil ibd

A cikin shari'ata, maharan sun bar kirtani 4-byte (1, 0, 0, 0) a ƙarshen kowane fayil ɗin da aka ɓoye, wanda ya sauƙaƙa aikin. Don bincika fayilolin da ba su da cuta, rubutun ya isa:

def opened(path):
    files = os.listdir(path)
    for f in files:
        if os.path.isfile(path + f):
            yield path + f

for full_path in opened("C:somepath"):
    file = open(full_path, "rb")
    last_string = ""
    for line in file:
        last_string = line
        file.close()
    if (last_string[len(last_string) -4:len(last_string)]) != (1, 0, 0, 0):
        print(full_path)

Don haka, ya juya don nemo fayilolin mallakar nau'in farko. Na biyu ya ƙunshi aikin hannu da yawa, amma abin da aka samo ya riga ya isa. Komai zai yi kyau, amma kuna buƙatar sani cikakken daidaitaccen tsari kuma (ba shakka) wani lamari ya taso cewa dole ne in yi aiki tare da tebur mai sauyawa akai-akai. Babu wanda ya tuna ko an canza nau'in filin ko kuma an ƙara sabon shafi.

Wilds City, da rashin alheri, ba zai iya taimakawa da irin wannan lamarin ba, wanda shine dalilin da ya sa ake rubuta wannan labarin.

Je zuwa batun

Akwai tsarin tebur daga watanni 3 da suka gabata wanda bai dace da na yanzu ba (yiwuwar filin daya, da yuwuwar ƙari). Tsarin tebur:

CREATE TABLE `table_1` (
    `id` INT (11),
    `date` DATETIME ,
    `description` TEXT ,
    `id_point` INT (11),
    `id_user` INT (11),
    `date_start` DATETIME ,
    `date_finish` DATETIME ,
    `photo` INT (1),
    `id_client` INT (11),
    `status` INT (1),
    `lead__time` TIME ,
    `sendstatus` TINYINT (4)
); 

a wannan yanayin, kuna buƙatar cirewa:

  • id_point int (11);
  • id_user int (11);
  • date_start LOKACI;
  • date_finish DATETIME.

Don murmurewa, ana amfani da nazarin byte-by-byte na fayil ɗin .ibd, sannan a juyar da su zuwa sigar da za a iya karantawa. Tunda don nemo abin da muke buƙata, kawai muna buƙatar bincika nau'ikan bayanai kamar int da lokacin data, labarin zai bayyana su kawai, amma wani lokacin kuma zamu koma ga wasu nau'ikan bayanan, waɗanda zasu iya taimakawa a cikin wasu abubuwan da suka faru.

Matsala 1: filayen da ke da nau'ikan DATETIME da TEXT suna da ƙimar NULL, kuma ana tsallake su a cikin fayil ɗin kawai, saboda wannan, ba a iya tantance tsarin da za a maido a cikin akwati na ba. A cikin sabbin ginshiƙai, ƙimar da aka daɗe ta kasance ba ta da amfani, kuma ɓangaren ma'amala na iya ɓacewa saboda saitin innodb_flush_log_at_trx_commit = 0, don haka za a ƙara ƙarin lokaci don tantance tsarin.

Matsala 2: ya kamata a la'akari da cewa layuka da aka goge ta hanyar DELETE duk za su kasance a cikin fayil ɗin ibd, amma tare da ALTER TABLE tsarin su ba za a sabunta ba. Sakamakon haka, tsarin bayanan zai iya bambanta daga farkon fayil ɗin zuwa ƙarshensa. Idan sau da yawa kuna amfani da OPTIMIZE TABLE, to da wuya ku sami irin wannan matsalar.

Kula, sigar DBMS tana shafar yadda ake adana bayanai, kuma wannan misalin bazai yi aiki ga wasu manyan nau'ikan ba. A cikin akwati na, an yi amfani da sigar windows na mariadb 10.1.24. Hakanan, kodayake a cikin mariadb kuna aiki tare da tebur InnoDB, a zahiri suna XtraDB, wanda ya keɓance amfani da hanyar tare da InnoDB mysql.

Binciken fayil

A cikin Python, nau'in bayanai bytes() yana nuna bayanan Unicode a madadin saitin lambobi na yau da kullun. Ko da yake kuna iya duba fayil ɗin a cikin wannan tsari, don dacewa kuna iya canza bytes zuwa nau'i na lambobi ta hanyar canza tsarin byte zuwa tsararru na yau da kullun (jeri(example_byte_array)). A kowane hali, hanyoyin biyu sun dace da bincike.

Bayan duba fayilolin ibd da yawa, zaku iya samun waɗannan:

Ana dawo da bayanai daga tebur na XtraDB ba tare da tsarin tsarin ba ta amfani da nazarin byte-byte na fayil ibd

Bugu da ƙari, idan kun raba fayil ɗin ta waɗannan kalmomi, za ku sami mafi yawa ko da tubalan bayanai. Za mu yi amfani da infimum a matsayin mai rarrabawa.

table = table.split("infimum".encode())

Wani kallo mai ban sha'awa: don tebur tare da ƙananan bayanai, tsakanin rashin ƙarfi da babba akwai mai nuna alamar adadin layuka a cikin toshe.

Ana dawo da bayanai daga tebur na XtraDB ba tare da tsarin tsarin ba ta amfani da nazarin byte-byte na fayil ibd - Tebur na gwaji tare da jere na 1st

Ana dawo da bayanai daga tebur na XtraDB ba tare da tsarin tsarin ba ta amfani da nazarin byte-byte na fayil ibd - Teburin gwaji tare da layuka 2

Za a iya tsallake teburin jeren jere[0]. Bayan duba ta, har yanzu na kasa samun danyen bayanan tebur. Mafi mahimmanci, ana amfani da wannan toshe don adana fihirisa da maɓalli.
Fara da tebur[1] da fassara shi zuwa tsararrun lamba, kuna iya riga kun lura da wasu alamu, wato:

Ana dawo da bayanai daga tebur na XtraDB ba tare da tsarin tsarin ba ta amfani da nazarin byte-byte na fayil ibd

Waɗannan ƙimar int ne da aka adana a cikin kirtani. Byte na farko yana nuna ko lambar tana da inganci ko mara kyau. A cikin yanayina, duk lambobi suna da kyau. Daga ragowar 3 bytes, zaku iya tantance lambar ta amfani da aikin mai zuwa. Rubutun:

def find_int(val: str):  # example '128, 1, 2, 3'
    val = [int(v) for v in  val.split(", ")]
    result_int = val[1]*256**2 + val[2]*256*1 + val[3]
    return result_int

Alal misali, 128, 0, 0, 1 = 1, ko 128, 0, 75, 108 = 19308.
Teburin yana da maɓalli na farko tare da haɓakawa ta atomatik, kuma ana iya samunsa anan

Ana dawo da bayanai daga tebur na XtraDB ba tare da tsarin tsarin ba ta amfani da nazarin byte-byte na fayil ibd

Bayan da aka kwatanta bayanan daga teburin gwajin, an bayyana cewa abun DATETIME ya ƙunshi bytes 5 kuma ya fara da 153 (mai yiwuwa yana nuna tazarar shekara-shekara). Tunda kewayon DATTIME shine '1000-01-01' zuwa '9999-12-31', ina tsammanin adadin bytes zai iya bambanta, amma a cikin yanayina, bayanan sun faɗi cikin lokacin daga 2016 zuwa 2019, don haka zamu ɗauka. cewa 5 bytes isa.

Don ƙayyade lokacin ba tare da sakanni ba, an rubuta ayyuka masu zuwa. Rubutun:

day_ = lambda x: x % 64 // 2  # {x,x,X,x,x }

def hour_(x1, x2):  # {x,x,X1,X2,x}
    if x1 % 2 == 0:
        return x2 // 16
    elif x1 % 2 == 1:
        return x2 // 16 + 16
    else:
        raise ValueError

min_ = lambda x1, x2: (x1 % 16) * 4 + (x2 // 64)  # {x,x,x,X1,X2}

Ba zai yiwu a rubuta aikin aiki na shekara da wata ba, don haka dole ne in yi hack. Rubutun:

ym_list = {'2016, 1': '153, 152, 64', '2016, 2': '153, 152, 128', 
           '2016, 3': '153, 152, 192', '2016, 4': '153, 153, 0',
           '2016, 5': '153, 153, 64', '2016, 6': '153, 153, 128', 
           '2016, 7': '153, 153, 192', '2016, 8': '153, 154, 0', 
           '2016, 9': '153, 154, 64', '2016, 10': '153, 154, 128', 
           '2016, 11': '153, 154, 192', '2016, 12': '153, 155, 0',
           '2017, 1': '153, 155, 128', '2017, 2': '153, 155, 192', 
           '2017, 3': '153, 156, 0', '2017, 4': '153, 156, 64',
           '2017, 5': '153, 156, 128', '2017, 6': '153, 156, 192',
           '2017, 7': '153, 157, 0', '2017, 8': '153, 157, 64',
           '2017, 9': '153, 157, 128', '2017, 10': '153, 157, 192', 
           '2017, 11': '153, 158, 0', '2017, 12': '153, 158, 64', 
           '2018, 1': '153, 158, 192', '2018, 2': '153, 159, 0',
           '2018, 3': '153, 159, 64', '2018, 4': '153, 159, 128', 
           '2018, 5': '153, 159, 192', '2018, 6': '153, 160, 0',
           '2018, 7': '153, 160, 64', '2018, 8': '153, 160, 128',
           '2018, 9': '153, 160, 192', '2018, 10': '153, 161, 0', 
           '2018, 11': '153, 161, 64', '2018, 12': '153, 161, 128',
           '2019, 1': '153, 162, 0', '2019, 2': '153, 162, 64', 
           '2019, 3': '153, 162, 128', '2019, 4': '153, 162, 192', 
           '2019, 5': '153, 163, 0', '2019, 6': '153, 163, 64',
           '2019, 7': '153, 163, 128', '2019, 8': '153, 163, 192',
           '2019, 9': '153, 164, 0', '2019, 10': '153, 164, 64', 
           '2019, 11': '153, 164, 128', '2019, 12': '153, 164, 192',
           '2020, 1': '153, 165, 64', '2020, 2': '153, 165, 128',
           '2020, 3': '153, 165, 192','2020, 4': '153, 166, 0', 
           '2020, 5': '153, 166, 64', '2020, 6': '153, 1, 128',
           '2020, 7': '153, 166, 192', '2020, 8': '153, 167, 0', 
           '2020, 9': '153, 167, 64','2020, 10': '153, 167, 128',
           '2020, 11': '153, 167, 192', '2020, 12': '153, 168, 0'}

def year_month(x1, x2):  # {x,X,X,x,x }

    for key, value in ym_list.items():
        key = [int(k) for k in key.replace("'", "").split(", ")]
        value = [int(v) for v in value.split(", ")]
        if x1 == value[1] and x2 // 64 == value[2] // 64:
            return key
    return 0, 0

Na tabbata idan kun kashe n adadin lokaci, ana iya gyara wannan rashin fahimta.
Na gaba, aikin da ke dawo da abu na kwanan wata daga kirtani. Rubutun:

def find_data_time(val:str):
    val = [int(v) for v in val.split(", ")]
    day = day_(val[2])
    hour = hour_(val[2], val[3])
    minutes = min_(val[3], val[4])
    year, month = year_month(val[1], val[2])
    return datetime(year, month, day, hour, minutes)

An gudanar da gano ƙimar maimaita akai-akai daga int, int, kwanan wata, kwanan wata Ana dawo da bayanai daga tebur na XtraDB ba tare da tsarin tsarin ba ta amfani da nazarin byte-byte na fayil ibd, ga alama wannan shine abin da kuke buƙata. Bugu da ƙari, irin wannan jerin ba a maimaita sau biyu a kowace layi ba.

Yin amfani da magana ta yau da kullun, muna samun mahimman bayanai:

fined = re.findall(r'128, d*, d*, d*, 128, d*, d*, d*, 153, 1[6,5,4,3]d, d*, d*, d*, 153, 1[6,5,4,3]d, d*, d*, d*', int_array)

Lura cewa lokacin bincike ta amfani da wannan magana, ba zai yiwu a ƙayyade ƙimar NULL a cikin filayen da ake buƙata ba, amma a cikin yanayina wannan ba shi da mahimmanci. Sa'an nan kuma mu shiga cikin abin da muka samu a cikin madauki. Rubutun:

result = []
for val in fined:
    pre_result = []
    bd_int  = re.findall(r"128, d*, d*, d*", val)
    bd_date= re.findall(r"(153, 1[6,5,4,3]d, d*, d*, d*)", val)
    for it in bd_int:
        pre_result.append(find_int(bd_int[it]))
    for bd in bd_date:
        pre_result.append(find_data_time(bd))
    result.append(pre_result)

A zahiri, wannan ke nan, bayanan daga jerin sakamakon shine bayanan da muke buƙata. ###PS.###
Na fahimci cewa wannan hanyar ba ta dace da kowa ba, amma babban burin labarin shine hanzarta aiwatar da aiki maimakon magance duk matsalolin ku. Ina tsammanin mafita mafi daidai shine fara nazarin lambar tushe da kanka mariadb, amma saboda ƙayyadaddun lokaci, hanyar da ake amfani da ita yanzu ta zama kamar ta fi sauri.

A wasu lokuta, bayan nazarin fayil ɗin, zaku iya tantance kusan tsarin kuma ku dawo da shi ta amfani da ɗayan daidaitattun hanyoyin daga hanyoyin haɗin da ke sama. Wannan zai zama mafi daidai kuma yana haifar da ƴan matsaloli.

source: www.habr.com

Add a comment