Indlela yenzululwazi yepoke, okanye indlela yokukhetha uqwalaselo lwesiseko sedata usebenzisa iibenchmarks kunye ne-algorithm yokuphucula

hello

Ndaye ndagqiba ekubeni ndibelane ngoko ndikufumeneyo-isiqhamo sengcinga, uvavanyo kunye nempazamo.
Ngokubanzi: oku akufumaneki, ngokuqinisekileyo - konke oku kufuneka kube kwaziwa ixesha elide, kwabo babandakanyekayo ekusetyenzisweni kweenkcukacha manani kunye nokulungiswa kwazo naziphi na iinkqubo, kungekhona ngokukodwa i-DBMS.
Kwaye: ewe, bayazi, babhala amanqaku anomdla kuphando lwabo, mzekelo (UPD.: kwizimvo babonise iprojekthi enomdla kakhulu: ottertune )
Kwelinye icala: andiboni kukhankanywa ngokubanzi okanye ukusasazwa kwale ndlela kwi-Intanethi phakathi kweengcali ze-IT, i-DBA.

Ngoko, kwinqanaba.

Masicinge ukuba sinomsebenzi: ukuseta inkqubo yenkonzo ethile ukuze sikhonze uhlobo oluthile lomsebenzi.

Kuyaziwa ngalo msebenzi: yintoni na, indlela umgangatho walo msebenzi ulinganiswa ngayo, kwaye yintoni imilinganiselo yokulinganisa lo mgangatho.

Masicinge kwakhona ukuba kuyaziwa kwaye kuqondwa kancinci: kanye indlela umsebenzi owenziwa ngayo (okanye) kule nkqubo yenkonzo.

"Ngakumbi okanye ngaphantsi" - oku kuthetha ukuba kunokwenzeka ukulungiselela (okanye ukuyifumana kwenye indawo) isixhobo esithile, into eluncedo, inkonzo enokuthi idityaniswe kwaye isetyenziswe kwinkqubo enomthwalo wovavanyo owaneleyo ngokwaneleyo kwinto eya kuba yimveliso, kwiimeko ngokwaneleyo ngokwaneleyo ukusebenza kwimveliso .

Ewe, makhe sicinge ukuba isethi yeeparitha zohlengahlengiso kule nkqubo yenkonzo iyaziwa, engasetyenziselwa ukuqwalasela le nkqubo ngokwemveliso yomsebenzi wayo.

Kwaye yintoni ingxaki - akukho ukuqonda ngokwaneleyo ngokupheleleyo kule nkqubo yenkonzo, enye ikuvumela ukuba uqwalasele ngobuchule izicwangciso zale nkqubo yomthwalo wexesha elizayo kwiqonga elinikiweyo kwaye ufumane imveliso efunekayo yenkqubo.

Kulungile. Oku kusoloko kunjalo.

Ungenza ntoni apha?

Ewe, into yokuqala ethi qatha engqondweni kukujonga amaxwebhu ale nkqubo. Qonda ukuba yeyiphi imida eyamkelekileyo kumaxabiso eparamitha zohlengahlengiso. Kwaye, umzekelo, usebenzisa indlela yokulungelelanisa yokwehla, khetha amaxabiso kwiiparamitha zenkqubo kuvavanyo.

Ezo. nika inkqubo uhlobo oluthile loqwalaselo, ngokohlobo oluthile lwexabiso leeparamitha zoqwalaselo.

Faka umthwalo wovavanyo kuyo, usebenzisa esi sixhobo siluncedo kakhulu, ukulayisha umbane.
Kwaye jonga ixabiso - impendulo, okanye i-metric yomgangatho wenkqubo.

Ingcinga yesibini inokuba sisigqibo sokuba eli lixesha elide kakhulu.

Ewe, oko kukuthi: ukuba zininzi iiparamitha zokucwangcisa, ukuba uluhlu lwamaxabiso aqhutywayo makhulu, ukuba uvavanyo lomthwalo ngamnye luthatha ixesha elininzi ukugqiba, ngoko: ewe, konke oku kungathatha ngokungamkelekanga. ixesha elide.

Ewe, nantsi into onokuthi uyiqonde kwaye uyikhumbule.

Unokufumanisa ukuba kwiseti yamaxabiso eeparamitha zenkqubo yenkonzo kukho i-vector, njengokulandelelana kwamaxabiso athile.

Ivector nganye enjalo, ezinye izinto zilingana (ngokuba ayichaphazeleki kule vector), ihambelana nexabiso eliqinisekileyo elipheleleyo le-metric - isalathisi somgangatho wokusebenza kwenkqubo phantsi komthwalo wokuvavanya.

oko kukuthi

Masibonise iVector yoqwalaselo lwenkqubo njenge Indlela yenzululwazi yepoke, okanye indlela yokukhetha uqwalaselo lwesiseko sedata usebenzisa iibenchmarks kunye ne-algorithm yokuphuculaphi Indlela yenzululwazi yepoke, okanye indlela yokukhetha uqwalaselo lwesiseko sedata usebenzisa iibenchmarks kunye ne-algorithm yokuphucula; Apho Indlela yenzululwazi yepoke, okanye indlela yokukhetha uqwalaselo lwesiseko sedata usebenzisa iibenchmarks kunye ne-algorithm yokuphucula - inani leeparamitha zoqwalaselo lwenkqubo, zingaphi ezi parameters zikhona.

Kwaye ixabiso le metric ehambelana nale Indlela yenzululwazi yepoke, okanye indlela yokukhetha uqwalaselo lwesiseko sedata usebenzisa iibenchmarks kunye ne-algorithm yokuphucula masiyichaze njenge
Indlela yenzululwazi yepoke, okanye indlela yokukhetha uqwalaselo lwesiseko sedata usebenzisa iibenchmarks kunye ne-algorithm yokuphucula, emva koko sifumana umsebenzi: Indlela yenzululwazi yepoke, okanye indlela yokukhetha uqwalaselo lwesiseko sedata usebenzisa iibenchmarks kunye ne-algorithm yokuphucula

Ewe, ke: yonke into ngokukhawuleza yehla, kwimeko yam: phantse ndilibele kwiintsuku zam zomfundi, i-algorithms yokukhangela eyona nto iphambili yomsebenzi.

Kulungile, kodwa nanku umbuzo wombutho kunye nosetyenzisiweyo uvela: yeyiphi i-algorithm yokusetyenziswa.

  1. Ngengqiqo - ukuze ukwazi ikhowudi ngaphantsi ngesandla.
  2. Kwaye ukuze isebenze, okt. ifumene i-extremum (ukuba kukho enye), kulungile, ngokukhawuleza kunokulungelelanisa ukwehla.

Inqaku lokuqala libonisa ukuba kufuneka sijonge kwiindawo ezithile apho i-algorithms esele iphunyeziwe, kwaye, ngandlela-thile, ilungele ukusetyenziswa kwikhowudi.
Ewe, ndiyazi python и cran-r

Inqaku lesibini lithetha ukuba kufuneka ufunde malunga nee-algorithms ngokwazo, ukuba zithini na, zithini na iimfuno zazo, kunye neempawu zomsebenzi wazo.

Kwaye oko bakunikayo kunokuba luncedo kwimiphumo emibi - iziphumo, okanye ngokuthe ngqo kwi-algorithm ngokwayo.

Okanye banokufumaneka kwiziphumo ze-algorithm.

Okuninzi kuxhomekeke kwiimeko zegalelo.

Umzekelo, ukuba, ngesizathu esithile, kufuneka ufumane isiphumo ngokukhawuleza, kulungile, kuya kufuneka ujonge kwi-algorithms yokwehla kwe-gradient kwaye ukhethe enye yazo.

Okanye, ukuba ixesha alibalulekanga kangako, unako, umzekelo, ukusebenzisa iindlela zokuphucula i-stochastic, njenge-algorithm yofuzo.

Ndicebisa ukuqwalasela umsebenzi wale ndlela, ukukhetha ukucwangciswa kwenkqubo, usebenzisa i-algorithm yofuzo, ngokulandelayo, ngoko kuthetha: umsebenzi webhubhoratri.

Eyokuqala:

  1. Makubekho, njengenkqubo yenkonzo: oracle xe 18c
  2. Yivumele isebenze umsebenzi wokuthengiselana kunye nenjongo: ukufumana eyona nto iphezulu ye-throughput ye-subdatabase, kwintengiselwano / isekhondi.
  3. Ukuthengiselana kunokwahluka kakhulu kubume bokusebenza neenkcukacha kunye nomxholo womsebenzi.
    Masivume ukuba ezi zintengiselwano ezingaqhubekiyo inani elikhulu ledatha yetheyibhile.
    Ngengqiqo yokuba abavelisi ngaphezulu kwedatha yokuhlenga kunokuphinda kwaye bangaqhubeki iipesenti ezinkulu zemiqolo kunye neetafile ezinkulu.

Ezi zintengiselwano ezitshintsha umqolo omnye kwitheyibhile enkulu okanye encinci, enenani elincinci lezalathisi kule theyibhile.

Kule meko: imveliso ye-subdatabase yokulungiswa kweentengiselwano iya kuthi, ngokugcinwa, igqitywe ngumgangatho wokulungiswa kwedatha ye-redox.

I-Disclaimer-ukuba sithetha ngokuthe ngqo malunga noseto lwe-subdb.

Ngenxa yokuba, kwimeko eqhelekileyo, kunokubakho, umzekelo, ukukhiya ukuthengiselana phakathi kweeseshoni ze-SQL, ngenxa yoyilo lomsebenzi womsebenzisi kunye nedatha yetheyibhile kunye / okanye imodeli yetheyibhile.

Yiyiphi, ngokuqinisekileyo, eya kuba nefuthe elidandathekileyo kwi-metric ye-TPS kwaye oku kuya kuba yinto engaphandle, ngokumalunga ne-subdatabase: kakuhle, yile ndlela imodeli ye-tabular yenziwe ngayo kunye nomsebenzi kunye nedatha kuyo ukuba i-blockages yenzeke.

Ke ngoko, ukucoceka kovavanyo, siya kukhuphela ngaphandle le nto, kwaye ngezantsi ndiya kucacisa ngokuthe ngqo ukuba njani.

  1. Makhe sicinge, ngokuqinisekileyo, ukuba i-100% yemiyalelo ye-SQL efakwe kwisiseko sedatha yimiyalelo ye-DML.
    Vumela iimpawu zokusebenza komsebenzisi kunye ne-subdatabase ibe yinto efanayo kwiimvavanyo.
    Oko kukuthi: inani leeseshoni zeskl, idatha yetheyibhile, ukuba iiseshini zeskl zisebenza njani nazo.
  2. I-Subd isebenza ngaphakathi FORCE LOGGING, ARCHIVELOG iimodyuli. Imowudi ye-Flashback-database icinyiwe, kwinqanaba le-subd.
  3. Phinda wenze iilogi: ibekwe kwindlela yefayile eyahlukileyo, kwi "disk" eyahlukileyo;
    Eminye inxenye yendawo yedatabase: kwenye, inkqubo yefayile eyahlukileyo, kwi "disk" eyahlukileyo:

Iinkcukacha ezithe vetshe malunga nesixhobo somzimba. amacandelo edatha yebhubhoratri

SQL> select status||' '||name from v$controlfile;
 /db/u14/oradata/XE/control01.ctl
SQL> select GROUP#||' '||MEMBER from v$logfile;
1 /db/u02/oradata/XE/redo01_01.log
2 /db/u02/oradata/XE/redo02_01.log
SQL> select FILE_ID||' '||TABLESPACE_NAME||' '||round(BYTES/1024/1024,2)||' '||FILE_NAME as col from dba_data_files;
4 UNDOTBS1 2208 /db/u14/oradata/XE/undotbs1_01.dbf
2 SLOB 128 /db/u14/oradata/XE/slob01.dbf
7 USERS 5 /db/u14/oradata/XE/users01.dbf
1 SYSTEM 860 /db/u14/oradata/XE/system01.dbf
3 SYSAUX 550 /db/u14/oradata/XE/sysaux01.dbf
5 MONITOR 128 /db/u14/oradata/XE/monitor.dbf
SQL> !cat /proc/mounts | egrep "/db/u[0-2]"
/dev/vda1 /db/u14 ext4 rw,noatime,nodiratime,data=ordered 0 0
/dev/mapper/vgsys-ora_redo /db/u02 xfs rw,noatime,nodiratime,attr2,nobarrier,inode64,logbsize=256k,noquota 0 0

Ekuqaleni, phantsi kwezi meko zomthwalo, ndandifuna ukusebenzisa i-subd yentengiselwano I-SLOB-eluncedo
Inophawu oluhle kakhulu, ndiza kucaphula umbhali:

Embindini we-SLOB "yindlela ye-SLOB." Indlela ye-SLOB ijolise ekuvavanyeni amaqonga
ngaphandle kokuphikisana kwesicelo. Umntu akakwazi ukuqhuba ukusebenza kwehardware ephezulu
usebenzisa ikhowudi yesicelo ethi, umzekelo, ibotshelelwe kukutshixa isicelo okanye nokuba
ukwabelana ngeebhloko ze-Oracle Database. Kulungile-kukho ngaphezulu xa kwabelwana ngedatha
kwiibhloko zedatha! Kodwa i-SLOB-ekuhanjisweni kwayo okungagqibekanga-ikhuselekile kwingxabano enjalo.

Esi sibhengezo: siyahambelana, kunjalo.
Kukulungele ukulawula iqondo le-parallelism yeeseshoni ze-cl, oku kungundoqo -t uqalise usetyenziso runit.sh ukusuka kwi-SLOB
Ipesenti yemiyalelo ye-DML ilawulwa, kwinani lemiyalezo ebhaliweyo ethunyelwa kwi-subd, iseshoni yokubhaliweyo nganye, iparameter. UPDATE_PCT
Ngokwahlukileyo kwaye kulula kakhulu: SLOB ngokwayo, ngaphambi nangemva kweseshoni yomthwalo - ilungiselela i-statspack, okanye i-awr-snapshots (yintoni emiselweyo ukuba ilungiswe).

Noko ke, kwathi kanti SLOB ayizixhasi iiseshoni zeSQL ezinobude obungaphantsi kwemizuzwana engama-30.
Ke ngoko, ndiqale ndafaka eyam ikhowudi, eyam inguqulelo yesilayishi, emva koko yahlala isebenza.

Makhe ndicacise malunga nesilayishi - yintoni kwaye yenza njani, ukuze icace.
Ngokwenyani isilayishi sijongeka ngolu hlobo:

Ikhowudi yabasebenzi

function dotx()
{
local v_period="$2"
[ -z "v_period" ] && v_period="0"
source "/home/oracle/testingredotracе/config.conf"

$ORACLE_HOME/bin/sqlplus -S system/${v_system_pwd} << __EOF__
whenever sqlerror exit failure
set verify off
set echo off
set feedback off

define wnum="$1"
define period="$v_period"
set appinfo worker_&&wnum

declare
 v_upto number;
 v_key  number;
 v_tots number;
 v_cts  number;
begin
 select max(col1) into v_upto from system.testtab_&&wnum;
 SELECT (( SYSDATE - DATE '1970-01-01' ) * 86400 ) into v_cts FROM DUAL;
 v_tots := &&period + v_cts;
 while v_cts <= v_tots
 loop
  v_key:=abs(mod(dbms_random.random,v_upto));
  if v_key=0 then
   v_key:=1;
  end if;
  update system.testtab_&&wnum t
  set t.object_name=translate(dbms_random.string('a', 120), 'abcXYZ', '158249')
  where t.col1=v_key
  ;
  commit;
  SELECT (( SYSDATE - DATE '1970-01-01' ) * 86400 ) into v_cts FROM DUAL;
 end loop;
end;
/

exit
__EOF__
}
export -f dotx

Abasebenzi baqaliswa ngolu hlobo:

Abasebenzi ababalekayo

echo "starting test, duration: ${TEST_DURATION}" >> "$v_logfile"
for((i=1;i<="$SQLSESS_COUNT";i++))
do
 echo "sql-session: ${i}" >> "$v_logfile"
 dotx "$i" "${TEST_DURATION}" &
done
echo "waiting..." >> "$v_logfile"
wait

Kwaye iitafile zabasebenzi zilungiswa ngolu hlobo:

Ukudala iitafile

function createtable() {
source "/home/oracle/testingredotracе/config.conf"
$ORACLE_HOME/bin/sqlplus -S system/${v_system_pwd} << __EOF__
whenever sqlerror continue
set verify off
set echo off
set feedback off

define wnum="$1"
define ts_name="slob"

begin
 execute immediate 'drop table system.testtab_&&wnum';
exception when others then null;
end;
/

create table system.testtab_&&wnum tablespace &&ts_name as
select rownum as col1, t.*
from sys.dba_objects t
where rownum<1000
;
create index testtab_&&wnum._idx on system.testtab_&&wnum (col1);
--alter table system.testtab_&&wnum nologging;
--alter index system.testtab_&&wnum._idx nologging;
exit
__EOF__
}
export -f createtable

seq 1 1 "$SQLSESS_COUNT" | xargs -n 1 -P 4 -I {} -t bash -c "createtable "{}"" | tee -a "$v_logfile"
echo "createtable done" >> "$v_logfile"

Ezo. Kumsebenzi ngamnye (ngokoqobo: iseshoni yeSQL eyahlukileyo kwi-DB) itafile eyahlukileyo yenziwa, apho umsebenzi asebenza khona.

Oku kuqinisekisa ukungabikho kwezitshixo zentengiselwano phakathi kweeseshoni zabasebenzi.
Umsebenzi ngamnye: wenza into enye, ngetafile yakhe, iitafile ziyafana.
Bonke abasebenzi benza umsebenzi ngexesha elifanayo.
Ngaphezu koko, ixesha elide ngokwaneleyo ukwenzela ukuba, umzekelo, utshintsho lwelogi luya kwenzeka ngokuqinisekileyo, kwaye ngaphezu kweyodwa.
Ewe, ngokufanelekileyo, iindleko ezihambelanayo kunye neziphumo zavela.
Kwimeko yam, ndilungise ixesha lomsebenzi wabasebenzi kwimizuzu eyi-8.

Iqhekeza lengxelo ye-statspack echaza ukusebenza kwe-subd phantsi komthwalo

Database    DB Id    Instance     Inst Num  Startup Time   Release     RAC
~~~~~~~~ ----------- ------------ -------- --------------- ----------- ---
          2929910313 XE                  1 07-Sep-20 23:12 18.0.0.0.0  NO

Host Name             Platform                CPUs Cores Sockets   Memory (G)
~~~~ ---------------- ---------------------- ----- ----- ------- ------------
     billing.izhevsk1 Linux x86 64-bit           2     2       1         15.6

Snapshot       Snap Id     Snap Time      Sessions Curs/Sess Comment
~~~~~~~~    ---------- ------------------ -------- --------- ------------------
Begin Snap:       1630 07-Sep-20 23:12:27       55        .7
  End Snap:       1631 07-Sep-20 23:20:29       62        .6
   Elapsed:       8.03 (mins) Av Act Sess:       8.4
   DB time:      67.31 (mins)      DB CPU:      15.01 (mins)

Cache Sizes            Begin        End
~~~~~~~~~~~       ---------- ----------
    Buffer Cache:     1,392M              Std Block Size:         8K
     Shared Pool:       288M                  Log Buffer:   103,424K

Load Profile              Per Second    Per Transaction    Per Exec    Per Call
~~~~~~~~~~~~      ------------------  ----------------- ----------- -----------
      DB time(s):                8.4                0.0        0.00        0.20
       DB CPU(s):                1.9                0.0        0.00        0.04
       Redo size:        7,685,765.6              978.4
   Logical reads:           60,447.0                7.7
   Block changes:           47,167.3                6.0
  Physical reads:                8.3                0.0
 Physical writes:              253.4                0.0
      User calls:               42.6                0.0
          Parses:               23.2                0.0
     Hard parses:                1.2                0.0
W/A MB processed:                1.0                0.0
          Logons:                0.5                0.0
        Executes:           15,756.5                2.0
       Rollbacks:                0.0                0.0
    Transactions:            7,855.1

Ukubuyela kumsebenzi waselebhu.
Siya kuthi, ezinye izinto zilingane, zitshintshe amaxabiso ezi parameters zilandelayo ze-subdatabase yelabhoratri:

  1. Ubungakanani bamaqela elogi yedatha. uluhlu lwexabiso: [32, 1024] MB;
  2. Inani lamaqela ejenali kuvimba weenkcukacha. uluhlu lwexabiso: [2,32];
  3. log_archive_max_processes uluhlu lwexabiso: [1,8];
  4. commit_logging amaxabiso amabini avumelekile: batch|immediate;
  5. commit_wait amaxabiso amabini avumelekile: wait|nowait;
  6. log_buffer uluhlu lwexabiso: [2,128] MB.
  7. log_checkpoint_timeout ixabiso uluhlu: [60,1200] imizuzwana
  8. db_writer_processes Uluhlu lwexabiso: [1,4]
  9. undo_retention ixabiso uluhlu: [30;300] imizuzwana
  10. transactions_per_rollback_segment Uluhlu lwexabiso: [1,8]
  11. disk_asynch_io amaxabiso amabini avumelekile: true|false;
  12. filesystemio_options amaxabiso alandelayo avumelekile: none|setall|directIO|asynch;
  13. db_block_checking amaxabiso alandelayo avumelekile: OFF|LOW|MEDIUM|FULL;
  14. db_block_checksum amaxabiso alandelayo avumelekile: OFF|TYPICAL|FULL;

Umntu onamava ekugcineni ii-database ze-Oracle unokuthi ngokuqinisekileyo sele ethetha ukuba yintoni na kwaye yeyiphi ixabiso ekufuneka libekwe, ukusuka kwiiparamitha ezichaziweyo kunye namaxabiso azo amkelekileyo, ukuze ufumane imveliso enkulu yedatha yomsebenzi kunye nedatha eboniswe ngu. ikhowudi yesicelo , apha ngasentla.

Kodwa.

Inqaku lomsebenzi webhubhoratri kukubonisa ukuba i-algorithm yokuphucula ngokwayo iya kusicacisela oku ngokukhawuleza.

Kuthi, konke okuseleyo kukujonga kuxwebhu, ngokusebenzisa inkqubo enokwenziwa ngokwezifiso, ngokwaneleyo ukufumanisa ukuba yeyiphi iparamitha ekufuneka uyitshintshe kwaye kowuphi na uluhlu.
Kwaye kwakhona: ikhowudi ikhowudi eya kusetyenziselwa ukusebenza kunye nenkqubo yesiko ye-algorithm yokuphucula ekhethiweyo.

Ke, ngoku malunga nekhowudi.
Ndithethe ngasentla malunga cran-r, o.k.t.: zonke iinguqulelo ezinesistim elungelelanisiweyo zilungelelaniswe ngokohlobo lwescript R.

Owona msebenzi, uhlalutyo, ukhetho ngexabiso lemetriki, iivektha zesistim yenkqubo: le yiphakheji GA (amaxwebhu)
Iphakheji, kule meko, ayifanelekanga kakhulu, ngomqondo wokuba ilindele ukuba i-vectors (i-chromosomes, ukuba ngokwemiqathango yephakheji) ichazwe ngendlela yeentambo zamanani kunye nenxalenye yecandelo.

Kwaye i-vector yam, ukusuka kumaxabiso eeparamitha zokuseta: ezi zizixa ezili-14 - ii-integers kunye namaxabiso omtya.

Ingxaki, ngokuqinisekileyo, iphepheka ngokulula ngokunikezela amanani athile kumaxabiso omtya.

Ke, ekugqibeleni, eyona nto iphambili yeskripthi sika-R ibonakala ngolu hlobo:

Fowunela GA::ga

cat( "", file=v_logfile, sep="n", append=F)

pSize = 10
elitism_value=1
pmutation_coef=0.8
pcrossover_coef=0.1
iterations=50

gam=GA::ga(type="real-valued", fitness=evaluate,
lower=c(32,2, 1,1,1,2,60,1,30,1,0,0, 0,0), upper=c(1024,32, 8,10,10,128,800,4,300,8,10,40, 40,30),
popSize=pSize,
pcrossover = pcrossover_coef,
pmutation = pmutation_coef,
maxiter=iterations,
run=4,
keepBest=T)
cat( "GA-session is done" , file=v_logfile, sep="n", append=T)
gam@solution

Apha, ngoncedo lower и upper iimpawu ezingaphantsi ga ngokusisiseko, indawo yendawo yokukhangela ichaziwe, apho ukhangelo luya kwenzelwa loo vektha (okanye i-vectors) apho elona xabiso liphezulu lomsebenzi wokuqina liya kufunyanwa.

I-ga subroutine yenza uphendlo lokwandisa umsebenzi wokuqina.

Ewe, ke, kuye kwavela ukuba, kule meko, kuyimfuneko ukuba umsebenzi wokufaneleka, ukuqonda i-vector njengeseti yamaxabiso kwiiparamitha ezithile ze-subd, ifumana i-metric kwi-subd.

Oko kukuthi: zingaphi, kunye nesicwangciso esinikiweyo se-subd kunye nomthwalo onikiweyo kwi-subd: i-subd iinkqubo zentengiselwano ngesekhondi.

Oko kukuthi, xa isombuluka, eli nyathelo lilandelayo kufuneka lenziwe ngaphakathi komsebenzi wokuqina:

  1. Ukucubungula i-vector yegalelo lamanani-ukuyiguqulela kumaxabiso eeparamitha ezingaphantsi.
  2. Umzamo wokudala inani elinikiweyo lamaqela ophinda-phinda obungakanani obunikiweyo. Ngaphezu koko, eli linge lisenokungaphumeleli.
    Amaqela eemagazini asele ekho kwi-subd, ngobuninzi kunye nobukhulu obuthile, ukwenzela ukucoceka kovavanyo - d.b. icinyiwe.
  3. Ukuba inqaku langaphambili liphumelele: ichaza amaxabiso eparameters zoqwalaselo kwisiseko sedatha (kwakhona: kunokubakho ukusilela)
  4. Ukuba inyathelo langaphambili liphumelele: ukumisa i-subd, ukuqala i-subd ukuze ixabiso elitsha elichaziweyo leparameter lisebenze. (kwakhona: kusenokubakho ingxaki)
  5. Ukuba inyathelo langaphambili liphumelele: yenza uvavanyo lomthwalo. fumana i-metrics kwi-subd.
  6. Buyisela i-subd kwisimo sayo sokuqala, okt. cima amaqela elog ongezelelweyo, buyisela uqwalaselo lwesiseko sesiseko soqobo emsebenzini.

Ikhowudi yomsebenzi wokufaneleka

evaluate=function(p_par) {
v_module="evaluate"
v_metric=0
opn=NULL
opn$rg_size=round(p_par[1],digit=0)
opn$rg_count=round(p_par[2],digit=0)
opn$log_archive_max_processes=round(p_par[3],digit=0)
opn$commit_logging="BATCH"
if ( round(p_par[4],digit=0) > 5 ) {
 opn$commit_logging="IMMEDIATE"
}
opn$commit_logging=paste("'", opn$commit_logging, "'",sep="")

opn$commit_wait="WAIT"
if ( round(p_par[5],digit=0) > 5 ) {
 opn$commit_wait="NOWAIT"
}
opn$commit_wait=paste("'", opn$commit_wait, "'",sep="")

opn$log_buffer=paste(round(p_par[6],digit=0),"m",sep="")
opn$log_checkpoint_timeout=round(p_par[7],digit=0)
opn$db_writer_processes=round(p_par[8],digit=0)
opn$undo_retention=round(p_par[9],digit=0)
opn$transactions_per_rollback_segment=round(p_par[10],digit=0)
opn$disk_asynch_io="true"
if ( round(p_par[11],digit=0) > 5 ) {
 opn$disk_asynch_io="false"
} 

opn$filesystemio_options="none"
if ( round(p_par[12],digit=0) > 10 && round(p_par[12],digit=0) <= 20 ) {
 opn$filesystemio_options="setall"
}
if ( round(p_par[12],digit=0) > 20 && round(p_par[12],digit=0) <= 30 ) {
 opn$filesystemio_options="directIO"
}
if ( round(p_par[12],digit=0) > 30 ) {
 opn$filesystemio_options="asynch"
}

opn$db_block_checking="OFF"
if ( round(p_par[13],digit=0) > 10 && round(p_par[13],digit=0) <= 20 ) {
 opn$db_block_checking="LOW"
}
if ( round(p_par[13],digit=0) > 20 && round(p_par[13],digit=0) <= 30 ) {
 opn$db_block_checking="MEDIUM"
}
if ( round(p_par[13],digit=0) > 30 ) {
 opn$db_block_checking="FULL"
}

opn$db_block_checksum="OFF"
if ( round(p_par[14],digit=0) > 10 && round(p_par[14],digit=0) <= 20 ) {
 opn$db_block_checksum="TYPICAL"
}
if ( round(p_par[14],digit=0) > 20 ) {
 opn$db_block_checksum="FULL"
}

v_vector=paste(round(p_par[1],digit=0),round(p_par[2],digit=0),round(p_par[3],digit=0),round(p_par[4],digit=0),round(p_par[5],digit=0),round(p_par[6],digit=0),round(p_par[7],digit=0),round(p_par[8],digit=0),round(p_par[9],digit=0),round(p_par[10],digit=0),round(p_par[11],digit=0),round(p_par[12],digit=0),round(p_par[13],digit=0),round(p_par[14],digit=0),sep=";")
cat( paste(v_module," try to evaluate vector: ", v_vector,sep="") , file=v_logfile, sep="n", append=T)

rc=make_additional_rgroups(opn)
if ( rc!=0 ) {
 cat( paste(v_module,"make_additional_rgroups failed",sep="") , file=v_logfile, sep="n", append=T)
 return (0)
}

v_rc=0
rc=set_db_parameter("log_archive_max_processes", opn$log_archive_max_processes)
if ( rc != 0 ) {  v_rc=1 }
rc=set_db_parameter("commit_logging", opn$commit_logging )
if ( rc != 0 ) {  v_rc=1 }
rc=set_db_parameter("commit_wait", opn$commit_wait )
if ( rc != 0 ) {  v_rc=1 }
rc=set_db_parameter("log_buffer", opn$log_buffer )
if ( rc != 0 ) {  v_rc=1 }
rc=set_db_parameter("log_checkpoint_timeout", opn$log_checkpoint_timeout )
if ( rc != 0 ) {  v_rc=1 }
rc=set_db_parameter("db_writer_processes", opn$db_writer_processes )
if ( rc != 0 ) {  v_rc=1 }
rc=set_db_parameter("undo_retention", opn$undo_retention )
if ( rc != 0 ) {  v_rc=1 }
rc=set_db_parameter("transactions_per_rollback_segment", opn$transactions_per_rollback_segment )
if ( rc != 0 ) {  v_rc=1 }
rc=set_db_parameter("disk_asynch_io", opn$disk_asynch_io )
if ( rc != 0 ) {  v_rc=1 }
rc=set_db_parameter("filesystemio_options", opn$filesystemio_options )
if ( rc != 0 ) {  v_rc=1 }
rc=set_db_parameter("db_block_checking", opn$db_block_checking )
if ( rc != 0 ) {  v_rc=1 }
rc=set_db_parameter("db_block_checksum", opn$db_block_checksum )
if ( rc != 0 ) {  v_rc=1 }

if ( rc!=0 ) {
 cat( paste(v_module," can not startup db with that vector of settings",sep="") , file=v_logfile, sep="n", append=T)
 rc=stop_db("immediate")
 rc=create_spfile()
 rc=start_db("")
 rc=remove_additional_rgroups(opn)
 return (0)
}

rc=stop_db("immediate")
rc=start_db("")
if ( rc!=0 ) {
 cat( paste(v_module," can not startup db with that vector of settings",sep="") , file=v_logfile, sep="n", append=T)
 rc=stop_db("abort")
 rc=create_spfile()
 rc=start_db("")
 rc=remove_additional_rgroups(opn)
 return (0)
}

rc=run_test()
v_metric=getmetric()

rc=stop_db("immediate")
rc=create_spfile()
rc=start_db("")
rc=remove_additional_rgroups(opn)

cat( paste("result: ",v_metric," ",v_vector,sep="") , file=v_logfile, sep="n", append=T)
return (v_metric)
}

Oko. wonke umsebenzi: owenziwe kumsebenzi wokomelela.

I-ga-subroutine iqhuba i-vectors, okanye, ngokuchanekileyo, iichromosomes.
Apho, eyona nto ibalulekileyo kuthi kukukhethwa kweechromosome ezinemizila yofuzo apho umsebenzi wokuqina uvelisa amaxabiso amakhulu.

Oku, ngokwenene, yinkqubo yokukhangela elona seti yechromosomes usebenzisa i-vector kwindawo yokukhangela ye-N-dimensional.

Icace kakhulu, ineenkcukacha ngcaciso, kunye nemizekelo ye-R-code, umsebenzi we-algorithm yofuzo.

Ndingathanda ukuqaphela ngokwahlukeneyo amanqaku amabini obugcisa.

Iifowuni ezincedisayo ezivela kumsebenzi evaluate, umzekelo, ukuyeka-ukuqala, ukubeka ixabiso le-subd parameter, kwenziwa ngokusekelwe cran-r imisebenzi system2

Ngoncedo lwalo: iskripthi esithile se-bash okanye umyalelo ubizwa.

Umzekelo:

set_db_iparamitha

set_db_parameter=function(p1, p2) {
v_module="set_db_parameter"
v_cmd="/home/oracle/testingredotracе/set_db_parameter.sh"
v_args=paste(p1," ",p2,sep="")

x=system2(v_cmd, args=v_args, stdout=T, stderr=T, wait=T)
if ( length(attributes(x)) > 0 ) {
 cat(paste(v_module," failed with: ",attributes(x)$status," ",v_cmd," ",v_args,sep=""), file=v_logfile, sep="n", append=T)
 return (attributes(x)$status)
}
else {
 cat(paste(v_module," ok: ",v_cmd," ",v_args,sep=""), file=v_logfile, sep="n", append=T)
 return (0)
}
}

Inqaku lesibini ngumgca, evaluate imisebenzi, ngokugcina ixabiso elithile lemetric kunye nevektha ehambelana nayo kwifayile yelog:

cat( paste("result: ",v_metric," ",v_vector,sep="") , file=v_logfile, sep="n", append=T)

Oku kubalulekile, kuba ukusuka kolu luhlu lwedatha, kuya kwenzeka ukuba ufumane ulwazi olongezelelweyo malunga nokuba yeyiphi na icandelo le-vector yokulungisa enempembelelo enkulu okanye encinci kwixabiso le-metric.

Oko kukuthi: kuya kwenzeka ukwenza uhlalutyo lwe-attribute-importamce.

Yintoni ke enokwenzeka?

Kwifomu yegrafu, ukuba u-odola iimvavanyo ngokunyuka komyalelo we-metric, umfanekiso umi ngolu hlobo lulandelayo:

Indlela yenzululwazi yepoke, okanye indlela yokukhetha uqwalaselo lwesiseko sedata usebenzisa iibenchmarks kunye ne-algorithm yokuphucula

Enye idatha ehambelana namaxabiso agqithisileyo emetric:
Indlela yenzululwazi yepoke, okanye indlela yokukhetha uqwalaselo lwesiseko sedata usebenzisa iibenchmarks kunye ne-algorithm yokuphucula
Apha, kumfanekiso wekhusi kunye neziphumo, ndiza kucacisa: amaxabiso e-vector yokulungisa anikwe ngokwemigaqo yekhowudi yokusebenza komzimba, hayi ngokwemigaqo yoluhlu lweeparamitha/uluhlu lwamaxabiso eparameter, olwathi lwaqulunqwa. ngasentla kwisicatshulwa.

Kulungile. Ngaba ininzi okanye incinci, ~ 8 amawaka eetps: umbuzo owahlukileyo.
Ngaphakathi kwesakhelo somsebenzi webhubhoratri, eli nani alibalulekanga, into ebalulekileyo yi-dynamics, indlela eli xabiso litshintsha ngayo.

I-dynamics apha ilungile.
Kucacile ukuba ubuncinane into enye inempembelelo kakhulu kwixabiso le-metric, i-ga-algorithm, ukuhlenga nge-chromosome vectors: igqunyiwe.
Ngokujonga amandla anamandla amaxabiso egophe, kukho ubuncinci into enye ethe, nangona incinci kakhulu, inefuthe.

Kulapho uyidinga khona attribute-importance Uhlalutyo lokuqonda ukuba zeziphi iimpawu (kakuhle, kulo mzekelo, amacandelo evektha yokulungisa) kunye nokuba zinefuthe kangakanani kwixabiso lemetric.
Kwaye kule ngcaciso: qonda ukuba zeziphi izinto ezichatshazelwa lutshintsho kwiimpawu ezibalulekileyo.

Phumeza attribute-importance kunokwenzeka ngeendlela ezahlukeneyo.

Ngezi njongo, ndiyathanda i-algorithm randomForest Ipakethe ye-R yegama elifanayo (amaxwebhu)
randomForest, njengoko ndiqonda umsebenzi wakhe ngokubanzi kunye nendlela yakhe yokuvavanya ukubaluleka kweempawu ngokukodwa, yakha imodeli ethile yokuxhomekeka kokuguquguquka kwempendulo kwiimpawu.

Kwimeko yethu, ukuhluka kwempendulo yimetric efunyenwe kwisiseko sedatha kwiimvavanyo zomthwalo: tps;
Kwaye iimpawu ziyinxalenye yevektha yokulungisa.

Ngoko apha randomForest ivavanya ukubaluleka kophawu lomzekelo ngamnye onamanani amabini: %IncMSE - indlela ubukho / ukungabikho kolu phawu kumzekelo kuguqula umgangatho we-MSE walo mzekelo (Mean Impazamo yesikwere);

Kwaye i-IncNodePurity linani elibonisa indlela, ngokusekwe kumaxabiso ale mpawu, iseti yedatha enemigqaliselo inokwahlulwa, ukuze kwicandelo elinye kukho idatha enexabiso elinye lemetric ecaciswayo, kwaye kwenye elinye ixabiso lemetrikhi.
Ewe, oko kukuthi: ingakanani le mpawu yokuhlela (ndibone eyona icacileyo, ingcaciso yolwimi lwesiRashiya kwiRandomForest apha).

Ikhowudi engu-R yabasebenzi-yokusetyenzwa kwedatha eneziphumo zovavanyo lomthwalo:

x=NULL
v_data_file=paste('/tmp/data1.dat',sep="")
x=read.table(v_data_file, header = TRUE, sep = ";", dec=",", quote = ""'", stringsAsFactors=FALSE)
colnames(x)=c('metric','rgsize','rgcount','lamp','cmtl','cmtw','lgbffr','lct','dbwrp','undo_retention','tprs','disk_async_io','filesystemio_options','db_block_checking','db_block_checksum')

idxTrain=sample(nrow(x),as.integer(nrow(x)*0.7))
idxNotTrain=which(! 1:nrow(x) %in% idxTrain )
TrainDS=x[idxTrain,]
ValidateDS=x[idxNotTrain,]

library(randomForest)
#mtry=as.integer( sqrt(dim(x)[2]-1) )
rf=randomForest(metric ~ ., data=TrainDS, ntree=40, mtry=3, replace=T, nodesize=2, importance=T, do.trace=10, localImp=F)
ValidateDS$predicted=predict(rf, newdata=ValidateDS[,colnames(ValidateDS)!="metric"], type="response")
sum((ValidateDS$metric-ValidateDS$predicted)^2)
rf$importance

Unokukhetha ngokuthe ngqo i-hyperparameters ye-algorithm ngezandla zakho kwaye, ugxininise kumgangatho wemodeli, khetha imodeli ezalisekisa ngokuchanekileyo izibikezelo kwidatha yokuqinisekisa.
Ungabhala uhlobo oluthile lomsebenzi walo msebenzi (ngendlela, kwakhona, usebenzisa uhlobo oluthile lwe-algorithm yokuphucula).

Ungasebenzisa iphakheji ye-R caret, hayi ingongoma ebalulekileyo.

Ngenxa yoko, kule meko, esi siphumo silandelayo sifunyenwe ukuvavanya iqondo lokubaluleka kweempawu:

Indlela yenzululwazi yepoke, okanye indlela yokukhetha uqwalaselo lwesiseko sedata usebenzisa iibenchmarks kunye ne-algorithm yokuphucula

Kulungile. Ke, sinokuqala ukubonakaliswa kwehlabathi:

  1. Kuvela ukuba eyona nto ibalulekileyo, phantsi kwezi meko zokuvavanya, yayiyiparameter commit_wait
    Ngobuchwephesha, ixela indlela yophumezo yokusebenza kwe-io yokubhala kwakhona idata ukusuka kwi-subdb log buffer kwiqela lelog yangoku: i-synchronous okanye i-asynchronous.
    Nentsingiselo nowait nto leyo ekhokelela ekunyukeni okuthe nkqo, okuphindaphindiweyo kwixabiso le-tps metric: oku kubandakanywa kwemowudi ye-asynchronous io kumaqela okubuyisela kwakhona.
    Umbuzo owahlukileyo ngowokuba ngaba kufuneka wenze oku kuluhlu lokutya. Apha ndizinqanda ekuchazeni nje: lo ngumba obalulekileyo.
  2. Kunengqiqo ukuba ubungakanani belog buffer ye-subd: ijika ibe yinto ebalulekileyo.
    Ubuncinci besayizi yesithinteli selogi, kokukhona sincinci umthamo wayo, kokukhona siphuphuma rhoqo kunye/okanye ukungakwazi ukwaba indawo esimahla kuyo inxalenye yedatha entsha yeredox.
    Oku kuthetha: ulibaziseko olunxulunyaniswa nokwabiwa kwesithuba kwilog yesikhuseli kunye/okanye ukulahla idata yophinda yenziwe kuyo kumaqela okwenziwa kwakhona.
    Oku kulibaziseka, ngokuqinisekileyo, kufanele kwaye kuchaphazele ukwenzeka kwezinto kwiziko ledatha ukulungiselela iitransekshini.
  3. IParamu db_block_checksum: kuhle, kwakhona, ngokubanzi kucacile - ukusetyenzwa kwentengiselwano kukhokelela ekwenziweni kweebhloko ze-darty kwi-cache ye-buffer ye-subdatabase.
    Yiyiphi, xa kuhlolwa i-checksums ye-datablocks ivuliwe, i-database kufuneka iqhube - ukubala ezi zitshekisho ezivela kumzimba we-datablock, zikhangele kunye noko kubhaliweyo kwi-header block block: imidlalo / ayifani.
    Umsebenzi onjalo, kwakhona, awukwazi kodwa ukulibazisa ukusetyenzwa kwedatha, kwaye ngokufanelekileyo, ipharamitha kunye nendlela ebeka le parameter ibonakale ibalulekile.
    Yiyo loo nto umthengisi enikezela, kumaxwebhu ale parameter, amaxabiso ahlukeneyo ayo (iparamitha) kwaye aphawule ukuba ewe, kuya kubakho impembelelo, kodwa, ke, ungakhetha amaxabiso ahlukeneyo, ukuya kuthi ga “kucime” kwaye iimpembelelo ezahlukeneyo.

Ewe, isiphelo sehlabathi.

Indlela, ngokubanzi, ibonakala isebenza kakhulu.

Uzivumela ngokwakhe, kumanqanaba okuqala ovavanyo lomthwalo wenkqubo ethile yenkonzo, ukuze akhethe (inkqubo) ulungelelwaniso olufanelekileyo lomthwalo, ukuba angajongi kakhulu kwiinkcukacha zokuseta inkqubo yomthwalo.

Kodwa ayikukhupheli ngaphandle ngokupheleleyo - ubuncinci kwinqanaba lokuqonda: inkqubo kufuneka yaziwe malunga "nemibhobho yokulungelelanisa" kunye noluhlu oluvumelekileyo lokujikeleza kwala maqhosha.

Indlela yokujonga inokuthi emva koko ifumane ulungelelwaniso lwenkqubo olulolona lufanelekileyo.
Kwaye ngokusekwe kwiziphumo zovavanyo, kuyenzeka ukuba ufumane ulwazi malunga nobume bobudlelwane phakathi kweemethrikhi zokusebenza kwenkqubo kunye namaxabiso eeparamitha zokuseta inkqubo.

Yiyiphi, ngokuqinisekileyo, kufuneka ifake isandla ekuveleni kolu lwazi olunzulu kakhulu lwenkqubo, ukusebenza kwayo, ubuncinane phantsi komthwalo onikiweyo.

Enyanisweni, oku kukutshintshana kweendleko zokuqonda inkqubo eyenziwe ngokwezifiso kwiindleko zokulungiselela uvavanyo olunjalo lwenkqubo.

Ndingathanda ukuqaphela ngokwahlukileyo: kule ndlela, iqondo lokufaneleka kovavanyo lwenkqubo kwiimeko zokusebenza eziya kuba nazo ekusebenzeni kwezorhwebo kubaluleke kakhulu.

Enkosi ngengqalelo yakho kunye nexesha.

umthombo: www.habr.com

Yongeza izimvo