Y-branches When you come to a fork in the road, take it


2023年12月23日发(作者数学题目解答扫一扫)

Y-Branches:WhenYouCometoaForkintheRoad,TakeItNicholasWangMichaelFertigSanjayPatelCenterforReliableandHigh-PerformanceComputingDepartmentofElectricalandComputerEngineeringUniversityofIllinoisatUrbana-Champaignnwang,fertig,sjp@tractInthispaper,westudytheefhtheuseofstatisticalsampling,wefindthatabout40%ofalldy-namicbranchesandabout50%ofmispredictedbranchesherexaminethisunexpectedphenomenon,weprovideaesofsuchcodiideastaty,wesuggestsometechniquesforexploitingthisbehavior,uctionControlspeculationentely,though,predictabilitybecomesalimitingfactorinimprov-ingprocessorperformance:atsomelevel,improvingtheperformanceofasinglethreadofexecutioncanbetiedtoimprovingtheprocessor’paper,aultinjection,wefindthatabout40%,thebehavioroftheapplicatcbranchesthataremispredictedex-hibitalargerdegreeofoutcome-tolerance—abouthalfofallmispredictedbranchescanproceeddownanincorrectarchitecturalpathandresultincorrectexecution.

ologyInordertocharacterizeY-branches,findallY-branchesforourinputsetswouldrequitiscomputa-tionallyexpensivetoperformthiscompleteanalysis,ctiondescribesourexperi-mentalsetupaswellastheapplicationsandinfrastructureusedtoperformtheidentifimingInjectionForeachexperiment,rminewhetherabranchisoutcome-tolerant,wefi,whenthechosenbranchisencoun-teredatruntime,,iftheconditionalbranchwasresolvedtobetaken,thentheexyoneinjectionismadeperbenchmarkexecution,thesebenchmarkexecutionsconstituteasingletrial,nditionalbranchesaretargetedinourstudy—allothercontrolinstructionsareex-ecutedasdefiachinjectionisperformed,rminetheeffectsoftheinjection,wecomparetheinjectedsimu-lationwithrchitecturalstate(PC,registers,andmemory)isfrequentlycomparedagacoveryofthispointguaranteesidenticalex-ecutionthereafteraswellasconfiplicationsdonotexecuteinisolation,soitisimportantthatallexystemcallscontainafinitesetofinputs,wetreatthemasbarriersforverifi,theinjectedpro-grammustreconvergetotheoriginalprogram’-strictionreducesreconvergencelikelihood,however,italsoreducessimulationtimeandcomplexity.

domselectionisdonebeforethetrialbeginsbygeneratingapseudo-randomnumberbetweenoneandthenumberofe-tolerantmispredictedbranchesaredeter-minedbyrestrictingthesetofdynamicbranchestothosethataremispredictedbyan18-bitgshareconditionalbranchpredictor[9].Again,edacrossallbenchmarks,approxi-mately30%imately40%ofalldynamicbranchesareoutcome-tolerant,indicatingthatdynamicallyfry,50%ofdynamicallymispredictedbranchesexhibitoutcome-tolerance,indicatingthathalfofallmisnificantriseinY-branchesinthecaseofmispredicchmarkbzip2containsanunrolledlooptr,theloopend-ingbranchesoftranchescansafelyperformearlrly,theincreaseinoutcome-toleranceinthebenchmarkgzipisduetothreebranchesthatalsoaccountforasmallportionofthepro-gramexecution,butrepresentap(instructions)239028962116

ally,iftheinjectedbranchisaY-branch,thetwostreamsconvergeandexecutethesameinstructionsfortheremainderoftheprogram—nininstructionsbetweenthepointofinjectionandthecontrolreconvergencepointinthein-jectedexecutioniswhatwecallthecontrolreconvergencelatency,whichislabeledinthefimoreinstructionslater,iftheinjectedbranchisaY-branch,bothexecutionsarriveatthesamearchitecturalstate,lstatereconver-gencelatencyislabeledinthefisactuallyanotherinterestingpoint:livestatereconvergence,thatissomeofthefullstateoftherunningprocessisactuallydaterecon-vergencewillation,livesta,thesetofbranchesthatareoutcome-tolerantwithrespecttolivestateisasupersetofthunately,livestatereconvergenceisdifficultandex-pensivetomeasureprecisely,nce ExecutionInjected ExecutionInjection PointrstControl ConvergenceFull State idedataonthespan(ininstructions)betweentheinjectionofaY-branchandthesubsequentcontrolre-convergence(withrespecttotheinjectedsimulation),ethedistributionsofthedataareverywide,wehaveonlyplottedthemiddle80%,mple,80%ofY-branchesinbzianY-branchinbzip2reconvergesoncontrolin41instructions,raphswithmedians,isinmind,theaveragebenfersslightlyfromtheotherbenchmarksinthatmansonforthisisthaingthisbranchusuallyresultsinanearlylooetheshortcontrolreconver-gencelatency,eon’evethisoccursbecausetheearlyloopexitsskipanumberofdeadstores,wherethedeadvaluestheywouldhavewrittenhavearelativelylonglifetime.1012ynpc4plotsthefullstatereconvergencespansininstructionsforthemiddle80%thatfullstatereconvergencespansaremuchwiderrangesthanforcontrolreconvergence,rangingfrom1instruallbenchmarks,themedianY-branchreachesfullstatereconvergenceafterapproximately1300instructions.1ynpc5presentsthediff,eofthenatureofthebranchbeinginjected,sometimestheinjectedexecutionrunsforfewerinstructionsthanthereferenceexecution,

sometimeslonger(weexplainthisinSection3.3).Therangesareprovidedinthisfigure(positivenumbersindi-catethattheinjectedexecutionranforfewerinstructions).Inthemediancasefortheaveragebenchmark,theinjectedexecatinterestingisthefactthatinroughlyhalfthecases,theinjectedexecutionrequiredfentsatahigherthanexpecteddegreeofredundancyindynamiccodestreams,indicatingthatad-ditionaloptimizationopportunitiesexist.22080400-40-80-120-160-200-6062yrkfsofOutcomeToleranceOutcome-tolerantbranchesareedundanciesoccurinvariousforms,suchassuper-fluousloopiterations,performance-relatedbranches,short-circuits,theseY-branchesarearesultofpartiallydeadcontrol,meaningthatthebranchesarsection,weperformacarefuldissectionofasamplingofoutcome-tolerantbranchesinordertofidesireisanabilitytoexaminethecontrolflowgraphandpossiblythesourcecodeofadetectedY-branchinordertochisautomatically,weexaminethedifferences,re-ferringtoFigure2,weexaminethedifferencesinPCsex-ecutedbetweentheshadedregionsofeachexecution,icsituationsarisewhenweexaminethediffer-encebetweentheinjectedandreferenceexecutions,tmostsubfi,theinjectedexecutionandthertuationisprimarilyduetotheY-branchbeinganif-elseconstruct,wherethedecisionmadebytheparticulardynamicinstanceofthebranchisirrelevant(weprovideaspecificexampleofthisattheendofthissection).Sometimesthereferenceexecutionexecutesmoreinstruc-tions;ion AExecution BExecution AExecution BInjection PointInjection PointControl ConvergenceControl ConvergenceDisjoint CaseSkip erbasicsituationisrepresentedintherightsubfi,oneoftheexecutionsskipsoverinstruc-t,ExecutionAmightexecuteblocksX,Y,ZbeforehittingthecontrolreconvergencepointwhereasExecutionBflmple,iftheinjec-tiontargetsaY-branchthatisaloop-endingbranchcausingthelooptoeitherterminateearlyorexecuteadditionaliter-ations, (A || B) { X;}Y;-typeY-branchescanbefurtherclassifi,noneofthebranchesintheshadedregionareback-wardsbranches),thecodeconstructresponsiblemple,Figure7demonstratesthecontrolflnchattheendofblockAisanearlyjumptoblockX

80%70%[Other] Unclassified60%[Disjoint]50%[Skip] Complex40%[Skip] Loop30%20%10%0%vortexaverageparserperlbmkcraftyeongapgzipbzip2twolfgccmvprcf[Skip] anchispotentiallyaY-branchoftheSkipvY-branchesofthiscategorySkip/atsomeneskip-typeY-branchinvolvesabackwardsbranch,thenweclassifythebranchaseitheraSkip/LooporSkip//LoopbranchesarecaseswhereweareconfidentthattheY-branchisaloop-terminalconditionalbranch,whereastheSkip/Complexsituationisindeterminate:abackwardsbranchisencountered,unately,itisnotclearfallcasesaresocleanasthecategorizationsofSkip/Simple,Skip/Loop,Skip/Complex,onally,theregionbetweentheieswherethisregionisover1Minstructions,rmore,casesarisewheretheY-branchinjectionhaslingeringeffectsonpro-gramstate,causingcontroltorandomlywaverbeforefi-isoccurs,weseverthetracesatthefiasesarecalledSkip/Loop+Slop,Skip/Simple+Slop,aincasesitisdifficulttofindapointatwhichtotruncatesowegiveup(Unclassified).Figure8providesadetailedclassifiisfigure,itispossibletoidentage,thecontributionsto-proximately10%ofmispredictedbranchesareoutcome-tolerantandduetoshort-circuitlikecodeconstructs(Skip/SimpleandSkip/Simple+Slop).Approximately20%ofmispredictionsareY-branchesthatinvolveloops(Skip/Loop,Skip/Complex,...).Approximately15%oftheminvolveif-else-typeconstructs(Skip/Disjoint,Skip/Disjoint+Slop).ficexamplesTofurtherillustratethenatureoftheoutcome-tolerancephenomenon,weexaminespecificexamples,takenfromSPEC,tatementiswithinado-whileloop,fthemevaluatetotrue,thereinary,thecompilerimplementsthisifstatementasfourconditionalbranches,eachofwhichtestsoneoftheconditionsand,iftrue,rvethatthefirstbranchaccountsforabout15%ofallmispredictionsingzipandabout90%ral,thefullstateandcontrolreconvergencelatenciestendtobeshortforshort-circuitlikecodeconstructs,2isafilecompres-sionprogramthatemploysmove-to-frontcodingafterap-plyingatransformationonthedatatobecompressed[4].Whenanelementisusedduringencodingordecoding,itismovedtothefrontofalist,ftingofelementsisim-plementedinatightloop,rcecodeisshowninFigure10.

do { ... if (match[best_len] != scan_end || match[best_len−1] != scan_end1 || *match != *scan || *++match != scan[1]) continue; ...} while ( ... );pilerhasfurtherunrolledeachofthesetwoloopsfourtimes,resultingina16xunrolledloop,two4xunrolledloops,ofaclean-uploopallorationsnotperformedbytheunrolledloopswillbecompletedbysubsequentclean-upcode,stingly,inputset,thenumberofelementsthatneedtobeshifted(andthusthenumberoforiginalloopiterationsrequired)ingthisloopexacerbatesthisprob-lembyaddingmorestaticbranches,nchesassociated == yy[nextSym−1];j = nextSym − 1;for (; j > 3; j −= 4) { yy[j] = yy[j−1]; yy[j−1] = yy[j−2]; yy[j−2] = yy[j−3]; yy[j−3] = yy[j−4];}for (; j > 0; j−−) yy[j] = yy[j−1];yy[0] = tmp;opisfromafunctionthatdeallocatesahashbatforeachloopit-eration,ifthecurrentlyexaminedhashtableentryisNULL,thennooemainderofthehashtableentriesareempty,thentheoutsideloop(theloopiteratingthroughthehashtable)cansafelyexitearly,anchaccountsforapproximately6%oftheexecutedconditionalbranchesinparser,andaboutone-firal,earlylo,thedifferenceininstructionsexecutedisusuallyfor (i=0; inext; xfree(t, sizeof(Table_connector)); }}tetohigh,ctionfirstdeterminentherelativepositioningoftheseblocks,itexecutesoneoftwoloops:onecopiesmemorystart-ingfromthelowestaddresstothehighest,esegmentsofmemorytobecopieddonotoverlap(whichiscommonlythecaseforourinputset),tion,thebranchisoftenmispredicted—likelyduetolackofastrongpatteranchaloneaccountsforapproximately16%enote,byidentifyingthisY-branch,wewereabletomodifyhemodificationsweremade,ral,thistypeoftransformationcanbeusedtoexploitpoorlypre-dictedY-brancheswhensuffi (from − to >= 0) { while (len−−) *to++ = *from++;} else { to += len; from += len; while (len−−) *(−−to) = *(−−from);}seenthatif-elsetypeconstructsshowawidedisnvaryfromshortstatementstorelativelylongloops.

tingOutcomeToleranceDespitethesurprisingfrequencyofoutcome-tolerantdynamicbranches,capitaliziduetothebroadvarietyofcodingconstructsthatgiverisetothephenomenon,andpartlyduetothetrade-offsassoci-atedwithexploitingthem,designandcodeoptimsection,wediscusspossibilitiesonlever-agingoutcome-tolerancetoimproveprogramperformanceandexecutioneffieoftheheavyrelianceoncontrolspeculation(broadlyencompassingbranchprediction,speculativecodemotion,andthread-levelspeculation)forachievinghighperformanceoncontrol-intensiveapplications,thecostofamisspeculationcanbequitehigh,ead-parallelprocessors[10,12]andothermachinesthatmoreaggressivelyevaluateaprogram’scontrolflowgraph,therearelikelytobeadditionalcostsduetore-execution,re-rename,tingoutcome-toleranceinsomefashionmightbeaninterestingnewangletoreducetheoverallcostofmisspeculation,ngthephenomenontothethreebasiccodecon-structsdiscussedintheprevioussection(loopexits,if-elsebranches,andshortcircuits),weidentifywaysthatspecifieloopterminalbranchesfromtheunrollediterationsweresafelyguardedbytheclean-uploop(forearlyloopexits),particularsituation,ifthecostofmispredictingtheloop-endingbranchisgreaterthanthesub-parexecutionresultingfromexecutingthenon-unrolledloop,hervein,someloop-endingY-branchesareweaklyspecified,meaningthataspecificinstanceoftheloopcaniteratefosetypesofbranches,aninjectionthatcausesanearlyloopexitmayresultinsignificantlyfewerinstruc-tionsexecutedthaninthereferenceexecution,indicatingttingoutcome-tolerantloopexitsinameaning-fulmannermightbeaccomplishedbytheprogrammer,provse,compileroptimiza-tionsmightbeabletomoretightlyboundloopiterationsandperhapsreformulatetheiterationboundsforaloopbasedondynamicinstance,asaccomplishedbydynamiccodespecializationtechniques[5].Threadingapproachesthatuseaseparatethreadorspecializedhardwaretoval-idatethearchitecturalstateofanapproximateexecutionthread[13,14]tingIf-ElseNearlyathirdofoutcome-tolerantbranchesarebasedonif-elseconstructs(theDisjointandDisjoint+SlopcategoriesinFigure8).AparticularY-branchoftheif-elsevarietytypicallyperformsanunnecessarycontrolbetweentwoacceptableoptions,suchasthechecaseforperlbmk,knowingthebranchishighlymispredictedandoutcome-tolerantegrammerorcompiler,onceawareofanoutcome-tolerantif-elsebranch,tingShortCircuitsOneverypromisingareaofexploitiranchesarisefromtheuseofshort-circuitingopera-tors(lANDandOR)inCandC++,hesemanticsoftheseoperators,acompilerwillinsertearlyexitbranchesintheevaluationofsuchexpressionswhenaterminalogicalconditionimplicatestheentireexpression,suchasthecaseof(A||B)pilercanelimi-natethisbranchincaseswherenosideeffectsarepossibleinevaluatingB,butonlywhenanyvariablereationtechniquesandconditionalmoveinstruc-tionscanbeusedtohandleshortformsofthiscontrolflowbutattcome-toleranceindicatesisthatforsuchY-branches,icular,ompilercannotsafelyper-formBwithoutsideeffects,nchattheendofA

nthedatapresentedinFigure8,about10%ofmispredictedbranches(nearly20%inbenchmarkscrafty,gcc,andgzip)areSkip/SimplevarietyY-branches,estanarchitecturalmechanisminwhichmis-predictedwedbranch,amispre-dictiondoesnotcauseamisspeculationrecoveryproventialcon-trolfleconstructcorrespondingtothisstruc-turemightbe,forexample,(A||B||C).Supposebranrueandthecontroleventu-allyreachesX,tneedtoactonthemispredictionofbhshort-circuitbranches,ewedbranches,amisspe,mispredictingA(orB)tobrwords,forconditionalstructureswhere(A||B||C)islikelytobetrue,butanyoneofAorBarehardtopredict,eoftheshort-circuitingrestrictions,ifAistrue,ofcontrolspeculation-likemaskinginEPIC[1]andde-layederrorreportterestingtonotethatthedifferencebetweenus-ingskewedbranchesinaregionsuchastheoneinFig-ure13versuspredicationisthatwithskewedbranches,notallblocksarefetchedandevaluated,whichsavesationontheotherhwenotethatcompilertransformations(partic-ularlyinthecompilerweused)attempttoremoveshort-circuitingbranchesbyestablishingthatevaluationsofsub-sequentconditionswillnotgenerateside-effects(inef-fectmirroringthepredicationapproach).Withtheuseofskewedbranches,thecostofmispredictingabranchisre-duced,manceApproximationInthissectionweprovideacursoryperformanceanal-retwomainmethodsbywhichY-branchescanincreaseprogramefficiency:byremovingthetcome-tolerantbranchismispre-dicted,flushingthepipelineandrestartingfetchatthecor-rectaddressisnotnecessary,,ifthenewsetofinstructionsexecutedcanbecompletedfasterthantheoriginal“cor-rect”set,section,weuseasetofheuristicstoidentifyandtargetY-branchmispredictionsduringaprogram’portanttokeepinmindthatthisanalysispro-videsn,itservesasaperfor-manceapproximatiy,itoffersaframingofpely,techniquestoexploitY-branchesmayhavesignificant(possiblynegative)impactonothermicroarchitecturalcomponents,udyoffersremanymispredictedcondi-tionalbranchesintheexecutionofaprogram,anddeter-miningwhedertogetareason-ableapproximationonperformance,weemployasimpleheuristicfiltertofirforamispredictedbranchtobeidentifiedasoutcome-tolerantinthisexperiment,itmust(A)achievefullstateconver-gencewithin1000instructionsand(B)reducethenumberofinstructionsexecutedoverthecorrectarchitecturalpath.

Furthermore,pertainingtocondition(B),weallowtheal-ternativepathanadditionalbudgetof40ins’veobservedthat40instructionportanttonotethateventhoughwedonotex-plicitlyselectwhichY-branchcategoriestotakeadvan-tageof,ouridentifimple,Skip/SimpleY-branchesusuallyexhibitshorterreconvergencelaten-cies,therhand,othertypesofY-branchessuchasSkip/Loopmayhavetheirreconvergencepointafterourlimit,whichsubjectsthemtofidecisionhasbeenmadetofollowanon-architectedpath,itispossibletoencounteoccurs,thenewlymispredictedbranctruebecausethenewbranch,althoughnotintheoriginalinstructionstream,canbetreatedassuchsinceithasalreadybeenidentifiocessoffindingandselectivpoint,anewandvalidpathhasbeendefinedthroughtheprogram,andthispathcanbefingmodelusesthebranchpredictionslaidoutbythefunctionalsimulatoranddoesnotrecoverfrommis-predictedbrancheswhenthepredicteddirectionisalongthepredefined(andguaranteedtobeequivalent)elsimulatesamodern8-widedynamicallyscheduled15-stagepipelinemicroprocessorwithsignificantbranchmispredictionpenalties(duetothedeeppipeline).14showst-age,26%ofmispredictionsinthene-fortunately,sometimesmispredictionsareintmple,ingccthenuteridentifying34%oftheseasY-branches(andthusre-movingtheassociatedmispredictionpenalty),age,6%speees38%speedup,whichisaculminationofafewmainfactors:fewermispredic-tionsalongthenewpath,theremovalof40%ofthesefewermispredictions,andareductionininstructioncountby5%.Ontheotherhand,gccsuffersa-6%speedup,esultsindicatethatourheuristicforchoosingwhethertorecoverfromabranchmispredictionortosimplyfollowthepredictedpathrequiresretooling—r,gzipandmcfshowpromisethatsignificantspeedupscanbeobtainedfromtakingadvantageoftheY-branchphe-nomenon,despitealloftherestrictionswehaveimposed.45%40%35%30%25%20%15%10%5%0%2ynpcptageofmispredictionsre-movedalongthenewexecutionpath.1.41.351.31.251.21.151.11.0510.950.9dWoratedworkfallsintothefollowingcategories:(i)analysisoffaultinjectiononthecontrolpathoftheprogram,(ii)studiesonredundantcode,and(iii)theuseofcontrolflndSiewiorek[6]performedfaultinjectionsatttheiranalysis,theydescribedsce-narioswherecontrolflowcoulddivergetoincorrect,r,theydidnotexploretheeffectsofthese,possiblyoutcome-tolerant,branches.

Guetal.[8]analyzedtheeffectofintroducingfaaultinjectioncampaignswererunthattargetedspecifihesecampaigns,reversingthedirectionofconditionalbranches,specifihorsfoundthat33%pleofsource-levelredundancyisgiventhatoffersanexplanationforthisphenomenon,how-ever,erg[11]exploredtheeffectsofremovinginef-fectualcodefromaprogram’ctualcodeisdefierfocusesontransformationstotheoriginal,architectedpaththatdonotalterthepro-gram’soriginaltraversalofthecontrolflkdiffersinthatwefocusonmodifyingtheoriginalpathtoobtainanalternate,asbeenextensiveresearchondetectingfaultsusingcontrolflecifically,EifertandShen[7]proposedamethodforidentifyingfaultsbyen-codingtheinstructiosionThispaperexploredtheconceptofoutcome-tolerantbranches:conditionalbrancheswhich,irrespectiveofout-come,erunderstandthisphenomenon,aninvestigatiodthatconstructssuchasshort-circuitlikoximationofthebenefirmore,possi-bleextensionstopredicationwereproposedtoharnesstheoutcome-tolerantbranchesthatfallintotheshort-circuitlikelogicalexpressioncategory.7AcknowledgmentsWethanktheothermembersoftheAdvancedComput-ingSystemsgroupaswellasSteveLumetta,ScottMahlke,andtheanoterialisbaseduponworksupportedbytheNationalScienceFoundationunderGrantNo.0092740andtheC2S2MarcoCenter,withsup-portfromAMD,Intel,nces[1],s,,,r,B.-,,an,ateedingsofthe25thAnnualInternationalSymposiumonComputerAr-chitecture,1998.[2]uCometoaForkintheRoad,TakeIt!InspirationandWisdomfromOneofBase-ball’on,2001.[3],,tingfuturemi-croprocessors:calReport1308,UniversityofWisconsin-MadisonTechnicalReport,July1996.[4]calReport124,DigitalEquipmentCorporation,1994.[5]eedingsofthe23rdACMSIGPLAN/SIGACTSymposiumonthePrinciplesofProgrammingLanguages,1996.[6]20thInternationalSymposiumonFaultTolerantComputing,DigestofPapers,pages236–243,June1990.[7]14thInterna-tionalSymposiumonFaultTolerantComputing,DigestofPapers,pages394–399,1984.[8],czyk,,rna-tionalConferenceonDependableSystemsandNetworks,2003.[9]calReportTN-36,DigitalWesternResearchLaboratory,June1993.[10]un,,d,,eedingsofthe7thInternationalConferenceonArchi-tecturalSupportforProgrammingLanguagesandOperat-ingSystems,1996.[11]calreport,NorthCarolinaStateUniversity,November1999.[12],,eedingsofthe22ndAnnualInterna-tionalSymposiumonComputerArchitecture,1992.[13]amoorthy,,-streamprocessors:eedingsofthe9thInternationalConfer-enceonArchitecturalSupportforProgrammingLanguagesandOperatingSystems,2000.[14]/eedingsofthe35thAnnualInternationalSymposiumonMicroarchitecture,2002.


本文发布于:2024-09-20 22:38:29,感谢您对本站的认可!

本文链接:https://www.17tex.com/fanyi/26568.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:题目   解答   数学   作者
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2024 Comsenz Inc.Powered by © 易纺专利技术学习网 豫ICP备2022007602号 豫公网安备41160202000603 站长QQ:729038198 关于我们 投诉建议