狠狠综合久久AV一区二区三区 ,伊人成综合人网,多人疯狂的在她体内撞np

facebook，做為世界上最大的php應用網站，為php貢獻出了hhvm xhprof等優秀開源工具，其中xhprof已成為很多phper調試php性能瓶頸的利器。本文作者將從xhprof源碼出發，看看xhprof是怎么做到性能分析的

關鍵數據結構

xhprof主要只使用了如下兩個數據結構：

xhprof的兩種分析模式

1、XHPROF_MODE_HIERARCHICAL模式，該模式是詳細分析整個PHP代碼的執行情況，其輸出的分析數據如下：

array(7){["main()==>load::./inc.php"]=>array(5){……}["main()==>run_init::Test/inc.php"]=>array(5){……}["bar==>echoHello"]=>array(5){……}["foo==>bar"]=>array(5){……}["main()==>foo"]=>array(5){……}["main()==>xhprof_disable"]=>array(5){……}["main()"]=>array(5){["ct"]=>int(1)["wt"]=>int(390372)["cpu"]=>int(392000)["mu"]=>int(15040)["pmu"]=>int(10024)}}

2、XHPROF_MODE_SAMPLED模式，該模式每隔0.1秒取樣一次，記錄當前執行的堆棧，其輸出的分析數據如下：

array(5){["1460294938.300000"]=>string(30)

"main()==>foo==>bar==>echoHello"["1460294938.400000"]=>string(30)

"main()==>foo==>bar==>echoHello"["1460294938.500000"]=>string(30)

"main()==>foo==>bar==>echoHello"["1460294938.600000"]=>string(30)

"main()==>foo==>bar==>echoHello"["1460294938.700000"]=>string(30)

"main()==>foo==>bar==>echoHello"

}

該模式通過使用xhprof-flamegraphs和FlameGraph可生成flame graph，如下圖（我的測試代碼的圖太簡陋，就用xhprof-flamegraphs的圖代之了= =）：

XHPROF_MODE_HIERARCHICAL模式分析

一、xhprof_enable([ int $flags = 0 [, array $options ]] )的第二個參數$options用于過濾掉不想被profile的函數，過濾函數功能的實現：

1、在xhprof_enable()中會先執行：hp_get_ignored_functions_from_arg(optional_array)，將要忽略的函數存儲到char **hp_globals.ignored_function_names中。

2、接著執行hp_ignored_functions_filter_init()初始化uint8 hp_globals.ignored_function_filter[XHPROF_IGNORED_FUNCTION_FILTER_SIZE]，具體代碼如下：

static

void

hp_ignored_functions_filter_init(){

if(hp_globals.ignored_function_names

!=NULL){inti=0;for(;

hp_globals.ignored_function_names[i]

!=NULL;

i++){char*str=

hp_globals.ignored_function_names[i];uint8hash=

hp_inline_hash(str);

//根據函數名做hashhash >> 3

intidx=INDEX_2_BYTE(hash);

hp_globals.ignored_function_filter[idx]

|=INDEX_2_BIT(hash);

//1 << (hash & 0x7)

}}

}

因為XHPROF_IGNORED_FUNCTION_FILTER_SIZE為 32，所以INDEX_2_BYTE(hash)將hash右移3位，高位補0，確保得到的idx不會超過32。

hp_globals.ignored_function_filter是uint8類型數組，所以INDEX_2_BIT(hash)就是將hash映射到這8個bit中的某個位置。

也就是說一個hp_globals.ignored_function_filter的元素有可能保存多個hash值的映射。

3、過濾的判斷是通過hp_ignore_entry()->hp_ignore_entry_work()進行的，具體代碼：

int

hp_ignored_functions_filter_collision

(uint8hash){uint8mask=INDEX_2_BIT(hash);return

hp_globals.ignored_function_filter

[INDEX_2_BYTE(hash)]&mask;}

/*該方法首先判斷curr_func的hash是否在過濾列表

hp_globals.ignored_function_filter中如果存在，因為存在hash碰撞，

那么還需要判斷curr_func是否

在hp_globals.ignored_function_names中hp_globals.ignored_function_filter的存在就是

為了減少直接根據函數名去判斷是否需要過濾*/

int

hp_ignore_entry_work

(uint8hash_code,char*curr_func){intignore=0;if(

hp_ignored_functions_filter_collision

(hash_code)

){inti=0;for(;

hp_globals.ignored_function_names[i]

!=NULL;i++){char*name

=hp_globals.ignored_function_names[i];if(!strcmp(curr_func,name)){ignore++;break;}}}returnignore;

}

二、打點采集性能數據的實現：

在hp_begin(long level, long xhprof_flags TSRMLS_DC)中，替換掉了zend內核execute_data的執行函數以及一些編譯代碼的函數，相當于加了一層proxy，部分代碼如下：

_zend_compile_file=zend_compile_file;

//編譯PHP文件

zend_compile_file=hp_compile_file;_zend_compile_string=zend_compile_string;

//PHP的eval函數

zend_compile_string=hp_compile_string;_zend_execute_ex=zend_execute_ex;

//execute_data的執行函數

zend_execute_ex=hp_execute_ex;_zend_execute_internal=zend_execute_internal;

//內部函數（C函數）的執行

zend_execute_internal=hp_execute_internal;

在每一層proxy中，都會調用BEGIN_PROFILING和END_PROFILING，以hp_execute_ex為例：

ZEND_DLEXPORT

void

hp_execute_ex

(zend_execute_data*execute_dataTSRMLS_DC){……BEGIN_PROFILING(

&hp_globals.entries,func,

hp_profile_flag);

//函數執行前打點

#if PHP_VERSION_ID < 50500

_zend_execute(opsTSRMLS_CC);

#else

_zend_execute_ex(execute_dataTSRMLS_CC);

#endif

if(hp_globals.entries){END_PROFILING(&hp_globals.entries

,hp_profile_flag);

//函數執行結束記錄統計信息

}

efree(func);

}

三、xhprof_disable輸出數據中ct和wt的實現

ct是當前代碼塊被執行的次數，在END_PROFILING->hp_globals.mode_cb.end_fn_cb->hp_mode_hier_endfn_cb->hp_mode_shared_endfn_cb中：

hp_inc_count(counts,"ct",1TSRMLS_CC)

在每次代碼塊執行結束后就會對其對應的ct增1。

wt是當前代碼塊總的執行時間（wall clock time），在END_PROFILING->hp_globals.mode_cb.end_fn_cb->hp_mode_hier_endfn_cb->hp_mode_shared_endfn_cb中：

tsc_end=cycle_timer();hp_inc_count(

counts,

"wt",

get_us_from_tsc(tsc_end-top->tsc_start,hp_globals.cpu_frequencies

[hp_globals.cur_cpu_id])

TSRMLS_CC);

top->tsc_start是在BEGIN_PROFILING->hp_globals.mode_cb.begin_fn_cb->hp_mode_hier_beginfn_cb()中通過cycle_timer()獲得的，具體代碼：

//通過rdtsc匯編指令獲取CPU時鐘周期

staticinlineuint64cycle_timer(){uint32__a,__d;uint64val;asmvolatile("rdtsc":"=a"(__a),"=d"(__d));(val)=((uint64)__a)|(((uint64)__d)<<32);returnval;

}

hp_globals.cpu_frequencies[hp_globals.cur_cpu_id]存儲了各個CPU對應的時鐘頻率，時鐘頻率的獲取是通過如下方式：

static

double

get_cpu_frequency(){structtimevalstart;structtimevalend;if(gettimeofday(&start,0)){perror("gettimeofday");return0.0;}uint64tsc_start=cycle_timer();/* Sleep for 5 miliseconds.

Comparaing with gettimeofday's

few microseconds* execution time, this should be enough. */usleep(5000);if(gettimeofday(&end,0)){perror("gettimeofday");return0.0;}uint64tsc_end=cycle_timer();// 時鐘周期數/時間 = 時鐘頻率

return

(tsc_end-tsc_start)*1.0

/(get_us_interval(&start,&end));

}

static

void

get_all_cpu_frequencies(){intid;doublefrequency;hp_globals.cpu_frequencies

=malloc(sizeof(double)*hp_globals.cpu_num);if(hp_globals.cpu_frequencies==NULL){return;}/* Iterate over all cpus found

on the machine. */for(id=0;

++id){/* Only get the previous cpu affinity

mask for the first call. */if(bind_to_cpu(id)){

//為了測定每個CPU核的時鐘頻率，

//需要先綁定到指定的核上運行

clear_frequencies();return;}/* Make sure the current process

gets scheduled to the target cpu.

This might not be necessary though. */usleep(0);frequency=get_cpu_frequency();if(frequency==0.0){clear_frequencies();return;}hp_globals.cpu_frequencies[id]

=frequency;}

}

在獲取了每個核的CPU時鐘頻率后，會隨機地綁定到某個核上繼續執行。

最后在get_us_from_tsc()中，通過代碼塊執行花費的時鐘周期數/當前CPU時鐘頻率得到代碼塊執行的時間wt。采用這種方式能更精確地獲取wt，欲詳細了解可以去研究下micro-benchmarking= =。

四、xhprof_disable輸出數據中cpu的實現

在END_PROFILING->hp_globals.mode_cb.end_fn_cb->hp_mode_hier_endfn_cb中：

if(hp_globals.xhprof_flags

&XHPROF_FLAGS_CPU){/* Get CPU usage */getrusage(RUSAGE_SELF,&ru_end);

//系統調用，獲取當前進程的資源使用情況/* Bump CPU stats in the counts hashtable */hp_inc_count(counts,"cpu",

(get_us_interval(

&(top->ru_start_hprof.ru_utime),&(ru_end.ru_utime))+get_us_interval(

&(top->ru_start_hprof.ru_stime),&(ru_end.ru_stime)))TSRMLS_CC

);}

top->ru_start_hprof是在hp_mode_hier_beginfn_cb()中通過getrusage()設置的。

ru_utime為user time，ru_stime為system time，兩者加起來就得到cpu time了。

五、xhprof_disable輸出數據中mu和pmu的實現

在END_PROFILING->hp_globals.mode_cb.end_fn_cb->hp_mode_hier_endfn_cb中：

if(hp_globals.xhprof_flags

&XHPROF_FLAGS_MEMORY){/* Get Memory usage */mu_end=zend_memory_usage(0

TSRMLS_CC);pmu_end=zend_memory_peak_usage(0

TSRMLS_CC);/* Bump Memory stats in the counts hashtable */hp_inc_count(counts,"mu",

mu_end-top->mu_start_hprof

TSRMLS_CC);hp_inc_count(counts,"pmu",

pmu_end-top->pmu_start_hprof

TSRMLS_CC);}

top->mu_start_hprof和top->pmu_start_hprof已在BEGIN_PROFILING->hp_globals.mode_cb.begin_fn_cb->hp_mode_hier_beginfn_cb中通過zend_memory_usage和zend_memory_peak_usage賦值。這兩個zend函數的實現：

ZEND_API

size_t

zend_memory_usage

(intreal_usageTSRMLS_DC){if(real_usage){returnAG(mm_heap)->real_size;

//PHP實際占用了的系統內存

}else{size_tusage=AG(mm_heap)->size;

#if ZEND_MM_CACHE

usage-=AG(mm_heap)->cached;

#endif

returnusage;}

}

ZEND_API

size_t

zend_memory_peak_usage

(intreal_usageTSRMLS_DC){if(real_usage){returnAG(mm_heap)->real_peak;}else{returnAG(mm_heap)->peak;}

}

可見，這里獲取的mu和pmu是當前使用到的內存，不包括已從系統申請的但未使用的。

六、由上面可發現各項統計信息是通過hp_inc_count進行疊加得到的。

XHPROF_MODE_SAMPLED模式分析

一、該模式不支持過濾掉不想被profile的函數

二、打點方式與XHPROF_MODE_HIERARCHICAL模式相同，不同點在于BEGIN_PROFILING調用的是hp_mode_sampled_beginfn_cb，END_PROFILING調用的是hp_mode_sampled_endfn_cb，而在這兩個函數中都只調用了hp_sample_check()，其代碼如下：

void

hp_sample_check

(hp_entry_t**entriesTSRMLS_DC){/* Validate input */if(!entries||!(*entries)){return;}/* See if its time to sample.

//While loop is to handle a single function ? * taking a long time and passing

several sampling intervals. */while(

(cycle_timer()-hp_globals.last_sample_tsc)>hp_globals.sampling_interval_tsc){

//如果當前時鐘周期數 - 上一次的時鐘周期數

> 采樣的時鐘周期間隔則繼續采樣/* bump last_sample_tsc */hp_globals.last_sample_tsc

+=hp_globals.sampling_interval_tsc;

//將上一次的時鐘周期數加上采樣的時鐘周期數間隔/* bump last_sample_time -

HAS TO BE UPDATED BEFORE

calling hp_sample_stack */incr_us_interval(

&hp_globals.last_sample_time,

XHPROF_SAMPLING_INTERVAL);

//更新上一次的采樣時間點/* sample the stack */hp_sample_stack(entriesTSRMLS_CC);

//采樣數據

}return;

}

在hp_sample_stack()中就是往hp_globals.stats_count中添加：函數調用棧 => 采樣時間點。

在hp_begin->hp_init_profiler_state->hp_globals.mode_cb.init_cb->hp_mode_sampled_init_cb中做了一些初始化工作：

void

hp_mode_sampled_init_cb

(TSRMLS_D){structtimevalnow;uint64truncated_us;uint64truncated_tsc;doublecpu_freq

=hp_globals.cpu_frequencies[

hp_globals.cur_cpu_id];/* Init the last_sample in tsc */hp_globals.last_sample_tsc

=cycle_timer();

//初始化開始采樣的時鐘周期數

gettimeofday(

&hp_globals.last_sample_time

,0);

//初始化開始采樣的時間點

now=hp_globals.last_sample_time;

XHPROF_SAMPLING_INTERVAL的值為0.1秒

hp_trunc_time的作用是

將hp_globals.last_sample_time更新

為XHPROF_SAMPLING_INTERVAL的整數倍

hp_trunc_time(

&hp_globals.last_sample_time,

XHPROF_SAMPLING_INTERVAL);

truncated_us=get_us_interval(

&hp_globals.last_sample_time

,&now);

//被hp_trunc_time 截斷掉的時間

truncated_tsc=

get_tsc_from_us(

truncated_us,cpu_freq);

if(hp_globals.last_sample_tsc

>truncated_tsc){/* just to be safe while

subtracting unsigned ints */hp_globals.last_sample_tsc

-=truncated_tsc;

//為了使last_sample_tsc

和last_sample_time保持同步}對于hp_globals.last_sample_tsc

<= truncated_tsc的情況，

出現的可能性非常小，

即使真的出現了也只是漏了第一次采樣hp_globals.sampling_interval_tsc

=get_tsc_from_us(

XHPROF_SAMPLING_INTERVAL

,cpu_freq);

}

三、函數調用堆棧的實現

對于每一個hp_entry_t（即分析點），都會有一個prev_hprof屬性指向上一層的分析點，hp_get_function_stack(hp_entry_t *entry, int level, char *result_buf, size_t result_len)就是通過這個將函數調用堆棧的函數名串起來，在XHPROF_MODE_SAMPLED模式下level傳參是INT_MAX，也就是說盡可能的將整個函數調用棧的函數名串起來返回，而在XHPROF_MODE_HIERARCHICAL模式下level傳參是2，也就是說只取當前跟其上一級的函數名串起來返回，從兩種模式的輸出結果就可以看出來了。

總結

從以上分析，基本了解到了xhprof的整個實現，也更清楚的知道xhprof的性能分析數據的含義，即使是采用XHPROF_MODE_HIERARCHICAL模式，我們也知道xhprof只是在每個函數執行前后進行打點和采樣，對性能的影響是很小的。

--------------偉大的分割線----------------

PHP飯米粒(phpfamily) 由一群靠譜的人建立，愿為PHPer帶來一些值得細細品味的精神食糧！

飯米粒只發原創或授權發表的文章，不轉載網上的文章

所發的文章，均可找到原作者進行溝通。

也希望各位多多打賞（算作稿費給文章作者），更希望大家多多投搞。

投稿請聯系：

shenzhe163@gmail.com

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

php性能分析工具xhprof分析

php性能分析工具xhprof分析

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

php性能分析工具xhprof分析

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频