Last active
March 24, 2016 02:59
-
-
Save gokul-uf/52c121b7f0b38ffd2b44 to your computer and use it in GitHub Desktop.
Profiler output for theano GSoC proposal
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/home/gokul/Theano/theano/compile/pfunc.py:479: UserWarning: config.profile_optimizer requires config.profile to be set to True as well | |
output_keys=output_keys) | |
Using gpu device 0: GeForce GT 750M (CNMeM is disabled, CuDNN 4004) | |
/home/gokul/Theano/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module. | |
"downsample module has been moved to the theano.tensor.signal.pool module.") | |
Function profiling | |
================== | |
Message: lasagne_googlenet.py:132 | |
Time in 0 calls to Function.__call__: 0.000000e+00s | |
Total compile time: 3.520045e+02s | |
Number of Apply nodes: 1788 | |
Theano Optimizer time: 4.472619e+01s | |
Theano validate time: 4.449729e+00s | |
Theano Linker time (includes C, CUDA code generation/compiling): 3.054581e+02s | |
Import time 1.817276e-01s | |
Time in all call to theano.grad() 7.898371e-01s | |
Time since theano import 370.526s | |
Optimizer Profile | |
----------------- | |
SeqOptimizer OPT_FAST_RUN time 44.726s for 6324/1788 nodes before/after optimization | |
13.111s for callback | |
4.450s for fgraph.validate() | |
time - (name, class, index) - validate time | |
15.334199s - ('canonicalize', 'EquilibriumOptimizer', 4) - 0.226s | |
EquilibriumOptimizer canonicalize | |
time 15.334s for 4 passes | |
nb nodes (start, end, max) 4784 3333 4784 | |
time io_toposort 0.155s | |
time in local optimizers 7.065s | |
time in global optimizers 0.000s | |
time in final optimizers 4.223s | |
time in cleanup optimizers 3.596s | |
0 - 12.757s 4318 (4.007s in global opts, 0.047s io_toposort) - 4784 nodes - ('MergeOptimizer', 1835) ('local_upcast_elemwise_constant_inputs', 904) ('local_add_canonizer', 661) ('local_subtensor_merge', 194) ('local_cut_gpu_host_gpu', 163) ... | |
1 - 1.435s 1005 (0.144s in global opts, 0.043s io_toposort) - 4271 nodes - ('local_subtensor_make_vector', 427) ('MergeOptimizer', 191) ('local_mul_canonizer', 137) ('local_intdiv_by_one', 130) ('local_add_canonizer', 80) ... | |
2 - 0.783s 276 (0.037s in global opts, 0.036s io_toposort) - 3437 nodes - ('local_add_canonizer', 138) ('MergeOptimizer', 129) ('local_cut_gpu_host_gpu', 9) | |
3 - 0.359s 0 (0.038s in global opts, 0.030s io_toposort) - 3333 nodes - | |
times - times applied - nb node created - name: | |
4.223s - 2 - 0 - topo_constant_folding | |
3.596s - 2155 - 5 - MergeOptimizer | |
1.777s - 194 - 5337 - local_subtensor_merge | |
1.442s - 879 - 1146 - local_add_canonizer | |
1.077s - 904 - 2712 - local_upcast_elemwise_constant_inputs | |
0.688s - 254 - 499 - local_mul_canonizer | |
0.285s - 39 - 0 - local_useless_switch | |
0.200s - 97 - 194 - local_mul_switch_sink | |
0.185s - 137 - 812 - local_shape_to_shape_i | |
0.185s - 172 - 0 - local_cut_gpu_host_gpu | |
0.161s - 113 - 172 - local_dimshuffle_lift | |
0.140s - 433 - 0 - local_subtensor_make_vector | |
0.112s - 1 - 6 - local_greedy_distributor | |
0.098s - 5 - 7 - local_fill_sink | |
0.090s - 58 - 0 - local_useless_elemwise | |
0.071s - 12 - 12 - local_useless_slice | |
0.061s - 130 - 0 - local_intdiv_by_one | |
0.004s - 2 - 4 - local_subtensor_lift | |
0.004s - 8 - 0 - local_useless_fill | |
0.003s - 4 - 6 - local_neg_to_mul | |
0.482s - in 62 optimization that where not used (display only those with a runtime > 0) | |
0.112s - local_mul_zero | |
0.080s - local_func_inv | |
0.066s - local_one_minus_erf2 | |
0.047s - local_one_minus_erf | |
0.037s - local_track_shape_i | |
0.027s - local_useless_elemwise_comparison | |
0.022s - local_fill_cut | |
0.020s - local_expm1 | |
0.019s - local_IncSubtensor_serialize | |
0.018s - local_cast_cast | |
0.016s - local_useless_subtensor | |
0.003s - local_div_switch_sink | |
0.002s - local_abs_lift | |
0.001s - local_sum_prod_all_to_none | |
0.001s - local_subtensor_of_alloc | |
0.001s - local_sum_prod_div_dimshuffle | |
0.001s - local_subtensor_of_dot | |
0.001s - local_incsubtensor_of_zeros | |
0.001s - local_useless_inc_subtensor | |
0.001s - local_lift_transpose_through_dot | |
0.001s - local_scalar_tensor_scalar | |
0.001s - local_cut_useless_reduce | |
0.001s - local_op_of_op | |
0.000s - local_reduce_join | |
0.000s - local_join_empty | |
0.000s - local_dimshuffle_no_inplace_at_canonicalize | |
0.000s - local_pow_canonicalize | |
0.000s - local_0_dot_x | |
0.000s - local_setsubtensor_of_constants | |
0.000s - local_join_make_vector | |
0.000s - local_useless_inc_subtensor_alloc | |
0.000s - local_useless_split | |
0.000s - f | |
0.000s - local_join_1 | |
0.000s - local_useless_alloc | |
0.000s - local_useless_reshape | |
0.000s - local_merge_alloc | |
0.000s - local_reshape_lift | |
0.000s - local_tensor_scalar_tensor | |
Global, final and clean up optimizers | |
Iter 0 | |
TopoOptimizer topo_constant_folding | |
nb_node (start, end, changed) (4575, 4461, 114) | |
init io_toposort 0.0480871200562 | |
loop time 3.95878100395 | |
callback_time 0.319423437119 | |
MergeOptimizer | |
nb fail= 0 merged=13675 constant= 6450 | |
time replace=3.32 validate=0.17 callback=2.43 | |
callbacks_time | |
Iter 1 | |
TopoOptimizer topo_constant_folding | |
nb_node (start, end, changed) (3442, 3437, 5) | |
init io_toposort 0.0371699333191 | |
loop time 0.103798151016 | |
callback_time 0.0362775325775 | |
MergeOptimizer | |
nb fail= 0 merged= 286 constant= 170 | |
time replace=0.17 validate=0.00 callback=0.13 | |
Iter 2 | |
TopoOptimizer topo_constant_folding | |
nb_node (start, end, changed) (3333, 3333, 0) | |
init io_toposort 0.0322849750519 | |
loop time 0.00407814979553 | |
callback_time 0.0 | |
MergeOptimizer | |
nb fail= 0 merged= 248 constant= 126 | |
time replace=0.09 validate=0.00 callback=0.07 | |
Iter 3 | |
TopoOptimizer topo_constant_folding | |
nb_node (start, end, changed) (3333, 3333, 0) | |
init io_toposort 0.0343580245972 | |
loop time 0.00389385223389 | |
callback_time 0.0 | |
MergeOptimizer | |
nb fail= 0 merged= 0 constant= 0 | |
time replace=0.00 validate=0.00 callback=0.00 | |
9.217939s - ('gpu_opt', 'SeqOptimizer', 14) - 0.046s | |
SeqOptimizer gpu_opt time 9.218s for 3329/2574 nodes before/after optimization | |
2.273s for callback | |
0.046s for fgraph.validate() | |
8.734742s - ('gpu_local_optimizations', 'EquilibriumOptimizer', 1) - 0.043s | |
EquilibriumOptimizer gpu_local_optimizations | |
time 8.734s for 7 passes | |
nb nodes (start, end, max) 3331 2950 3331 | |
time io_toposort 0.212s | |
time in local optimizers 8.050s | |
time in global optimizers 0.000s | |
time in final optimizers 0.298s | |
time in cleanup optimizers 0.000s | |
0 - 5.999s 1322 (0.193s in global opts, 0.032s io_toposort) - 3331 nodes - ('constant_folding', 462) ('local_gpu_elemwise_1', 444) ('local_gpu_elemwise_0', 279) ('local_gpu_dimshuffle_0', 59) ('local_dnn_convw_alpha_merge', 57) ... | |
1 - 0.554s 270 (0.008s in global opts, 0.033s io_toposort) - 3163 nodes - ('local_gpu_elemwise_1', 156) ('local_gpu_elemwise_0', 97) ('local_gpu_split', 6) ('constant_folding', 3) ('local_pool_dnn_grad_stride', 3) ... | |
2 - 1.373s 408 (0.080s in global opts, 0.030s io_toposort) - 2976 nodes - ('constant_folding', 163) ('local_gpu_elemwise_0', 104) ('local_gpu_elemwise_1', 63) ('local_gpu_careduce', 60) ('local_gpu_incsubtensor', 5) ... | |
3 - 0.431s 108 (0.014s in global opts, 0.031s io_toposort) - 3067 nodes - ('local_gpu_elemwise_1', 63) ('constant_folding', 21) ('local_gpu_join', 9) ('local_gpu_subtensor', 5) ('local_gpu_incsubtensor', 4) ... | |
4 - 0.131s 19 (0.002s in global opts, 0.031s io_toposort) - 2941 nodes - ('local_gpu_incsubtensor', 6) ('constant_folding', 3) ('local_pool_dnn_alternative', 3) ('local_gpu_contiguous_gpu_contiguous', 3) ('local_gpualloc', 2) ... | |
5 - 0.149s 21 (0.002s in global opts, 0.027s io_toposort) - 2950 nodes - ('constant_folding', 6) ('local_gpu_subtensor', 6) ('local_gpu_elemwise_1', 6) ('local_gpualloc_memset_0', 2) ('MergeOptimizer', 1) | |
6 - 0.096s 0 (0.000s in global opts, 0.027s io_toposort) - 2950 nodes - | |
times - times applied - nb node created - name: | |
5.146s - 658 - 0 - constant_folding | |
1.107s - 733 - 1698 - local_gpu_elemwise_1 | |
0.795s - 484 - 1428 - local_gpu_elemwise_0 | |
0.298s - 6 - 0 - MergeOptimizer | |
0.154s - 57 - 456 - local_dnn_convw_alpha_merge | |
0.139s - 60 - 160 - local_gpu_careduce | |
0.123s - 9 - 46 - local_gpu_split | |
0.070s - 64 - 70 - local_gpu_dimshuffle_0 | |
0.059s - 11 - 18 - local_gpu_subtensor | |
0.040s - 9 - 18 - local_gpu_join | |
0.037s - 18 - 36 - local_dnn_convi_output_merge | |
0.022s - 15 - 37 - local_gpu_incsubtensor | |
0.019s - 2 - 4 - local_gpu_reshape | |
0.012s - 3 - 12 - local_pool_dnn_alternative | |
0.008s - 3 - 21 - local_pool_dnn_grad_stride | |
0.006s - 2 - 6 - local_gpu_dot22scalar | |
0.006s - 2 - 6 - local_gpu_dot22 | |
0.005s - 4 - 12 - local_gpualloc | |
0.003s - 3 - 0 - local_gpu_contiguous_gpu_contiguous | |
0.002s - 3 - 3 - local_gpualloc_memset_0 | |
0.002s - 1 - 5 - local_gpu_crossentorpy_softmax_argmax_1hot_with_bias | |
0.001s - 1 - 5 - local_gpu_crossentorpy_softmax_1hot_with_bias_dx | |
0.292s - in 51 optimization that where not used (display only those with a runtime > 0) | |
0.039s - local_elemwise_alloc | |
0.037s - local_track_shape_i | |
0.037s - local_log_softmax_dnn | |
0.027s - local_dnn_conv_output_merge | |
0.024s - local_dnn_conv_alpha_merge | |
0.020s - local_useless_elemwise | |
0.017s - local_dnn_convw_output_merge | |
0.016s - local_dnn_convi_alpha_merge | |
0.008s - gpu_sparse_block_gemv_opt | |
0.005s - gpu_sparse_block_outer_opt | |
0.005s - local_gpu_dot_to_dot22 | |
0.005s - local_gpu_ger | |
0.005s - local_gpu_gemv | |
0.004s - local_gpu_batched_dot | |
0.004s - local_gpu_conv | |
0.004s - local_gpu_lazy_ifelse | |
0.004s - local_gpu_gemm | |
0.004s - gpuScanOptimization | |
0.004s - local_gpu_solve | |
0.004s - local_gpu_specifyShape_0 | |
0.003s - local_gpu_eye | |
0.003s - local_conv2d_gpu_conv | |
0.003s - local_gpu_flatten | |
0.003s - local_gpu_advanced_incsubtensor1 | |
0.003s - local_gpu_advanced_subtensor1 | |
0.002s - local_gpu_allocempty | |
0.001s - local_gpu_elemwise_careduce | |
0.000s - local_subtensor_make_vector | |
0.000s - f | |
0.000s - local_gpujoin_1 | |
0.000s - local_gpu_downsample_factor_max | |
0.000s - local_gpu_downsample_factor_max_grad | |
Global, final and clean up optimizers | |
Iter 0 | |
MergeOptimizer | |
nb fail= 0 merged= 854 constant= 473 | |
time replace=0.19 validate=0.01 callback=0.15 | |
Iter 1 | |
MergeOptimizer | |
nb fail= 0 merged= 80 constant= 9 | |
time replace=0.01 validate=0.00 callback=0.00 | |
Iter 2 | |
MergeOptimizer | |
nb fail= 0 merged= 202 constant= 158 | |
time replace=0.08 validate=0.00 callback=0.07 | |
Iter 3 | |
MergeOptimizer | |
nb fail= 0 merged= 42 constant= 28 | |
time replace=0.01 validate=0.00 callback=0.01 | |
Iter 4 | |
MergeOptimizer | |
nb fail= 0 merged= 12 constant= 11 | |
time replace=0.00 validate=0.00 callback=0.00 | |
Iter 5 | |
MergeOptimizer | |
nb fail= 0 merged= 6 constant= 6 | |
time replace=0.00 validate=0.00 callback=0.00 | |
Iter 6 | |
MergeOptimizer | |
nb fail= 0 merged= 0 constant= 0 | |
time replace=0.00 validate=0.00 callback=0.00 | |
0.482284s - ('gpu_cut_transfers', 'EquilibriumOptimizer', 2) - 0.003s | |
EquilibriumOptimizer gpu_cut_transfers | |
time 0.482s for 2 passes | |
nb nodes (start, end, max) 2950 2574 2950 | |
time io_toposort 0.298s | |
time in local optimizers 0.167s | |
time in global optimizers 0.000s | |
time in final optimizers 0.000s | |
time in cleanup optimizers 0.000s | |
0 - 0.199s 189 (0.000s in global opts, 0.027s io_toposort) - 2950 nodes - ('local_cut_gpu_host_gpu', 189) | |
1 - 0.283s 0 (0.000s in global opts, 0.271s io_toposort) - 2574 nodes - | |
times - times applied - nb node created - name: | |
0.158s - 189 - 0 - local_cut_gpu_host_gpu | |
0.009s - in 1 optimization that where not used (display only those with a runtime > 0) | |
0.009s - constant_folding | |
0.000894s - ('InputToGpuOptimizer', 'InputToGpuOptimizer', 0) - 0.000s | |
9.179657s - ('inplace_elemwise_optimizer', 'FromFunctionOptimizer', 38) - 1.380s | |
3.009087s - ('local_dnn_conv_inplace', 'TopoOptimizer', 33) - 2.658s | |
TopoOptimizer local_dnn_conv_inplace | |
nb_node (start, end, changed) (1760, 1788, 170) | |
init io_toposort 0.0167989730835 | |
loop time 2.99210095406 | |
callback_time 2.77951002121 | |
1.587725s - ('gpu_elemwise_fusion', 'FusionOptimizer', 18) - 0.004s | |
FusionOptimizer | |
nb_iter 3 | |
nb_replacement 352 | |
nb_inconsistency_replace 0 | |
validate_time 0.00416564941406 | |
callback_time 0.226927757263 | |
time_toposort 0.349379062653 | |
1.342049s - ('specialize', 'EquilibriumOptimizer', 11) - 0.000s | |
EquilibriumOptimizer specialize | |
time 1.342s for 3 passes | |
nb nodes (start, end, max) 3336 3329 3336 | |
time io_toposort 0.302s | |
time in local optimizers 0.389s | |
time in global optimizers 0.142s | |
time in final optimizers 0.441s | |
time in cleanup optimizers 0.000s | |
0 - 0.590s 7 (0.402s in global opts, 0.033s io_toposort) - 3336 nodes - ('local_mul_to_sqr', 2) ('local_div_to_inv', 1) ('local_softmax_grad_to_crossentropy_with_softmax_grad', 1) ('local_softmax_with_bias', 1) ('topo_constant_folding', 1) ... | |
1 - 0.500s 2 (0.112s in global opts, 0.236s io_toposort) - 3330 nodes - ('crossentropy_to_crossentropy_with_softmax_with_bias', 1) ('local_useless_crossentropy_softmax_1hot_with_bias_dx_alloc', 1) | |
2 - 0.252s 0 (0.069s in global opts, 0.033s io_toposort) - 3329 nodes - | |
times - times applied - nb node created - name: | |
0.441s - 1 - 0 - topo_constant_folding | |
0.142s - 1 - 1 - crossentropy_to_crossentropy_with_softmax_with_bias | |
0.007s - 2 - 2 - local_mul_to_sqr | |
0.001s - 1 - 1 - local_div_to_inv | |
0.001s - 1 - 1 - local_softmax_grad_to_crossentropy_with_softmax_grad | |
0.001s - 1 - 1 - local_useless_crossentropy_softmax_1hot_with_bias_dx_alloc | |
0.001s - 1 - 1 - local_softmax_with_bias | |
0.000s - 1 - 0 - local_subtensor_make_vector | |
0.378s - in 63 optimization that where not used (display only those with a runtime > 0) | |
0.093s - local_add_specialize | |
0.045s - local_mul_specialize | |
0.041s - local_one_minus_erf2 | |
0.029s - local_elemwise_alloc | |
0.027s - local_useless_elemwise | |
0.025s - local_func_inv | |
0.016s - local_track_shape_i | |
0.015s - local_one_minus_erf | |
0.015s - local_abs_merge | |
0.011s - local_mul_switch_sink | |
0.010s - local_useless_elemwise_comparison | |
0.008s - local_expm1 | |
0.007s - local_elemwise_sub_zeros | |
0.007s - local_logsoftmax | |
0.007s - local_cast_cast | |
0.007s - local_useless_switch | |
0.006s - local_alloc_unary | |
0.001s - local_sum_prod_mul_by_scalar | |
0.001s - local_useless_subtensor | |
0.001s - local_pow_specialize | |
0.001s - local_reduce_broadcastable | |
0.001s - local_sum_prod_div_dimshuffle | |
0.001s - local_useless_inc_subtensor | |
0.001s - local_dimshuffle_lift | |
0.000s - local_useless_slice | |
0.000s - local_opt_alloc | |
0.000s - local_join_empty | |
0.000s - local_grad_log_erfc_neg | |
0.000s - local_useless_alloc | |
0.000s - local_join_make_vector | |
0.000s - local_useless_inc_subtensor_alloc | |
0.000s - local_scalar_tensor_scalar | |
0.000s - local_subtensor_merge | |
0.000s - local_subtensor_of_alloc | |
0.000s - local_subtensor_of_dot | |
0.000s - local_join_1 | |
0.000s - local_useless_split | |
0.000s - local_merge_alloc | |
0.000s - local_logsoftmax_grad | |
Global, final and clean up optimizers | |
Iter 0 | |
TopoOptimizer topo_constant_folding | |
nb_node (start, end, changed) (3332, 3331, 1) | |
init io_toposort 0.0324211120605 | |
loop time 0.334736824036 | |
callback_time 0.000298023223877 | |
Iter 1 | |
TopoOptimizer topo_constant_folding | |
nb_node (start, end, changed) (3329, 3329, 0) | |
init io_toposort 0.0332930088043 | |
loop time 0.00426387786865 | |
callback_time 0.0 | |
Iter 2 | |
TopoOptimizer topo_constant_folding | |
nb_node (start, end, changed) (3329, 3329, 0) | |
init io_toposort 0.0321750640869 | |
loop time 0.00389409065247 | |
callback_time 0.0 | |
0.627962s - ('merge1', 'MergeOptimizer', 0) - 0.030s | |
MergeOptimizer | |
nb fail= 0 merged= 3298 constant= 1703 | |
time replace=0.39 validate=0.03 callback=0.19 | |
0.546419s - ('stabilize', 'EquilibriumOptimizer', 6) - 0.000s | |
EquilibriumOptimizer stabilize | |
time 0.546s for 2 passes | |
nb nodes (start, end, max) 3333 3336 3336 | |
time io_toposort 0.065s | |
time in local optimizers 0.115s | |
time in global optimizers 0.269s | |
time in final optimizers 0.075s | |
time in cleanup optimizers 0.000s | |
0 - 0.373s 4 (0.271s in global opts, 0.032s io_toposort) - 3333 nodes - ('local_fill_to_alloc', 4) | |
1 - 0.173s 0 (0.074s in global opts, 0.033s io_toposort) - 3336 nodes - | |
times - times applied - nb node created - name: | |
0.003s - 4 - 7 - local_fill_to_alloc | |
0.457s - in 39 optimization that where not used (display only those with a runtime > 0) | |
0.269s - crossentropy_to_crossentropy_with_softmax_with_bias | |
0.075s - topo_constant_folding | |
0.039s - local_greedy_distributor | |
0.025s - local_one_minus_erf2 | |
0.024s - local_sigm_times_exp | |
0.010s - local_one_minus_erf | |
0.007s - local_useless_elemwise_comparison | |
0.007s - local_expm1 | |
0.000s - local_incsubtensor_of_zeros | |
0.000s - local_exp_over_1_plus_exp | |
0.000s - local_grad_log_erfc_neg | |
0.000s - local_0_dot_x | |
0.000s - local_setsubtensor_of_constants | |
0.000s - local_subtensor_of_dot | |
0.000s - local_useless_inc_subtensor_alloc | |
0.000s - local_useless_alloc | |
0.000s - local_merge_alloc | |
0.000s - local_useless_reshape | |
0.000s - local_reshape_lift | |
Global, final and clean up optimizers | |
Iter 0 | |
TopoOptimizer topo_constant_folding | |
nb_node (start, end, changed) (3336, 3336, 0) | |
init io_toposort 0.0341119766235 | |
loop time 0.00382590293884 | |
callback_time 0.0 | |
Iter 1 | |
TopoOptimizer topo_constant_folding | |
nb_node (start, end, changed) (3336, 3336, 0) | |
init io_toposort 0.0333940982819 | |
loop time 0.00389885902405 | |
callback_time 0.0 | |
0.511182s - ('elemwise_fusion', 'SeqOptimizer', 17) - 0.001s | |
SeqOptimizer elemwise_fusion time 0.511s for 2574/2420 nodes before/after optimization | |
0.126s for callback | |
0.001s for fgraph.validate() | |
0.398408s - ('composite_elemwise_fusion', 'FusionOptimizer', 1) - 0.001s | |
FusionOptimizer | |
nb_iter 2 | |
nb_replacement 95 | |
nb_inconsistency_replace 0 | |
validate_time 0.00113773345947 | |
callback_time 0.100536823273 | |
time_toposort 0.04758477211 | |
0.112761s - ('local_add_mul_fusion', 'FusionOptimizer', 0) - 0.000s | |
FusionOptimizer | |
nb_iter 3 | |
nb_replacement 18 | |
nb_inconsistency_replace 0 | |
validate_time 0.000210523605347 | |
callback_time 0.0254974365234 | |
time_toposort 0.0722908973694 | |
0.492368s - ('add_destroy_handler', 'AddDestroyHandler', 21) - 0.000s | |
0.481988s - ('scan_eqopt2', 'EquilibriumOptimizer', 9) - 0.000s | |
EquilibriumOptimizer scan_eqopt2 | |
time 0.482s for 1 passes | |
nb nodes (start, end, max) 3336 3336 3336 | |
time io_toposort 0.035s | |
time in local optimizers 0.000s | |
time in global optimizers 0.441s | |
time in final optimizers 0.000s | |
time in cleanup optimizers 0.000s | |
0 - 0.482s 0 (0.441s in global opts, 0.035s io_toposort) - 3336 nodes - | |
Global, final and clean up optimizers | |
Iter 0 | |
TopoOptimizer constant_folding_for_scan2 | |
nb_node (start, end, changed) (3336, 3336, 0) | |
init io_toposort 0.0319931507111 | |
loop time 0.00378704071045 | |
callback_time 0.0 | |
TopoOptimizer scanOp_remove_constants_and_unused_inputs1 | |
nb_node (start, end, changed) (3336, 3336, 0) | |
init io_toposort 0.0346179008484 | |
loop time 0.0035719871521 | |
callback_time 0.0 | |
TopoOptimizer scanop_remove_constants_and_unused_inputs2 | |
nb_node (start, end, changed) (3336, 3336, 0) | |
init io_toposort 0.0306630134583 | |
loop time 0.00346493721008 | |
callback_time 0.0 | |
TopoOptimizer scanOp_merge_inouts | |
nb_node (start, end, changed) (3336, 3336, 0) | |
init io_toposort 0.0324099063873 | |
loop time 0.0039119720459 | |
callback_time 0.0 | |
TopoOptimizer scanOp_remove_constants_and_unused_inputs3 | |
nb_node (start, end, changed) (3336, 3336, 0) | |
init io_toposort 0.0334548950195 | |
loop time 0.00352096557617 | |
callback_time 0.0 | |
0.445209s - ('scan_eqopt1', 'EquilibriumOptimizer', 2) - 0.000s | |
EquilibriumOptimizer scan_eqopt1 | |
time 0.445s for 1 passes | |
nb nodes (start, end, max) 4784 4784 4784 | |
time io_toposort 0.047s | |
time in local optimizers 0.000s | |
time in global optimizers 0.391s | |
time in final optimizers 0.000s | |
time in cleanup optimizers 0.000s | |
0 - 0.445s 0 (0.391s in global opts, 0.047s io_toposort) - 4784 nodes - | |
Global, final and clean up optimizers | |
Iter 0 | |
SeqOptimizer all_pushout_opt time 0.391s for 4784/4784 nodes before/after optimization | |
0.000s for callback | |
0.000s for fgraph.validate() | |
0.191856s - ('remove_constants_and_unused_inputs_scan', 'TopoOptimizer', 0) - 0.000s | |
TopoOptimizer scanOp_remove_constants_and_unused_inputs0 | |
nb_node (start, end, changed) (4784, 4784, 0) | |
init io_toposort 0.18671298027 | |
loop time 0.00507307052612 | |
callback_time 0.0 | |
0.051415s - ('scan_pushout_dot1', 'PushOutDot1', 3) - 0.000s | |
0.049522s - ('scanOp_pushout_seqs_ops', 'PushOutSeqScan', 2) - 0.000s | |
0.049322s - ('scanOp_pushout_output', 'PushOutScanOutput', 4) - 0.000s | |
0.048480s - ('scanOp_pushout_nonseqs_ops', 'PushOutNonSeqScan', 1) - 0.000s | |
0.397820s - ('ShapeOpt', 'ShapeOptimizer', 1) - 0.000s | |
0.288505s - ('BlasOpt', 'SeqOptimizer', 10) - 0.000s | |
SeqOptimizer BlasOpt time 0.288s for 3336/3336 nodes before/after optimization | |
0.001s for callback | |
0.000s for fgraph.validate() | |
0.083254s - ('gemm_optimizer', 'GemmOptimizer', 1) - 0.000s | |
GemmOptimizer | |
nb_iter 1 | |
nb_replacement 0 | |
nb_replacement_didn_t_remove 0 | |
nb_inconsistency_make 0 | |
nb_inconsistency_replace 0 | |
time_canonicalize 0.0285625457764 | |
time_factor_can 0 | |
time_factor_list 0 | |
time_toposort 0.0311839580536 | |
validate_time 0.0 | |
callback_time 0.0 | |
0.045156s - ('use_c_blas', 'TopoOptimizer', 4) - 0.000s | |
TopoOptimizer use_c_blas | |
nb_node (start, end, changed) (3336, 3336, 0) | |
init io_toposort 0.0328800678253 | |
loop time 0.0122079849243 | |
callback_time 0.0 | |
0.042374s - ('local_dot22_to_dot22scalar', 'TopoOptimizer', 2) - 0.000s | |
TopoOptimizer local_dot22_to_dot22scalar | |
nb_node (start, end, changed) (3336, 3336, 1) | |
init io_toposort 0.0323820114136 | |
loop time 0.00990796089172 | |
callback_time 0.000487804412842 | |
0.041280s - ('local_gemm_to_gemv', 'EquilibriumOptimizer', 3) - 0.000s | |
EquilibriumOptimizer local_gemm_to_gemv | |
time 0.041s for 1 passes | |
nb nodes (start, end, max) 3336 3336 3336 | |
time io_toposort 0.035s | |
time in local optimizers 0.000s | |
time in global optimizers 0.000s | |
time in final optimizers 0.000s | |
time in cleanup optimizers 0.000s | |
0 - 0.041s 0 (0.000s in global opts, 0.035s io_toposort) - 3336 nodes - | |
0.039106s - ('use_scipy_ger', 'TopoOptimizer', 5) - 0.000s | |
TopoOptimizer scipy_blas | |
nb_node (start, end, changed) (3336, 3336, 0) | |
init io_toposort 0.0348558425903 | |
loop time 0.00417590141296 | |
callback_time 0.0 | |
0.037307s - ('local_dot_to_dot22', 'TopoOptimizer', 0) - 0.000s | |
TopoOptimizer local_dot_to_dot22 | |
nb_node (start, end, changed) (3336, 3336, 3) | |
init io_toposort 0.0324900150299 | |
loop time 0.00476217269897 | |
callback_time 0.000386953353882 | |
0.207870s - ('local_IncSubtensor_serialize', 'TopoOptimizer', 3) - 0.000s | |
TopoOptimizer pre_local_IncSubtensor_serialize | |
nb_node (start, end, changed) (4784, 4784, 20) | |
init io_toposort 0.176945924759 | |
loop time 0.0308661460876 | |
callback_time 0.00447106361389 | |
0.167285s - ('gpu_after_fusion', 'SeqOptimizer', 20) - 0.000s | |
SeqOptimizer gpu_after_fusion time 0.167s for 1760/1760 nodes before/after optimization | |
0.001s for callback | |
0.000s for fgraph.validate() | |
0.083829s - ('gpu_local_optimizations', 'EquilibriumOptimizer', 1) - 0.000s | |
EquilibriumOptimizer gpu_local_optimizations | |
time 0.083s for 1 passes | |
nb nodes (start, end, max) 1762 1762 1762 | |
time io_toposort 0.028s | |
time in local optimizers 0.040s | |
time in global optimizers 0.000s | |
time in final optimizers 0.000s | |
time in cleanup optimizers 0.000s | |
0 - 0.083s 0 (0.000s in global opts, 0.028s io_toposort) - 1762 nodes - | |
Global, final and clean up optimizers | |
Iter 0 | |
MergeOptimizer | |
nb fail= 0 merged= 0 constant= 0 | |
time replace=0.00 validate=0.00 callback=0.00 | |
0.082261s - ('gpu_cut_transfers', 'EquilibriumOptimizer', 2) - 0.000s | |
EquilibriumOptimizer gpu_cut_transfers | |
time 0.082s for 2 passes | |
nb nodes (start, end, max) 1762 1760 1762 | |
time io_toposort 0.055s | |
time in local optimizers 0.010s | |
time in global optimizers 0.000s | |
time in final optimizers 0.000s | |
time in cleanup optimizers 0.000s | |
0 - 0.042s 2 (0.000s in global opts, 0.028s io_toposort) - 1762 nodes - ('local_cut_gpu_host_gpu', 2) | |
1 - 0.040s 0 (0.000s in global opts, 0.027s io_toposort) - 1760 nodes - | |
times - times applied - nb node created - name: | |
0.001s - 2 - 0 - local_cut_gpu_host_gpu | |
0.008s - in 1 optimization that where not used (display only those with a runtime > 0) | |
0.008s - constant_folding | |
0.001180s - ('InputToGpuOptimizer', 'InputToGpuOptimizer', 0) - 0.000s | |
0.134150s - ('local_inplace_setsubtensor', 'TopoOptimizer', 26) - 0.104s | |
TopoOptimizer local_inplace_setsubtensor | |
nb_node (start, end, changed) (1760, 1760, 13) | |
init io_toposort 0.0167520046234 | |
loop time 0.117291927338 | |
callback_time 0.108938455582 | |
0.062133s - ('inplace_elemwise_optimizer', 'FromFunctionOptimizer', 41) - 0.000s | |
0.058811s - ('scanOp_make_inplace', 'ScanInplaceOptimizer', 42) - 0.000s | |
0.042339s - ('local_elemwise_alloc', 'TopoOptimizer', 8) - 0.000s | |
TopoOptimizer local_elemwise_alloc | |
nb_node (start, end, changed) (3336, 3336, 0) | |
init io_toposort 0.032870054245 | |
loop time 0.00939798355103 | |
callback_time 0.0 | |
0.040328s - ('gpu_scanOp_make_inplace', 'ScanInplaceOptimizer', 39) - 0.000s | |
0.038766s - ('uncanonicalize', 'EquilibriumOptimizer', 13) - 0.000s | |
EquilibriumOptimizer uncanonicalize | |
time 0.039s for 1 passes | |
nb nodes (start, end, max) 3329 3329 3329 | |
time io_toposort 0.033s | |
time in local optimizers 0.000s | |
time in global optimizers 0.000s | |
time in final optimizers 0.000s | |
time in cleanup optimizers 0.000s | |
0 - 0.039s 0 (0.000s in global opts, 0.033s io_toposort) - 3329 nodes - | |
0.038547s - ('local_fill_to_alloc', 'TopoOptimizer', 7) - 0.000s | |
TopoOptimizer local_fill_to_alloc | |
nb_node (start, end, changed) (3336, 3336, 0) | |
init io_toposort 0.0312049388885 | |
loop time 0.0072660446167 | |
callback_time 0.0 | |
0.037259s - ('specialize_device', 'EquilibriumOptimizer', 15) - 0.000s | |
EquilibriumOptimizer specialize_device | |
time 0.037s for 1 passes | |
nb nodes (start, end, max) 2574 2574 2574 | |
time io_toposort 0.026s | |
time in local optimizers 0.006s | |
time in global optimizers 0.000s | |
time in final optimizers 0.000s | |
time in cleanup optimizers 0.000s | |
0 - 0.037s 0 (0.000s in global opts, 0.026s io_toposort) - 2574 nodes - | |
0.032412s - ('crossentropy_to_crossentropy_with_softmax', 'FromFunctionOptimizer', 12) - 0.000s | |
0.029169s - ('AbstractConvCheck', 'TopoOptimizer', 16) - 0.000s | |
TopoOptimizer AbstractConvCheck | |
nb_node (start, end, changed) (2574, 2574, 0) | |
init io_toposort 0.0260169506073 | |
loop time 0.00309085845947 | |
callback_time 0.0 | |
0.027419s - ('gpua_elemwise_fusion', 'FusionOptimizer', 37) - 0.000s | |
FusionOptimizer | |
nb_iter 1 | |
nb_replacement 0 | |
nb_inconsistency_replace 0 | |
validate_time 0.0 | |
callback_time 0.0 | |
time_toposort 0.0257840156555 | |
0.021753s - ('InplaceGpuBlasOpt', 'TopoOptimizer', 30) - 0.000s | |
TopoOptimizer InplaceGpuBlasOpt | |
nb_node (start, end, changed) (1760, 1760, 0) | |
init io_toposort 0.016930103302 | |
loop time 0.00467014312744 | |
callback_time 0.0 | |
0.020725s - ('local_dnna_conv_inplace', 'TopoOptimizer', 34) - 0.000s | |
TopoOptimizer local_dnna_conv_inplace | |
nb_node (start, end, changed) (1788, 1788, 0) | |
init io_toposort 0.0175051689148 | |
loop time 0.00305318832397 | |
callback_time 0.0 | |
0.020469s - ('blas_opt_inplace', 'TopoOptimizer', 29) - 0.000s | |
TopoOptimizer InplaceBlasOpt | |
nb_node (start, end, changed) (1760, 1760, 0) | |
init io_toposort 0.0158088207245 | |
loop time 0.00451397895813 | |
callback_time 0.0 | |
0.020260s - ('local_inplace_gpu_sparse_block_gemv', 'TopoOptimizer', 23) - 0.000s | |
TopoOptimizer local_inplace_gpu_sparse_block_gemv | |
nb_node (start, end, changed) (1760, 1760, 0) | |
init io_toposort 0.0183529853821 | |
loop time 0.00179409980774 | |
callback_time 0.0 | |
0.020023s - ('random_make_inplace', 'TopoOptimizer', 45) - 0.000s | |
TopoOptimizer random_make_inplace | |
nb_node (start, end, changed) (1788, 1788, 0) | |
init io_toposort 0.0180921554565 | |
loop time 0.00187206268311 | |
callback_time 0.0 | |
0.019779s - ('make_ger_destructive', 'TopoOptimizer', 36) - 0.000s | |
TopoOptimizer make_scipy_blas_destructive | |
nb_node (start, end, changed) (1788, 1788, 0) | |
init io_toposort 0.0174670219421 | |
loop time 0.00225496292114 | |
callback_time 0.0 | |
0.019312s - ('local_inplace_incsubtensor1', 'TopoOptimizer', 25) - 0.000s | |
TopoOptimizer local_inplace_incsubtensor1 | |
nb_node (start, end, changed) (1760, 1760, 0) | |
init io_toposort 0.0174140930176 | |
loop time 0.00179100036621 | |
callback_time 0.0 | |
0.019285s - ('dimshuffle_as_view', 'TopoOptimizer', 22) - 0.000s | |
TopoOptimizer dimshuffle_as_view | |
nb_node (start, end, changed) (1760, 1760, 0) | |
init io_toposort 0.0174798965454 | |
loop time 0.00175309181213 | |
callback_time 0.0 | |
0.019202s - ('gpuablas_opt_inplace', 'TopoOptimizer', 31) - 0.000s | |
TopoOptimizer InplaceGpuaBlasOpt | |
nb_node (start, end, changed) (1760, 1760, 0) | |
init io_toposort 0.0163221359253 | |
loop time 0.00271797180176 | |
callback_time 0.0 | |
0.019028s - ('local_inplace_sparse_block_gemv', 'TopoOptimizer', 27) - 0.000s | |
TopoOptimizer local_inplace_sparse_block_gemv | |
nb_node (start, end, changed) (1760, 1760, 0) | |
init io_toposort 0.0169730186462 | |
loop time 0.00194787979126 | |
callback_time 0.0 | |
0.018978s - ('local_gemm16_inplace', 'TopoOptimizer', 35) - 0.000s | |
TopoOptimizer local_gemm16_inplace | |
nb_node (start, end, changed) (1788, 1788, 0) | |
init io_toposort 0.0172350406647 | |
loop time 0.00164389610291 | |
callback_time 0.0 | |
0.018874s - ('c_blas_destructive', 'TopoOptimizer', 32) - 0.000s | |
TopoOptimizer c_blas_destructive | |
nb_node (start, end, changed) (1760, 1760, 0) | |
init io_toposort 0.0154929161072 | |
loop time 0.00330901145935 | |
callback_time 0.0 | |
0.018569s - ('local_inplace_gpu_sparse_block_outer', 'TopoOptimizer', 24) - 0.000s | |
TopoOptimizer local_inplace_gpu_sparse_block_outer | |
nb_node (start, end, changed) (1760, 1760, 0) | |
init io_toposort 0.0166771411896 | |
loop time 0.00178098678589 | |
callback_time 0.0 | |
0.018206s - ('cond_make_inplace', 'TopoOptimizer', 43) - 0.000s | |
TopoOptimizer cond_make_inplace | |
nb_node (start, end, changed) (1788, 1788, 0) | |
init io_toposort 0.0165610313416 | |
loop time 0.00158500671387 | |
callback_time 0.0 | |
0.018179s - ('local_inplace_sparse_block_outer', 'TopoOptimizer', 28) - 0.000s | |
TopoOptimizer local_inplace_sparse_block_outer | |
nb_node (start, end, changed) (1760, 1760, 0) | |
init io_toposort 0.0165140628815 | |
loop time 0.00155401229858 | |
callback_time 0.0 | |
0.018010s - ('mrg_random_make_inplace', 'TopoOptimizer', 46) - 0.000s | |
TopoOptimizer random_make_inplace_mrg | |
nb_node (start, end, changed) (1788, 1788, 0) | |
init io_toposort 0.0160892009735 | |
loop time 0.00187706947327 | |
callback_time 0.0 | |
0.017756s - ('local_destructive', 'TopoOptimizer', 44) - 0.000s | |
TopoOptimizer CURAND_destructive | |
nb_node (start, end, changed) (1788, 1788, 0) | |
init io_toposort 0.0161859989166 | |
loop time 0.00152897834778 | |
callback_time 0.0 | |
0.017310s - ('inplace_elemwise_optimizer', 'FromFunctionOptimizer', 40) - 0.000s | |
0.000910s - ('merge2', 'MergeOptimizer', 19) - 0.000s | |
MergeOptimizer | |
nb fail= 0 merged= 3 constant= 0 | |
time replace=0.00 validate=0.00 callback=0.00 | |
0.000725s - ('merge3', 'MergeOptimizer', 47) - 0.000s | |
MergeOptimizer | |
nb fail= 0 merged= 0 constant= 0 | |
time replace=0.00 validate=0.00 callback=0.00 | |
0.000049s - ('merge1.2', 'MergeOptimizer', 5) - 0.000s | |
MergeOptimizer | |
nb fail= 0 merged= 0 constant= 0 | |
time replace=0.00 validate=0.00 callback=0.00 | |
Here are tips to potentially make your code run faster | |
(if you think of new ones, suggest them on the mailing list). | |
Test them first, as they are not guaranteed to always provide a speedup. | |
Sorry, no tip for today. | |
start compile | |
(<theano.gof.toolbox.PreserveVariableAttributes object at 0x7f71e492a1d0>, 0.027862071990966797) | |
(<theano.compile.function_module.Supervisor instance at 0x7f71e45cef38>, 0.0343480110168457) | |
(<theano.gof.opt.ChangeTracker instance at 0x7f71e4dadfc8>, 0.03467535972595215) | |
(<theano.gof.toolbox.ReplaceValidate object at 0x7f71e54b9ad0>, 0.045465946197509766) | |
(<theano.tensor.opt.ShapeFeature object at 0x7f71e4af1050>, 0.1717238426208496) | |
(<theano.gof.opt.MergeFeature object at 0x7f71e45ded50>, 0.5850560665130615) | |
(<theano.gof.opt.Updater instance at 0x7f71e4858200>, 1.2420501708984375) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment