mobile-deep-learning

  Modern framework ported to the mobile side is not a problem, caffe2 less than 1MB in support of facebook application layer required and full protobuf cases, the core is probably more than 100 kb, consistent look and mdl.

  The most important issue is how to optimize the mobile terminal. The vast majority math library is optimized for the server to do or GPU, the mobile terminal can run but run faster.

  It is important point to consider:

  1. Is there a CPU optimization? Whether NEON? Whether there is thread pool mobile design? Are there parameters to make adjustments over (thread number) for the CPU framework?

  2, whether to adopt the algorithm correct path, such as using Winograd convolution to do, to do special optimization for small matrix (similar xsmm) and so on?

  3, whether there are effective kernel implementation on the GPU? Such as OpenCL / GL, Metal and so on.

  4, if there is a very useful function in the mobile, such as reduced precision.

  

  How deep learning system optimization heavyweight and lightweight deployment balanced integration?

  Compiler optimization and execution end of the separator, direct memory allocation optimization deployment end, directly to the preservation solution. End no longer perform memory allocation algorithm, can be directly executed in accordance with the stored program. Required for packing in the compiler optimization op, discard unnecessary op, such that the end of the implementation of more lightweight.

Guess you like

Origin www.cnblogs.com/jianfeifeng/p/11040694.html