Technical Solutions

中文版本

TNN is a high-performance and lightweight inference framework for mobile devices. It provides lots of advanced features such as cross-platform, model-compression, and code-pruning. TNN, inspired by mainstream open-source industry frameworks, integrates and leverages Youtu Lab’s Rapidnet, ncnn framework. It also combines the efforts of the deep-learning framework Oteam from all departments(PCG, TEG, IEG) to create an enterprise-level mobile inference engine. At present, TNN has been launched in various major businesses, and its following characteristics have been widely praised.

  • Computation optimization
    • The backend operators are primely optimized to make the best use of computing power in different architectures, regarding instruction issue, throughput, delay, cache bandwidth, cache delay, registers, etc..
    • The TNN performance on mainstream hardware platforms (CPU: ARMv7, ARMv8, GPU: Mali, Adreno, Apple) has been greatly tuned and improved.
    • The convolution function is implemented by various algorithms such as Winograd, Tile-GEMM, Direct Conv, etc., to ensure efficiency under different parameters and sizes.
    • Op fusion: TNN can do offline analysis of network graph, fuse multiple simple operations and reduce overhead such as redundant memory access and kernel startup cost.
  • Low precision computation acceleration
    • TNN supports INT8/FP16 mode, reduces model size & memory consumption, and utilizes specific hardware low-precision instructions to accelerate calculations.
    • TNN supports INT8 WINOGRAD algorithm, (input 6bit), further reduces the model calculation complexity without sacrificing the accuracy.
    • TNN supports mixed-precision data in one model, speeding up the model’s calculation speed while preserving its accuracy.
  • Memory optimization
    • Efficient “memory pool” implementation: Based on a full network DAG analysis, the implementation reuses memory between non-dependent nodes which reduces memory cost by 90%.
    • Cross-model memory reduces: This supports external real-time design for network memory so that multiple models can share mutual memory.
  • Performance comparison among mainstream models: TNN outperforms other mainstream open-source mobile high-performance frameworks.
Kirin970:
model cpu 1 thread(ms) gpu time(ms)
Mobilenet_v1 88 12
Mobilenet_v1_int8 55
Mobilenet_v2 58 11
Mobilenet_v2_int8 41
squeezenet_v1.0 127 20
squeezenet_v1.0_int8 82
Snapdragon 835:
model cpu 1 thread(ms) gpu time(ms)
Mobilenet_v1 94 16
Mobilenet_v1_int8 62
Mobilenet_v2 61 14
Mobilenet_v2_int8 47
squeezenet_v1.0 122 28
squeezenet_v1.0_int8 93
Snapdragon 845:
model cpu 1 thread(ms) gpu time(ms)
Mobilenet_v1 60 10
Mobilenet_v1_int8 37
Mobilenet_v2 39 8
Mobilenet_v2_int8 28
squeezenet_v1.0 74 14
squeezenet_v1.0_int8 56

TNN Architecture Diagram:

  • TNN supports TensorFlow, Pytorch, MxNet, Caffe, and other training frameworks through ONNX, leveraging the continuous improvement of the ONNX open-source society. Currently, TNN supports 55 ONNX operators and will be developed to cover 80 operators shortly, consisting of most of the mainstream CNN operators needed.
  • TNN runs on mainstream operating systems (Android, iOS, embedded Linux, Windows), and is compatible with ARM CPU, GPU hardware platform (Da Vinci NPU will be supported soon)
  • TNN is constructed through Modular Design, which abstracts and isolates components such as model analysis, graph construction, graph optimization, low-level hardware adaptation, and high-performance kernel. It uses “Factory Mode” to register and build devices, that tries to minimize the cost of supporting more hardware and acceleration solutions.
  • TNN’s running time does not rely on any third-party libraries. The size of the CPU dynamic library is only around 400KB, and it provides basic image conversion operations, which are light-weight and convenient. TNN uses unified models and interfaces across platforms and can switch easily by configuring just one single parameter.