Technical Solutions¶

中文版本

TNN is a high-performance and lightweight inference framework for mobile devices. It provides lots of advanced features such as cross-platform, model-compression, and code-pruning. TNN, inspired by mainstream open-source industry frameworks, integrates and leverages Youtu Lab’s Rapidnet, ncnn framework. It also combines the efforts of the deep-learning framework Oteam from all departments(PCG, TEG, IEG) to create an enterprise-level mobile inference engine. At present, TNN has been launched in various major businesses, and its following characteristics have been widely praised.

Computation optimization
- The backend operators are primely optimized to make the best use of computing power in different architectures, regarding instruction issue, throughput, delay, cache bandwidth, cache delay, registers, etc..
- The TNN performance on mainstream hardware platforms (CPU: ARMv7, ARMv8, GPU: Mali, Adreno, Apple) has been greatly tuned and improved.
- The convolution function is implemented by various algorithms such as Winograd, Tile-GEMM, Direct Conv, etc., to ensure efficiency under different parameters and sizes.
- Op fusion: TNN can do offline analysis of network graph, fuse multiple simple operations and reduce overhead such as redundant memory access and kernel startup cost.
Low precision computation acceleration
- TNN supports INT8/FP16 mode, reduces model size & memory consumption, and utilizes specific hardware low-precision instructions to accelerate calculations.
- TNN supports INT8 WINOGRAD algorithm, (input 6bit), further reduces the model calculation complexity without sacrificing the accuracy.
- TNN supports mixed-precision data in one model, speeding up the model’s calculation speed while preserving its accuracy.
Memory optimization
- Efficient “memory pool” implementation: Based on a full network DAG analysis, the implementation reuses memory between non-dependent nodes which reduces memory cost by 90%.
- Cross-model memory reduces: This supports external real-time design for network memory so that multiple models can share mutual memory.
Performance comparison among mainstream models: TNN outperforms other mainstream open-source mobile high-performance frameworks.

Kirin970：

model	cpu 1 thread(ms)	gpu time(ms)
Mobilenet_v1	88	12
Mobilenet_v1_int8	55
Mobilenet_v2	58	11
Mobilenet_v2_int8	41
squeezenet_v1.0	127	20
squeezenet_v1.0_int8	82

Snapdragon 835：

model	cpu 1 thread(ms)	gpu time(ms)
Mobilenet_v1	94	16
Mobilenet_v1_int8	62
Mobilenet_v2	61	14
Mobilenet_v2_int8	47
squeezenet_v1.0	122	28
squeezenet_v1.0_int8	93

Snapdragon 845：

model	cpu 1 thread(ms)	gpu time(ms)
Mobilenet_v1	60	10
Mobilenet_v1_int8	37
Mobilenet_v2	39	8
Mobilenet_v2_int8	28
squeezenet_v1.0	74	14
squeezenet_v1.0_int8	56

TNN Architecture Diagram：¶

TNN supports TensorFlow, Pytorch, MxNet, Caffe, and other training frameworks through ONNX, leveraging the continuous improvement of the ONNX open-source society. Currently, TNN supports 55 ONNX operators and will be developed to cover 80 operators shortly, consisting of most of the mainstream CNN operators needed.
TNN runs on mainstream operating systems (Android, iOS, embedded Linux, Windows), and is compatible with ARM CPU, GPU hardware platform (Da Vinci NPU will be supported soon)
TNN is constructed through Modular Design, which abstracts and isolates components such as model analysis, graph construction, graph optimization, low-level hardware adaptation, and high-performance kernel. It uses “Factory Mode” to register and build devices, that tries to minimize the cost of supporting more hardware and acceleration solutions.
TNN’s running time does not rely on any third-party libraries. The size of the CPU dynamic library is only around 400KB, and it provides basic image conversion operations, which are light-weight and convenient. TNN uses unified models and interfaces across platforms and can switch easily by configuring just one single parameter.