Efficient AI Model Inference with Paddle Lite on JM9230 GPU

Introduction

The JM9230, developed by Jingjia Micro, is a domestic 32-bit single-precision floating-point GPU with a maximum computational power of 1.2T FLOPS. Paddle Lite, created by Baidu, is a high-performance, lightweight deep learning inference engine designed specifically for mobile and embedded devices. The integration of Paddle Lite with the JM9230 GPU offers enhanced opportunities for operator scheduling of AI models on domestic platforms. The embedded laboratory team, comprising Hu Yuhao, Wang Yuan, Liu Yan, and Xie Guoqi, has successfully completed the adaptation of the Paddle Lite inference engine to support the JM9230 GPU, enabling AI model inference computations with successful verification.

![Image](C:\Users\galtj\OneDrive\Codes\wechat-markdown\articles/Paddle Lite适配景嘉微JM9230 GPU,实现AI模型推理计算/PaddleLite适配景嘉微JM9230GPU实现AI模型推理计算1.jpg)

Main Content

01

Overview of JM9230 GPU

The JM9230 graphics card is powered by a self-designed, high-performance domestic GPU. It supports four independent display outputs, multi-screen simultaneous output, four-channel video decoding, and one video encoding channel. The card is compatible with graphics programming interfaces such as OpenGL 4.0 and Vulkan 1.1, as well as the OpenCL 3.0 computing programming interface. It also accommodates four channels of 4K@60fps HDMI 2.0 external video input. The JM9230 is fully compatible with domestic CPUs, operating systems, and firmware, making it suitable for a wide range of computing devices, including PCs, servers, and graphic workstations. It meets high-performance display and AI computation needs in areas such as geographic information systems, 3D surveying, 3D mapping, media processing, assisted design, and rendering.

![Image](data:image/svg+xml,%3C%3Fxml version='1.0' encoding='UTF-8'%3F%3E%3Csvg width='1px' height='1px' viewBox='0 0 1 1' version='1.1' xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink'%3E%3Ctitle%3E%3C/title%3E%3Cg stroke='none' stroke-width='1' fill='none' fill-rule='evenodd' fill-opacity='0'%3E%3Cg transform='translate(-249.000000, -126.000000)' fill='%23FFFFFF'%3E%3Crect x='249' y='126' width='1' height='1'%3E%3C/rect%3E%3C/g%3E%3C/g%3E%3C/svg%3E)

Overview of Baidu Paddle Lite Framework

Paddle Lite is a high-performance, lightweight deep learning inference engine developed by Baidu, specifically designed for mobile, embedded, and IoT devices. It supports various hardware platforms, including ARM, x86, MIPS, and RISC-V, and can operate on Android, iOS, and Linux operating systems. Paddle Lite offers a comprehensive suite of model optimization and acceleration techniques, such as quantization, pruning, and subgraph partitioning, all aimed at enhancing the inference speed and efficiency of AI models while minimizing resource consumption. As part of Baidu's PaddlePaddle open-source deep learning platform, Paddle Lite seamlessly integrates with PaddlePaddle, providing developers with end-to-end AI solutions.

![Image](data:image/svg+xml,%3C%3Fxml version='1.0' encoding='UTF-8'%3F%3E%3Csvg width='1px' height='1px' viewBox='0 0 1 1' version='1.1' xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink'%3E%3Ctitle%3E%3C/title%3E%3Cg stroke='none' stroke-width='1' fill='none' fill-rule='evenodd' fill-opacity='0'%3E%3Cg transform='translate(-249.000000, -126.000000)' fill='%23FFFFFF'%3E%3Crect x='249' y='126' width='1' height='1'%3E%3C/rect%3E%3C/g%3E%3C/g%3E%3C/svg%3E)

02

Environment Setup

System Specifications: Feiteng S2500 CPU, Kylin V10 SP3 Operating System

2.1 Installing JM9230 Kernel Driver and Application Libraries

Note: Install the RPM package first, then insert the graphics card into the PCIe slot.

Required Installation Packages:

  • Kernel Driver RPM Package: mwv207-dkms-1.5.0-release.ky10.aarch64.rpm
  • Application Library RPM Package: mwv207-dev-1.5.0-release.ky10.aarch64.rpm

Installation Commands:

$ rpm -ivh mwv207-dkms-1.5.0-release.ky10.aarch64.rpm
$ rpm -ivh mwv207-dev-1.5.0-release.ky10.aarch64.rpm

2.2 Compiling and Installing Paddle Lite

Paddle Lite Version: v2.12

Reference Tutorial:
OpenCL Guide

Clone Paddle Lite Source Code:

$ git clone https://github.com/PaddlePaddle/Paddle-Lite.git -b v2.12

Delete the Third-Party Library Directory:

$ rm -rf third-party

Compile and Generate ARM64 Deployment Library:

$ ./lite/tools/build_linux.sh --arch=armv8 --with_extra=ON --with_cv=ON --with_exception=ON --with_opencl=ON full_publish

Compilation Note:
Paddle Lite defaults to using 4 cores for compilation. It's advisable to set the number of CPU cores to be utilized:

$ export LITE_BUILD_THREADS=$(CPU_NUMs)

Then proceed with the compilation.

03

Model Inference & Result Demonstration

3.1 Downloading Paddle Lite Demo

Execute the following command in the Paddle-Lite source directory:

$ wget https://paddlelite-demo.bj.bcebos.com/devices/generic/PaddleLite-generic-demo_v2_12_0.tar.gz

Extract the Files:

$ tar -xvf PaddleLite-generic-demo_v2_12_0.tar.gz

3.2 Replacing the Compiled Paddle Lite Library

Execute the following commands to replace the necessary files:

# Replace include directory
$ cp -rf build.lite.android.armv8.gcc/inference_lite_lib.android.armv8.opencl/cxx/include/ PaddleLite-generic-demo/libs/PaddleLite/android/arm64-v8a/include/

# Replace libpaddle_light_api_shared.so
$ cp -rf build.lite.android.armv8.gcc/inference_lite_lib.android.armv8.opencl/cxx/lib/libpaddle_light_api_shared.so PaddleLite-generic-demo/libs/PaddleLite/android/arm64-v8a/lib/opencl/

# Replace libpaddle_full_api_shared.so
$ cp -rf build.lite.android.armv8.gcc/inference_lite_lib.android.armv8.opencl/cxx/lib/libpaddle_full_api_shared.so PaddleLite-generic-demo/libs/PaddleLite/android/arm64-v8a/lib/opencl/

3.3 Running the Inference Model

Change to the model demo directory:

$ cd PaddleLite-generic-demo/image_classification_demo/shell/

Recompile the demo:

$ ./build.sh linux arm64

Use the JM9230 GPU to perform inference with the ResNet50 model:

$ ./run.sh resnet50_fp32_224 imagenet_224.txt test linux arm64 opencl

Inference Result Demonstration:

![Image](data:image/svg+xml,%3C%3Fxml version='1.0' encoding='UTF-8'%3F%3E%3Csvg width='1px' height='1px' viewBox='0 0 1 1' version='1.1' xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink'%3E%3Ctitle%3E%3C/title%3E%3Cg stroke='none' stroke-width='1' fill='none' fill-rule='evenodd' fill-opacity='0'%3E%3Cg transform='translate(-249.000000, -126.000000)' fill='%23FFFFFF'%3E%3Crect x='249' y='126' width='1' height='1'%3E%3C/rect%3E%3C/g%3E%3C/g%3E%3C/svg%3E)

3.4 Important Considerations

When using Paddle Lite with OpenCL + GPU for model inference, it's crucial to pay attention to the dynamic library search paths. For server versions of Linux, the dynamic library path is /usr/lib64, thus the installation path for the Jingjia Micro GPU application library is /usr/lib64/mwv207. The search path for dynamic libraries in Paddle Lite on desktop versions takes precedence over that of server versions at /usr/lib/aarch64-linux-gnu. It is recommended to create a symbolic link for the OpenCL library in the appropriate path:

$ ln -s /usr/lib64/mwv207/libOpenCL.so.3.0.0 /usr/lib/aarch64-linux-gnu/libOpenCL.so

Conclusion

The Embedded Laboratory at Hunan University has successfully achieved the adaptation of Baidu Paddle Lite with the Jingjia Micro JM9230 GPU on a Feiteng S2500 CPU and Kylin V10 operating system. This innovative practice effectively combines heterogeneous computing resources to leverage the computational advantages of various hardware components, resulting in efficient and flexible AI model operator scheduling. It provides a viable technical solution for running AI applications on domestic heterogeneous computing platforms, showcasing its effectiveness in performance optimization and utilization of heterogeneous computing resources.