ARM Compute Library

ARM Compute Libraray
https://developer.arm.com/technologies/compute-library
ARMが公開した画像処理およびCNNライブラリ
Linux / Android / Bare Metalで利用可能　
2017.04.01(土)
@Vengineer

クロスコンパイラの用意
AArch64 : arm64-v8a
gcc-linaro-5.3-2016.02-x86_64_aarch64-linux-gnu
ARM : armv7a
aro/gcc-linaro-5.3-2016.02-x86_64_arm-linux-gnueabihf

OpenCL対応
libOpenCL.so がGPU(ARM Mali)をサポートしているときのみ利用可能
この資料では、NEONのみについて説明します

画像処理関連
　・Basic arithmetic, mathematical and binary operator functions
　・Colour manipulation (conversion, channel extraction, and more)
　・Convolution filters (Sobel, Gaussian, and more)
　・Canny Edge, Harris corners, optical flow and more
　・Pyramids (such as Laplacians)
　・HOG (Histogram of Oriented Gradients)
　・SVM (Support Vector Machines)
　・H/SGEMM (Half and Single precision General Matrix Multiply)

Convolutional Neural Networks関連
　・Activation
　・Convolution
　・Fully connected
　・Locally connected
　・Normalization
　・Pooling
　・Soft-max

サンプルコード：scale (NEON)
PPMLoader ppm; ppmファイル
Image src, dst; イメージバッファ
ppm.open(argv[1]); ファイルオープン
ppm.init_image(src, Format::U8); イメージ読み込み
constexpr int scale_factor = 2;
TensorInfo dst_tensor_info( 入力テンソル情報
src.info()->dimension(0) / scale_factor,
src.info()->dimension(1) / scale_factor,
Format::U8);

サンプルコード：scale (NEON)
dst.allocator()->init(dst_tensor_info); 初期化
NEScale scale; スケール
scale.configure(&src, &dst, コンフィギュレーション
InterpolationPolicy::NEAREST_NEIGHBOR,
BorderMode::UNDEFINED);
src.allocator()->allocate(); メモリ割当て
dst.allocator()->allocate(); メモリ割当て
scale.run(); 実行

サンプルコード：convolution (NEON)
PPMLoader ppm; ppmファイル
Image src, tmp, dst; イメージバッファ
ppm.open(argv[1]); ファイルオープン
ppm.init_image(src, Format::U8); イメージ読み込み
tmp.allocator()->init(*src.info()); 初期化
dst.allocator()->init(*src.info()); 初期化
NEConvolution3x3 conv3x3; 3x3 Convolution
NEConvolution5x5 conv5x5; 5x5 Convolution

サンプルコード：convolution (NEON)
conv3x3.configure(&src, &tmp, コンフィギュレーション
gaussian3x3, 0, BorderMode::UNDEFINED);
conv5x5.configure(&tmp, &dst, コンフィギュレーション
gaussian5x5, 0, BorderMode::UNDEFINED);
src.allocator()->allocate(); メモリ割当て
tmp.allocator()->allocate(); メモリ割当て
dst.allocator()->allocate(); メモリ割当て
conv3x3.run(); 実行
conv5x5.run(); 実行

スケジューラ
arm_compute/runtime/NEON/CPPScheduler.h
arm_compute/runtime/NEON/NEScheduler.h
namespace arm_compute
{
using NEScheduler = CPPScheduler;
}
NEScheduler は、CPPScheduler と同じ

multithread(スレッド無し)
void CPPScheduler::multithread(ICPPKernel *kernel, const size_t split_dimension)
{
const Window &max_window = kernel->window();
const int num_iterations = max_window.num_iterations(split_dimension);
int num_threads = std::min(num_iterations, _num_threads);
if(!kernel->is_parallelisable() || 1 == num_threads)
{
kernel->run(max_window);
}
}

multithread (スレッド有り)
for(int t = 0; t < num_threads; ++t)
{
Window win = max_window.split_window(split_dimension, t, num_threads);
win.set_thread_id(t);
win.set_num_threads(num_threads);
if(t != num_threads - 1)
{
_threads[t].start(kernel, win);
}
else
{
kernel->run(win);
}
}

サンプルカーネル：NEScaleKernel
void NEScaleKernel::run(const Window &window)
{
ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(this);
ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(INEKernel::window(), window);
ARM_COMPUTE_ERROR_ON(_func == nullptr);
(this->*_func)(window);
}
_func = &NEScaleKernel::scale_nearest;
_func = &NEScaleKernel::scale_bilinear;
_func = &NEScaleKernel::scale_area;

ARM Compute Library

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to ARM Compute Library (20)

More from Mr. Vengineer (20)

ARM Compute Library