Open Neural Network Exchange

ONNX Runtime
開発元	Microsoft
初版	2018年11月30日 (6年前)
最新版	v1.16.1 / 2023年10月12日
リポジトリ	github.com/microsoft/onnxruntime
プログラミング; 言語	Python, C++, C#, C言語, Java, JavaScript, Objective-C, WinRT
対応OS	Windows, Linux, macOS, Android, iOS, ウェブブラウザ
プラットフォーム	x86-64, x86, ARM64, ARM32, IBM Power
ライセンス	MIT License
公式サイト	onnxruntime.ai
	テンプレートを表示

Open Neural Network Exchange (ONNX)
開発元	Facebook, Microsoft
初版	2017年9月7日 (7年前)
リポジトリ	github.com/onnx/onnx
種別	人工知能機械学習
ライセンス	Apache License 2.0
公式サイト	onnx.ai
	テンプレートを表示

Open Neural Network Exchange（略称：ONNX）とは、オープンソースで開発されている機械学習や人工知能のモデルを表現する為の代表的なフォーマットである^[1]。実行エンジンとしてONNX Runtimeも開発されている。

概要

機械学習、特にニューラルネットワークモデルは様々なフレームワーク上で学習され、また様々なハードウェア上で実行（推論）される。各環境に特化したモデルは他のフレームワーク・ハードウェアで利用できず相互運用性を欠いてしまう。また実装者は環境ごとにサポートをおこなう必要があり大きな労力を必要とする。

ONNXはモデルを記述する統一インターフェース（フォーマット）を提供し、これらの問題を解決する。各フレームワークは学習したモデルをONNX形式で出力する。各ハードウェアはONNX実行環境を提供することで、どのフレームワークで学習されたかを問わずモデル推論を実行する。このように相互運用可能なモデルフォーマットとしてONNXは開発されている。

2017年に開発が開始された。

開発背景

以下の特性を補完する意図にて開発が進められた。

フレームワークの相互運用性

開発工程や機械学習の高速処理、ネットワークの基本設計における柔軟性やモバイルデバイスでの推論などの特定の段階において、開発者が複数のフレームワークでのデータのやり取りを簡単に行えるようにする^[2]。

最適化の共有

ハードウェアベンダーなどは、ONNXを対象に調整を行うことで、複数のフレームワークにおけるニューラルネットワークのパフォーマンスを一度に改善することができる^[2]。

沿革

2017年9月に、FacebookとMicrosoftは、PyTorchやCaffe2などの機械学習フレームワーク間において相互運用を可能にする為の取り組みとして、このプロジェクトを始動した。その後、IBM、Huawei、Intel、AMD、ARM、Qualcommがこの取り組みに対して積極的な支援を表明した^[1]。

2017年10月に、MicrosoftはCognitive ToolkitおよびProject Brainwaveプラットフォームにおいて、ONNXのサポートを発表した^[1]。

2019年11月、ONNXはLinux Foundation AIの卒業生プロジェクトとして承認された。

構成

ONNXは、推論（評価）に焦点を当て、拡張可能な計算グラフモデル、組み込み演算子、および標準データ型の定義を提供する^[2]。

それぞれのデータフローグラフは、有向非巡回グラフを形成するノードのリストになっている。ノードには入力と出力があり、各ノードが処理を呼び出すようになっている。メタデータはグラフを文書化する。組み込み演算子は、ONNXをサポートする各フレームワークで利用可能である^[2]。

グラフは Protocol Buffers を使用して拡張子 .onnx のバイナリファイルとして保存可能である^[3]。このファイルは様々な機械学習のライブラリから読み書き可能である。

ONNX仕様は2つのサブ仕様、IRとOperatorからなる。この2つの仕様はそれぞれバージョニングされており、ONNX仕様のバージョンはこの2つのサブ仕様の特定版を指定したものとなっている。2021-12-22現在の最新バージョンは version 1.10.2 であり、これはIR v8とOperator v15-v2-v1 から成る^[4]。

ONNX IR

Open Neural Network Exchange Intermediate Representation (ONNX IR) はONNXの基本データ型と計算グラフを定義するサブ仕様である^[5]。ONNX IRは計算グラフを構成する Model, Graph, Node 等の要素、入出力 Tensor, Sequence, Map およびデータ FLOAT, INT8, BFLOAT16 等の基本データ型を定義する。2021-12-22現在の最新バージョンは version 8 である^[4]。

ONNX IRが定義する要素として以下が挙げられる。

Graph: 計算グラフを表現する要素。Graph入出力を指定するinput/initializer^[6]^[7]/output、計算ノード群を指定するnode、メタデータを収納するname/doc_string/value_info、の属性をもつ。
Node: 計算ノードを表現する要素。Node入出力を指定するinput/output、演算子とそのパラメータを指定するdomain/op_type/attribute、メタデータを収納するname/doc_string/value_info、の属性をもつ。

すなわちGraphに収納された各Nodeが入出力をもった演算 (例: Conv) になっており、Node/Graph入出力名に基づいてNode群がグラフ構造を取っている。

拡張演算子

ONNX IRはONNX Operatorで定義される標準演算子に追加して、独自の拡張演算子を受け入れられるように設計されている^[8]。これによりONNXの "Extensible/拡張可能" 特性を実現している^[9]。拡張演算子セットを Model の opset_import 属性に指定することで実行エンジン側へ拡張演算子の利用を通知する仕組みである^[10]。ONNXを受け取った実行エンジンは opset_import を確認し、指定された演算子セット全てをサポートしていれば受け入れ、そうでなければ Model 全体を拒絶する^[11]。

ONNX Operator

ONNXのビルトイン演算子はサブ仕様 Operator specifications により定義される^[12]。3種類の演算子セット（Opset）ai.onnx, ai.onnx.ml, ai.onnx.training が定義されており、ai.onnx がデフォルトである。2022年12月12日現在、ai.onnx の最新バージョンは version 18 である^[4]。

例えばOpset ai.onnx v15ではRNN系演算子として RNN 、LSTM 、GRU が定義されている。

量子化

ONNXは入出力の量子化やそれに対する操作を演算子として持つ。QuantizeLinearはスケール・シフトパラメータに基づく線形量子化をおこなう^[13]。DynamicQuantizeLinearは入力ベクトルのmin/maxに基づく動的uint8量子化をおこなう^[14]。int8入力に対する演算にはMatMulInteger、QLinearMatMul、ConvInteger、QLinearConvなどがある。

ONNX Runtime

ONNX Runtime (略称: ORT^[15]) は様々な環境におけるONNXモデルの推論・学習高速化を目的としたオープンソースプロジェクトである^[16]。フレームワーク・OS・ハードウェアを問わず単一のRuntime APIを介してONNXモデルを利用できる^[17]。またデプロイ環境に合わせた最適化を自動でおこなう^[18]。ONNX Runtimeは設計方針としてアクセラレータ・ランタイム抽象化とパフォーマンス最適化の両立を掲げており、ONNXモデルの自動分割と最適アクセラレータによるサブモデル実行によりこれを実現している^[19]。

ONNX Runtimeがサポートする最適化には以下が挙げられる。

モデル量子化: 8-bit Model Quantization^[20]
グラフ最適化^[21]: Basic (不要ノード除去・一部のop fusions^[22]), Extended (op fusions^[23]), Layout (NCHWc Optimizer^[24]) の三段階

対応するバックエンドに関しては#ONNXバックエンドを参照。

ONNXモデル

ONNXのモデルはPythonスクリプトから生成したり（#例を参照）、他のフレームワークから変換したりすることで作ることができる。他のフレームワークからの変換には以下のような方法が存在する：

ONNXMLTools^[25] - 様々なフレームワークからの変換を行う。
PyTorchの標準ONNXエクスポータ (torch^[26])
tf2onnx^[26] - TensorFlowからの変換を行う。
sklearn-onnx (skl2onnx^[26]) - scikit-learnからの変換を行う

またONNXのモデル集としては以下が存在する；

ONNX Model Zoo^[25]

ONNXバックエンド

ONNX Runtime は共有ライブラリの Execution Providers によって多数のバックエンドをサポートしている^[27]。これにはIntel の OpenVINO バックエンド (onnxruntime-openvino) 及び oneDNN バックエンド、NVIDIAの CUDA バックエンド (onnxruntime-gpu) 及び TensorRT バックエンド (onnxruntime-gpu)、AMDの ROCm バックエンド及び MIGraphX バックエンド、Windows の DirectML バックエンド (onnxruntime-directmlなど)、macOS / iOS の CoreML バックエンド、Android の NNAPI バックエンド、Microsoft Azure向けの Azure バックエンドなどが存在する^[28]^[27]。

また ONNX からNVIDIA GPU向けのTensorRTバイナリ (.trt) を生成するものとして NVIDIA の Polygraphy^[29] や trtexec も存在する^[29]。

また OONX を LLVM の MLIR によってコンパイルするための onnx-mlir も存在する^[30]。

例

線形回帰モデルの学習結果として $y=2x+3$ が得られたとして、それを ONNX ファイルに保存する Python での実装例。

import numpy as np
import onnx
from onnx import TensorProto, numpy_helper
from onnx.helper import make_model, make_node, make_graph, make_tensor_value_info

A = numpy_helper.from_array(np.array(2.0), "A")
B = numpy_helper.from_array(np.array(3.0), "B")
X = make_tensor_value_info("X", TensorProto.DOUBLE, [])
Y = make_tensor_value_info("Y", TensorProto.DOUBLE, [])

graph = make_graph([
	make_node("Mul", ["A", "X"], ["AX"]),
	make_node("Add", ["AX", "B"], ["Y"]),
], "Linear Regression", [X], [Y], [A, B])
onnx.save(make_model(graph), "2x_3.onnx")

それを ONNX Runtime を使い実行する Python での実装例。

import numpy as np
import onnxruntime as ort
ort_sess = ort.InferenceSession("2x_3.onnx")
y = ort_sess.run(None, {"X": np.array(4.0)})[0]

脚注

[脚注の使い方]

出典

^ ^a ^b ^c “Microsoft and Facebook's open AI ecosystem gains more support” (英語). Engadget 2017年10月11日閲覧。
^ ^a ^b ^c ^d “Microsoft and Facebook create open ecosystem for AI model interoperability - Microsoft Cognitive Toolkit” (英語). Microsoft Cognitive Toolkit. (2017年9月7日) 2017年10月11日閲覧。
^ onnx/IR.md at main onnx/onnx - GitHub
^ ^a ^b ^c ONNX Versioning. onnx/onnx.
^ " 1. A definition of an extensible computation graph model. 2. Definitions of standard data types. #1 and #2 together make up the ONNX Intermediate Representation, or 'IR', specification" Open Neural Network Exchange Intermediate Representation (ONNX IR) Specification. onnx/onnx
^ 同名のGraph入力デフォルト値、あるいは定数Graph入力扱い "When an initializer has the same name as a graph input, it specifies a default value for that input. When an initializer has a name different from all graph inputs, it specifies a constant value." Open Neural Network Exchange Intermediate Representation (ONNX IR) Specification. onnx/onnx. 2022-06-09閲覧.
^ "When an initializer has the same name as a graph input, it specifies a default value for that input." Open Neural Network Exchange Intermediate Representation (ONNX IR) Specification. onnx/onnx. 2022-06-09閲覧.
^ "An implementation MAY extend ONNX by adding operators expressing semantics beyond the standard set of operators" Open Neural Network Exchange Intermediate Representation (ONNX IR) Specification. onnx/onnx
^ "Extensible computation graph model ... expressing semantics beyond the standard set of operators" Open Neural Network Exchange Intermediate Representation (ONNX IR) Specification. onnx/onnx
^ "The mechanism for this is adding operator sets to the opset_import property in a model that depends on the extension operators." Open Neural Network Exchange Intermediate Representation (ONNX IR) Specification. onnx/onnx
^ "An implementation must support all operators in the set or reject the model." Open Neural Network Exchange Intermediate Representation (ONNX IR) Specification. onnx/onnx
^ "Operator specifications that may be referenced by a given ONNX graph." ONNX Versioning. onnx/onnx.
^ "QuantizeLinear The linear quantization operator. It consumes a high precision tensor, a scale, and a zero point ... The quantization formula is y = saturate ((x / y_scale) + y_zero_point). ... For (x / y_scale), it's rounding to nearest ties to even." Operator Schemas. ONNX. 2022-03-13閲覧.
^ "DynamicQuantizeLinear for Scale, Zero Point and FP32->8Bit convertion of FP32 Input data" Operator Schemas. ONNX. 2022-03-13閲覧.
^ "ONNX Runtime (ORT)" Welcome to ONNX Runtime (ORT). ONNX Runtime.
^ "ONNX Runtime is an open source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms." About. ONNX Runtime.
^ "It enables acceleration of machine learning inferencing across all of your deployment targets using a single set of API." About. ONNX Runtime.
^ "ONNX Runtime automatically parses through your model to identify optimization opportunities and provides access to the best hardware acceleration available." About. ONNX Runtime.
^ "Design principles ONNX Runtime abstracts custom accelerators and runtimes to maximize their benefits across an ONNX model. ... ONNX Runtime partitions the ONNX model graph into subgraphs that align with available custom accelerators and runtimes." About. ONNX Runtime.
^ Quantize ONNX Models. ONNX Runtime.
^ Graph Optimizations in ONNX Runtime. ONNX Runtime.
^ "Redundant node eliminations ... Semantics-preserving node fusions" Graph Optimizations in ONNX Runtime. ONNX Runtime.
^ "These optimizations include complex node fusions." Graph Optimizations in ONNX Runtime. ONNX Runtime.
^ "These optimizations change the data layout ... Optimizes the graph by using NCHWc layout instead of NCHW layout." Graph Optimizations in ONNX Runtime. ONNX Runtime.
^ ^a ^b ONNX モデル Microsoft
^ ^a ^b ^c Install ONNX to export the model Microsoft
^ ^a ^b Build ONNX Runtime with Execution Providers Microsoft
^ Optimize and Accelerate Machine Learning Inferencing and Training Microsoft
^ ^a ^b NVIDIA Deep Learning TensorRT Documentation NVIDIA
^ Users of MLIR LLVM Project

外部リンク

[:02-1] “Microsoft and Facebook's open AI ecosystem gains more support” (英語). Engadget 2017年10月11日閲覧。

[:0-2] “Microsoft and Facebook create open ecosystem for AI model interoperability - Microsoft Cognitive Toolkit” (英語). Microsoft Cognitive Toolkit. (2017年9月7日) 2017年10月11日閲覧。

[3] x/IR.md at main onnx/onnx - GitHub

[:1-4] ONNX Versioning. onnx/onnx.

[5] " 1. A definition of an extensible computation graph model. 2. Definitions of standard data types. #1 and #2 together make up the ONNX Intermediate Representation, or 'IR', specification" Open Neural Network Exchange Intermediate Representation (ONNX IR) Specification. onnx/onnx

[6] 同名のGraph入力デフォルト値、あるいは定数Graph入力扱い "When an initializer has the same name as a graph input, it specifies a default value for that input. When an initializer has a name different from all graph inputs, it specifies a constant value." Open Neural Network Exchange Intermediate Representation (ONNX IR) Specification. onnx/onnx. 2022-06-09閲覧.

[7] "When an initializer has the same name as a graph input, it specifies a default value for that input." Open Neural Network Exchange Intermediate Representation (ONNX IR) Specification. onnx/onnx. 2022-06-09閲覧.

[8] "An implementation MAY extend ONNX by adding operators expressing semantics beyond the standard set of operators" Open Neural Network Exchange Intermediate Representation (ONNX IR) Specification. onnx/onnx

[9] "Extensible computation graph model ... expressing semantics beyond the standard set of operators" Open Neural Network Exchange Intermediate Representation (ONNX IR) Specification. onnx/onnx

[10] "The mechanism for this is adding operator sets to the opset_import property in a model that depends on the extension operators." Open Neural Network Exchange Intermediate Representation (ONNX IR) Specification. onnx/onnx

[11] "An implementation must support all operators in the set or reject the model." Open Neural Network Exchange Intermediate Representation (ONNX IR) Specification. onnx/onnx

[12] "Operator specifications that may be referenced by a given ONNX graph." ONNX Versioning. onnx/onnx.

[13] "QuantizeLinear The linear quantization operator. It consumes a high precision tensor, a scale, and a zero point ... The quantization formula is y = saturate ((x / y_scale) + y_zero_point). ... For (x / y_scale), it's rounding to nearest ties to even." Operator Schemas. ONNX. 2022-03-13閲覧.

[14] "DynamicQuantizeLinear for Scale, Zero Point and FP32->8Bit convertion of FP32 Input data" Operator Schemas. ONNX. 2022-03-13閲覧.

[15] "ONNX Runtime (ORT)" Welcome to ONNX Runtime (ORT). ONNX Runtime.

[16] "ONNX Runtime is an open source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms." About. ONNX Runtime.

[17] "It enables acceleration of machine learning inferencing across all of your deployment targets using a single set of API." About. ONNX Runtime.

[18] "ONNX Runtime automatically parses through your model to identify optimization opportunities and provides access to the best hardware acceleration available." About. ONNX Runtime.

[19] "Design principles ONNX Runtime abstracts custom accelerators and runtimes to maximize their benefits across an ONNX model. ... ONNX Runtime partitions the ONNX model graph into subgraphs that align with available custom accelerators and runtimes." About. ONNX Runtime.

[20] Quantize ONNX Models. ONNX Runtime.

[21] Graph Optimizations in ONNX Runtime. ONNX Runtime.

[22] "Redundant node eliminations ... Semantics-preserving node fusions" Graph Optimizations in ONNX Runtime. ONNX Runtime.

[23] "These optimizations include complex node fusions." Graph Optimizations in ONNX Runtime. ONNX Runtime.

[24] "These optimizations change the data layout ... Optimizes the graph by using NCHWc layout instead of NCHW layout." Graph Optimizations in ONNX Runtime. ONNX Runtime.

[windows-ml-onnx-25] ONNX モデル Microsoft

[onnx-export-26] Install ONNX to export the model Microsoft

[onnxr-eps-27] Build ONNX Runtime with Execution Providers Microsoft

[28] Optimize and Accelerate Machine Learning Inferencing and Training Microsoft

[nv-trt-guide-29] NVIDIA Deep Learning TensorRT Documentation NVIDIA

[30] Users of MLIR LLVM Project

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]