AutoGluonの「物体検出」のクイックスタートについて紹介・解説します。YOLOv3モデルを使って画像からバイクを検出するという内容です。

Object Detection - Quick Start — AutoGluon Documentation 0.2.0 documentation

表形式データや画像認識については別記事で紹介しています。

https://predora005.hatenablog.com/entry/2021/06/20/190000predora005.hatenablog.com

[1] データセットのダウンロード
[2] 学習の実行
[3] 評価・予測の実行
[4] 分類器のセーブ・ロード
終わりに
出典
補足

[1] データセットのダウンロード

データセットは、 VOCデータセットの中から、学習用に120枚、検証用に50枚、テスト用に50枚が抽出されたものになっています。

$ curl -OL 'https://autogluon.s3.amazonaws.com/datasets/tiny_motorbike.zip'
$ unzip tiny_motorbike.zip

[2] 学習の実行

データセットを読み込んだのち、学習を実行します。

import autogluon.core as ag
from autogluon.vision import ObjectDetector

url = './tiny_motorbike/'
dataset_train = ObjectDetector.Dataset.from_voc(url, splits='trainval')

num_traialsで学習回数を2回に指定しています。クイックスタートの説明によれば、time_limitでコントールするのが望ましいようです。

time_limit = 60*30  # at most 0.5 hour
detector = ObjectDetector()
hyperparameters = {'epochs': 5, 'batch_size': 8}
hyperparamter_tune_kwargs={'num_trials': 2}
detector.fit(dataset_train, time_limit=time_limit, 
hyperparameters=hyperparameters, 
hyperparamter_tune_kwargs=hyperparamter_tune_kwargs)

終了するまでに、r5.largeインスタンスでは10分、m5.xlargeインスタンスでは 5分ほどかかりました。

[3] 評価・予測の実行

evaluateでテストデータを用いた評価を行います。

dataset_test = ObjectDetector.Dataset.from_voc(url, splits='test')

test_map = detector.evaluate(dataset_test)
print(test_map)
# (['motorbike', 'chair', 'bus', 'car', 'dog', 
#   'bicycle', 'cow', 'person', 'boat', 'pottedplant', 'mAP'], 
#  [0.5806065427876039, nan, 0.47272727272727283, 0.007751937984496126, 
#   0.0, nan, nan, nan, nan, nan, 0.26527143837484324])

mAP (Mean Average Precision)を表示すると、約26%でした。

print("mAP on test dataset: {}".format(test_map[1][-1]))
# mAP on test dataset: 0.26527143837484324

以下では、テストデータから1枚画像を取り出し、予測を実行しています。

image_path = dataset_test.iloc[0]['image']
result = detector.predict(image_path)
print(result)
#    predict_class  predict_score  \
# 0         person       0.972545   
# 1      motorbike       0.656974   
# 2        bicycle       0.413718   
# ..           ...            ...   
# 85        person       0.035410   
# 86        person       0.034698   
# 87       bicycle       0.034663   
# 
#                                          predict_rois  
# 0   {'xmin': 0.3991624116897583, 'ymin': 0.2802912...  
# 1   {'xmin': 0.3347971439361572, 'ymin': 0.4365810...  
# 2   {'xmin': 0.3935226500034332, 'ymin': 0.4864529...  
# ..                                                ...  
# 85  {'xmin': 0.38480135798454285, 'ymin': 0.438172...  
# 86  {'xmin': 0.8661710619926453, 'ymin': 0.4210591...  
# 87  {'xmin': 0.46432197093963623, 'ymin': 0.484336...  
# 
# [88 rows x 3 columns]

検出したオブジェクトのクラス(predict_class)、スコア(predict_score)、バウンディングボックスの位置(predict_rois)が返されます。スコアの良いもののみ可視化すると以下の通りです(ソースコードは補足3に記載しています)。

f:id:predora005:20210503154919p:plain

また、複数枚の画像をまとめて予測することも可能です。

bulk_result = detector.predict(dataset_test)
print(bulk_result)
#      predict_class  predict_score  \
# 0           person       0.972545   
# 1        motorbike       0.656974   
# 2          bicycle       0.413718   
# ...            ...            ...   
# 4594        person       0.034718   
# 4595        person       0.034599   
# 4596        person       0.034501   
# 
#                                            predict_rois  \
# 0     {'xmin': 0.3991624116897583, 'ymin': 0.2802912...   
# 1     {'xmin': 0.3347971439361572, 'ymin': 0.4365810...   
# 2     {'xmin': 0.3935226500034332, 'ymin': 0.4864529...   
# ...                                                 ...   
# 4594  {'xmin': 0.37993040680885315, 'ymin': 0.462566...   
# 4595  {'xmin': 0.9430548548698425, 'ymin': 0.1451357...   
# 4596  {'xmin': 0.4937310814857483, 'ymin': 0.2520754...   
# 
#                                                   image  
# 0     /home/jupyter/tiny_motorbike/JPEGImages/000038...  
# 1     /home/jupyter/tiny_motorbike/JPEGImages/000038...  
# 2     /home/jupyter/tiny_motorbike/JPEGImages/000038...  
# ...                                                 ...  
# 4594  /home/jupyter/tiny_motorbike/JPEGImages/002488...  
# 4595  /home/jupyter/tiny_motorbike/JPEGImages/002488...  
# 4596  /home/jupyter/tiny_motorbike/JPEGImages/002488...  
# 
# [4597 rows x 4 columns]

[4] 分類器のセーブ・ロード

分類器のセーブ・ロードも可能です。savefileで指定したファイル名の通りに保存されます。

savefile = 'detector.ag'
detector.save(savefile)
new_detector = ObjectDetector.load(savefile)

終わりに

学習時間は比較的短めでしたが、それなりの精度は出ているようです。何よりもソースコード数行で物体検出が行えるというのが驚きでした。手軽に物体検出できるのはありがたいです。

出典

アイキャッチはGerd AltmannによるPixabayからの画像

補足

[補足1] ImportError: libGL.so.1

AutoGluon使用時に以下のエラーが発生することがある。根本はOpenCVがインポートできないことが原因。

ImportError: libGL.so.1: cannot open shared object file: No such file or directory

環境によって解決策が異なるようですが、Amazon Linux2では以下のコマンドで解決しました。

$ sudo yum install -y mesa-libGL.x86_64

[補足2] AutoGluonでデータセットの取得と解凍まで行う

データセットの取得と解凍をプログラムで行うことも可能です。

import autogluon.core as ag
from autogluon.vision import ObjectDetection as task
import os

root = './'
filename_zip = ag.download('https://autogluon.s3.amazonaws.com/datasets/tiny_motorbike.zip',
                        path=root)
filename = ag.unzip(filename_zip, root=root)

data_root = os.path.join(root, filename)
dataset_train = task.Dataset(data_root, classes=('motorbike',))

[補足3] 画像にバインディングボックスとスコアを表示

データセットから画像のパスを取り出し予測させます。次に、予測結果から良いスコアのもののみを取り出しています。ここでは0.7以上を対象としています。

image0_path = dataset_test.iloc[0]['image']
result = detector.predict(image0_path).query('predict_score >= 0.7')

可視化する際に画像をndarrayで渡す必要があります。Pillowで画像を読み込みnp.arrayでndarrayに変換します。

import numpy as np
from PIL import Image

im0 = Image.open(image0_path)
width, height = im0.size
image0 = np.array(im0)

バウンディングボックスは、Nx4のndarrayで渡します。ディクショナリから取り出した座標を画面座標に変換します。

bboxes=[]
for rois in result['predict_rois']:
    x1, y1 = int(rois['xmin']* width), int(rois['ymin']* height)
    x2, y2 = int(rois['xmax']* width), int(rois['ymax']* height)
    bboxes.append([x1, y1, x2, y2])
    
bboxes = np.array(bboxes)

準備したパラメータを渡せば以下のような画像が作れます。

from gluoncv.utils import viz

scores = result['predict_score'].values
labels , class_names = result['predict_class'].factorize()

ax = viz.plot_bbox(image0, bboxes=bboxes, scores=scores, 
                   labels = labels, class_names=class_names)
plt.show()

08. Finetune a pretrained detection model — gluoncv 0.11.0 documentation