OpenAI FineTuning を試してみる

OpenAI FineTuning
を試してみる
発表者 K.H

FineTuningについて
FineTuningでは既存のモデルを学習させて新しいモデルを作成することができる。
既存のモデルを学習させる目的として主に以下の２つがある。
• 既存のモデルでは期待する結果が得られない
• 処理時間を短くしたい
現在ベースとして利用できる既存のモデルはdavinci、curie、babbage、adaもしく
はFineTuningで作成したモデルのみである。

FineTuningの流れ
FineTuningは以下の流れで実施することができる。
1. トレーニングデータの準備
2. CLIを利用してトレーニングデータの整形
3. FineTuningの実施

トレーニングデータの準備
トレーニングデータは以下のようなjson形式で用意する
トレーニングデータの量が多けければ多いほど質がよくなる。何をするかによって
用意するトレーニングデータの量は異なるが、最低でも数百の例を用意することを
推奨している。
{"prompt":"{promptの内容}","completion":"{期待する結果}"}

トレーニングデータの準備
例えば映画のレビューを元にその映画の評価(positive or negative)を判断するように
FineTuningする場合のトレーニングデータは以下のようになる
{"prompt":"This was an absolutely terrible movie. Don't be lured in by Christopher
Walken or Michael Ironside. Both are great actors, but this must simply be their worst
role in history. Even their great acting could not redeem this movie's ridiculous
storyline. This movie is an early nineties US propaganda piece. The most pathetic
scenes were those when the Columbian rebels were making their cases for
revolutions. Maria Conchita Alonso appeared phony, and her pseudo-love affair with
Walken was nothing but a pathetic emotional plug in a movie that was devoid of any
real meaning. I am disappointed that there are movies like this, ruining actor's like
Christopher Walken's good name. I could barely sit through
it.","completion":"Negative"}
映画の
レビュー
映画の評価

CLIを利用してトレーニングデータの整形
OpenAI CLIツールを利用してトレーニングデータを整形する。下記を実行するとい
くつか選択する必要があるが今回はすべて「Y」を選択した。
$ openai tools fine_tunes.prepare_data -f <JSONLファイル>

選択する必要があるのは以下の4箇所
Based on the analysis we will perform the following actions:
- [Recommended] Add a suffix separator ` ->` to all prompts [Y/n]: Y
- [Recommended] Add a whitespace character to the beginning of the completion [Y/n]: Y
- [Recommended] Would you like to split into training and validation set? [Y/n]: Y
Your data will be written to a new JSONL file. Proceed [Y/n]: Y

処理が終わると検証ファイルとトレーニングファイルが作成される。
トレーニングデータが元のデータの80%,検証データが元のデータの20%となるよう
にファイルが作成される。
今回はトレーニングファイルを利用してFineTuningを実施する。

FineTuningの実施
以下のコマンドでFineTuningを実施する。
$ openai api fine_tunes.create
-t "traindata.jsonl"
-m curie

FineTuningの実施
オプションの意味は以下の通り
-t
・トレーニングファイルを指定
・必須オプション
-m
・ベースとなるモデルを指定
・必須オプション

Upload progress: 100% 102k/102k [00:00<00:00, 40.5Mit
Uploaded file from traindata.jsonl: file-vPKxxxxxxxxxxxxxxxxxx
Upload progress: 100% 24.9k/24.9k [00:00<00:00, 9.94M
Streaming events until fine-tuning is complete...
(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2023-02-07 08:19:12] Created fine-tune: ft-M3Ixxxxxxxxxxxxxxxxxxxxx
[2023-02-07 08:23:47] Fine-tune costs $0.27
[2023-02-07 08:23:47] Fine-tune enqueued. Queue number: 18
==== 続く ====
FineTuningの実施
FineTuningを実施するとターミナルに以下のような情報が表示される。
FineTuningのジョブID
FineTuningのコスト

FineTuningの実施
FineTuningを実施するとターミナルに以下のような情報が表示される。
==== 続き ====
[2023-02-07 08:56:15] Fine-tune is in the queue. Queue number: 0
[2023-02-07 08:57:06] Fine-tune started
[2023-02-07 08:58:07] Completed epoch 1/4
[2023-02-07 08:58:32] Completed epoch 2/4
[2023-02-07 08:58:55] Completed epoch 3/4
[2023-02-07 08:59:18] Completed epoch 4/4
[2023-02-07 08:59:38] Uploaded model: curie:ft-personal-yyyy-mm-dd-hh-mm-ss
[2023-02-07 08:59:39] Fine-tune succeeded
Job complete! Status: succeeded
Try out your fine-tuned model:
openai api completions.create -m curie:ft-personal-yyyy-mm-dd-hh-mm-ss -p <YOUR_PROMPT>
作成されたモデル

作成したモデルの利用方法
作成したモデルを利用してCompletionを実施する場合は以下のように作成したモデ
ルを指定する。
curl https://api.openai.com/v1/completions
-H "Content-Type: application/json"
-H "Authorization: Bearer YOUR_API_KEY"
-d '{"model": “curie:ft-personal-yyyy-mm-dd-hh-mm-ss",
"prompt": “{映画の感想} -> “}

FineTuning実施前の実施後の精度
FIneTuning実施前と実施後の精度を計測してFineTuningの結果を確認する。
精度の計測にはpytnonのライブラリであるtensorflow_datasetのimdb_reviewを使用
する。imdb_reviewは以下のように映画の感想とその評価(postive or negative)のデ
ータが存在する。
感想評価
This was an absolutely terrible movie. Don’t… Negative
I have been known to fall asleep during films, but… Negative

FineTuning実施前の実施後の精度
imdbの評価とテキスト補間の結果が一致した件数が多いほど精度が高く、不一致と
なった件数が少ないほど精度が低くなる。
50件のテストデータを使用してFineTuning実施前と実施後の精度を測定した結果以
下のようになった。
一致した件数不一致だった件数
FineTuning
実施前
31件 19件
FineTuning
実施後
50件 0件

まとめ
今回はFineTuningの実施方法を紹介した。
またFineTuning実施前と実施後の精度を測定し、精度が大幅に向上していることも
確認できた。

OpenAI FineTuning を試してみる

More Related Content

What's hot (20)

Similar to OpenAI FineTuning を試してみる (14)

More from iPride Co., Ltd. (20)

OpenAI FineTuning を試してみる