⼤語⾔模型 LLM 應⽤開發入⾨

RubyJam 2023/5/30
(給 Ruby 開發者的)
⼤語⾔模型 LLM
應⽤開發入⾨ (使⽤ LangChain + pycall.rb)
@ihower

About me
• 張⽂鈿 a.k.a. ihower
• https://ihower.tw
• Rails 實戰聖經作者
• Ruby developer since 2006
• 意思是從 Rails 1.1 ⽤到 Rails 7.0
• 愛好資訊科技有限公司 since 2018
• https://aihao.tw

關於這場 talk 的預期
• ⽬標對象是 Ruby Developer 開發者，如何⽤ LLM API 開發
• 就不重頭科普什麼是 ChatGPT，如何使⽤ ChatGPT 了
• 沒有神奇的 Prompt 分享
• 學習和開發的⼀點經驗分享，非課程不會太細節
• 主要分享 LangChain 可以做什麼，如何⽤ Ruby 來做
• 會給你⼀個⾺上可以跑的 Ruby 和 Rails 範例
• 這領域進展很快，若有疏漏錯誤，請多指教

Agenda
• Part 1 入⾨篇
• LLM 和 ChatGPT
• OpenAI API
• LangChain
• pycall.rb
• Ruby code demo
• Rails code demo
• Part2 進階篇
• Prompt Engineering
• Conversational Memory
• Summarization
• Retriever
• Agent
• LLM models

LLM 和 ChatGPT
• LLM (Large language model ) ⼤型語⾔模型:
• ⽤非常多語⾔資料來訓練的AI模型
• 這個模型⽤⼾輸入⼀句話(問題)，模型可以預測下⼀句話(答案)，就像接龍
• ⽣成式AI (AIGC)
• Generative AI 泛指所有可以⽣成內容的 AI，包括⽂字、影像(Midjourney, Stable Di
ff
usion 等)
• 包括 LLM，這裡我們只談 LLM
• ChatGPT 是 LLM 的殺⼿級應⽤
• 我們 App Developer 的定位是做 ChatGPT 應⽤層
• 不同的 UI 渠道、更多的資料整合、企業內應⽤等等
• Why LLM now?
• 跟以往機器學習流程不同，這⼀次使⽤⾨檻變超低

OpenAI API
https://platform.openai.com/docs/introduction

Completions API
https://platform.openai.com/docs/api-reference/completions

Completions API 參數說明
• model 選 model: text-davinci-003, text-davinci-002, text-curie-001, text-babbage-001, text-ada-001 等
• 請參考 https://platform.openai.com/docs/models/overview
• prompt 提⽰詞
• max_tokens 回應最多產⽣多少 token
• Token 是切詞出來的單位，不同 model 有不同上限
• temperature 或 top_p 控制隨機性: 0.8 有創意、0.5 平衡、0.2 精準、0 總是⼀樣的回應
• presence_penalty 若 Token 出現過則給予逞罰。預設0，數值越⼤會傾向新的內容
• frequency_penalty 若 Token 重複出現則給予逞罰。預設0，數值越⼤會傾向⽤不同表述，數值越⼩越撈叨
• logit_bias 可以設定是否某個 token 出現的機率
• stream 讓回傳⽤ server-sent events 逐字回傳，可以改進 UX 反應速度

From ChatGPT Prompt Engineering for Developers

Chat API
https://platform.openai.com/docs/api-reference/chat

Chat API
• 比較新的 Chat API
• 提供了 role 可以包裹對話
• Role 有 system, user, assistant
• OpenAI API 內部會轉換成 ChatML 格式
https://github.com/openai/openai-python/blob/main/chatml.md
• 注意: API 是 Stateless 無狀態沒有上下⽂關係的，因為每次呼叫API，都必須把對話全部傳過去
• 提供的 Model 有
• gpt-3.5-turbo 反應速度和價格較好
• gpt-4 ⽬前產出效果最好的 (GPT-4 API 要排隊，記得去排)

Completions API v.s. Chat API
prompt 參數差異
You are AI assistant. Answer as
concisely as possible.
user: How are you
assistant: well!
user: How are you now?
[
{"role": "system", "content": "You are AI assistant.
Answer as concisely as possible."},
{"role": "user", "content": "How are you"},
{"role": "assistant", "content": "well!"},
{"role": "user", "content": "How are you now?"}
]

Embeddings API
https://platform.openai.com/docs/api-reference/embeddings

Embeddings API
• 很有⽤的中間產物
• 給⼀段不超過 8192 token 的⽂字，回傳⼀個 1536 維度的向量 Vector 代表這⽂
字的語意
• 稍後我們會拿來做 “語意搜尋” (Semantic Search)
• ⽤餘弦相似性 (Cosine similarity) 可以算出最接近的兩個 Vectors，以此我們
就可以找出最相似的兩段內容
• “問題” 和 “最相關的內容”

其他 API
• Speech to text 語⾳轉⽂字
• https://platform.openai.com/docs/guides/speech-to-text
• Whisper model
• Moderations 檢查是否有不良內容
• https://platform.openai.com/docs/api-reference/moderations
• 這免費的!
• Images 圖像⽣成
• https://platform.openai.com/docs/guides/images
• DALL-E https://labs.openai.com/

Fine-tuning API
• 給 JSONL 格式的訓練資料 (⾄少上百條)，可以微調出⼀個新的 Model
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
• 但 Chat Models ⽬前都不⽀援
• ⽬前比較少⼈⽤這招

OpenAI API 的限制
• API 是 Stateless 無狀態的
• 有些 Chatroom demo 沒做記憶上下⽂，問他你第⼀句話說啥都不記得
• Token 有限制
• 對話如果太長太久，就會超過限制
• Prompt 的內容太多，會超過限制 (你無法直接塞⼀整本書的內容進去)
• Model 的資料集只有到 2021/9 限制，沒有新的內容
• 訓練⼀次要好幾百萬美⾦，因此不會⼀直重新訓練
• LLM 不會連網⾃動去抓新資料
• LLM 對於數學計算不在⾏

這些限制，代表我們 App developer 有事做
• 對話紀錄
• 你需要記憶功能，再塞進 prompt
• Token 數限制
• 你需要能夠有效縮減 prompt ⼤⼩: 做摘要、語意搜尋
• ⼯具限制
• 你需要聯網抓資料處理好，再塞進 prompt
• 你需要給他⼀個計算機⼯具

LangChain
https://github.com/hwchase17/langchain
https://python.langchain.com/en/latest/index.html

LangChain
• ⽬前最紅的 LLM ⼯具，讓你⽅便組裝使⽤各種⼯具
• 為何叫做 chain ?
• 因為可以 Chain of Calls 串接再⼀起
• 可以做 Prompt Template 樣版
• 有很多做好的⼯具，例如....

Why LangChain ?
• 整合 20+ 家不同 models 提供者
• 50 + 從不同來源載入不同⽂件格式的⽅式
• 10+ 種不同切割段落的⽅式
• 10+ 種不同 Vector 資料庫
• 15+ 種不同的外部⼯具可以讓 LLM 使⽤
• 20+ 種不同的 Chain
• 各種 Agent 代理⼈
• 稍後我們會看範例....
https://blog.langchain.dev/announcing-our-10m-seed-round-led-by-benchmark/

⽤ Ruby 的⽅案
但 LangChain 是 Python 寫的，怎麼辦?
• 如果 Python 是 AI 的語⾔，那 Ruby 可是做 webapp 後端語⾔
• 不⽤ LangChain 我們⽤ Ruby 寫?
• 直接⽤ openai gem ?
• ⽤ Boxcars gem ? https://github.com/BoxcarsAI/boxcars
• ⽤ Python + Flask 寫 Web service，然後 Ruby 再去 call
• 變成要⽤ Python 寫 web backend 了… 😣
• 採⽤ pycall.rb !

pycall.rb
PyCall: Calling Python functions from the Ruby language
https://github.com/mrkn/pycall.rb

直接看 code
https://github.com/ihower/rails-pycall-langchain
(請看 examples ⽬錄)

Examples
• OpenAI
• ChatOpenAI
• SimpleSequentialChain
• LLMMathChain
• LLMRequestsChain
• Agents

LLMMathChain 的 Prompt 長這樣

⼤語⾔模型 LLM 應⽤開發入⾨

https://github.com/pydata/numexpr

More interesting chains…
其實秘密就是 prompt，建議去看 source code
• RouterChain
• https://python.langchain.com/en/latest/modules/chains/generic/router.html
• API Chains
• https://python.langchain.com/en/latest/modules/chains/examples/api.html
• LLMBashChain
• https://python.langchain.com/en/latest/modules/chains/examples/llm_bash.html
• SQLDatabaseChain
• https://python.langchain.com/en/latest/modules/chains/examples/sqlite.html
• Prompt: https://github.com/hwchase17/langchain/blob/master/langchain/chains/sql_database/
prompt.py

SQLDatabaseChain 的 Prompt 長這樣

Boxcars gem 裡有個 ActiveRecord chain…
https://github.com/BoxcarsAI/boxcars

另⼀個 Chain 題⽬: ⾃動化寫單元測試
• Chain 1
• Input: 你的 code
• Output: 請 GPT 寫出程式規格
• Chain 2:
• Input: 程式規格
• Output: ⽣成⼀個測試計畫
• Chain 3:
• Input: 測試計畫
• Output: 測試 code

來跑 Rails 吧
https://github.com/ihower/rails-pycall-langchain

我在 lib/langchain.rb 把第⼀層的 module 包裹成 Ruby module
這樣就不需要到處 PyCall.import_module 了

範例 Rails 中⽀援的 Processing Job
• LangChainPlain Job 不記上下⽂的版本
• Ruby OpenAI Job 不記上下⽂的版本
• LangChainChat Job 會傳所有對話內容
• LangchainAgentReplJob 可以⽤ Python REPL ⼯具
• LangchainRetrievalQaJob 問答(稍後會說明這個)

Pycall.rb Tips
• 從 model 引入後
• 如果原先是類別，需要 new
• 如果原先是函式物件，需要 .call (也可以省略 call，⼀點 . 就⾏)
• pycall.rb ⽂件寫裝 python 需要 —enable-shared ?
• 先不⽤沒關係，只要沒⽤到 register_python_type_mapping ⽅法
• 這可以讓 pycall 呼叫 python 回傳的東⻄，有正確的 ruby type，也就是 class name
• 作者的 pandas, matplotlib, numpy gems 有⽤到
• 搭配 sidekiq 跟 resque background job 會有問題，會 crash….
• 範例⽤ sucker_punch，若 actioncable 沒反應請重新整理
• 上 production 請換 delyed_job 沒問題

幾個深入主題
• Prompt Engineering: 如何寫好 Prompt
• 進階⼯程解法
• 如何做 Conversational Memory (記住對話)
• 如何做 Summarization (提煉長⽂本)
• 如何做 Retriever (語意搜尋 RAG)
• 其他
• 如何做 Agent (⾃動做決策)
• 開源的 Open-Source LLM

Prompt Engineering
• ChatGPT Prompt Engineering for Developers 算是基礎必看
• 其他推薦資料:
• https://gaiconf.com/ 的 Enterprise Prompt Engineering (可以買回放)
• https://learningprompt.wiki/
• https://www.promptingguide.ai/zh
• https://github.com/promptslab/Awesome-Prompt-Engineering

ChatGPT Prompt Engineering for Developers
https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/
Fox 翻譯版: https://www.youtube.com/playlist?list=PLly8vI0gpqtpTB7mt_qi57qOKQRo4XWAQ
https://www.youtube.com/playlist?list=PLly8vI0gpqtpTB7mt_qi57qOKQRo4XWAQ

ChatGPT Prompt Engineering for Developers 快速複習
• Guidelines
• Iterative
• Summarizing
• Inferring
• Transforming
• Expanding
• Chatbot

Principle 1: Write clear and specific instruction
• clean != short
• Tactic 1: 請⽤分隔符號來明確界定輸入的不同部分
• 也可以避免 Prompt injection: 指⽤⼾輸入⼀些 prompt 讓你對模型發出你不
想做的事情
• Tactic 2: 可以指定輸出格式，例如指定 JSON 輸出
• Tactic 3: 指定滿⾜條件
• Tactic 4: 給範例 (這叫做 Few-shot prompt/learning，相比於 zero-shot)

Principle 2: Give the model time to think

• 給模型思考時間，指⽰模型對⼀個問題進⾏更長的思考時間，⽤更多算⼒
• 若模型出現推理錯誤，你應該嘗試重新設計 prompt，要求要有⼀系列的推理
• 太困難的任務，模型無法在短時間或⽤很少的詞來完成，就會亂掰
• 跟⼈思考⼀樣，⼀下給太困難的問題，⼈也會犯錯
• 這招叫做 Chain of Thought (CoT)
• 光是叫模型 Step by Step，就會⼤幅改進推理
• 因為模型在預測 token 時，給每個 token 的計算時間都是⼀樣的
Principle 2: Give the model time to think

Summarizing 摘要
可以進⼀步寫是要給誰看的摘要

Inferring 推理
以前要做這件事情，需要收集資料並且辛苦訓練你的獨立模型，現在⽤ LLM 下個 prompt 就辦到了
判斷客⼾情緒

Transforming ⽂本轉換
LLM 很擅長將輸入input的內容，轉換成另⼀種格式format，例如語⾔翻譯、拼寫和語法修正、校正

Expanding 擴寫
將⼀個短內容擴寫成長⽂、做 brainstorming

回顧⼀下 LLM API 限制
• Stateless 無狀態
• Token 有上限限制
• 資料集限制，內建只有到 2021/9 ⽉
• 這些問題如何解決?

進階⼯程解法
• 做 Conversational Memory (記住對話)
• 做 Summarization (提煉長⽂本)
• 做 Retriever (先搜尋再Prompt)

做 Memory 功能
• https://python.langchain.com/en/latest/modules/memory/how_to_guides.html
• Langchain 提供有
• ConversationBu
ff
erMemory
• ConversationBu
ff
erWindowMemory
• ConversationSummaryMemory
• ConversationSummaryBu
ff
erMemory
• Entity Memory

https://www.pinecone.io/learn/langchain-conversational-memory/

做摘要
超過 token 數量的超長⽂本
• LangChain 提供有
• https://python.langchain.com/en/latest/modules/chains/index_examples/
summarize.html
• 有三種⽅式 stu
ff
, map-reduce, re
fi
ne
• 如何讓 ChatGPT 摘要⼤量內容：不同⽅法的優缺點
• https://wylin.tw/pages/how-to-summarize-long-texts/

先搜尋再下 Prompt
學名: Retrieval-Augmented Generation
• 如何針對超長⽂本做 QA 問答? 先搜尋相關內容，然後再做 Prompt
• 準備⼯作
• 先上傳內容⽂件、拆段落 (chunks)
• 每個段落建立 index，放到 Vector Database
• ⽤⼾問題時
• 先根據問題做語意搜尋，找到最相似的內容(context)
• 把 context 和問題組出 Prompt 再問 LLM
• 進階⽤法: 若原始內容太鬆散(例如逐字稿)，也可以先摘要再做索引

LangChain 的 QA Prompt 長這樣

LangChain 的 QA 並附帶出處 Prompt 長這樣

State of GPT 演講 (2023/5/24) 也有提到
https://build.microsoft.com/en-US/sessions/db3f4859-cd30-4445-a0cd-553c3304f8e2

來看 Rails Code 的實作
• 上傳⽂件時:
• DocumentParserJob
• ⽤⼾問問提時:
• LangchainRetrievalQaJob

RAG 系統考量點
• ⽀援載入多種格式⽂件: DocumentLoaders
• https://python.langchain.com/en/latest/modules/indexes/document_loaders.html
• 各種拆 chunk 的⽅式: Text Splitters
• https://python.langchain.com/en/latest/modules/indexes/text_splitters.html
• LangChain 推薦⽤ RecursiveCharacterTextSplitter
• 各種 Vector Store (需要快速計算⼤量 vector 的 cosine 相似性)
• https://python.langchain.com/en/latest/modules/indexes/vectorstores.html
• 範例⽤ FAISS: https://github.com/facebookresearch/faiss (Facebook AI Similarity Search)
• 各種 Embedding 算法 (不只有 OpenAI Embedding)
• https://huggingface.co/spaces/mteb/leaderboard?
fbclid=IwAR2PbjMuEoYasJXrzEEOkHAWtQasnmO1rJGb_gzlu1O9ExPUOEsxHs9p8_w
• 各種 Retriever (不只有 Vector 相似性搜尋)
• https://python.langchain.com/en/latest/modules/indexes/retrievers.html

https://dante-ai.com/

做 Agent 代理⼈
• 給⼀些⼯具 (Tools，如同 ChatGPT 的 plugins) 讓 LLM ⾃⼰挑選要⽤哪些
• ⽬前兩種 Agent 策略
• ReAct
• https://python.langchain.com/en/latest/modules/agents/agents/examples/react.html
• Ruby 範例 exapmple/6-agent.rb
• Rails 範例 LangchainAgentReplJob
• Plan-and-Execute (⼜叫做 Autonomous agents 給⽬標⾃動執⾏)
• AutoGPT
• BabyGPT
• 微軟的 JARVIS
• 但⽬前仍不實⽤，可能跑很久卡住很浪費 tokens
• 所以⽬前 LLM 產品化⽅向仍是 Copilots over autonomous agents

ReAct
https://python.langchain.com/en/latest/modules/agents/agents/examples/react.html
• Action: 根據⽤⼾輸入，選擇應該⽤哪⼀個 Tool
• 就根據 tool 的⽂字描述來判斷
• Action Input: 根據需要使⽤的 tool，從⽤⼾輸入中提取參數
• Observation: 觀察 Tool 得到的結果
• Thought: 再看⼀次⽤⼾輸入，判斷接下來怎麼辦
• 是回到 Action ⽤另⼀個⼯具? 還是這就是 Final Answer?
• Final Answer: Thought 看到 Observation 後，若認為這是解答，就給出最終輸出

例如 llm_math ⼯具的描述是
若符合描述，就會呼叫 func

例如 request ⼯具的描述是

範例發出去的第⼀個 prompt:

• GPT 回傳片段
•
• 然後 langchain ⽤ requests_get ⼯具，⽤以上 Input 參數，去拿資料
• 將放到 Observation 裡⾯成為下⼀次的 Prompt

發出去的
第⼆個 prompt:

•
• 然後 langchain ⽤ Calculator ⼯具，⽤以上 Input 參數，去拿資料
• 放到 Observation 裡⾯成為下⼀次的 Prompt

發出去的
第三個 prompt:
這是⼀個
LLMMathChain

•
• 然後 langchain ⽤ Calculator ⼯具，⽤以上 Input 參數，產⽣ python code 執
⾏，算出 1230
• 1230 放到 Observation 裡⾯成為下⼀次的 Prompt

•
• 結束 (花了四個 prompt ⼀堆 tokens…. 辛苦了)

有沒有開源的 Open-Source LLM ?
• Why?
• 資安和隱私: ⼤企業需要放⾃⼰家裡，不想傳給 OpenAI
• 反應速度: OpenAI API 不總是穩定，⽽且有時候很慢

最新 LLM 模型排名(包括非開源跟開源)
https://lmsys.org/blog/2023-05-25-leaderboard/

State of GPT 演講 (2023/5/24)
https://build.microsoft.com/en-US/sessions/db3f4859-cd30-4445-a0cd-553c3304f8e2

上⼀⾴只有前四是 RLHF 模型 (回應更接近⼈類想要的答案) 其他都是 SFT 模型
⽽且基於 Facebook LLaMA 模型的都是 Non-commercial 授權

1. 預訓練階段: Base LLM 學會⽂字接龍
2. SFT 階段: LLM 學會對話，此時可以當 AI 助⼿了
3. RLHF 階段: LLM 學會符合⼈類期待

Andrej Karpathy (OpenAI co-founder) 推薦
• ⽤ GPT-4
• 把 prompt 寫好寫詳細
• ⽤ retriever 在 prompt 裡⾯加上 context 相關資訊
• 實驗各種 prompt engineering 技巧
• 實驗 few-shot example
• 實驗各種 tool 跟 plugins
• 花時間優化 chain
• 最後優化成本

我的⼀些建議
• 建議⾄少
• 看 OpenAI API ⽂件
• 看完 Prompt Engineering 那⾨課
• 接下來: ⾝為 Developer，我個⼈建議可以多看 code
• LangChain source code 了解他是怎麼做的 (他的 prompt 是可以改的!)
• 另⼀套 LlamaIndex 也想看看 https://github.com/jerryjliu/llama_index
• OpenAI cookbook https://github.com/openai/openai-cookbook
• 找⼀些題⽬下去做，真的資料跑下去做了才知道效果如何:
• Prompt 要怎麼寫才有效率?
• Retriever 選哪個? Vector Store 選哪個? Chunk 切多⼤?
• Tool 怎麼整合對⽤⼾ UX 最好?

謝謝聆聽
程式碼在 https://github.com/ihower/rails-pycall-langchain
稍後投影片會公開在 https://ihower.tw
歡迎各種 LLM 案⼦，⼀起研究開發

會後Q: 有關 text split 怎樣切比較好?
A: 需要做實驗，請參考 https://autoevaluator.langchain.com/ 這個思路

⼤語⾔模型 LLM 應⽤開發入⾨

More Related Content

What's hot (20)

Similar to ⼤語⾔模型 LLM 應⽤開發入⾨ (20)

More from Wen-Tien Chang (20)

⼤語⾔模型 LLM 應⽤開發入⾨