自 2025 年 4 月 29 日起，Gemini 1.5 Pro 和 Gemini 1.5 Flash 模型將無法用於先前未使用這些模型的專案，包括新專案。詳情請參閱「模型版本和生命週期」。

本頁面由 Cloud Translation API 翻譯而成。

圖片說明文字

注意：自 2025 年 6 月 24 日起，Imagen 第 1 版和第 2 版將淘汰。Imagen 模型 imagegeneration@002、imagegeneration@005 和 imagegeneration@006 將於 2025 年 9 月 24 日移除。如要進一步瞭解如何遷移至 Imagen 3，請參閱「遷移至 Imagen 3」。

imagetext 是支援圖片說明文字的模型名稱。imagetext 會根據您提供的圖片和指定的語言生成說明文字。模型支援下列語言：英文 (en)、德文 (de)、法文 (fr)、西班牙文 (es) 和義大利文 (it)。

如要在控制台中探索這個模型，請參閱 Model Garden 中的Image Captioning模型資訊卡。

查看 Imagen for Captioning & VQA 模型資訊卡

用途

圖片說明的常見用途包括：

創作者可以為上傳的圖片和影片生成說明 (例如影片序列的簡短說明)
生成描述產品的說明文字
使用 API 將字幕功能整合至應用程式，打造全新體驗

HTTP 要求

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/imagetext:predict

要求主體

{
  "instances": [
    {
      "image": {
        // Union field can be only one of the following:
        "bytesBase64Encoded": string,
        "gcsUri": string,
        // End of list of possible types for union field.
        "mimeType": string
      }
    }
  ],
  "parameters": {
    "sampleCount": integer,
    "storageUri": string,
    "language": string,
    "seed": integer
  }
}

請使用下列 Imagen 模型參數 imagetext。詳情請參閱「使用圖像說明生成功能產生圖像說明文字」。

參數	說明	可接受的值
`instances`	這個陣列包含物件，其中含有要取得相關資訊的圖片詳細資料。	陣列 (允許 1 個圖片物件)
`bytesBase64Encoded`	要加上說明文字的圖片。	Base64 編碼的圖片字串 (PNG 或 JPEG，大小上限為 20 MB)
`gcsUri`	要加上說明文字的圖片 Cloud Storage URI。	Cloud Storage 中圖片檔案的 URI 字串 (PNG 或 JPEG，大小上限為 20 MB)
`mimeType`	(選用步驟) 您指定的圖片 MIME 類型。	字串 (`image/jpeg` 或 `image/png`)
`sampleCount`	產生的文字字串數量。	Int 值：1 到 3
`seed`	(選用步驟) 隨機號碼產生器 (RNG) 的種子。如果輸入內容的 RNG 種子相同，預測結果也會相同。	整數
`storageUri`	(選用步驟) 儲存生成文字回應的 Cloud Storage 位置。	字串
`language`	(選用步驟) 引導回覆的文字提示。	字串：`en` (預設)、`de`、`fr`、`it`、`es`

要求範例

REST

如要使用 Vertex AI API 測試文字提示，請將 POST 要求傳送至發布商模型端點。

使用任何要求資料之前，請先替換以下項目：

PROJECT_ID：您的 Google Cloud 專案 ID。
LOCATION：專案的區域。例如 us-central1、europe-west2 或 asia-northeast3。如需可用區域的清單，請參閱「Vertex AI 的生成式 AI 服務地區」。
B64_IMAGE：要取得說明文字的圖片。圖片必須指定為 Base64 編碼的位元組字串。大小限制： 10 MB。
RESPONSE_COUNT：要生成的圖片說明數量。接受的整數值：1 到 3。
LANGUAGE_CODE：支援的語言代碼之一。支援的語言：
- 英文 (en)
- 法文 (fr)
- 德文 (de)
- 義大利文 (it)
- 西班牙文 (es)

HTTP 方法和網址：

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict

JSON 要求主體：

{
  "instances": [
    {
      "image": {
          "bytesBase64Encoded": "B64_IMAGE"
      }
    }
  ],
  "parameters": {
    "sampleCount": RESPONSE_COUNT,
    "language": "LANGUAGE_CODE"
  }
}

如要傳送要求，請選擇以下其中一個選項：

curl

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI，或使用 Cloud Shell，自動登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict"

PowerShell

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagetext:predict" | Select-Object -Expand Content

以下範例回應適用於含有 "sampleCount": 2 的要求。回應會傳回兩個預測字串。

英文 (en)：

{
  "predictions": [
    "a yellow mug with a sheep on it sits next to a slice of cake",
    "a cup of coffee with a heart shaped latte art next to a slice of cake"
  ],
  "deployedModelId": "DEPLOYED_MODEL_ID",
  "model": "projects/PROJECT_ID/locations/LOCATION/models/MODEL_ID",
  "modelDisplayName": "MODEL_DISPLAYNAME",
  "modelVersionId": "1"
}

西班牙文 (es)：

{
  "predictions": [
    "una taza de café junto a un plato de pastel de chocolate",
    "una taza de café con una forma de corazón en la espuma"
  ]
}

回應主體

{
  "predictions": [ string ]
}

回應元素	說明
`predictions`	代表說明文字的字串清單，依信賴度排序。

回應範例

{
  "predictions": [
    "text1",
    "text2"
  ]
}

圖片說明文字 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

用途

HTTP 要求

要求主體

要求範例

REST

curl

PowerShell

回應主體

回應範例

圖片說明文字