zozotown real time linkage infrastructure

ZOZOTOWNを支える 
リアルタイムデータ連携基盤 
株式会社ZOZOテクノロジーズ 
SRE部 MA基盤 
谷口恵輔
Copyright © ZOZO Technologies, Inc.
1

© ZOZO Technologies, Inc.
株式会社ZOZOテクノロジーズ 
SRE部 MA基盤
谷口恵輔 
● 2020年3月入社。入社したらニックネームはすでに決まってい
ました。 
● 前職はアドテクの会社で位置情報を使ったサイネージのプラ
ンニングシステムを作ってました。 
 
2

目次 
1. 既存のデータ連携基盤の紹介 
2. リアルタイムデータ連携基盤の必要性 
3. リアルタイムデータ連携基盤の仕組みと課題 
4. リアルタイムデータ連携基盤のリプレイス 
5. まとめ 
3

目次 
5. まとめ 
4

既存のデータ連携基盤の紹介 
5・・・
基幹DBs
SQL Server
中間DB
SQL Server
BigQuery
bcp
 
● ワークフローエンジン 
○ Digdag 
■ Treasure Dataが開発してるOSS 
■ ZOZOテクノロジーズにはコントリビューターが7人！
● ETLツール 
○ BCP 
■ SQL Server専用のETLツール 
■ とにかく速い 
■ オンプレ環境の基幹DBから中間DBへデータ連携で使う 
○ Embulk 
■ Treasure Dataが開発してるOSS 
■ 個人情報をNULL置換 or ハッシュ化 
■ 中間DBからBigQueryへデータ連携 
● データ連携の頻度 
○ 1日1回 
■ これまでは日次での連携で要件を満たしてた

目次 
5. まとめ 
6

リアルタイムデータ連携基盤の必要性 
7
 
● 配信基盤 
○ 商品が残り1点になったタイミングで通知したい 
○ キャンペーン施策の効果をリアルタイムでモニタリングしたい 
 
● 機械学習や異常検知 
○ AIを使うサービスが増えてきた 
○ 商品在庫をリアルタイムで連携したい 
○ 不正検知などをリアルタイムに行いたい 
○ 推定モデルで鮮度の高いデータを使いたい

目次 
5. まとめ 
8

事例紹介　検索パーソナライズ基盤 
9
 
● 検索パーソナライズとは？ 
○ ユーザーごとに商品をおすすめ順でレコメンド 
○ ユーザーの行動履歴や属性をみてモデルを生成 
○ 検索パーソナライズ基盤の商品在庫連携で 
リアルタイムデータ連携基盤を活用

リアルタイムデータ連携基盤の仕組み 
10
On-premises
SQL
Replication
SQL
Replication
SQL
Replication
Qlik
Replication
GCP
Dataflow BigQueryKafkaQlik
Replicate
SQL ServerSQL ServerSQL ServerSQL Server
● データ連携方法 
○ オンプレ環境からGCPまで多段にレプリケーションをしている 
○ Qlik Replicateを使いSQL ServerからKafkaにデータを転送 
https://techblog.zozo.com/entry/migrating-zozotown-search-platform 
https://www.qlik.com/us/attunity

リアルタイムデータ連携基盤の課題 
11
● データの欠損 
○ 既存の処理系にメモリリークがあるようで定期的に再起動が必要 
● データの遅延 
○ レプリケーション遅延 
■ オンプレ環境からクラウドまで多段にレプリケーションを行う 
■ 10分〜30分程度の遅延が発生する 
● コスト 
○ 約200万円/月 
■ Qlik Replicateライセンス料金 
■ Confluent Cloud(Kafka) 
■ Compute Engine(SQL Server/Dataflow) 
 
 
 
異なる要件に汎用的に使える
基盤を作りたい

目次 
5. まとめ 
12

SQL Serverの差分データの取り方を検討 
13
● 大元のデータベースから取得したい 
○ レプリ遅延が発生するため 
● 更新タイムスタンプの利用を検討 
○ 更新タイムスタンプが付与されたテーブルが少ない 
○ 付与されていてもなぜか更新されてない 
● CDCの利用を検討 
○ オンプレ環境にはCDCを使える2016年以降のバージョンがほとんどない 
○ 非同期での連携なので少し遅延しそう

SQL Serverの差分データの取り方を検討 
 
14
● Change Tracking機能を採用 
● Change Trackingとは？ 
○ SQLを使って差分データを取得 
■ CDCのようにSQL Serverの差分データ取得できる 
■ 差分のあった主キーの更新方法や管理してるバージョンも取得できる 
○ 主キーごとの変更履歴は取れない 
■ 変更のあった主キーとレコードを紐づける 
■ PULLする間隔よりも高頻度で更新が実行されると、最新の情報しか取得できない 
■ 要件として変更履歴は不要なので特に問題ない 
○ 同期的に書き込まれる 
■ 非同期であるCDCよりもリアルタイム性が高い 
■ COMMITされたトランザクションは確実にChange Trackingで読み出せる

Change Tracking 
15
SELECT
a.SYS_CHANGE_OPERATION as changetrack_type,
a.SYS_CHANGE_VERSION as changetrack_ver,
#{columns}
FROM
CHANGETABLE(CHANGES #{@tablename},
@前回更新したバージョン) AS a
LEFT OUTER JOIN #{@tablename} ON a.#{@primary_key} = b.#{@primary_key}
前回取得した時のバージョンを渡すことで、差分の
あったレコードを取得する
https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/about-chang
e-tracking-sql-server

アーキテクチャと処理の流れ 
16
Pub/Sub Dataflow DataflowPub/Sub
Dataflow
BigQuery
BigQuery
BigQuery
基幹DB
Fluentd
(Change Tracking)
Bigtable

17
Dataflow
BigQuery
BigQuery
BigQuery
基幹DB
Fluentd
(Change Tracking)
Bigtable
Change Tracking
● 差分データの取得
○ Fluentdのインプットプラグインを作り、オンプレ環境に対
してChange Trackingを実行
https://docs.fluentd.org/plugin-development
○ 初回ではオンプレ環境から最新のバージョンを取得
○ Pub/Subへのアウトプットプラグインを利用
https://github.com/mia-0032/fluent-plugin-gcloud-pubsub-custom
● 冗長構成
○ Compute Engine 2台にプラグインをデプロイ
○ HA構成による可用性の向上
● 専用回線
○ Dedicated Interconnectを使い、オンプレ環境からGCP
環境へデータを高速に連携
https://cloud.google.com/network-connectivity/docs/interconnect/concepts/dedicate
d-overview?hl=ja
Unique Message Id
Unique Message Id

FluentdのInputプラグイン① 
18
<system>
workers '<worker count>'
</system>
<worker 1>
<source>
@type sql_server
username "#{ENV['CHANGETRACK_USER']}"
password "#{ENV['CHANGETRACK_PASSWORD']}"
host '<host>'
port '<port>'
databasename '<database_name>'
primary_key ["primary_key"]
tablename '<table-name>'
columns ["column_1", "column_2", "column_3"]
changetrack_interval 60
output_tag '<tag_name>'
</source>
<match '<tag_name>'
// out put plugin
</match>
</worker>
テーブル単位で並列処理
https://docs.fluentd.org/deployment/multi-process-workers
Change Trackingを実行するプラグインを作成
https://docs.fluentd.org/plugin-development/api-plugin-input

FluentdのInputプラグイン② 
19
query = """
declare @last_synchronization_version bigint;
SET @last_synchronization_version = #{changetrack_ver};
SET lock_timeout #{@lock_timeout}
SELECT
CONCAT('#{@tablename}','-',a.#{@primary_key.join(',').gsub(',', ',a.')},a.SYS_CHANGE_VERSION) as massage_unique_id,
'#{@tablename}' as table_name,
'#{@changetrack_interval}' as changetrack_interval,
'#{Time.now.utc}' as changetrack_start_time,
a.SYS_CHANGE_OPERATION as changetrack_type,
a.SYS_CHANGE_VERSION as changetrack_ver,
#{columns}
FROM
CHANGETABLE(CHANGES #{@tablename},
@last_synchronization_version) AS a
LEFT OUTER JOIN #{@tablename} ON a.#{@primary_key} = b.#{@primary_key}
"""
メッセージIDを付与
BigQueryで最新のマスタ情報を参照できるよう
にChange Trackingのバージョンも渡す

FluentdのOutputプラグイン 
20
<system>
workers '<worker count>'
</system>
<worker 1>
<source>
// Input Plugin
</source>
<match '<tag_name>'
@type gcloud_pubsub
project "#{ENV['PROJECT_ID']}"
key /usr/src/app/config/gcp_credential.json
topic "projects/#{ENV['PROJECT_ID']}/topics/<topic-name>"
autocreate_topic false
max_messages 1000
max_total_size 9800000
max_message_size 4000000
attribute_keys ["message_unique_id"]
<buffer>
@type memory
flush_interval 1s
total_limit_size 64GB
flush_thread_count 50
flush_thread_interval 1.0
flush_thread_burst_interval 1.0
chunk_limit_size 3MB
retry_max_times 10
</buffer>
<format>
@type json
</format>
</match>
</worker>
attribute_keysでメッセージIDを渡し、Dataflowでメッセー
ジの重複排除を行う
https://cloud.google.com/dataflow/model/pubsub-io#using-record-ids
Pub/Subのプラグインは下記を利用
https://github.com/mia-0032/fluent-plugin-gcloud-pubsub-custom

21
Dataflow
BigQuery
BigQuery
BigQuery
基幹DB
Fluentd
(Change Tracking)
Bigtable
Remove Message DuplicationUnique Message Id
Unique Message Id
● Dataflowでメッセージの重複を排除する
○ FluentdでレコードごとにユニークとなるメッセージIDを付与
○ DataflowのidAttributeを使い冗長構成によるメッセージ重複を排除
○ DataflowはPub/Subで付与されるメッセージのIDの重複は自動で排除可能
https://cloud.google.com/dataflow/docs/concepts/streaming-with-cloud-pubsub?hl=ja#efficient_deduplication
○ Publisherが複数回同じメッセージをPublishした場合、Pub/Subで異なる
メッセージIDが付与されるので自動で重複排除はできない
https://cloud.google.com/community/tutorials/pubsub-spring-dedup-messages

Dataflowでメッセージの重複を排除  
22
public static PipelineResult run(Options options) {
// Create the pipeline
Pipeline pipeline = Pipeline.create(options);
pipeline
.apply(
"Read PubSub Events",
PubsubIO.readMessagesWithAttributes()
.withIdAttribute("message_unique_id")
.fromSubscription(options.getInputSubscription()))
.apply(
"Filter Events If Enabled",
ParDo.of(
ExtractAndFilterEventsFn.newBuilder()
.withFilterKey(options.getFilterKey())
.withFilterValue(options.getFilterValue())
.build()))
.apply("Write PubSub Events", PubsubIO.writeMessages().to(options.getOutputTopic()));
return pipeline.run();
}
10分以内に到着したメッセージIDの重複を排除
https://cloud.google.com/dataflow/model/pubsub-io#using-record-ids
https://beam.apache.org/releases/javadoc/2.4.0/org/apache/beam/sdk/io
/gcp/pubsub/PubsubIO.Read.html#withIdAttribute-java.lang.String-

23
Dataflow
BigQuery
BigQuery
BigQuery
中間DB
Fluentd
(Change Tracking)
Bigtable
Table A
Table B
Dynamic Destinations
● Dataflowで動的にBigQueryの各テーブルへ出力
○ Dataflowでメッセージ内のテーブル名を参照し、動的に
BigQueryのテーブルに書き込む
○ 本来だとテーブル数分Dataflowが必要となるが、1台の
Dataflowで実現可能となり、インフラコストを抑えるこ
とができる
○ DataflowのDynamic Destinations機能を使う
(Javaのみ利用可
能)https://beam.apache.org/documentation/io/built-in/google-bigquery/#using
-dynamic-destinations

Dataflowを使い動的にBigQueryへ出力 
24
WriteResult writeResult = convertedTableRows.get(TRANSFORM_OUT)
.apply(
BigQueryIO.<TableRow>write()
.to(
new DynamicDestinations<TableRow, String>() {
@Override
public String getDestination(ValueInSingleWindow<TableRow> elem) {
return elem.getValue().get("table_name").toString();
}
@Override
public TableDestination getTable(String destination) {
return new TableDestination(
new TableReference()
.setProjectId("project_id")
.setDatasetId("dataset_name")
.setTableId("table_prefix" + "_" + destination), // destination: table name
"destination table" + destination);
}
@Override
public TableSchema getSchema(String destination) {
TableSchema schema = new TableSchema()
switch (destination) {
case "table_a":
schema.setFields(ImmutableList.of(new TableFieldSchema().setName("column").setType("STRING").setMode("NULLABLE")));
break;
case "table_b":
schema.setFields(ImmutableList.of(new TableFieldSchema().setName("column").setType("STRING").setMode("NULLABLE")));
break;
default:
}
return schema
}
})
プラグインから送られてきたテーブル名を参照
BigQueryの出力先のテーブル先を決める
テーブル名に基づいてスキーマを決める
https://beam.apache.org/documentation/io/built-in/google-bigquery/#using-dynamic-destinations
https://www.case-k.jp/entry/2020/06/25/150527

25
Dataflow
BigQuery
BigQuery
BigQuery
中間DB
Fluentd
(Change Tracking)
Bigtable
● ウィンドウ処理を使った特徴量生成も可能
○ Pub/Subでは重複排除したメッセージを7日間保持
https://cloud.google.com/pubsub/pricing#seek-related_message_storage
○ Dataflowのウィンドウ処理で特徴量生成も可能
■ 固定ウィンドウ
■ スライディングウィンドウ
■ セッションウィンドウ
https://beam.apache.org/documentation/programming-guide/#windowing
https://cloud.google.com/blog/ja/products/data-analytics/anomaly-detectio
n-using-streaming-analytics-and-ai
○ BigQuery以外にも出力できる
Window Processing
Keep Message 7days

Pub/Subでメッセージ管理 
26
resource "google_pubsub_subscription" "message_hub" {
name = "message_hub"
topic = google_pubsub_topic.message_hub.name
# subscribe from　multiple subscriber
message_retention_duration = "604800s"
retain_acked_messages = true
ack_deadline_seconds = 60
}
https://cloud.google.com/pubsub/docs/replay-overview?hl=ja
https://cloud.google.com/pubsub/docs/rference/rest/v1/projects.subscrip
tions/modifyAckDeadline?hl=ja#request-body
メッセージは7日間メッセージ保持
複数のサブスクライバーから参照できるようにする

イベントログの連携(構想中) 
27
Dataflow
BigQuery
BigQuery
BigQuery
基幹DB
Fluentd
(Change Tracking)
Bigtable
User
Master Data
Event Log Data
● イベントログの連携も検討
○ オンプレ環境にあるデータはマスタ系のテーブル
○ 現在構想中ではあるが、イベントデータは
クライアントから直接Pub/Subに送ることも検討

個人情報の取り扱い 
28
● BigQueryの個人情報の取り扱い 
○ ポリシータグを使い、 
カラムレベルのアクセス制御 
○ ポリシータグはTerraformで 
テーブルに付与できる 
(作成はまだできない) 
 
 
 
 
 
https://cloud.google.com/bigquery/docs/column-level-security-intro?hl=ja
https://github.com/hashicorp/terraform-provider-google/issues/6075
resource "google_bigquery_table" "table-name" {
dataset_id = google_bigquery_dataset.<dataset-name>.dataset_id
table_id = "<table-name>"
schema = <<EOF
[
{
"name": "column-name>",
"type": "STRING",
"mode": "NULLABLE",
"policyTags": {
"names": [
"projects/<project-id>/
locations/<location>/
taxonomies/<taxonomies-id>/
policyTags/<policy-tag-id>"
]
}
}
]
EOF
}
ポリシータグを付与

個人情報の取り扱い 
29
Dataflow
Pub/Sub
BigQuery
BigQuery
BigQuery
Dataflow
中間DB
Fluentd
(Change Tracking)
● Pub/Subの個人情報の取り扱い
○ 個人情報を必要としない場合は
マスキングしたTopicからデータを参照する
○ DataflowでNULL置換やハッシュ化を行う
○ トピック単位でアクセス権限を制限可能
https://cloud.google.com/pubsub/docs/access-control?hl=ja
Masking Secret Data

● CI/CDツール 
○ CircleCIを活用 
● ビルド 
○ Trackerのビルド 
■ Fluentdのコンテナイメージをビルドし、Container RegistryにPush 
○ Dataflowのビルド 
■ Google提供のテンプレートをカスタマイズしてビルド 
https://github.com/GoogleCloudPlatform/DataflowTemplates 
● デプロイ 
○ Trackerのデプロイ 
■ ２台構成のCompute Engineを1台ずつ停止＆起動させ無停止でのデプロイ 
■ Compute Engine起動時にコンテナイメージをContainer RegistryからPull 
○ Dataflowのデプロイ 
■ 既存パイプラインの更新 
https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline?hl=ja 
ビルド・デプロイ戦略 
30
テーブルやカラム追加を行う際にデータが欠損
しないようにデプロイする必要がある

Compute Engineにコンテナをデプロイ 
31
module "gce-container" {
source = "terraform-google-modules/container-vm/google"
version = "~> 2.0"
container = {
image = "gcr.io/${var.project}/<image-name>"
tty : true
}
restart_policy = "Always"
}
resource "google_compute_instance" "compute engine" {
name = "name"
machine_type = "n2-custom-4-10240"
zone = "asia-northeast1-a"
boot_disk {
initialize_params {
image = module.gce-container.source_image
size = 500
}
}
metadata_startup_script = "#!/bin/bash /usr/bin/docker-credential-gcr configure-docker EOF"
metadata = {
gce-container-declaration = module.gce-container.metadata_value
google-logging-enabled = "true"
google-monitoring-enabled = "true"
}
service_account {
email = "${google_service_account.tracker_app.email}"
scopes = [
"https://www.googleapis.com/auth/cloud-platform",
]
}
}
Fluentdのコンテナイメージを指定
Cloud LoggingとMonitoringのコンテナもデプロイ
https://github.com/terraform-google-modules/terraform-google-container-vm/blob/master/example
s/simple_instance/main.tf
起動時に最新のイメージをPULL

Dataflowカスタムテンプレートのビルド 
32
mvn -Pdataflow-runner compile exec:java
-Dexec.mainClass=com.google.cloud.teleport.templates.PubsubToPubsub
-Dexec.args="--project=${project_id}
--tempLocation=gs://${project_id}/tmp
--templateLocation=gs://${project_id}/templates/PubsubToPubsubWithIdAttributeTemplate
--experiments=enable_stackdriver_agent_metrics
--enableStreamingEngine
--runner=DataflowRunner" Cloud Monitoringをデプロイ
Streaming Engineを有効にすると
ディスクサイズも420GBから30GBに変更可能
https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline?hl=ja#streaming-en
gine
https://cloud.google.com/dataflow/quotas#compute-engine-quotas

Dataflowカスタムテンプレートのデプロイ 
33
def create_template_request(self, job_name, template_path, parameters, environment, update_options):
request = self.dataflow.projects().templates().launch(
projectId = self.project_id,
gcsPath = template_path,
body = {
"jobName": job_name,
"parameters": parameters,
"environment": environment,
"update": update_options
}
)
return request
def deploy_dynamic_destinations_datatransfer(self, active_jobs):
job_name='dynamic_destinations_datatransfer'
template_name = 'PubSubToBigQueryDynamicDestinationsTemplate'
template_path = "gs://{}/templates/{}".format(self.project_id, template_name)
input_subscription = 'message_hub'
output_default_table = 'streaming_datatransfer.streaming_dynamic_changetracktransfer'
parameters = {
"inputSubscription": "projects/{}/subscriptions/{}".format(self.project_id, input_subscription),
"outputTableSpec": "{}:{}".format(self.project_id, output_default_table),
"autoscalingAlgorithm": "THROUGHPUT_BASED"
}
environment = {
"zone": 'us-central1-a',
"machineType": 'n2-standard-2',
"maxWorkers": 5
}
update_options='false'
if 'dynamic_destinations_datatransfer' in active_jobs:
update_options='true'
request = self.create_template_request(job_name, template_path, parameters, environment, update_options)
request.execute()
既存パイプラインを更新
https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline?hl=ja
https://cloud.google.com/dataflow/docs/reference/rest/v1b3/RuntimeEnvironment
CPUの使用率に応じてオートスケール
https://cloud.google.com/dataflow/docs/guides/deploying-
a-pipeline?hl=ja#autotuning-features

監視 
34
● データの欠損 
○ リトライログ 
○ メモリ使用率 
 
● データの遅延 
○ CPU使用率 
○ 遅延時間 
■ RedashでChange Tracking開始からBigQueryへのインサート時刻を確認 
Cloud LoggingやMonitoringを使っ
て監視

リトライのログを監視 
35
def execute_changetracking(changetrack_ver)
try = 0
begin
try += 1
query = generate_query(changetrack_ver)
changetrack_results = execute_query(query)
if !changetrack_results.nil?
changetrack_results.each_slice(@batch_size) { |rows|
es = MultiEventStream.new
rows.each do |r|
r["changetrack_end_time"] = Time.now.utc
es.add(Fluent::Engine.now, r)
if changetrack_ver < r["changetrack_ver"] then
changetrack_ver = r["changetrack_ver"]
end
end
router.emit_stream(@output_tag, es)
}
update_changetrack_version(changetrack_ver)
end
rescue => e
puts "Write Retry Cnt: #{try}, Table Name: #{@tablename}, Error Message: #{e}"
sleep try**2
retry if try < @retry_max_times
raise
end
end
プラグインでリトライ時にログを出力

リトライのログを監視  
36
resource "google_logging_metric" "retry_error_tracker_a_metric" {
name = "retry-error-tracker-a/metric"
filter = "resource.type="gce_instance" severity>=DEFAULT jsonPayload.message: "Write Retry Cnt: 10" resource.labels.instance_id:
"${google_compute_instance.streaming_datatransfer_a.instance_id}""
metric_descriptor {
metric_kind = "DELTA"
value_type = "INT64"
}
}
resource "google_monitoring_alert_policy" "tracker_a_retry_error_alert_policy" {
display_name = "Tracker A Retry Error"
depends_on = [google_logging_metric.retry_error_tracker_a_metric]
combiner = "OR"
conditions {
display_name = "condition"
condition_threshold {
filter = "metric.type="logging.googleapis.com/user/retry-error-tracker-a/metric" resource.type="gce_instance""
duration = "0s"
comparison = "COMPARISON_GT"
aggregations {
alignment_period = "60s"
per_series_aligner = "ALIGN_DELTA"
}
trigger {
count = 1
}
threshold_value = 0
}
}
enabled = true
# gcloud alpha monitoring policies list --project=streaming-datatransfer-env
notification_channels = ["projects/${var.project}/notificationChannels/${var.slack_notification_channel_id}"]
}
Cloud Loggingでアラート用のメトリクスを作成
リトライ回数10回でアラート

メモリやCPUの使用率を監視 
37
resource "google_monitoring_alert_policy" "tracker_a_memory_alert_policy" {
display_name = "Tracker A Memory Utilization"
combiner = "OR"
conditions {
display_name = "condition"
condition_threshold {
filter = "metric.type="agent.googleapis.com/memory/percent_used" resource.type="gce_instance"
resource.labels.instance_id="${google_compute_instance.streaming_datatransfer_a.instance_id}" metric.label."state"="used""
duration = "60s"
comparison = "COMPARISON_GT"
aggregations {
alignment_period = "60s"
per_series_aligner = "ALIGN_MEAN"
}
trigger {
count = 1
}
threshold_value = 80
}
}
enabled = true
notification_channels = ["projects/${var.project}/notificationChannels/${var.slack_notification_channel_id}"]
}
CPUの場
合"metric.type="agent.googleapis.com/cpu/utilization
"
メモリ使用率が80%を超えた場合アラート

性能評価 
38
● データ欠損 
○ データの欠損はなくなった 
■ 冗長構成による可用性の向上 
■ リトライ処理で欠損を防ぐ 
 
● データ遅延 
○ インターバルを除けば数秒程度 
■ インターバルはテーブル単位で変更可能 
○ 取得するレコード数が多いと遅延 
 
● コスト 
○ めちゃくちゃ安くなった 
○ 約200万円/月 → 約5万円/月

目次 
5. まとめ 
39

まとめ 
40
 
● リアルタイムデータ基盤を作っています 
 
● イベントデータもリアルタイムで連携しようとしてます 
 
● リアルタイムデータを使った機械学習系の案件も増えてます

詳細 
弊社テックブログでも詳細について紹介しています 
41
https://techblog.zozo.com/entry/real-time-data-linkage-infrastructure

42
9月8日19:00〜　YouTube Liveにて配信予定 
応募はこちらから：https://zozotech-inc.connpass.com/event/185894/

zozotown real time linkage infrastructure

zozotown real time linkage infrastructure

More Related Content

What's hot (20)

Similar to zozotown real time linkage infrastructure (20)

zozotown real time linkage infrastructure