SlideShare a Scribd company logo
Boundary Between Pseudonymity
and Anonymity
-Case Study about Japanese Personal Information
Protection Act -
Hiroshi Nakagawa
The Univeristy of Toktyo
Updated Personal Information
Protection Act in Japan
– The EU General Data Protection Regulation is finally agreed in 2016
• Japan: Personal Information Protection Act (PIPA): Sep.2015
• De-identified Information is introduced. This should meet the following
condition:
– Processed to be unidentifiable to said person
– Prohibited from restoring said personal information
• Informally , the intention of this type of information means:
– Anonymized enough not to de-anonymized easily
– Freely used without the consent of data subject.
– Pseudonymized is not regarded as De-identified Information
• Boarder line between pseudonymity and anonymity which is intended
by De-identified Information is a critical issue.
Why not pseudonym?
Variations of Pseudonymization
in terms of frequency of pseudonym update
pseu weight
A123 60.0
A123 65.5
A123 70.8
A123 68.5
A123 69.0
pseu weight
A123 60.0
A123 65.5
B432 70.8
B432 68.5
C789 69.0
pseu weight
A123 60.0
B234 65.5
C567 70.8
X321 68.5
Y654 69.0
weight
60.0
65.5
70.8
68.5
69.0
Same
Info.
• No
pseudonym
update
• Highly
identifiable
• Needed in
med., farm.
Update
pseud.
Frequent
update
• pseudonym
update
• Divide k subsets
with different
pseudonyms
• Freq. update
lowers both
identifiability and
data value
• Update
pseudonym
data by data
• Regarded as
distinct
person’s data.
No
identifiability
obscurity
The same individual’s personal data
• Pseudonymization without updating for accumulated
time sequence personal data
– Accumulation makes it be easily identified
– Then reasonable to prohibit it to transfer the third party
– PIPA sentence reads pseudonymized personal data without
updating is not De-identified Information.
– In practice
• Unless non Quasi ID data such as a location log do not used as
Quasi ID, we do not easily identify the person corresponding to
pseudonymized personal data without updating.
• Obscurity, in which every data of the same person has distinct
pseudonyms, certainly is De-identified Information because
there are no clue to aggregate the same person’s data.
Pseudonymization with updating is not De-identified
Information (of new Japanese PIPA)?
Record Length
pseu Loc. 1 Loc.2 Loc.3 …
A123 Minato Sibuya Asabu …
A123 Odaiba Toyosu Sinbasi …
A123 … … …. ….
A123 xy yz zw …
A123
• No pseudonym update
• High identifiability by
long location sequence
• Even if pseudonym is
deleted, long location
sequence makes it easy to
identify the specific data
subject.
transform
obscurity
Loc. 1 Loc.2 Loc.3 …
Minato Sibuya Asabu …
Odaiba Toyosu Sinbas
i
…
… … …. ….
xy yz zw …
Shuffling
Loc. 1 Loc.2 Loc.3 …
Minato yz zw …
Odaiba Toyosu Asabu …
… … …. ….
xy Sibuya Sinbasi …
• Obscurity
shuffle
Loc. 1 Loc.2 Loc.3 …
Minato Sibuya Asabu …
Odaiba Toyosu Sinbasi …
… … …. ….
xy yz zw …
Almost no clue to identify same individual’s record.
But data value is reduced.
obscurity
 DI
No update update for ever data
frequency of pseudonym update
Pseudonymize w.o.
update
 Not DI
De-identified
Information
Not De-identified
Information
Somewhere here is the
boundary.
The boundary between De-identified Information (DI) and no API
Continuously observed personal data
has high value in medicine
• Frequent updating of pseudonym enhances
anonymity,
• But reduces data value
– Especially in medicine.
– Physicians do not require “no update of pseudonym.”
– For instance, it seems to be enough to keep the same
pseudonym for one illness as I heard from a
researcher in medicine.
Updating frequency vs Data value
• see the figure below:
Data value
Update
frequency
No
update
low high
update data
by y data
location log
purchasing log
medical log
category Frequency of pseudonym
updating
Usage
Medical No update Able to analyze an individual patient’s
log ,especially history of chronic disease
and lifestyle
update Not able to pursue an individual patient’s
history. Able to recognize short term
epidemic
Driving record
No update If a data subject consents to use it with
Personal ID, the automobile manufacture
can get the current status of his/her own
car, and give some advice such as parts
being in need to repair.
If no consent, nothing can be done.
category Frequency of pseudonym
updating
Usage
Driving record
Low frequency Long range trend of traffic, which can be
used to urban design, or road traffic
regulation for day, i.e. Sunday.
High frequency We can only get a traffic in short period.
Purchasing
record
No update If a data subject consents to use it with
Personal ID, then it can be used for
targeted advertisement.
If no consent, we can only use to
extract sales statistics of ordinary goods.
Low frequency We can mine the long range trend of
individual’s purchasing behavior.
High frequency We can mine the short range trend of
individual’s purchasing behavior.
Every data We only investigate sales statistics of
specific goods
Summary: What usage is possible by
pseudonymization with/without updating
• As stated so far, almost all psedonymized data do
are statistical processing
• No targeted advertisement, nor profiling of
individual person
– Learning the classifier of profile (only with consented
data) and use it to targeted advertise to new users, is
possible but gray.
• Pseudonymized data are hard to trace if it is
transferred to many organizations such as IT
companies.

More Related Content

PPTX
Privacy Protection Technologies: Introductory Overview
PDF
Problems in Technology to Use Anonymized Personal Data
PDF
Privacy Protectin Models and Defamation caused by k-anonymity
PPTX
K anonymity for crowdsourcing database
PDF
プライバシー保護の法制と技術課題(2014年時点)
PDF
差分プライベート最小二乗密度比推定
PPT
19. Reti sociali
PPT
From Mass Media to the Networked Public Sphere
Privacy Protection Technologies: Introductory Overview
Problems in Technology to Use Anonymized Personal Data
Privacy Protectin Models and Defamation caused by k-anonymity
K anonymity for crowdsourcing database
プライバシー保護の法制と技術課題(2014年時点)
差分プライベート最小二乗密度比推定
19. Reti sociali
From Mass Media to the Networked Public Sphere

Viewers also liked (20)

PDF
Toward Networked Literacies
PPT
Avatar
PDF
A Happy New Year 2016
PPTX
データ利用における個人情報の保護
PDF
2014人工知能学会大会および情報処理学会EIP研究会発表資料
PDF
k-匿名化が誘発する濡れ衣:解決編
PPTX
数式を使わないプライバシー保護技術
PPTX
匿名加工情報を使えないものか?(改訂版)
PPTX
未出現事象の出現確率
PDF
匿名化の技術的俯瞰ー匿名加工情報の観点から
PDF
時系列パーソナル・データの プライバシー
PDF
プライバシー保護のためのサンプリング、k-匿名化、そして差分プライバシー
PDF
シンギュラリティ以後
PDF
シンギュラリティ以前
PDF
パーソナル履歴データに対する匿名化と再識別:SCIS2017
PDF
クラシックな機械学習の入門 2.ベイズ統計に基づく推論
PDF
クラシックな機械学習の入門 3. 線形回帰および識別
PDF
非制約最小二乗密度比推定法 uLSIF を用いた外れ値検出
PDF
クラシックな機械学習入門 1 導入
PPTX
ようやく分かった!最尤推定とベイズ推定
Toward Networked Literacies
Avatar
A Happy New Year 2016
データ利用における個人情報の保護
2014人工知能学会大会および情報処理学会EIP研究会発表資料
k-匿名化が誘発する濡れ衣:解決編
数式を使わないプライバシー保護技術
匿名加工情報を使えないものか?(改訂版)
未出現事象の出現確率
匿名化の技術的俯瞰ー匿名加工情報の観点から
時系列パーソナル・データの プライバシー
プライバシー保護のためのサンプリング、k-匿名化、そして差分プライバシー
シンギュラリティ以後
シンギュラリティ以前
パーソナル履歴データに対する匿名化と再識別:SCIS2017
クラシックな機械学習の入門 2.ベイズ統計に基づく推論
クラシックな機械学習の入門 3. 線形回帰および識別
非制約最小二乗密度比推定法 uLSIF を用いた外れ値検出
クラシックな機械学習入門 1 導入
ようやく分かった!最尤推定とベイズ推定
Ad

Similar to Boundary Between Pseudonymity and Anonymity (16)

PDF
BlueHat v18 || Securely handling data in marginalized communities
PDF
Privacy & innovation digital enterprise
PDF
Lightning Talks: An Innovation Showcase
PDF
SFScon 22 - Paolo Pinto - Real Life Data Anonymization.pdf
PDF
Critical Essay On The Poisonwood Bible. Online assignment writing service.
PDF
Scarecrows Have . . . Scarecrows Are Writing
DOCX
Data storage on fingernails
PPTX
DPA-Training1
PDF
Basketball Essays
PDF
Wireless Sensor Networks for the Determination Wellness of Elderly People by ...
PPTX
Smart Data Module 5 d drive_legislation
PDF
Voice Biometrics: The Big Picture Gets Bigger
PDF
Daily Human Activity Recognition using Adaboost Classifiers on Wisdm Dataset
PPT
Palm vein technology
DOCX
[Biometric Systems][The world has been waiting]  [Th.docx
PDF
Smart Blind Stick using Arduino
BlueHat v18 || Securely handling data in marginalized communities
Privacy & innovation digital enterprise
Lightning Talks: An Innovation Showcase
SFScon 22 - Paolo Pinto - Real Life Data Anonymization.pdf
Critical Essay On The Poisonwood Bible. Online assignment writing service.
Scarecrows Have . . . Scarecrows Are Writing
Data storage on fingernails
DPA-Training1
Basketball Essays
Wireless Sensor Networks for the Determination Wellness of Elderly People by ...
Smart Data Module 5 d drive_legislation
Voice Biometrics: The Big Picture Gets Bigger
Daily Human Activity Recognition using Adaboost Classifiers on Wisdm Dataset
Palm vein technology
[Biometric Systems][The world has been waiting]  [Th.docx
Smart Blind Stick using Arduino
Ad

More from Hiroshi Nakagawa (20)

PDF
人工知能学会大会2020ーAI倫理とガバナンス
PDF
信頼できるAI評価リスト パーソナルAIエージェントへの適用例
PDF
NICT-nakagawa2019Feb12
PDF
情報ネットワーク法学会研究大会
PDF
最近のAI倫理指針からの考察
PDF
AI and Accountability
PDF
AI Forum-2019_Nakagawa
PDF
2019 3-9-nakagawa
PDF
CPDP2019 summary-report
PDF
情報法制研究所 第5回情報法セミナー:人工知能倫理と法制度、社会
PDF
Ai e-accountability
PDF
自動運転と道路沿い情報インフラ
PDF
暗号化によるデータマイニングと個人情報保護
PDF
Defamation Caused by Anonymization
PDF
人工知能と社会
PDF
人工知能学会合同研究会2017-汎用人工知能研究会(SIG-AGI)招待講演
PDF
情報ネットワーク法学会2017大会第8分科会発表資料
PPTX
学術会議 ITシンポジウム資料「プライバシー保護技術の概観と展望」
PPTX
AI社会論研究会
PDF
Social Effects by the Singularity -Pre-Singularity Era-
人工知能学会大会2020ーAI倫理とガバナンス
信頼できるAI評価リスト パーソナルAIエージェントへの適用例
NICT-nakagawa2019Feb12
情報ネットワーク法学会研究大会
最近のAI倫理指針からの考察
AI and Accountability
AI Forum-2019_Nakagawa
2019 3-9-nakagawa
CPDP2019 summary-report
情報法制研究所 第5回情報法セミナー:人工知能倫理と法制度、社会
Ai e-accountability
自動運転と道路沿い情報インフラ
暗号化によるデータマイニングと個人情報保護
Defamation Caused by Anonymization
人工知能と社会
人工知能学会合同研究会2017-汎用人工知能研究会(SIG-AGI)招待講演
情報ネットワーク法学会2017大会第8分科会発表資料
学術会議 ITシンポジウム資料「プライバシー保護技術の概観と展望」
AI社会論研究会
Social Effects by the Singularity -Pre-Singularity Era-

Recently uploaded (20)

PPT
3. INDUTRIAL RELATIONS INTRODUCTION AND CONCEPTS.ppt
PDF
Analysis Childrens act Kenya for the year 2022
PPTX
Law of Torts , unit I for BA.LLB integrated course
PPTX
RULE_4_Out_of_Court_or_Informal_Restructuring_Agreement_or_Rehabilitation.pptx
PPTX
Income under income Tax Act..pptx Introduction
PPTX
ART OF LEGAL WRITING IN THE CBD [Autosaved].pptx
PPTX
4-D...Preparation of Research Design.pptx
PDF
Notes to accompany the TMT and FRAND Overview Slides
PDF
Vinayaka Mission Law School Courses and Infrastructure.pdf
PPTX
prenuptial agreement ppt my by a phd scholar
PDF
Kayla Coates Wins no-insurance case Against the Illinois Workers’ Benefit Fund
PDF
OpenAi v. Open AI Summary Judgment Order
PPTX
POSH Awareness and policy ppt with all design covering .
PDF
SUMMARY CASES-42-47.pdf tax -1 257++/ hsknsnd
PDF
AHRP LB - Quick Look of the Newly-initiated Koperasi Merah Putih (KMP).pdf
PDF
The Advocate, Vol. 34 No. 1 Fall 2024
DOCX
FOE Reviewer 2022.docxhgvgvhghhghyjhghggg
PPT
Cyber-Crime-in- India at Present day and Laws
PPTX
Peter Maatouk Is Redefining What It Means To Be A Local Lawyer Who Truly List...
PPTX
Sexual Harassment Prevention training class
3. INDUTRIAL RELATIONS INTRODUCTION AND CONCEPTS.ppt
Analysis Childrens act Kenya for the year 2022
Law of Torts , unit I for BA.LLB integrated course
RULE_4_Out_of_Court_or_Informal_Restructuring_Agreement_or_Rehabilitation.pptx
Income under income Tax Act..pptx Introduction
ART OF LEGAL WRITING IN THE CBD [Autosaved].pptx
4-D...Preparation of Research Design.pptx
Notes to accompany the TMT and FRAND Overview Slides
Vinayaka Mission Law School Courses and Infrastructure.pdf
prenuptial agreement ppt my by a phd scholar
Kayla Coates Wins no-insurance case Against the Illinois Workers’ Benefit Fund
OpenAi v. Open AI Summary Judgment Order
POSH Awareness and policy ppt with all design covering .
SUMMARY CASES-42-47.pdf tax -1 257++/ hsknsnd
AHRP LB - Quick Look of the Newly-initiated Koperasi Merah Putih (KMP).pdf
The Advocate, Vol. 34 No. 1 Fall 2024
FOE Reviewer 2022.docxhgvgvhghhghyjhghggg
Cyber-Crime-in- India at Present day and Laws
Peter Maatouk Is Redefining What It Means To Be A Local Lawyer Who Truly List...
Sexual Harassment Prevention training class

Boundary Between Pseudonymity and Anonymity

  • 1. Boundary Between Pseudonymity and Anonymity -Case Study about Japanese Personal Information Protection Act - Hiroshi Nakagawa The Univeristy of Toktyo
  • 2. Updated Personal Information Protection Act in Japan – The EU General Data Protection Regulation is finally agreed in 2016 • Japan: Personal Information Protection Act (PIPA): Sep.2015 • De-identified Information is introduced. This should meet the following condition: – Processed to be unidentifiable to said person – Prohibited from restoring said personal information • Informally , the intention of this type of information means: – Anonymized enough not to de-anonymized easily – Freely used without the consent of data subject. – Pseudonymized is not regarded as De-identified Information • Boarder line between pseudonymity and anonymity which is intended by De-identified Information is a critical issue.
  • 4. Variations of Pseudonymization in terms of frequency of pseudonym update pseu weight A123 60.0 A123 65.5 A123 70.8 A123 68.5 A123 69.0 pseu weight A123 60.0 A123 65.5 B432 70.8 B432 68.5 C789 69.0 pseu weight A123 60.0 B234 65.5 C567 70.8 X321 68.5 Y654 69.0 weight 60.0 65.5 70.8 68.5 69.0 Same Info. • No pseudonym update • Highly identifiable • Needed in med., farm. Update pseud. Frequent update • pseudonym update • Divide k subsets with different pseudonyms • Freq. update lowers both identifiability and data value • Update pseudonym data by data • Regarded as distinct person’s data. No identifiability obscurity The same individual’s personal data
  • 5. • Pseudonymization without updating for accumulated time sequence personal data – Accumulation makes it be easily identified – Then reasonable to prohibit it to transfer the third party – PIPA sentence reads pseudonymized personal data without updating is not De-identified Information. – In practice • Unless non Quasi ID data such as a location log do not used as Quasi ID, we do not easily identify the person corresponding to pseudonymized personal data without updating. • Obscurity, in which every data of the same person has distinct pseudonyms, certainly is De-identified Information because there are no clue to aggregate the same person’s data. Pseudonymization with updating is not De-identified Information (of new Japanese PIPA)?
  • 6. Record Length pseu Loc. 1 Loc.2 Loc.3 … A123 Minato Sibuya Asabu … A123 Odaiba Toyosu Sinbasi … A123 … … …. …. A123 xy yz zw … A123 • No pseudonym update • High identifiability by long location sequence • Even if pseudonym is deleted, long location sequence makes it easy to identify the specific data subject. transform obscurity Loc. 1 Loc.2 Loc.3 … Minato Sibuya Asabu … Odaiba Toyosu Sinbas i … … … …. …. xy yz zw …
  • 7. Shuffling Loc. 1 Loc.2 Loc.3 … Minato yz zw … Odaiba Toyosu Asabu … … … …. …. xy Sibuya Sinbasi … • Obscurity shuffle Loc. 1 Loc.2 Loc.3 … Minato Sibuya Asabu … Odaiba Toyosu Sinbasi … … … …. …. xy yz zw … Almost no clue to identify same individual’s record. But data value is reduced.
  • 8. obscurity  DI No update update for ever data frequency of pseudonym update Pseudonymize w.o. update  Not DI De-identified Information Not De-identified Information Somewhere here is the boundary. The boundary between De-identified Information (DI) and no API
  • 9. Continuously observed personal data has high value in medicine • Frequent updating of pseudonym enhances anonymity, • But reduces data value – Especially in medicine. – Physicians do not require “no update of pseudonym.” – For instance, it seems to be enough to keep the same pseudonym for one illness as I heard from a researcher in medicine.
  • 10. Updating frequency vs Data value • see the figure below: Data value Update frequency No update low high update data by y data location log purchasing log medical log
  • 11. category Frequency of pseudonym updating Usage Medical No update Able to analyze an individual patient’s log ,especially history of chronic disease and lifestyle update Not able to pursue an individual patient’s history. Able to recognize short term epidemic Driving record No update If a data subject consents to use it with Personal ID, the automobile manufacture can get the current status of his/her own car, and give some advice such as parts being in need to repair. If no consent, nothing can be done.
  • 12. category Frequency of pseudonym updating Usage Driving record Low frequency Long range trend of traffic, which can be used to urban design, or road traffic regulation for day, i.e. Sunday. High frequency We can only get a traffic in short period. Purchasing record No update If a data subject consents to use it with Personal ID, then it can be used for targeted advertisement. If no consent, we can only use to extract sales statistics of ordinary goods. Low frequency We can mine the long range trend of individual’s purchasing behavior. High frequency We can mine the short range trend of individual’s purchasing behavior. Every data We only investigate sales statistics of specific goods
  • 13. Summary: What usage is possible by pseudonymization with/without updating • As stated so far, almost all psedonymized data do are statistical processing • No targeted advertisement, nor profiling of individual person – Learning the classifier of profile (only with consented data) and use it to targeted advertise to new users, is possible but gray. • Pseudonymized data are hard to trace if it is transferred to many organizations such as IT companies.