SlideShare a Scribd company logo
淺嚐淺嚐 LHCbLHCb 數據分析的滋味數據分析的滋味
Play around the LHCb Data on Kaggle withPlay around the LHCb Data on Kaggle with
SciKit-Learn and MatPlotLibSciKit-Learn and MatPlotLib
Yuan CHAO ( 趙元 )
(National Taiwan University, Taipei, Taiwan)
PyCon2017
2017/06/09-11
我是誰?
Yuan CHAO (John)
YChao
...
研究員
高能物理
使用 OSS 做研究 ...
全球 LHC 計算網格
Worldwide LHC
Computing Grid (WLCG)
如何分析處理數據?
https://cdsweb.cern.ch/record/1541893
https://www.youtube.com/watch?v=jDC3-QSiLB4
歐洲粒子物理研究機構
CERN
的地理位置
瑞士
日內瓦近郊
跨越瑞法邊境
LHC 周長 27 KM
位於地下
50~150 公尺
質子經逐級加速
接近光速高能對撞
四個對撞點進行實驗
通用型
Atlas, CMS
特定目的
Alice, LHCb
我參加的實驗http://cms.web.cern.ch/org/cms-public
http://zh.wikipedia.org/wiki/%E7%B7%
對撞生成的粒子
會穿過偵測器
留下軌跡或能量
的電子訊號
質子團每秒通過
四千萬次 (40MHz)
平均每次有 15 個對撞
真正有意義的對撞約
只有百萬分之一
高速硬體邏輯電路
先篩選出萬分之一事例
特殊極高速網路傳送至
「線上」叢集電腦
軟體粗篩出
百分之一事例
可隨時最佳化
各實驗篩選出
的資料
集中傳送至
零級資料中心
儲存
實驗期間
7 x 24 不間斷
淺嚐 LHCb 數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib
事例重建 磁帶長期保存
資料分散保存在 13 個一級資料中心
二級資料中心提供實驗學家模擬與分析數據
( 前 ) 亞洲唯一
一級資料中心
中研院網格中心
研究員
高能物理
使用 OSS 做研究 ...
Member of CMS Experiment
尋找希格斯粒子
Atlas Higgs ML
Challenge
https://www.kaggle.com/c/higgs-boson
$13,000 & 876 teams
淺嚐味物理
Search for
charged lepton
flavour violation
https://www.kaggle.com/c/flavours-of-physics
Search for new physics on lepton-flavour violation
$15,000 & 673 teams
淺嚐 LHCb 數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib
淺嚐 LHCb 數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib
τ 濤子?
μ 渺子??
味物理?
輕子味不守恆?
–- 請給我五分鐘
26
標準模型標準模型 Standard ModelStandard Model
~10-18
m宇宙的尺度 http://htwins.net/scale2/~10-1
m
膠子 光子 W/Z 子 重力子
強作用力 電磁力 弱作用力 重力
夸
克
輕
子
奈米 =10-9
m
27
大霹靂大霹靂 The Origin of the UniverseThe Origin of the Universe
28
四大問題四大問題 The QuestionsThe Questions
LHC was built for the following
purposes:
質量的來源
To find the origin of mass...
the Higgs boson.
暗物質與暗能量
Looking for the unification..
Super-symmetry as well as
other candidates of Dark Mater
& Dark energy
反物質的消失
Investigate the mystery of
anti-matter disappearance
宇宙初期狀態
Physics at the early stage of the
universe: Heavy Ion Collisions
and Quark-Gluon Plasma
Courtesy of Center for European Nuclear Research (CERN), Geneva,
Switzerland.
29
Symmetry & Flavor PhysicsSymmetry & Flavor Physics
People think the universe is symmetric? E = mc2
Parity violation introduced by T.D. Lee ( 李政道 ) and C.N.
Yang ( 楊振寧 ) in 1956.
–- 宇稱不守恆
Parity violation seen in a β decay by C.S. Wu ( 吳健雄 ) in
1957. Nobel prize for Lee & Yang.
CP violation discovered in Kaon system in 1964.
M. Kobayashi and T. Maskawa introduced CP violation in
the Standard Model in 1973.
–- 電荷・宇稱不守恆
Sanda and Carter pointed out the possibility of CP
violation in the B meson system in 1980.
Prof. Wu's experiment in
1956. Prof. Li and Yang
got Nobel Prize in 1957.
http://de.wikipedia.org/wiki/Wu-Experiment
31
Symmetry & Flavor PhysicsSymmetry & Flavor Physics
KTeV experiment at FNAL established the direct CP
violation in Kaon system and confirmed by NA48 at
CERN in 1999.
Belle and BaBar observed indirect CP violation B meson
system in 2002.
Belle observed the direct CP violation in B → ππ but not
confirmed by BaBar in 2004
Belle and BaBar present the evidence of direct CP
violation in B → Kππ in 2004.
M. Kobayashi ( 小林誠 ) and T. Maskawa ( 益川敏英 )
share the Nobel Prize in 2008
with Y. Nambu ( 南部陽一郎 ).
CP violation can't fully explain
the Baryon asymmetry problem.
→ People cont. searching for NP
Machine
Learning is
nothing new
in HEP
People in Tevatron, B-
factories, LEP and LHC
experiments more or less use
MVA in their studies!
(LL, LD → BDT, NN, .. → DL?)
Era of analog
~1980
↓
Digital
Processing
物理到此為止 ...
35
The Kaggle ChallengeThe Kaggle Challenge
τ→3μ breaks lepton flavour conservation
Basic Data operations
Input variables
Signal vs. Background
Correlations
K-S test, CvM test
ROC and AUC
Machine Learning Algor.
Event weight
Training and testing
AUC score calculation
Summary
https://www.kaggle.com/c/flavours-of-physics/data
Samle Events
training.csv mixed MC & data
τ→3μ
test.csv mixed MC & data
τ→3μ
check_agre
ement.csv
mixed MC & data
Ds→φ(μμ)π
check_corr
elation.csv
real background data
36
The GoalThe Goal
Look for the rare events of τ→3μ
Classifier not too dependent on MC and data
Classifier not too dependent on the τ mass
The score is counted using the weighted area under the
ROC curve (AUC)
37
The K-S TestThe K-S Test
The τ→3μ process is not yet observed
Signal is made with MC simulation
Background are from real data
The classifier should not pick up the difference
A control channel Ds→φ(μμ)π is used for the similarity
The Kolmogorov-Smirnov (KS) test used to evaluate
the difference; requiring KS < 0.09
F are the cumulative distribution functions for MC and real data
38
The CvM TestThe CvM Test
The provided background events are not τ-free
Classifier should not too much depend on τ-mass
The distribution of τ-mass could be used to extract signal #
The Cramer-von Mises (CvM) test is used to test the
correlation; requiring CvM-value < 0.002
F are the predictions cumulative distribution functions for all
data and data in some mass interval corresponding.
“Rules… Some of them can be bent, others are to
be broken” – Morpheus
https://artistotleonline.wordpress.com/category/climax/
40
LiveLive DEMO with Jupyter NBDEMO with Jupyter NB
Forked from Kaggle challenge package
https://github.com/yandexdataschool/flavours-of-physics-start
Now following my derived Jupyter notebook
https://github.com/yuanchao/flavours-of-physics-start/blob/master/my_baseline.ipynb
41
Related URLsRelated URLs
LHC computing grid (LCG) and CERN overview video:
https://cds.cern.ch/record/2020780
"Higgs ML" Kaggle Challenge
https://www.kaggle.com/c/higgs-boson
“Flavour of physics” Kaggle Challenge
https://www.kaggle.com/c/flavours-of-physics
宇宙的尺度 http://htwins.net/scale2/
Heavy Flavour Data Mining workshop
https://indico.cern.ch/event/433556/
Official jupyter NB:
https://github.com/yandexdataschool/flavours-of-physics
-start
My derived jupter NB:
https://github.com/yuanchao/flavours-of-physics-start
以上
Thank YOU!
謝謝
Remercie de Votre
Attention
43
Installing Jupyter & SciPyInstalling Jupyter & SciPy
Setup a virtual environment
(you need python installed before hands)
Using pip:
$ pip3 install virtualenv
You can also use easy_install or apt-get instead
Open a terminal
Type in the following commands:
$ virtualenv -p python3 .scienv
$ source .scienv/bin/activate ← activate the environment!
$ pip3 install --upgrade pip
$ pip3 install jupyter
$ pip3 install scipy pandas sklearn ← you get all packages
Then start the jupyter notebook server:
$ jupyter notebook ← a web page will be loaded automatically
Here we go!

More Related Content

PDF
玩轉 LHC 公開數據 (Play around with the LHC open data)
PPT
Teleportation
PPT
The Evolution of Elliptical Galaxies in Rich Clusters
PPTX
High Precision, Not High Energy: Using Atomic Physics to Look Beyond the Stan...
PDF
Prof Jonathan Sievers (UKZN) NITheP Associate Workshop talk
PPTX
High Precision, Not High Energy: Using Atomic Physics to Look Beyond the Stan...
DOCX
Opportunities for students
PDF
Dr Jonathan Shock (UCT)
玩轉 LHC 公開數據 (Play around with the LHC open data)
Teleportation
The Evolution of Elliptical Galaxies in Rich Clusters
High Precision, Not High Energy: Using Atomic Physics to Look Beyond the Stan...
Prof Jonathan Sievers (UKZN) NITheP Associate Workshop talk
High Precision, Not High Energy: Using Atomic Physics to Look Beyond the Stan...
Opportunities for students
Dr Jonathan Shock (UCT)

What's hot (14)

PDF
Extreme light research group overview
PPTX
Teleportation
PDF
Visit at CERN
PPTX
Quantum teleportation
PDF
Superconducting qubits for quantum information an outlook
PDF
September 2014 NITheP Associate meeting Dr Chiang presentation
PDF
1047 bae[1]
PPTX
Aimuratov ecl 17
PPTX
Discovering Your Inner Scientist
PPS
F5 c5 radioactivity
PPTX
Eps edison volta
PPT
Quantum computing
PDF
21cm forest probes axion dark matter
Extreme light research group overview
Teleportation
Visit at CERN
Quantum teleportation
Superconducting qubits for quantum information an outlook
September 2014 NITheP Associate meeting Dr Chiang presentation
1047 bae[1]
Aimuratov ecl 17
Discovering Your Inner Scientist
F5 c5 radioactivity
Eps edison volta
Quantum computing
21cm forest probes axion dark matter
Ad

More from Yuan CHAO (14)

PDF
基礎科學研究連結跨領域的未來 The Link between Fundamental Science Research and the Future o...
PDF
介紹 TrackML 挑戰 (Introduction to TrackML Kaggle challenge)
PDF
蒙地卡羅模擬與志願運算
PDF
粒子物理與天文物理學簡介
PDF
Project Linne 徵音梅林 -- Virtual Singer Sound-bank Processed with Python
PDF
「大數據」時代的「小問題」-- 以數據分析的手法處理虛擬歌手聲源參數
PDF
用 Python 玩 LHC 公開數據
PDF
Mass Resconstruction with HEP detectors
PDF
Detector Simulation for HEP
PDF
巨量資料分析輕鬆上手_教您玩大強子對撞機公開數據
PDF
大強子計算網格與OSS
PDF
No more fractures MingLiu font!
PDF
Introduce powertop to end users
PDF
Introduction to Soas
基礎科學研究連結跨領域的未來 The Link between Fundamental Science Research and the Future o...
介紹 TrackML 挑戰 (Introduction to TrackML Kaggle challenge)
蒙地卡羅模擬與志願運算
粒子物理與天文物理學簡介
Project Linne 徵音梅林 -- Virtual Singer Sound-bank Processed with Python
「大數據」時代的「小問題」-- 以數據分析的手法處理虛擬歌手聲源參數
用 Python 玩 LHC 公開數據
Mass Resconstruction with HEP detectors
Detector Simulation for HEP
巨量資料分析輕鬆上手_教您玩大強子對撞機公開數據
大強子計算網格與OSS
No more fractures MingLiu font!
Introduce powertop to end users
Introduction to Soas
Ad

Recently uploaded (20)

PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
RMMM.pdf make it easy to upload and study
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
Computing-Curriculum for Schools in Ghana
PDF
Complications of Minimal Access Surgery at WLH
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PPTX
master seminar digital applications in india
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Cell Structure & Organelles in detailed.
PPTX
Cell Types and Its function , kingdom of life
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
O5-L3 Freight Transport Ops (International) V1.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Microbial disease of the cardiovascular and lymphatic systems
Final Presentation General Medicine 03-08-2024.pptx
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
RMMM.pdf make it easy to upload and study
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Computing-Curriculum for Schools in Ghana
Complications of Minimal Access Surgery at WLH
Anesthesia in Laparoscopic Surgery in India
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Orientation - ARALprogram of Deped to the Parents.pptx
master seminar digital applications in india
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Module 4: Burden of Disease Tutorial Slides S2 2025
Cell Structure & Organelles in detailed.
Cell Types and Its function , kingdom of life
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape

淺嚐 LHCb 數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib