SlideShare a Scribd company logo
Learning	From	a	Few	Large-Scale	Partial	Examples:		
Computational	Tools,	Regularization,	and	Network	Design
Bas	Petersb	
Eldad	Habera,b	
JusFn	Granekb	
Keegan	Lensinka
a) UBC	
b) Computational	Geosciences	Inc.
The Scientific Computing, Applied and Industrial Mathematics
(SCAIM) Seminar, UBC, October 2019
This	talk
Learning	from	a	single	or	few	examples	
• loss	functions	
• mitigating	lack	of	data	with	regularization	
Networks	and	optimization	for	
• large	scare	inputs-outputs	
Applications	
• Video	segmentation	
• seismic	interpretation,	aquifer	mapping,	mineral	prospectively
Seismic	imaging
Time domain LS-RTM
Model%separation
Data%separation
0 1000 2000 3000 4000
Lateral [m]
0
500
1000
1500
2000
Depth[m]
-1
-0.5
0
0.5
1
km/s
0 1000 2000 3000 4000
Lateral [m]
0
500
1000
1500
2000
Depth[m]
1.5
2
2.5
3
3.5
4
4.5
5
km/s
0 1000 2000 3000 4000
Lateral [m]
0
500
1000
1500
2000
Depth[m]
1.5
2
2.5
3
3.5
4
4.5
5
km/s
= +
True model Smooth background model Model perturbation
mt ms m
dD
ime[m]
0
0.5
1
1.5
2 0
50
100
Ds
ime[m]
0
0.5
1
1.5
2 0
50
100
Dt
ime[m]
0
0.5
1
1.5
2 0
50
100
= +
Seismic	imaging
Real	seismic	images	are	not	like	a	‘photo’	of	the	earth
Challenge	-	sparse	labels
Very	few	ground	truth	measurements	(labels)	
• available	at	the	surface	and	in	boreholes	
• no	true	images	of	the	subsurface	
• maybe	up	to	20	data-label	image	pairs	for	training
Only	a	few	slices	of	the	mask	are	provided.
Example	-	single	video	segmentation
Opportunities
Geoscience	is	quite	different	from	‘standard’	learning	tasks.		
(image	classification,	segmentation)
Train
Opportunities
Geoscience	is	quite	different	from	‘standard’	learning	tasks.	
(image	classification,	segmentation)	
• Validation	data	is	not	available	at	training	time.	
• Apply	trained	network	to	data	never	seen	before:
Predict
Example	application	-	semantic	segmentation	
Input data Desired output
Opportunities
all data recorded at training
time for some applications
only the labels are unknown
Goals	
Design	
• networks	
• loss-functions	
• network-regularization	
to	avoid	the	need	for	
• large	data	volumes	&	storage/access	issues	
• many	labeled	pixels	or	fully	annotated	images	
• GPU	numbers/training	time
Historically	dealt	with	by	
• patch	->	classifying	central	pixel
Sparse	labels
cannot	learn	from	large-scale	structure
Historically	dealt	with	by	
• manually	completing	the	label	image	at	high	cost	&	ambiguous	quality
Sparse	labels
this is exactly what we
want the machine to do
Partial	Loss-Functions
Example:						for	non-linear	regression	type	problems:
l(f(y, ✓), c) =
NX
i=1
|f(y, ✓)i ci| .
<latexit sha1_base64="xlLzo4reQHU9IrqrTS8+bm7njP8=">AAADBHicbZHLbtQwFIY94VbCpVO6ZGOoKk2lMkpgQTeVKrGBTTWjdjpTjYfIdpwZq04c2U6rKM2WPS/AE7DjsuQ9YAsPgjOTRLTlSJZ/fef8OvY5JBVcG8/72XFu3b5z997afffBw0eP17sbT060zBRlIyqFVBOCNRM8YSPDjWCTVDEcE8HG5OxNlR+fM6W5TI5NnrJZjOcJjzjFxqKgeyh6UQ+RfBcdmwUzeGcXEboD9yHSWRwUfN8v3x9CJFhkLuHVyoDDF9BW2xspPl+Yy74bdLe8vrcMeFP4tdg6GKx/fLa9+X0QbHQ+oVDSLGaJoQJrPfW91MwKrAyngpUuyjRLMT3Dcza1MsEx07Ni+fESblsSwkgqexIDl/RfR4FjrfOY2MoYm4W+nqvg/3LTzER7s4InaWZYQleNokxAI2E1RRhyxagRuRWYKm7fCukCK0yNnbWLQhbVUyoQkSKs3iAFWpKyTo8LVPUlERzXiFy06KJBeYvyxnjaotMGHbXoqDHqFukG0RbRxjhp0aRBwxYNS9etNupf399NcfKy77/qe0O72ndgFWvgKXgOesAHr8EBeAsGYAQo+Ap+gd/gj/PB+ex8cb6tSp1O7dkEV8L58RchFfXb</latexit>
Neural network
`1<latexit sha1_base64="qaOsBLyU0uGok6bykghRg/MN0o0=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0V9Bj04jGCeUCyhNlJbzJmdmaZmRVCyD948aCIV//Hm3/jJNmDJhY0FFXddHdFqeDG+v63t7K6tr6xWdgqbu/s7u2XDg4bRmWaYZ0poXQrogYFl1i33ApspRppEglsRsPbqd98Qm24kg92lGKY0L7kMWfUOqnRQSG6QbdU9iv+DGSZBDkpQ45at/TV6SmWJSgtE9SYduCnNhxTbTkTOCl2MoMpZUPax7ajkiZowvHs2gk5dUqPxEq7kpbM1N8TY5oYM0oi15lQOzCL3lT8z2tnNr4Ox1ymmUXJ5oviTBCryPR10uMamRUjRyjT3N1K2IBqyqwLqOhCCBZfXiaN80pwUfHvL8vVmzyOAhzDCZxBAFdQhTuoQR0YPMIzvMKbp7wX7937mLeuePnMEfyB9/kDNcaO4Q==</latexit>
Partial	Loss-Functions
Example:						for	non-linear	regression	type	problems:
l(f(y, ✓), c) =
NX
i=1
|f(y, ✓)i ci| .
<latexit sha1_base64="xlLzo4reQHU9IrqrTS8+bm7njP8=">AAADBHicbZHLbtQwFIY94VbCpVO6ZGOoKk2lMkpgQTeVKrGBTTWjdjpTjYfIdpwZq04c2U6rKM2WPS/AE7DjsuQ9YAsPgjOTRLTlSJZ/fef8OvY5JBVcG8/72XFu3b5z997afffBw0eP17sbT060zBRlIyqFVBOCNRM8YSPDjWCTVDEcE8HG5OxNlR+fM6W5TI5NnrJZjOcJjzjFxqKgeyh6UQ+RfBcdmwUzeGcXEboD9yHSWRwUfN8v3x9CJFhkLuHVyoDDF9BW2xspPl+Yy74bdLe8vrcMeFP4tdg6GKx/fLa9+X0QbHQ+oVDSLGaJoQJrPfW91MwKrAyngpUuyjRLMT3Dcza1MsEx07Ni+fESblsSwkgqexIDl/RfR4FjrfOY2MoYm4W+nqvg/3LTzER7s4InaWZYQleNokxAI2E1RRhyxagRuRWYKm7fCukCK0yNnbWLQhbVUyoQkSKs3iAFWpKyTo8LVPUlERzXiFy06KJBeYvyxnjaotMGHbXoqDHqFukG0RbRxjhp0aRBwxYNS9etNupf399NcfKy77/qe0O72ndgFWvgKXgOesAHr8EBeAsGYAQo+Ap+gd/gj/PB+ex8cb6tSp1O7dkEV8L58RchFfXb</latexit>
`1<latexit sha1_base64="qaOsBLyU0uGok6bykghRg/MN0o0=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0V9Bj04jGCeUCyhNlJbzJmdmaZmRVCyD948aCIV//Hm3/jJNmDJhY0FFXddHdFqeDG+v63t7K6tr6xWdgqbu/s7u2XDg4bRmWaYZ0poXQrogYFl1i33ApspRppEglsRsPbqd98Qm24kg92lGKY0L7kMWfUOqnRQSG6QbdU9iv+DGSZBDkpQ45at/TV6SmWJSgtE9SYduCnNhxTbTkTOCl2MoMpZUPax7ajkiZowvHs2gk5dUqPxEq7kpbM1N8TY5oYM0oi15lQOzCL3lT8z2tnNr4Ox1ymmUXJ5oviTBCryPR10uMamRUjRyjT3N1K2IBqyqwLqOhCCBZfXiaN80pwUfHvL8vVmzyOAhzDCZxBAFdQhTuoQR0YPMIzvMKbp7wX7937mLeuePnMEfyB9/kDNcaO4Q==</latexit>
Network parameters: convolutional kernels
Partial	Loss-Functions
Example:						for	non-linear	regression	type	problems:
l(f(y, ✓), c) =
NX
i=1
|f(y, ✓)i ci| .
<latexit sha1_base64="xlLzo4reQHU9IrqrTS8+bm7njP8=">AAADBHicbZHLbtQwFIY94VbCpVO6ZGOoKk2lMkpgQTeVKrGBTTWjdjpTjYfIdpwZq04c2U6rKM2WPS/AE7DjsuQ9YAsPgjOTRLTlSJZ/fef8OvY5JBVcG8/72XFu3b5z997afffBw0eP17sbT060zBRlIyqFVBOCNRM8YSPDjWCTVDEcE8HG5OxNlR+fM6W5TI5NnrJZjOcJjzjFxqKgeyh6UQ+RfBcdmwUzeGcXEboD9yHSWRwUfN8v3x9CJFhkLuHVyoDDF9BW2xspPl+Yy74bdLe8vrcMeFP4tdg6GKx/fLa9+X0QbHQ+oVDSLGaJoQJrPfW91MwKrAyngpUuyjRLMT3Dcza1MsEx07Ni+fESblsSwkgqexIDl/RfR4FjrfOY2MoYm4W+nqvg/3LTzER7s4InaWZYQleNokxAI2E1RRhyxagRuRWYKm7fCukCK0yNnbWLQhbVUyoQkSKs3iAFWpKyTo8LVPUlERzXiFy06KJBeYvyxnjaotMGHbXoqDHqFukG0RbRxjhp0aRBwxYNS9etNupf399NcfKy77/qe0O72ndgFWvgKXgOesAHr8EBeAsGYAQo+Ap+gd/gj/PB+ex8cb6tSp1O7dkEV8L58RchFfXb</latexit>
`1<latexit sha1_base64="qaOsBLyU0uGok6bykghRg/MN0o0=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0V9Bj04jGCeUCyhNlJbzJmdmaZmRVCyD948aCIV//Hm3/jJNmDJhY0FFXddHdFqeDG+v63t7K6tr6xWdgqbu/s7u2XDg4bRmWaYZ0poXQrogYFl1i33ApspRppEglsRsPbqd98Qm24kg92lGKY0L7kMWfUOqnRQSG6QbdU9iv+DGSZBDkpQ45at/TV6SmWJSgtE9SYduCnNhxTbTkTOCl2MoMpZUPax7ajkiZowvHs2gk5dUqPxEq7kpbM1N8TY5oYM0oi15lQOzCL3lT8z2tnNr4Ox1ymmUXJ5oviTBCryPR10uMamRUjRyjT3N1K2IBqyqwLqOhCCBZfXiaN80pwUfHvL8vVmzyOAhzDCZxBAFdQhTuoQR0YPMIzvMKbp7wX7937mLeuePnMEfyB9/kDNcaO4Q==</latexit>
Vectorized input image
Partial	Loss-Functions
Example:						for	non-linear	regression	type	problems:
l(f(y, ✓), c) =
NX
i=1
|f(y, ✓)i ci| .
<latexit sha1_base64="xlLzo4reQHU9IrqrTS8+bm7njP8=">AAADBHicbZHLbtQwFIY94VbCpVO6ZGOoKk2lMkpgQTeVKrGBTTWjdjpTjYfIdpwZq04c2U6rKM2WPS/AE7DjsuQ9YAsPgjOTRLTlSJZ/fef8OvY5JBVcG8/72XFu3b5z997afffBw0eP17sbT060zBRlIyqFVBOCNRM8YSPDjWCTVDEcE8HG5OxNlR+fM6W5TI5NnrJZjOcJjzjFxqKgeyh6UQ+RfBcdmwUzeGcXEboD9yHSWRwUfN8v3x9CJFhkLuHVyoDDF9BW2xspPl+Yy74bdLe8vrcMeFP4tdg6GKx/fLa9+X0QbHQ+oVDSLGaJoQJrPfW91MwKrAyngpUuyjRLMT3Dcza1MsEx07Ni+fESblsSwkgqexIDl/RfR4FjrfOY2MoYm4W+nqvg/3LTzER7s4InaWZYQleNokxAI2E1RRhyxagRuRWYKm7fCukCK0yNnbWLQhbVUyoQkSKs3iAFWpKyTo8LVPUlERzXiFy06KJBeYvyxnjaotMGHbXoqDHqFukG0RbRxjhp0aRBwxYNS9etNupf399NcfKy77/qe0O72ndgFWvgKXgOesAHr8EBeAsGYAQo+Ap+gd/gj/PB+ex8cb6tSp1O7dkEV8L58RchFfXb</latexit>
`1<latexit sha1_base64="qaOsBLyU0uGok6bykghRg/MN0o0=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0V9Bj04jGCeUCyhNlJbzJmdmaZmRVCyD948aCIV//Hm3/jJNmDJhY0FFXddHdFqeDG+v63t7K6tr6xWdgqbu/s7u2XDg4bRmWaYZ0poXQrogYFl1i33ApspRppEglsRsPbqd98Qm24kg92lGKY0L7kMWfUOqnRQSG6QbdU9iv+DGSZBDkpQ45at/TV6SmWJSgtE9SYduCnNhxTbTkTOCl2MoMpZUPax7ajkiZowvHs2gk5dUqPxEq7kpbM1N8TY5oYM0oi15lQOzCL3lT8z2tnNr4Ox1ymmUXJ5oviTBCryPR10uMamRUjRyjT3N1K2IBqyqwLqOhCCBZfXiaN80pwUfHvL8vVmzyOAhzDCZxBAFdQhTuoQR0YPMIzvMKbp7wX7937mLeuePnMEfyB9/kDNcaO4Q==</latexit>
Vectorized Label Image
Partial	Loss-Functions
Example:						for	non-linear	regression	type	problems:	
Want	to	use	sparse	labels	directly	->	partial	loss-function:	
Related,	but	different	from	point-annotations
l(f(y, ✓), c) =
NX
i=1
|f(y, ✓)i ci| .
<latexit sha1_base64="xlLzo4reQHU9IrqrTS8+bm7njP8=">AAADBHicbZHLbtQwFIY94VbCpVO6ZGOoKk2lMkpgQTeVKrGBTTWjdjpTjYfIdpwZq04c2U6rKM2WPS/AE7DjsuQ9YAsPgjOTRLTlSJZ/fef8OvY5JBVcG8/72XFu3b5z997afffBw0eP17sbT060zBRlIyqFVBOCNRM8YSPDjWCTVDEcE8HG5OxNlR+fM6W5TI5NnrJZjOcJjzjFxqKgeyh6UQ+RfBcdmwUzeGcXEboD9yHSWRwUfN8v3x9CJFhkLuHVyoDDF9BW2xspPl+Yy74bdLe8vrcMeFP4tdg6GKx/fLa9+X0QbHQ+oVDSLGaJoQJrPfW91MwKrAyngpUuyjRLMT3Dcza1MsEx07Ni+fESblsSwkgqexIDl/RfR4FjrfOY2MoYm4W+nqvg/3LTzER7s4InaWZYQleNokxAI2E1RRhyxagRuRWYKm7fCukCK0yNnbWLQhbVUyoQkSKs3iAFWpKyTo8LVPUlERzXiFy06KJBeYvyxnjaotMGHbXoqDHqFukG0RbRxjhp0aRBwxYNS9etNupf399NcfKy77/qe0O72ndgFWvgKXgOesAHr8EBeAsGYAQo+Ap+gd/gj/PB+ex8cb6tSp1O7dkEV8L58RchFfXb</latexit>
`1<latexit sha1_base64="qaOsBLyU0uGok6bykghRg/MN0o0=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0V9Bj04jGCeUCyhNlJbzJmdmaZmRVCyD948aCIV//Hm3/jJNmDJhY0FFXddHdFqeDG+v63t7K6tr6xWdgqbu/s7u2XDg4bRmWaYZ0poXQrogYFl1i33ApspRppEglsRsPbqd98Qm24kg92lGKY0L7kMWfUOqnRQSG6QbdU9iv+DGSZBDkpQ45at/TV6SmWJSgtE9SYduCnNhxTbTkTOCl2MoMpZUPax7ajkiZowvHs2gk5dUqPxEq7kpbM1N8TY5oYM0oi15lQOzCL3lT8z2tnNr4Ox1ymmUXJ5oviTBCryPR10uMamRUjRyjT3N1K2IBqyqwLqOhCCBZfXiaN80pwUfHvL8vVmzyOAhzDCZxBAFdQhTuoQR0YPMIzvMKbp7wX7937mLeuePnMEfyB9/kDNcaO4Q==</latexit>
l⌦(f(y, ✓), c⌦) =
X
i2⌦
|f(y, ✓)i ci| .
<latexit sha1_base64="SB5F5Jd6/J3GaL63CS18bq2VEqk=">AAADGXicbZFPa9RAGMZn478a/3SrRy+Di7CFuiRV0ItQ8KInu7Tb3bKzhJnJZHfoZBJm3lhCmk8i+F28qVdPfgqvejPZTYJtfWHg4fe8D+/wvixV0oLn/ew5N27eun1n66577/6Dh9v9nUcnNskMFxOeqMTMGLVCSS0mIEGJWWoEjZkSU3b2tvanH4WxMtHHkKdiEdOllpHkFCoU9LkKyIdYLOkwGhKW75FjWAmgu3uE8cbZxW8wsVkcFBITqfGGlpgoEcEFvpwLJH6O62zVa+RyBRcjN+gPvJG3Lnxd+I0YoKYOg53eZxImPIuFBq6otXPfS2FRUAOSK1G6JLMipfyMLsW8kprGwi6K9TZK/KwiIY4SUz0NeE3/TRQ0tjaPWdUZU1jZq14N/+fNM4heLwqp0wyE5ptBUaYwJLheLQ6lERxUXgnKjaz+ivmKGsqhOoBLQhE1WyoIS1RY/yFRZE3Kxp4WpJ7LIjxtEDvv0HmL8g7lbfC0Q6ctOurQURu0HbIt4h3ibXDWoVmLxh0al65bX9S/er/r4mR/5L8Y7Y9fDg7eN7fdQk/QUzREPnqFDtA7dIgmiKNv6Bf6jf44n5wvzlfn+6bV6TWZx+hSOT/+AmzM/Fo=</latexit>
[Bearman et al., 2016]
Partial	Loss-Functions
Example:						for	non-linear	regression	type	problems:	
1)	compute	full	forward-pass	using	full	data	
2)	compute	misfit	&	grad	from	subsampled	
l(f(y, ✓), c) =
NX
i=1
|f(y, ✓)i ci| .
<latexit sha1_base64="xlLzo4reQHU9IrqrTS8+bm7njP8=">AAADBHicbZHLbtQwFIY94VbCpVO6ZGOoKk2lMkpgQTeVKrGBTTWjdjpTjYfIdpwZq04c2U6rKM2WPS/AE7DjsuQ9YAsPgjOTRLTlSJZ/fef8OvY5JBVcG8/72XFu3b5z997afffBw0eP17sbT060zBRlIyqFVBOCNRM8YSPDjWCTVDEcE8HG5OxNlR+fM6W5TI5NnrJZjOcJjzjFxqKgeyh6UQ+RfBcdmwUzeGcXEboD9yHSWRwUfN8v3x9CJFhkLuHVyoDDF9BW2xspPl+Yy74bdLe8vrcMeFP4tdg6GKx/fLa9+X0QbHQ+oVDSLGaJoQJrPfW91MwKrAyngpUuyjRLMT3Dcza1MsEx07Ni+fESblsSwkgqexIDl/RfR4FjrfOY2MoYm4W+nqvg/3LTzER7s4InaWZYQleNokxAI2E1RRhyxagRuRWYKm7fCukCK0yNnbWLQhbVUyoQkSKs3iAFWpKyTo8LVPUlERzXiFy06KJBeYvyxnjaotMGHbXoqDHqFukG0RbRxjhp0aRBwxYNS9etNupf399NcfKy77/qe0O72ndgFWvgKXgOesAHr8EBeAsGYAQo+Ap+gd/gj/PB+ex8cb6tSp1O7dkEV8L58RchFfXb</latexit>
`1<latexit sha1_base64="qaOsBLyU0uGok6bykghRg/MN0o0=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0V9Bj04jGCeUCyhNlJbzJmdmaZmRVCyD948aCIV//Hm3/jJNmDJhY0FFXddHdFqeDG+v63t7K6tr6xWdgqbu/s7u2XDg4bRmWaYZ0poXQrogYFl1i33ApspRppEglsRsPbqd98Qm24kg92lGKY0L7kMWfUOqnRQSG6QbdU9iv+DGSZBDkpQ45at/TV6SmWJSgtE9SYduCnNhxTbTkTOCl2MoMpZUPax7ajkiZowvHs2gk5dUqPxEq7kpbM1N8TY5oYM0oi15lQOzCL3lT8z2tnNr4Ox1ymmUXJ5oviTBCryPR10uMamRUjRyjT3N1K2IBqyqwLqOhCCBZfXiaN80pwUfHvL8vVmzyOAhzDCZxBAFdQhTuoQR0YPMIzvMKbp7wX7937mLeuePnMEfyB9/kDNcaO4Q==</latexit>
f(y, ✓)<latexit sha1_base64="kJHO6G9+4Z69IfUiML9FEDGsEJ0=">AAACxXicbZFNa9tAEIbX6kdS9SNOe+xliSmktBgpPTRH0x7aY0zi2MFrzO5qFC9ZacXuKEEI0z/RW6Gn9kfl31SyJdEmHVh4ed59mWFGZFo5DILbnvfg4aPHO7tP/KfPnr/Y6++/PHcmtxIm0mhjZ4I70CqFCSrUMMss8ERomIqrz7U/vQbrlEnPsMhgkfDLVMVKcqzQsr8XHzJRvGdnuALkb/1lfxAMg03R+yJsxGB0wN59vx0VJ8v93k8WGZknkKLU3Ll5GGS4KLlFJTWsfZY7yLi84pcwr2TKE3CLcjP5mr6pSERjY6uXIt3QvxMlT5wrElH9TDiu3F2vhv/z5jnGx4tSpVmOkMptozjXFA2t10AjZUGiLirBpVXVrFSuuOUSq2X5LIK4WUnJhNFRPYPRbEPWjT0tWd1XxHTaIHHToZsWFR0q2uBFhy5adNqh0zboOuRaJDsk2+CsQ7MWjTs0Xvt+fdHw7v3ui/OjYfhhGIzDwegT2dYueU0OyCEJyUcyIl/JCZkQSXLyg/wiv70vXuKhd7396vWazCvyT3nf/gDWud8Y</latexit>
f(y, ✓)i, i 2 ⌦<latexit sha1_base64="ZJZndQ4xSHJI1Atq/jvaODmI0Mk=">AAAC2nicbZFNb9NAEIY35qPFfDSFI5dVA1IRVWTDoainCC7caNSmSZWNot31OFl1vWvtrqksKxduwLVnfgNX+Cf9N9iJbUHLSCu9emZezewMS6WwLgiuO96du/fub20/8B8+evxkp7v79MzqzHAYcS21mTBqQQoFIyechElqgCZMwphdfKjy489grNDq1OUpzBK6UCIWnLoSzbsv4n3C8gNy6pbg6Ku5OMDkiBxhgYlQmHxKYEH9ebcX9IN14NsirEVvsEdeX10P8uP5bucHiTTPElCOS2rtNAxSNyuocYJLWPkks5BSfkEXMC2lognYWbH+zgq/LEmEY23Kpxxe078dBU2szRNWVibULe3NXAX/l5tmLn43K4RKMweKbxrFmcRO42o3OBIGuJN5KSg3opwV8yU1lLtygz6JIK73VBCmZVTNoCVZk1WdHhek6stiPK4Ru2zRZYPyFuWN8bxF5w06adFJY7Qtsg3iLeKNcdKiSYOGLRqufL+6aHjzfrfF2Zt++LYfDMPe4D3axDZ6jvbQPgrRIRqgj+gYjRBH39BP9Av99oj3xfvqfd+Uep3a8wz9E97VH7Ag5ms=</latexit>
Partial	Loss-Functions
Example:						for	non-linear	regression	type	problems:	
1)	compute	full	forward-pass	using	full	data	
2)	compute	misfit	&	grad	from	subsampled
l(f(y, ✓), c) =
NX
i=1
|f(y, ✓)i ci| .
<latexit sha1_base64="xlLzo4reQHU9IrqrTS8+bm7njP8=">AAADBHicbZHLbtQwFIY94VbCpVO6ZGOoKk2lMkpgQTeVKrGBTTWjdjpTjYfIdpwZq04c2U6rKM2WPS/AE7DjsuQ9YAsPgjOTRLTlSJZ/fef8OvY5JBVcG8/72XFu3b5z997afffBw0eP17sbT060zBRlIyqFVBOCNRM8YSPDjWCTVDEcE8HG5OxNlR+fM6W5TI5NnrJZjOcJjzjFxqKgeyh6UQ+RfBcdmwUzeGcXEboD9yHSWRwUfN8v3x9CJFhkLuHVyoDDF9BW2xspPl+Yy74bdLe8vrcMeFP4tdg6GKx/fLa9+X0QbHQ+oVDSLGaJoQJrPfW91MwKrAyngpUuyjRLMT3Dcza1MsEx07Ni+fESblsSwkgqexIDl/RfR4FjrfOY2MoYm4W+nqvg/3LTzER7s4InaWZYQleNokxAI2E1RRhyxagRuRWYKm7fCukCK0yNnbWLQhbVUyoQkSKs3iAFWpKyTo8LVPUlERzXiFy06KJBeYvyxnjaotMGHbXoqDHqFukG0RbRxjhp0aRBwxYNS9etNupf399NcfKy77/qe0O72ndgFWvgKXgOesAHr8EBeAsGYAQo+Ap+gd/gj/PB+ex8cb6tSp1O7dkEV8L58RchFfXb</latexit>
`1<latexit sha1_base64="qaOsBLyU0uGok6bykghRg/MN0o0=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0V9Bj04jGCeUCyhNlJbzJmdmaZmRVCyD948aCIV//Hm3/jJNmDJhY0FFXddHdFqeDG+v63t7K6tr6xWdgqbu/s7u2XDg4bRmWaYZ0poXQrogYFl1i33ApspRppEglsRsPbqd98Qm24kg92lGKY0L7kMWfUOqnRQSG6QbdU9iv+DGSZBDkpQ45at/TV6SmWJSgtE9SYduCnNhxTbTkTOCl2MoMpZUPax7ajkiZowvHs2gk5dUqPxEq7kpbM1N8TY5oYM0oi15lQOzCL3lT8z2tnNr4Ox1ymmUXJ5oviTBCryPR10uMamRUjRyjT3N1K2IBqyqwLqOhCCBZfXiaN80pwUfHvL8vVmzyOAhzDCZxBAFdQhTuoQR0YPMIzvMKbp7wX7937mLeuePnMEfyB9/kDNcaO4Q==</latexit>
f(y, ✓)<latexit sha1_base64="kJHO6G9+4Z69IfUiML9FEDGsEJ0=">AAACxXicbZFNa9tAEIbX6kdS9SNOe+xliSmktBgpPTRH0x7aY0zi2MFrzO5qFC9ZacXuKEEI0z/RW6Gn9kfl31SyJdEmHVh4ed59mWFGZFo5DILbnvfg4aPHO7tP/KfPnr/Y6++/PHcmtxIm0mhjZ4I70CqFCSrUMMss8ERomIqrz7U/vQbrlEnPsMhgkfDLVMVKcqzQsr8XHzJRvGdnuALkb/1lfxAMg03R+yJsxGB0wN59vx0VJ8v93k8WGZknkKLU3Ll5GGS4KLlFJTWsfZY7yLi84pcwr2TKE3CLcjP5mr6pSERjY6uXIt3QvxMlT5wrElH9TDiu3F2vhv/z5jnGx4tSpVmOkMptozjXFA2t10AjZUGiLirBpVXVrFSuuOUSq2X5LIK4WUnJhNFRPYPRbEPWjT0tWd1XxHTaIHHToZsWFR0q2uBFhy5adNqh0zboOuRaJDsk2+CsQ7MWjTs0Xvt+fdHw7v3ui/OjYfhhGIzDwegT2dYueU0OyCEJyUcyIl/JCZkQSXLyg/wiv70vXuKhd7396vWazCvyT3nf/gDWud8Y</latexit>
f(y, ✓)i, i 2 ⌦<latexit sha1_base64="ZJZndQ4xSHJI1Atq/jvaODmI0Mk=">AAAC2nicbZFNb9NAEIY35qPFfDSFI5dVA1IRVWTDoainCC7caNSmSZWNot31OFl1vWvtrqksKxduwLVnfgNX+Cf9N9iJbUHLSCu9emZezewMS6WwLgiuO96du/fub20/8B8+evxkp7v79MzqzHAYcS21mTBqQQoFIyechElqgCZMwphdfKjy489grNDq1OUpzBK6UCIWnLoSzbsv4n3C8gNy6pbg6Ku5OMDkiBxhgYlQmHxKYEH9ebcX9IN14NsirEVvsEdeX10P8uP5bucHiTTPElCOS2rtNAxSNyuocYJLWPkks5BSfkEXMC2lognYWbH+zgq/LEmEY23Kpxxe078dBU2szRNWVibULe3NXAX/l5tmLn43K4RKMweKbxrFmcRO42o3OBIGuJN5KSg3opwV8yU1lLtygz6JIK73VBCmZVTNoCVZk1WdHhek6stiPK4Ru2zRZYPyFuWN8bxF5w06adFJY7Qtsg3iLeKNcdKiSYOGLRqufL+6aHjzfrfF2Zt++LYfDMPe4D3axDZ6jvbQPgrRIRqgj+gYjRBH39BP9Av99oj3xfvqfd+Uep3a8wz9E97VH7Ag5ms=</latexit>
Can	train	(randomized)	a	network	on	single	image	using	SGD	and	partial	loss
Partial	Loss-Functions
1.	compute	full	forward-pass	using	full	data	
2.	compute	misfit	&	grad	from	subsampled	
Same	procedure	as	in	seismic	full-waveform	inversion:	
1. Forward	propagate	wavefield	from	source	
2. Sample	wavefield	at	receivers	and	compute	misfit	&	gradient
f(y, ✓)<latexit sha1_base64="kJHO6G9+4Z69IfUiML9FEDGsEJ0=">AAACxXicbZFNa9tAEIbX6kdS9SNOe+xliSmktBgpPTRH0x7aY0zi2MFrzO5qFC9ZacXuKEEI0z/RW6Gn9kfl31SyJdEmHVh4ed59mWFGZFo5DILbnvfg4aPHO7tP/KfPnr/Y6++/PHcmtxIm0mhjZ4I70CqFCSrUMMss8ERomIqrz7U/vQbrlEnPsMhgkfDLVMVKcqzQsr8XHzJRvGdnuALkb/1lfxAMg03R+yJsxGB0wN59vx0VJ8v93k8WGZknkKLU3Ll5GGS4KLlFJTWsfZY7yLi84pcwr2TKE3CLcjP5mr6pSERjY6uXIt3QvxMlT5wrElH9TDiu3F2vhv/z5jnGx4tSpVmOkMptozjXFA2t10AjZUGiLirBpVXVrFSuuOUSq2X5LIK4WUnJhNFRPYPRbEPWjT0tWd1XxHTaIHHToZsWFR0q2uBFhy5adNqh0zboOuRaJDsk2+CsQ7MWjTs0Xvt+fdHw7v3ui/OjYfhhGIzDwegT2dYueU0OyCEJyUcyIl/JCZkQSXLyg/wiv70vXuKhd7396vWazCvyT3nf/gDWud8Y</latexit>
f(y, ✓)i, i 2 ⌦<latexit sha1_base64="ZJZndQ4xSHJI1Atq/jvaODmI0Mk=">AAAC2nicbZFNb9NAEIY35qPFfDSFI5dVA1IRVWTDoainCC7caNSmSZWNot31OFl1vWvtrqksKxduwLVnfgNX+Cf9N9iJbUHLSCu9emZezewMS6WwLgiuO96du/fub20/8B8+evxkp7v79MzqzHAYcS21mTBqQQoFIyechElqgCZMwphdfKjy489grNDq1OUpzBK6UCIWnLoSzbsv4n3C8gNy6pbg6Ku5OMDkiBxhgYlQmHxKYEH9ebcX9IN14NsirEVvsEdeX10P8uP5bucHiTTPElCOS2rtNAxSNyuocYJLWPkks5BSfkEXMC2lognYWbH+zgq/LEmEY23Kpxxe078dBU2szRNWVibULe3NXAX/l5tmLn43K4RKMweKbxrFmcRO42o3OBIGuJN5KSg3opwV8yU1lLtygz6JIK73VBCmZVTNoCVZk1WdHhek6stiPK4Ru2zRZYPyFuWN8bxF5w06adFJY7Qtsg3iLeKNcdKiSYOGLRqufL+6aHjzfrfF2Zt++LYfDMPe4D3axDZ6jvbQPgrRIRqgj+gYjRBH39BP9Av99oj3xfvqfd+Uep3a8wz9E97VH7Ag5ms=</latexit>
Example:	label	interpolation
Data	image
Label	image
Example:	label	interpolation
Final	prediction
Prediction	error
Example:	label	interpolation
Success	using	24	images,	20	labeled	columns	each	for	training.	
But	what	if	we	only	have	very	few	known	labels?
Example:	label	interpolation
Probability map
for one of the
classes
Thresholded:
argmax class
per pixel
Network	output	regularization
What	if	we	only	have	very	few	known	labels?	
• standard	approach:	data	augmentation	
• alternatively:	regularization	of	network	output
Network	output	regularization
What	if	we	only	have	very	few	known	labels?	
• standard	approach:	data	augmentation	
• alternatively:	regularization	of	network	output	
Any	network	for	semantic	segmentation:
f(✓, y) : RN
! RN⇥nclass
<latexit sha1_base64="iU3v/Sa32WgU3CLbkCkO3RSck7c=">AAADGHicbZFPb9MwGMbd8G+Efx0cuVhMSJuEqmQcQJwm9QIXWNm6dmpKZTtOa82JI/sNJYoi8Tn4Fhy4c0NcuSGu8D1w0iQaG68U6dHv8WO/eV+aSmHA8372nCtXr12/sXXTvXX7zt17/e37J0ZlmvExU1LpKSWGS5HwMQiQfJpqTmIq+YSeDSt/8p5rI1RyDHnK5zFZJiISjIBFi/6baDc4hhUH8iSg+R5+gYOYwIrS4m357jUOtFiugGit1ueNwjogYm5wsgiAf4CCSWJMWbqL/o438OrCl4XfiJ2DvY+oqsPFdu9zECqWxTyB+pKZ76UwL4gGwSQv3SAzPCXsjCz5zMqE2GfnRf3nJX5sSYgjpe2XAK7p+URBYmPymNqTVffmolfB/3mzDKLn80IkaQY8YZuHokxiULgaIw6F5gxkbgVhWtheMVsRTRjYYbtByKNmrEVAlQyrHpQMalI29rDYTDTCwxZNOjRpEF13aN2ivEN5Gzzt0GmLjjp01AZNh0yLWIdYG5x2aNqiUYdGpetWS/YvrvSyONkf+E8H3shu+xXa1BZ6iB6hXeSjZ+gAvUSHaIwY+oJ+od/oj/PJ+ep8c75vjjq9JvMA/VPOj7+U8v8s</latexit>
Network	output	regularization
What	if	we	only	have	very	few	known	labels?	
• standard	approach:	data	augmentation	
• alternatively:	regularization	of	network	output	
Add	prior	knowledge	via	a	penalty	function	(per	class):
L(y, ✓, C) = l(y, ✓, C) + ↵
nclassX
j=1
r(f(✓, y)j)
<latexit sha1_base64="uEdpajUqLxgvsDWDscpch5lHP8M=">AAADLXicbdHNb9MwFABwN3yN8LEOjlwsKqRWQJUAElwmTfTCgcOqrWunukS246zeHCeyHUYU5W9C/BvcOSAhrly5wokkSyxYsRTl6ff8bOs9kgqujed97TlXrl67fmPrpnvr9p272/2de0c6yRRlM5qIRC0I1kxwyWaGG8EWqWI4JoLNydmkzs/fM6V5Ig9NnrJVjE8kjzjFpqKgH74dIpI/QYdmzQyu/pMRhLvwKRSb/hgiLNI1hkhncVCc7vrlu0IGyLAPpqACa12WUA2joa0i+Sg4HblBf+CNvWbBzcBvgwFo136w0/uEwoRmMZOmOXjpe6lZFVgZTgUrXZRplmJ6hk/YsgoljpleFU07SviokhBGiao+aWCjf1cUONY6j0m1M8ZmrS/navxfbpmZ6NWq4DLNDJP04qIoE9AksO4tDLli1Ii8CjBVvHorpGusMDXVBFwUsqjtTIFIIsL6DYlAjZRtelKg+l4SwUlHc0vzlsi5pfOOckt5V3hs6bijA0sHXaG2pDuilmhXuLC06GhqaVq6bj1k//JIN4OjZ2P/+dibvhjsvW7HvQUegIdgCHzwEuyBN2AfzAAFn8FP8Av8dj46X5xvzveLrU6vrbkP/lnOjz9TuAMv</latexit>
Total loss
Network	output	regularization
What	if	we	only	have	very	few	known	labels?	
• standard	approach:	data	augmentation	
• alternatively:	regularization	of	network	output	
Add	prior	knowledge	via	a	penalty	function	(per	class):
L(y, ✓, C) = l(y, ✓, C) + ↵
nclassX
j=1
r(f(✓, y)j)
<latexit sha1_base64="uEdpajUqLxgvsDWDscpch5lHP8M=">AAADLXicbdHNb9MwFABwN3yN8LEOjlwsKqRWQJUAElwmTfTCgcOqrWunukS246zeHCeyHUYU5W9C/BvcOSAhrly5wokkSyxYsRTl6ff8bOs9kgqujed97TlXrl67fmPrpnvr9p272/2de0c6yRRlM5qIRC0I1kxwyWaGG8EWqWI4JoLNydmkzs/fM6V5Ig9NnrJVjE8kjzjFpqKgH74dIpI/QYdmzQyu/pMRhLvwKRSb/hgiLNI1hkhncVCc7vrlu0IGyLAPpqACa12WUA2joa0i+Sg4HblBf+CNvWbBzcBvgwFo136w0/uEwoRmMZOmOXjpe6lZFVgZTgUrXZRplmJ6hk/YsgoljpleFU07SviokhBGiao+aWCjf1cUONY6j0m1M8ZmrS/navxfbpmZ6NWq4DLNDJP04qIoE9AksO4tDLli1Ii8CjBVvHorpGusMDXVBFwUsqjtTIFIIsL6DYlAjZRtelKg+l4SwUlHc0vzlsi5pfOOckt5V3hs6bijA0sHXaG2pDuilmhXuLC06GhqaVq6bj1k//JIN4OjZ2P/+dibvhjsvW7HvQUegIdgCHzwEuyBN2AfzAAFn8FP8Av8dj46X5xvzveLrU6vrbkP/lnOjz9TuAMv</latexit>
Multi-class cross-entropy
Network	output	regularization
What	if	we	only	have	very	few	known	labels?	
• standard	approach:	data	augmentation	
• alternatively:	regularization	of	network	output	
Add	prior	knowledge	via	a	penalty	function	(per	class):
L(y, ✓, C) = l(y, ✓, C) + ↵
nclassX
j=1
r(f(✓, y)j)
<latexit sha1_base64="uEdpajUqLxgvsDWDscpch5lHP8M=">AAADLXicbdHNb9MwFABwN3yN8LEOjlwsKqRWQJUAElwmTfTCgcOqrWunukS246zeHCeyHUYU5W9C/BvcOSAhrly5wokkSyxYsRTl6ff8bOs9kgqujed97TlXrl67fmPrpnvr9p272/2de0c6yRRlM5qIRC0I1kxwyWaGG8EWqWI4JoLNydmkzs/fM6V5Ig9NnrJVjE8kjzjFpqKgH74dIpI/QYdmzQyu/pMRhLvwKRSb/hgiLNI1hkhncVCc7vrlu0IGyLAPpqACa12WUA2joa0i+Sg4HblBf+CNvWbBzcBvgwFo136w0/uEwoRmMZOmOXjpe6lZFVgZTgUrXZRplmJ6hk/YsgoljpleFU07SviokhBGiao+aWCjf1cUONY6j0m1M8ZmrS/navxfbpmZ6NWq4DLNDJP04qIoE9AksO4tDLli1Ii8CjBVvHorpGusMDXVBFwUsqjtTIFIIsL6DYlAjZRtelKg+l4SwUlHc0vzlsi5pfOOckt5V3hs6bijA0sHXaG2pDuilmhXuLC06GhqaVq6bj1k//JIN4OjZ2P/+dibvhjsvW7HvQUegIdgCHzwEuyBN2AfzAAFn8FP8Av8dj46X5xvzveLrU6vrbkP/lnOjz9TuAMv</latexit>
Regularization term
Network	output	regularization
Add	prior	knowledge	via	a	penalty	function	(per	class):	
Following	example	uses	quadratic	smoothing	function:	
L(y, ✓, C) = l(y, ✓, C) + ↵
nclassX
j=1
r(f(✓, y)j)
<latexit sha1_base64="uEdpajUqLxgvsDWDscpch5lHP8M=">AAADLXicbdHNb9MwFABwN3yN8LEOjlwsKqRWQJUAElwmTfTCgcOqrWunukS246zeHCeyHUYU5W9C/BvcOSAhrly5wokkSyxYsRTl6ff8bOs9kgqujed97TlXrl67fmPrpnvr9p272/2de0c6yRRlM5qIRC0I1kxwyWaGG8EWqWI4JoLNydmkzs/fM6V5Ig9NnrJVjE8kjzjFpqKgH74dIpI/QYdmzQyu/pMRhLvwKRSb/hgiLNI1hkhncVCc7vrlu0IGyLAPpqACa12WUA2joa0i+Sg4HblBf+CNvWbBzcBvgwFo136w0/uEwoRmMZOmOXjpe6lZFVgZTgUrXZRplmJ6hk/YsgoljpleFU07SviokhBGiao+aWCjf1cUONY6j0m1M8ZmrS/navxfbpmZ6NWq4DLNDJP04qIoE9AksO4tDLli1Ii8CjBVvHorpGusMDXVBFwUsqjtTIFIIsL6DYlAjZRtelKg+l4SwUlHc0vzlsi5pfOOckt5V3hs6bijA0sHXaG2pDuilmhXuLC06GhqaVq6bj1k//JIN4OjZ2P/+dibvhjsvW7HvQUegIdgCHzwEuyBN2AfzAAFn8FP8Av8dj46X5xvzveLrU6vrbkP/lnOjz9TuAMv</latexit>
r(f(✓, y)) =
1
2
nclassX
j=1
k
✓
↵1(Iy ⌦ Dx)
↵2(Dy ⌦ Ix)
◆
f(✓, y)jk2
2
<latexit sha1_base64="un5G130BWDWLxd9PeoVMt92kwBI=">AAADfnicbZFbT9swHMUdsgvLboU98mKt2lSkrSTdJHhBYisPY3uhgtKiukSO47QG56LYGY1MPtQe9mH2bea0ScZlliz99Ts+9pGPl3AmpG3/MdbMR4+fPF1/Zj1/8fLV69bG5pmIs5TQIYl5nI49LChnER1KJjkdJynFocfpyLvql/roJ00Fi6NTmSd0GuJZxAJGsNTIbf1OO0EHnco5lfgD8vLtbQj3IQpSTJRTqF4BkchCV13uO8WFilwk6UIqwrEQhdZuIEQenbFIJSGWKVtohnkyx64DO0duDlEsWUgFPHQX+maEYK33YOfwln5U6ohG/r+L7uZyL7X15qLn9izLclttu2svF3w4ONXQBtU6djeMX8iPSRbSSC7DTxw7kVOFU8kIp4WFMkETTK7wjE70GGEdaqqW/1vAd5r4MIhTvSMJl/S2Q+FQiDz09Ekdfi7uayX8nzbJZLA3VSxKMkkjsnooyDiUMSzLgj5LKZE81wMmKdNZIZlj3Y3UlVrIp0H1RQp5MffLDDFHS1JUcl+h8l0vgP0ajRo0qpB33aDrGuUNymvjeYPOa3TSoJPaKBokakQaRGrjuEHjGg0aNChWJTv3K304nPW6zqeuPfjcPvha1b0OtsBb0AEO2AUH4Bs4BkNAjC3ji/Hd+GEC87350dxZHV0zKs8bcGeZe38BFv0aDQ==</latexit>
Network	output	regularization
Add	prior	knowledge	via	a	penalty	function	(per	class):	
Compare	to	weight	decay	/	Tikhonov	regularization:
L(y, ✓, C) = l(y, ✓, C) + ↵
nclassX
j=1
r(f(✓, y)j)
<latexit sha1_base64="uEdpajUqLxgvsDWDscpch5lHP8M=">AAADLXicbdHNb9MwFABwN3yN8LEOjlwsKqRWQJUAElwmTfTCgcOqrWunukS246zeHCeyHUYU5W9C/BvcOSAhrly5wokkSyxYsRTl6ff8bOs9kgqujed97TlXrl67fmPrpnvr9p272/2de0c6yRRlM5qIRC0I1kxwyWaGG8EWqWI4JoLNydmkzs/fM6V5Ig9NnrJVjE8kjzjFpqKgH74dIpI/QYdmzQyu/pMRhLvwKRSb/hgiLNI1hkhncVCc7vrlu0IGyLAPpqACa12WUA2joa0i+Sg4HblBf+CNvWbBzcBvgwFo136w0/uEwoRmMZOmOXjpe6lZFVgZTgUrXZRplmJ6hk/YsgoljpleFU07SviokhBGiao+aWCjf1cUONY6j0m1M8ZmrS/navxfbpmZ6NWq4DLNDJP04qIoE9AksO4tDLli1Ii8CjBVvHorpGusMDXVBFwUsqjtTIFIIsL6DYlAjZRtelKg+l4SwUlHc0vzlsi5pfOOckt5V3hs6bijA0sHXaG2pDuilmhXuLC06GhqaVq6bj1k//JIN4OjZ2P/+dibvhjsvW7HvQUegIdgCHzwEuyBN2AfzAAFn8FP8Av8dj46X5xvzveLrU6vrbkP/lnOjz9TuAMv</latexit>
L(y, ✓, C) = l(y, ✓, C) + ↵
nkernelsX
k=1
r(✓k)
<latexit sha1_base64="2FfgD08yykDRWFRkcdQSHfA+4H0=">AAADKHicbdHNb9MwFABwN3yN8NXBkYtFhdQJqJKBtF0mTfTCgcOqrWunukSO87JacZzIdhhRlH+IG38Ed25oV65c4U7SJRGsPCnK0+/5ydZ7fiq4No5z2bNu3Lx1+87WXfve/QcPH/W3H5/qJFMMpiwRiZr7VIPgEqaGGwHzVAGNfQEzPxrX9dlHUJon8sTkKSxjei55yBk1FXl97/2Q+PlLcmJWYGj1H+9gfIBfYbHpLzChIl1RTHQWe0V04JYfCukRA59MEYGSIHRZYjVsurxox/b6A2fkrANvJm6TDFATR9527wsJEpbFIA0TVOuF66RmWVBlOBNQ2iTTkFIW0XNYVKmkMehlsZ5EiZ9XEuAwUdUnDV7r3x0FjbXOY786GVOz0tdrNf6vtshMuL8suEwzA5JdXRRmApsE12PFAVfAjMirhDLFq7ditqKKMlMN3yYBhM1QCuInIqjfkAiylrIpjwtS3+uHeNzSrKNZQ/5FRxct5R3lbeNZR2ctHXd03DbqjnRLrCPWNs47mrc06WhS2na9ZPf6SjeT092R+3q0O3kzOHzbrHsLPUXP0BC5aA8donfoCE0RQ1/RT/QL/bY+W9+s79bl1VGr1/Q8Qf+E9eMPBMMBxQ==</latexit>
Example:	label	interpolation
Thresholded:
argmax class
per pixel
Prediction from
regularized training
for one of the
classes
Networks	vs	PDE-constrained	optimization
train a network to track more than one horizon simultaneously?
(2) How do networks deal with multiple horizons that merge and
split? These two questions warrant a new look at the automatic
horizon tracking/interpolation problem because results with
Conclusions
In this paper, we introduced DNNs from an inverse p
point of view. We have shown that the network can be con
as the forward problem and the training as the inverse pr
Geophysical forward problem Geophysical inverse problem Neural network
Discrete problem
structure
Discretized differential opera-
tors in time-stepping scheme
Network structure w.r.t. Yj
, e.g.,
Yj+1 =Yj −Kj
T
σ KjYj + Bj( )
Model parameters Known physical parameters Unknown physical
parameters
Unknown convolutional kernels
with unclear meaning
Model parameter
regularization
Tikhonov regularization on
physical model parameters
Weight decay on kernels and
biases
Output/state
regularization
Would be equivalent to: regu-
larization of final elastic/elec-
tromagnetic field
Tikhonov regularization on final
network state (probability maps)
Table 1. Overview of similarities and differences between geophysical forward/inverse modeling and neural networks for interpretation of geophysical data/imag
Software
Many	geoscience	problems	have	similar	structure:	
• one/few	large	data/label	images	(																																	pix.)	
• small	number	of	labeled	points	at	sparse	locations	
• all	examples	were	trained	on	1	GPU,	<	1	hour	
Same	algorithms	and	code	used	for	
• seismic	interpretation	
• aquifer	mapping		
• mineral	prospectively	mapping
1000 ⇥ 1000<latexit sha1_base64="zYa2rcoFiRHcrdp1tLqeSdl53Sw=">AAAC3nicbdE7jxMxEABgZ3lcWF45HhWNRYREFXmPgisj0lBQJLrLJac4Cl6vN7HO+8Ce5bRapaVDtLS0UPEX+Bf8G7ybrAV3jGRp9I1HM7LDXEkDhPzueDdu3rp90L3j3713/8HD3uGjM5MVmospz1Sm5yEzQslUTEGCEvNcC5aESszCi1Fdn30U2sgsPYUyF8uErVMZS87A0qr3lK7FBxwQQjAFmQjT5KtenwxIE/h6EuyT/vDJu19j3V2MV4ednzTKeJGIFLhixiwCksOyYhokV2Lr08KInPELthYLm6bMjlpWzf5b/MJKhONM25MCbvTvjoolxpRJaG8mDDbmaq3G/9UWBcTHy0qmeQEi5btBcaEwZLh+DBxJLTio0iaMa2l3xXzDNONgn8ynkYjpKWwEsIqGmYrqHTJFG9nuy6OK1nPDGI9amjma7Sm8dHTZUumobBvPHZ23dOLopG00jkxL3BFvG+eO5i1NHE22vu/bTw6ufun15OxoELwakEnQH75Bu+iiZ+g5eokC9BoN0Vs0RlPEUYW+oe/oh/fe++R99r7srnqdfc9j9E94X/8AK+/m/g==</latexit>
Aquifer	mapping
Delineating New Aquifers
To demonstrate the scalability and power of CGI’s new AI
methodology, a number of public regional datasets, including
magnetic, gravity, topography and geology were used as inputs to
map out the large-scale aquifers of Australia’s Northern Territory. A
map of the known aquifer extents was used to validate the results.
The data were processed at CGI and used to train our proprietary
VNet AI algorithm, using only 1% of the known aquifer map as
training targets. The algorithm is a purpose built deep neural
Aquifer	mapping	
identify large regional targets, and potentially to delineate new
unidentified aquifers previously hidden in complex datasets.
Prospec
explora
product
is a co
geoscie
informa
Training	on	small	number	of	
point-annotations
Mineral	prospectively
structural interpretation, as well as a full suite of
geochemical assaying.
Gold had previously been identified in a handful of
locations via hand-samples and drilling intercepts,
and the goal of the project was to use these
examples to train our VNet to highlight new targets
in the region.
Sample geoscience inputs from the Auryn property
The algorithm was trained using a variety of
different parameters and input combinations, and
the resulting gold prospectivity maps were shown
Predicted prospectivity map overlaid on the
simplified geological interpretation. Known
mineralized locations are plotted as gold stars.
property of interest has seen extensive
ation work over the years and contains many
g geoscientific datasets, including airborne
magnetics, magnetics, geological mapping,
ral interpretation, as well as a full suite of
emical assaying.
ad previously been identified in a handful of
ns via hand-samples and drilling intercepts,
he goal of the project was to use these
les to train our VNet to highlight new targets
region.
which generates confidence in the methodolo
and its usefulness for exploration targeting.
Predicted prospectivity map overlaid on the
simplified geological interpretation. Known
Predict
Examples	so	far	used	a	symmetric	-	additive	variant
Networks	-	U-net	variants [Ronneberger et al., 2015]
Network
Y0 = D
Yj = g(Yj 1, Kj) , j = 1, 2, · · · , n
(X Yn)<latexit sha1_base64="MZvW2k0U8C8OxOfVEff3FASbTK4=">AAAJ0nicddbZbtNAFIBhh7UJWwEJLrgZUYFASqsEEMsFEpAALWFJqLNRV9V4PG7c2mPXM05II0sgbnlBnoDXwE6OR9ATLFUd/d8ZJ3VS2Xbke1LVar9Kp06fOXvu/Eq5cuHipctXVq9e68kwiRnvstAP44FNJfc9wbvKUz4fRDGnge3zvn3YyL0/5rH0QmGqacR3A7ovPNdjVGVpb/X7Xcsc7tXIc2KZTWKJUCSBzWNiWaQyp9lBmuH+vcV6vZ5WLbOV1/vEOkqoQ6rkIBuoVx9ULeaESlbFP6fJzhKNvGz7gKyT+UlEvrWYqFT2VtdqG7X5QfCiDos1A4723tWVZ5YTsiTgQjGfSrlTr0Vqd0Zj5TGfpxUrkTyi7JDu851sKWjA5e5sfq1ScicrDnHDOPsRiszr3ztmNJByGtjZZEDVSJ60PC6znUS5T3dnnogSxQVbvJCb+ESFJL/wxPFizpQ/zRaUxV72Xgkb0ZgylX08lUrFcrhrmWrEFZ1Zdug7+bsIfWteUuDGzMpf2XZJo0hNnZpFaunUKtKWTluQ7IlOkyJNdZoWG4c6DYvU1qldpG2dtotzSZ1kkZhOrNjY16lfpJ5OvSINdBoUqaNTpzj9WKfx//8gO9IpSvPLLviEhUFAhZNddvdlOst/kZdpepJeAb3C1ABqYGoCNTG9BnqN6Q3QG0xvgd5i2gTaxLQFtIXpHdA7TC2gFqb3QO8xfQD6gOkj0EdMn4A+YWoDtTF1gDqYPgN9xrQNtI3JBDIxdYG6mHpAPUx9oD6mAdAA0xBoiOkL0Jd0yfeXAlK8zwayMTEgtuyUHJDjfSOgEaYDoANMh0CHmDwgD5MEkpi+An3FNAWaYkqAEkxHQEeYIqAIkwASmBwgB1MAFGCKgWJMLpCLaQw0xjQBmmA6Bjpe9hXI7uBz1nclYrVHSz6jSKK5PKE5nlU/FCdni4zmF/fAE9OLiP+Bl82aMDt/2HiWH4/1owVe9B5s1B9uPOw8WnvR+bZ47Fgxbhm3jXtG3XhivDA2jbbRNZjxu3SpdKN0s2yWj8vfyz8Wo6dK8Khy3fjnKP/8A6wEkew=</latexit>
Initial	state Data
Loss
Label
Network	-	Notation
Y ⌘
2
6
6
6
4
Y 1
Y 2
...
Y nchan
3
7
7
7
5
<latexit sha1_base64="2kwQXlD3z36GpujklkN4bDHZGHM=">AAAJqXicddbJbtNAGMDxKWsTthaOXCwqJCSkKmkloCeWdEsLbdJmbadU4/G4MfVWzzhpavnVeAau3LnCM2AnnwfoFyxFtv6/GceZOIrN0HWkqlS+z924eev2nbvzpfK9+w8ePlpYfNyRQRxx0eaBG0Q9k0nhOr5oK0e5ohdGgnmmK7rmeS337lBE0gn8lhqH4sRjZ75jO5ypLJ0u9Girb1BxETvDMjXFmeMnpsdU5Fym5f7nqkFptlvJd3RoBUpOQ+KfUiUuVcIHzE/TCQvf+jO1XD5dWKosVyabgQ+qcLD09huZbI3Txfk1agU89oSvuMukPK5WQnWSsEg53BVpmcZShIyfszNxnB36zBPyJJksQWo8z4pl2EGUvXxlTOrfMxLmSTn2zGxkdo0Ded3yOMuOY2W/OUkcP4yV8Pn0jezYNVRg5OtpWE4kuHLH2QHjkZNdq5EtSsS4ylY9WwdqCZu21EAollAzcK38KgKXTkoKXEto/s6mbdSKtK7TepF2ddotUl2nOiRzpNOoSGOdxsXEvk79IjV0ahTpUKfD4lxSJ1kkrhMvJnZ16hapo1OnSD2dekVq6tQsTj/Uafj/D2SGOoX57Ud9MeKB57HsrqSm/T5N8p3xPk2v0wegD5hqQDVM60DrmDaANjBtAm1i2gLawrQNtI2pDlTHtAO0g2kXaBfTR6CPmD4BfcK0B7SHaR9oH1MDqIGpCdTEdAB0gOkQ6BBTC6iFqQ3UxtQB6mDqAnUx9YB6mPpAfUxHQEfpjPuXATI8zwQyMXEgPuuUAlDgeQOgAaYvQF8wnQOdY3KAHEwSSGK6BLrENAYaY4qBYkwXQBeYQqAQkw/kY7KALEwekIcpAoow2UA2piHQENMIaITpCuhq1i0QDqZfjP5XMmhjMOM7CiUalyc0TmTVDfzrY4uMxk//A6+Nnkb8A541tgVjJw8ba/n2Sj9a4IPOynJ1dXm1WVl6V58+dZB58pQ8Iy9Ilbwm78g2aZA24eQr+UF+kl+ll6VmqVc6mg69MQdznpB/thL/DVuyifg=</latexit>
Rnx⇥ny⇥nz⇥nchan
<latexit sha1_base64="6m4cyL0rDGHSsonDZCrMqw26Vfw=">AAAJjnicddbJbtpAGMBxJ90C3Uh77MVqVKmnCBopbQ5RFkgCIQuENYkpGg/j4OAt9rA4lp+mT9Nrc+rb1IbPk5aPjhRl9P/NGDyAQHUM3ePZ7O+l5SdPnz1/sZJKv3z1+s3bzOq7pmcPXcoa1DZst60Sjxm6xRpc5wZrOy4jpmqwljrIx94aMdfTbavOfYd1THJj6ZpOCY9SN7OtmIT3VTW4CL8HVnciK1w3mSdbXf9xev84VTib8ID2iRXKYbqbWcuuZ6dDxpMcTNZ2Hjamo9JdXdlSejYdmszi1CCed53LOrwTEJfr1GBhWhl6zCF0QG7YdTS1SPSwnWB6n6H8KSo9WbPd6M/i8rT+vSMgpuf5phqtjO/Km7c4LrLrIde+dQLdcoacWXT2QNrQkLktx4cm93SXUW740YRQV4+eqxwdgUsoj442nU4rPaYpdd5nnASKahu9+FnYhjItIXA+mJ21JueTVBCpkKSySOUklUQqQVLHIo2T5IvkJxsvRbpMUkWkSpJqItWSa3kieUmiItFkY0ukVpKaIjWT1BapnaSqSNXk8iORRv+/IdURyQnjY7fYmNqmSaxedOzaXhjE/+S9MJynfaB9THmgPKYCUAHTAdABpkOgQ0xHQEeYikBFTCWgEqZjoGNMZaAyphOgE0ynQKeYzoDOMJ0DnWOqAFUwVYGqmC6ALjDVgGqY6kB1TA2gBqYmUBNTC6iFqQ3UxnQJdInpCugqXPD+JYAE71OBVEwUiC66JANkeF8fqI/pFugW0wBogEkH0jF5QB6mCdAEkw/kYxoCDTHdAd1hcoAcTBaQhakH1MNkApmYXCAXkwakYRoBjTCNgcaY7oHuF70FnP7shRHfSrJS6S94jRwPrYsTWseiatjW/Noko/Wz78C51bOIP8CL1tZh7fTHxlY8NsVPCzxpflnPbaxvVLNru2fSbKxIH6SP0mcpJ32VdqWiVJEaEpV+SD+lX9JDKpPaTG2ndmZLl5dgz3vpn5Eq/gFmQYJ9</latexit>
Write	tensor-valued	
network	state
as	a	block-vector
Y0 = D
Yj = g(Yj 1, Kj) , j = 1, 2, · · · , n
(X Yn)<latexit sha1_base64="MZvW2k0U8C8OxOfVEff3FASbTK4=">AAAJ0nicddbZbtNAFIBhh7UJWwEJLrgZUYFASqsEEMsFEpAALWFJqLNRV9V4PG7c2mPXM05II0sgbnlBnoDXwE6OR9ATLFUd/d8ZJ3VS2Xbke1LVar9Kp06fOXvu/Eq5cuHipctXVq9e68kwiRnvstAP44FNJfc9wbvKUz4fRDGnge3zvn3YyL0/5rH0QmGqacR3A7ovPNdjVGVpb/X7Xcsc7tXIc2KZTWKJUCSBzWNiWaQyp9lBmuH+vcV6vZ5WLbOV1/vEOkqoQ6rkIBuoVx9ULeaESlbFP6fJzhKNvGz7gKyT+UlEvrWYqFT2VtdqG7X5QfCiDos1A4723tWVZ5YTsiTgQjGfSrlTr0Vqd0Zj5TGfpxUrkTyi7JDu851sKWjA5e5sfq1ScicrDnHDOPsRiszr3ztmNJByGtjZZEDVSJ60PC6znUS5T3dnnogSxQVbvJCb+ESFJL/wxPFizpQ/zRaUxV72Xgkb0ZgylX08lUrFcrhrmWrEFZ1Zdug7+bsIfWteUuDGzMpf2XZJo0hNnZpFaunUKtKWTluQ7IlOkyJNdZoWG4c6DYvU1qldpG2dtotzSZ1kkZhOrNjY16lfpJ5OvSINdBoUqaNTpzj9WKfx//8gO9IpSvPLLviEhUFAhZNddvdlOst/kZdpepJeAb3C1ABqYGoCNTG9BnqN6Q3QG0xvgd5i2gTaxLQFtIXpHdA7TC2gFqb3QO8xfQD6gOkj0EdMn4A+YWoDtTF1gDqYPgN9xrQNtI3JBDIxdYG6mHpAPUx9oD6mAdAA0xBoiOkL0Jd0yfeXAlK8zwayMTEgtuyUHJDjfSOgEaYDoANMh0CHmDwgD5MEkpi+An3FNAWaYkqAEkxHQEeYIqAIkwASmBwgB1MAFGCKgWJMLpCLaQw0xjQBmmA6Bjpe9hXI7uBz1nclYrVHSz6jSKK5PKE5nlU/FCdni4zmF/fAE9OLiP+Bl82aMDt/2HiWH4/1owVe9B5s1B9uPOw8WnvR+bZ47Fgxbhm3jXtG3XhivDA2jbbRNZjxu3SpdKN0s2yWj8vfyz8Wo6dK8Khy3fjnKP/8A6wEkew=</latexit>
Network	-	Notation
K ⌘
2
6
6
6
4
K(✓1,1
) K(✓1,2
) . . . K(✓1,nchan in
)
K(✓2,1
) K(✓2,2
) . . . K(✓2,nchan in
)
...
...
...
...
K(✓nchan out,1
) K(✓nchan out,2
) . . . K(✓nchan out,nchan in
)
3
7
7
7
5
<latexit sha1_base64="00n7wAhCSd3IFzSOUGYls8hPZd4=">AAAKwnicddZbb9s2FAdwOevaVLu02R73QrTY0AJDYLvA2r61tdvGdS9241saZgFFUbESiVJEyo6rCti3HPZhBkyyj7jUxyYQmPj/Do9pmoHlxIGvdL3+T23nmxvf3ry1e9v+7vsffrxzd++nkYrShIshj4IomThMicCXYqh9HYhJnAgWOoEYOxet0sczkSg/kgO9iMVJyM6k7/mc6SI63avFdNAlVFym/ozY1BFnvsyckOnEv8rt7gOqp0KzP7PG7438IfmNXE+ay4RO3UirNZKnVIsrnfEpk8SXeVFJKbnWr4n6Nbf3a27rR2dQS100WRGl9rVGX7WJUp2jXeCKrbvCpdt2KaT7/5me3r1f368vB8GTBkzuP3v4l1WO3une7lPqRjwNhdQ8YEodN+qxPslYon0eiNymqRIx4xfsTBwXU8lCoU6y5d3Iya9F4hIvSoo/qckyvb4iY6FSi9ApKosdTtW6leEmO0619+Qk82WcaiH56o28NCA6IuVFI66fCK6DRTFhPPGLvZLiXBLGdXEdbdumrvDoYHmWGXWiwC13EQWr082BWxkt39nxSKuK2iZqV1HXRN0q6pioA5EzN9G8ihYmWlQLj0x0VEU9E/Wq6NBEh1UvZSJVRdxEvFo4NtG4ikYmGlXRxESTKuqbqF+1n5lotv0DObGJ4rw8dinmPApDVtxJ6njP86x8Ic/zfJ1eAL3A1AJqYWoDtTG9BHqJ6RXQK0yvgV5jOgA6wNQB6mB6A/QGUxeoi+kt0FtM74DeYXoP9B7TB6APmHpAPUx9oD6mj0AfMR0CHWIaAA0wDYGGmEZAI0xjoDGmCdAE0xHQEaZPQJ/yDfeXATK8zgFyMHEgvqmlABR43RRoiukc6BzTBdAFJh/Ix6SAFKYroCtMC6AFphQoxXQJdIkpBooxSSCJyQVyMYVAIaYEKMHkAXmYZkAzTHOgOabPQJ83XYF4uvpizK8Sob3phu8oVqiujFCdKNIgkuu1VYzqV7+Ba9WrEP8Db6odQO3yYeNpOf4wjxZ4MmruNx7tP+oXTx0dazV2rV+se9YDq2E9tp5ZB1bPGlq89nft352bO7fstn1uX9pqVbpTgzU/W18N+8t/7Yrnxw==</latexit>
Y0 = D
Yj = g(Yj 1, Kj) , j = 1, 2, · · · , n
(X Yn)<latexit sha1_base64="MZvW2k0U8C8OxOfVEff3FASbTK4=">AAAJ0nicddbZbtNAFIBhh7UJWwEJLrgZUYFASqsEEMsFEpAALWFJqLNRV9V4PG7c2mPXM05II0sgbnlBnoDXwE6OR9ATLFUd/d8ZJ3VS2Xbke1LVar9Kp06fOXvu/Eq5cuHipctXVq9e68kwiRnvstAP44FNJfc9wbvKUz4fRDGnge3zvn3YyL0/5rH0QmGqacR3A7ovPNdjVGVpb/X7Xcsc7tXIc2KZTWKJUCSBzWNiWaQyp9lBmuH+vcV6vZ5WLbOV1/vEOkqoQ6rkIBuoVx9ULeaESlbFP6fJzhKNvGz7gKyT+UlEvrWYqFT2VtdqG7X5QfCiDos1A4723tWVZ5YTsiTgQjGfSrlTr0Vqd0Zj5TGfpxUrkTyi7JDu851sKWjA5e5sfq1ScicrDnHDOPsRiszr3ztmNJByGtjZZEDVSJ60PC6znUS5T3dnnogSxQVbvJCb+ESFJL/wxPFizpQ/zRaUxV72Xgkb0ZgylX08lUrFcrhrmWrEFZ1Zdug7+bsIfWteUuDGzMpf2XZJo0hNnZpFaunUKtKWTluQ7IlOkyJNdZoWG4c6DYvU1qldpG2dtotzSZ1kkZhOrNjY16lfpJ5OvSINdBoUqaNTpzj9WKfx//8gO9IpSvPLLviEhUFAhZNddvdlOst/kZdpepJeAb3C1ABqYGoCNTG9BnqN6Q3QG0xvgd5i2gTaxLQFtIXpHdA7TC2gFqb3QO8xfQD6gOkj0EdMn4A+YWoDtTF1gDqYPgN9xrQNtI3JBDIxdYG6mHpAPUx9oD6mAdAA0xBoiOkL0Jd0yfeXAlK8zwayMTEgtuyUHJDjfSOgEaYDoANMh0CHmDwgD5MEkpi+An3FNAWaYkqAEkxHQEeYIqAIkwASmBwgB1MAFGCKgWJMLpCLaQw0xjQBmmA6Bjpe9hXI7uBz1nclYrVHSz6jSKK5PKE5nlU/FCdni4zmF/fAE9OLiP+Bl82aMDt/2HiWH4/1owVe9B5s1B9uPOw8WnvR+bZ47Fgxbhm3jXtG3XhivDA2jbbRNZjxu3SpdKN0s2yWj8vfyz8Wo6dK8Khy3fjnKP/8A6wEkew=</latexit>
[Treister et al., 2018; Ruthotto & Haber, 2018]
Block-Toeplitz	matrix:
Network
Y0 = D
Yj = g(Yj 1, Kj) , j = 1, 2, · · · , n
(X Yn)<latexit sha1_base64="MZvW2k0U8C8OxOfVEff3FASbTK4=">AAAJ0nicddbZbtNAFIBhh7UJWwEJLrgZUYFASqsEEMsFEpAALWFJqLNRV9V4PG7c2mPXM05II0sgbnlBnoDXwE6OR9ATLFUd/d8ZJ3VS2Xbke1LVar9Kp06fOXvu/Eq5cuHipctXVq9e68kwiRnvstAP44FNJfc9wbvKUz4fRDGnge3zvn3YyL0/5rH0QmGqacR3A7ovPNdjVGVpb/X7Xcsc7tXIc2KZTWKJUCSBzWNiWaQyp9lBmuH+vcV6vZ5WLbOV1/vEOkqoQ6rkIBuoVx9ULeaESlbFP6fJzhKNvGz7gKyT+UlEvrWYqFT2VtdqG7X5QfCiDos1A4723tWVZ5YTsiTgQjGfSrlTr0Vqd0Zj5TGfpxUrkTyi7JDu851sKWjA5e5sfq1ScicrDnHDOPsRiszr3ztmNJByGtjZZEDVSJ60PC6znUS5T3dnnogSxQVbvJCb+ESFJL/wxPFizpQ/zRaUxV72Xgkb0ZgylX08lUrFcrhrmWrEFZ1Zdug7+bsIfWteUuDGzMpf2XZJo0hNnZpFaunUKtKWTluQ7IlOkyJNdZoWG4c6DYvU1qldpG2dtotzSZ1kkZhOrNjY16lfpJ5OvSINdBoUqaNTpzj9WKfx//8gO9IpSvPLLviEhUFAhZNddvdlOst/kZdpepJeAb3C1ABqYGoCNTG9BnqN6Q3QG0xvgd5i2gTaxLQFtIXpHdA7TC2gFqb3QO8xfQD6gOkj0EdMn4A+YWoDtTF1gDqYPgN9xrQNtI3JBDIxdYG6mHpAPUx9oD6mAdAA0xBoiOkL0Jd0yfeXAlK8zwayMTEgtuyUHJDjfSOgEaYDoANMh0CHmDwgD5MEkpi+An3FNAWaYkqAEkxHQEeYIqAIkwASmBwgB1MAFGCKgWJMLpCLaQw0xjQBmmA6Bjpe9hXI7uBz1nclYrVHSz6jSKK5PKE5nlU/FCdni4zmF/fAE9OLiP+Bl82aMDt/2HiWH4/1owVe9B5s1B9uPOw8WnvR+bZ47Fgxbhm3jXtG3XhivDA2jbbRNZjxu3SpdKN0s2yWj8vfyz8Wo6dK8Khy3fjnKP/8A6wEkew=</latexit>
g(Yj 1, Kj) = Yj 1 hf(KjYj 1)<latexit sha1_base64="TBpxmBJOykHz2zqhtb8WMYqKZoA=">AAAJi3icddZrT9NQGMDxgjc2RUFf+qaRmEAiZJNEhWgCbFzGuGzsDl2W07NTWtYb7enGmP0oJn4a3+pb3/lRbLenR+WZTQgn/99zuq3bsqquafg8k/k5M3vv/oOHj+ZS6cdP5p8+W1h8XvedwKOsRh3T8Zoq8Zlp2KzGDW6ypusxYqkma6i9XOyNPvN8w7GrfOiytkUubUMzKOFR6ixsXi4r1VZndLWaDd8o1WK0ClfkT7KI8qqsy9oy0J++ku4sLGXWMuNDxossLJa2sgPz9vOXX6XO4tyG0nVoYDGbU5P4/kU24/L2iHjcoCYL00rgM5fQHrlkF9HSJhbz26Pxiwzl11HpyprjRX82l8f17x0jYvn+0FKjSYtw3b9rcZxmFwHXPrRHhu0GnNl08kBaYMrckeMrJncNj1FuDqMFoZ4RPVeZ6sQjlEfXNZ1OK12mKVWuM05GiuqY3fhZOKYyLiFwbqTEj6xqci5JeZHySSqKVExSQaQCJHUg0iBJQ5GGycaWSK0klUQqJakiUiU5ly+SnyQqEk02NkRqJKkuUj1JTZGaSSqLVE5O3xep//8XpLoiuWF82W02oI5lEbsbXXZtOxzF/+TtMLxLO0A7mHJAOUx5oDymXaBdTHtAe5j2gfYxHQAdYCoAFTAdAh1iKgIVMR0BHWE6BjrGdAJ0gukU6BRTCaiEqQxUxnQGdIapAlTBVAWqYqoB1TDVgeqYGkANTE2gJqYWUAvTOdB5OOXzSwAJ3qcCqZgoEJ12SgbI8D4dSMd0BXSFqQfUw2QAGZh8IB/TDdANpiHQEFMAFGC6BrrG5AK5mGwgG1MXqIvJArIweUAeJg1Iw9QH6mMaAA0w3QLdTvsIuPrkjRG/SrJS0qe8R66P5uKE5lhUTce+O5tkND/5DbwzPYn4Czxttgqz45uNjfh4J24t8KL+di27vrZeju46TqTJMSe9lF5Jy1JWei9tSQdSSapJVPoqfZO+Sz9S86n11Gbq42R0dgb2vJD+OVK7vwED6X7p</latexit>
ResNet:
Hyperbolic: g(Yj 1, Yj 2, Kj) = 2Yj 1 Yj 2 + h2
f(KjYj 1)<latexit sha1_base64="Sr1KPox5lmODDi467FN8yb/Y3rQ=">AAAJpXicddbbUtpAGMDxaE9CT9pe9manTqc6rQ7oTFsvOqPiGQ8gZw1lNpuNRHMyWUCkeZS+SN+kd962T9EEvmxbPpoZx53/79sAAYZonmUGIpP5MTV97/6Dh49mUunHT54+ez4796IauB2f8QpzLdevazTglunwijCFxeuez6mtWbymXeVir3W5H5iuUxZ9jzdteuGYhsmoiFJrtnyxoJYbrcHlUjZ8D6uVeJWPVuEi+UxWiBwgS0SOkHek/WWFGAsw+mdqMd2anc8sZ4YHwYssLObXsz3r9uu3u0JrbmZN1V3WsbkjmEWD4Dyb8URzQH1hMouHabUTcI+yK3rBz6OlQ20eNAfDlx+SN1HRieH60Z8jyLD+vWNA7SDo21o0aVPRDsYtjpPsvCOMT82B6XgdwR02eiCjYxHhkvhaEt30ORNWP1pQ5pvRcyWsTX3KRHTF0+m0qnNDLYs2F3Sgaq6lx8/CtdRhCYFzAzV+ZM0guSRtybSVpLxM+STty7QPSevJ1EtSX6Z+srEhUyNJBZkKSSrJVErOFcgUJInJxJKNNZlqSarKVE1SXaZ6kooyFZPTd2Xq/v8FaZ5MXhhfdof3mGvb1NGjy25shIP4H9kIw3HaBNrElAPKYdoC2sK0DbSNaQdoB9Mu0C6mPaA9TPtA+5gOgA4w5YHymA6BDjEdAR1hOgY6xnQCdIKpAFTAVAQqYjoFOsVUAiphKgOVMVWAKpiqQFVMNaAapjpQHVMDqIHpDOgsnPD5pYAU79OANEwMiE06JQfkeF8bqI3pEugS0xXQFSYTyMQUAAWYboBuMPWB+pg6QB1M10DXmDwgD5MD5GDSgXRMNpCNyQfyMRlABqYuUBdTD6iH6RbodtJHwGuP3hj5q0TUQnvCe+QFaC5OaI5H1XKd8dkko/nRb+DY9CjiL/Ck2TLMDm821uLjg7y1wIvqynJ2dXm1GN11HCujY0Z5pbxWFpSs8lFZV/aUglJRmPJduVN+Kr9Sb1NHqXKqOhqdnoI9L5V/jlTrN1oCh7g=</latexit>
[Chang et al., 2018]
[He et al., 2015]
Network	-	optimization
min
{Kj },{Yj }
(X Yn)
s.t.
Yn = g(Yn 1, Kn)
Yn 1 = g(Yn 2, Kn 1)
...
Y2 = g(Y1, K2)
Y1 = D<latexit sha1_base64="jJSHf5fpDxlHxpRhT6M2hcMeuuI=">AAAKJHicddZbb9s2FMBxubvF3i1dH/dCLFiQAa1hp0W3FCjQ1m6b1G1qN742DAyKomIlulWk7biCvs4+zd6KPexln2WSfEikOZ6AIPT/dyg7sgPLjn1Pqkbjn8qtL7786utvtqq1b7/7/ocft2//NJTRPOFiwCM/SsY2k8L3QjFQnvLFOE4EC2xfjOzLVuGjhUikF4V9tYrFWcDOQ8/1OFN5mm5/ooEXTlNCU9rvTC8Ize6W60m5JhnZpfHM26P9MblHipyG2W+U1napElcqlXVVz/KHIGT3MTnfWz+418zuFucsNhAzktfPhvb1UA7l2C6hj+gjQhdOpKTetn99kz7v/rXzrs9K++081KbbO416ozwIXjRhsWPB0Z3e3jqgTsTngQgV95mUp81GrM5SliiP+yKr0bkUMeOX7Fyc5suQBUKepeXVz8iveXGIGyX5T6hIWa/vSFkg5Sqw88mAqZm8aUXcZKdz5f5xlnphPFci5Osncuc+UREp3krieIngyl/lC8YTL3+thM9YwrjK3/BarUYd4dK+mgnFUmpHvlO8isinZcmAWyktntl2SUuntkltnTomdXQ6MukIkr00aanTyqSV3jgxaaJT16SuTicmnehzSZOkTtwkrjeOTBrpNDRpqNPYpLFOPZN6+vQLkxb//wfZsUlxVlz2UCx5FAQsdPLL7j7N0uIXeZplN+kZ0DNMLaAWpjZQG9NzoOeYXgC9wPQS6CWmQ6BDTEdAR5heAb3C1AHqYHoN9BrTG6A3mI6BjjG9BXqLqQvUxdQD6mF6B/QO0wnQCaY+UB/TAGiAaQg0xDQCGmEaA40xTYAmmN4Dvc82fH4ZIMP7bCAbEwfim04pAAXeNwOaYboAusB0CXSJyQPyMEkgiekK6ArTCmiFaQ40x/QB6AOmGCjGFAKFmBwgB1MAFGBKgBJMLpCLaQG0wLQEWmL6CPRx00cgv6ko2XwrEdqdbXiPYonmioTmRF79KLw5qzOaX38H3pheR/wPvGm2D7PlzcZBcTw0txZ4MdyvN+/X7/ce7Dw5htuOLetn6xdrz2pav1tPrEOraw0sXjmoTCuzilf9s/pX9VP17/XorQrsuWN9dlT//Q9jIa0a</latexit>
Network	-	optimization
min
{Kj },{Yj }
(X Yn)
s.t.
Yn = g(Yn 1, Kn)
Yn 1 = g(Yn 2, Kn 1)
...
Y2 = g(Y1, K2)
Y1 = D<latexit sha1_base64="jJSHf5fpDxlHxpRhT6M2hcMeuuI=">AAAKJHicddZbb9s2FMBxubvF3i1dH/dCLFiQAa1hp0W3FCjQ1m6b1G1qN742DAyKomIlulWk7biCvs4+zd6KPexln2WSfEikOZ6AIPT/dyg7sgPLjn1Pqkbjn8qtL7786utvtqq1b7/7/ocft2//NJTRPOFiwCM/SsY2k8L3QjFQnvLFOE4EC2xfjOzLVuGjhUikF4V9tYrFWcDOQ8/1OFN5mm5/ooEXTlNCU9rvTC8Ize6W60m5JhnZpfHM26P9MblHipyG2W+U1napElcqlXVVz/KHIGT3MTnfWz+418zuFucsNhAzktfPhvb1UA7l2C6hj+gjQhdOpKTetn99kz7v/rXzrs9K++081KbbO416ozwIXjRhsWPB0Z3e3jqgTsTngQgV95mUp81GrM5SliiP+yKr0bkUMeOX7Fyc5suQBUKepeXVz8iveXGIGyX5T6hIWa/vSFkg5Sqw88mAqZm8aUXcZKdz5f5xlnphPFci5Osncuc+UREp3krieIngyl/lC8YTL3+thM9YwrjK3/BarUYd4dK+mgnFUmpHvlO8isinZcmAWyktntl2SUuntkltnTomdXQ6MukIkr00aanTyqSV3jgxaaJT16SuTicmnehzSZOkTtwkrjeOTBrpNDRpqNPYpLFOPZN6+vQLkxb//wfZsUlxVlz2UCx5FAQsdPLL7j7N0uIXeZplN+kZ0DNMLaAWpjZQG9NzoOeYXgC9wPQS6CWmQ6BDTEdAR5heAb3C1AHqYHoN9BrTG6A3mI6BjjG9BXqLqQvUxdQD6mF6B/QO0wnQCaY+UB/TAGiAaQg0xDQCGmEaA40xTYAmmN4Dvc82fH4ZIMP7bCAbEwfim04pAAXeNwOaYboAusB0CXSJyQPyMEkgiekK6ArTCmiFaQ40x/QB6AOmGCjGFAKFmBwgB1MAFGBKgBJMLpCLaQG0wLQEWmL6CPRx00cgv6ko2XwrEdqdbXiPYonmioTmRF79KLw5qzOaX38H3pheR/wPvGm2D7PlzcZBcTw0txZ4MdyvN+/X7/ce7Dw5htuOLetn6xdrz2pav1tPrEOraw0sXjmoTCuzilf9s/pX9VP17/XorQrsuWN9dlT//Q9jIa0a</latexit>
L({Yi}, {Pi}, {Ki}) = (X, Yn)
nX
i=2
PT
i (Yi gi(Yi 1, Ki))
PT
1 (Y1 D)<latexit sha1_base64="5TUoYTLitFuuQK7J2L/Ce3ajyr0=">AAAJ53icddZbb9NIFMBxB3bZJixsYR/3ZUQFakSpEkBcHpCABGhJKQl1bnTaaDwZN0N9w5eEYPkz7Bva1/1Y+132YcfO8WjpCZaqTv6/M07qpIqtwJFR3Gj8U7l0+aefr/yyUa1d/fXa9d82b9wcRH4SctHnvuOHI4tFwpGe6McydsQoCAVzLUcMrfNW7sO5CCPpe2a8DMSJy848aUvOYpUmm9/uHGzTlFBzPJGEZjuEptTsTqRaFrlTZFInzwgNZnKbmqOdfDj1sjqhtHaH3CM0StxJKp/dz05VJsX+U5NsF3MyUxNnEwmP7jXVidVZVa/XV/vz+eapWQw01aN2ndQmm1uN3UZxELxowmLLgKM7ubHxlE59nrjCi7nDoui42Qjik5SFseSOyGo0iUTA+Dk7E8dq6TFXRCdpcQEzcluVKbH9UP14MSnq/3ekzI2ipWupSZfFs+ii5XGdHSex/eQklV6QxMLjqyeyE4fEPsnfDTKVoeCxs1QLxkOpXivhMxYyHqv3rFar0amwqRnPRMxSavnONH8VvkOLkgG3Upo/s2WTVpnaOrXL1NGpU6Z9nfYhWQudFmVa6rQsN451Gpepq1O3TEc6HZXninSKysR14uXGoU7DMg10GpRppNOoTD2deuXp5zrNf/wHWYFOQZZfdk8suO+6zJuqy26/yNL8F3mRZRfpJdBLTC2gFqY2UBvTK6BXmF4Dvcb0BugNpj2gPUz7QPuY3gK9xdQB6mA6ADrA9A7oHaZDoENM74HeY+oCdTH1gHqYPgB9wHQEdITJBDIx9YH6mAZAA0xDoCGmEdAI0xhojOkj0MdszeeXATK8zwKyMHEgvu6UAlDgfTOgGaZPQJ8wnQOdY5JAElMEFGH6AvQF0xJoiSkBSjB9BvqMKQAKMHlAHqYp0BSTC+RiCoFCTDaQjWkONMe0AFpg+gr0dd1HQN0OFKy/lQjtzta8R0GE5vKE5oSqju9dnC0zml99B16YXkX8D7xu1oTZ4mbjaX480rcWeDG4v9t8sPug93Dr+SHcdmwYfxi3jG2jaTw2nht7RtfoG9z4t3KrcreyU5XVP6vfqn+tRi9VYM/vxndH9e//AG04l+I=</latexit>
Network	-	optimization
L({Yi}, {Pi}, {Ki}) = (X, Yn)
nX
i=2
PT
i (Yi gi(Yi 1, Ki))
PT
1 (Y1 D)<latexit sha1_base64="5TUoYTLitFuuQK7J2L/Ce3ajyr0=">AAAJ53icddZbb9NIFMBxB3bZJixsYR/3ZUQFakSpEkBcHpCABGhJKQl1bnTaaDwZN0N9w5eEYPkz7Bva1/1Y+132YcfO8WjpCZaqTv6/M07qpIqtwJFR3Gj8U7l0+aefr/yyUa1d/fXa9d82b9wcRH4SctHnvuOHI4tFwpGe6McydsQoCAVzLUcMrfNW7sO5CCPpe2a8DMSJy848aUvOYpUmm9/uHGzTlFBzPJGEZjuEptTsTqRaFrlTZFInzwgNZnKbmqOdfDj1sjqhtHaH3CM0StxJKp/dz05VJsX+U5NsF3MyUxNnEwmP7jXVidVZVa/XV/vz+eapWQw01aN2ndQmm1uN3UZxELxowmLLgKM7ubHxlE59nrjCi7nDoui42Qjik5SFseSOyGo0iUTA+Dk7E8dq6TFXRCdpcQEzcluVKbH9UP14MSnq/3ekzI2ipWupSZfFs+ii5XGdHSex/eQklV6QxMLjqyeyE4fEPsnfDTKVoeCxs1QLxkOpXivhMxYyHqv3rFar0amwqRnPRMxSavnONH8VvkOLkgG3Upo/s2WTVpnaOrXL1NGpU6Z9nfYhWQudFmVa6rQsN451Gpepq1O3TEc6HZXninSKysR14uXGoU7DMg10GpRppNOoTD2deuXp5zrNf/wHWYFOQZZfdk8suO+6zJuqy26/yNL8F3mRZRfpJdBLTC2gFqY2UBvTK6BXmF4Dvcb0BugNpj2gPUz7QPuY3gK9xdQB6mA6ADrA9A7oHaZDoENM74HeY+oCdTH1gHqYPgB9wHQEdITJBDIx9YH6mAZAA0xDoCGmEdAI0xhojOkj0MdszeeXATK8zwKyMHEgvu6UAlDgfTOgGaZPQJ8wnQOdY5JAElMEFGH6AvQF0xJoiSkBSjB9BvqMKQAKMHlAHqYp0BSTC+RiCoFCTDaQjWkONMe0AFpg+gr0dd1HQN0OFKy/lQjtzta8R0GE5vKE5oSqju9dnC0zml99B16YXkX8D7xu1oTZ4mbjaX480rcWeDG4v9t8sPug93Dr+SHcdmwYfxi3jG2jaTw2nht7RtfoG9z4t3KrcreyU5XVP6vfqn+tRi9VYM/vxndH9e//AG04l+I=</latexit>
rYn
L = rYn
(X, Y2) Pn
rYi
L = Pi + (rYi
gi+1)T
Pi+1 for i = (n 2), . . . 2
rY1
L = P1<latexit sha1_base64="P+qm+lwP2GST23k1vMcPs9KB6Xo=">AAAKH3icddbdbtNIFMBxh4WlCbtL2b3kZkQFagWtkoJgQUICUqClpSTU+WqnROPxuDH1F55J0mD5YfZp9g5xy9us7RyPaE7WUlXn/zseu06q2Io8V6p6/Uflyi9Xr/16faVau/Hb73/cXL31Z1eG45iLDg+9MO5bTArPDURHucoT/SgWzLc80bPOm7n3JiKWbhiYahaJU5+dBa7jcqayNFz9RgNmeWyYUHMwTII0JQfk3nOCKqHRyF2nZv9BkbbTDbJJqNnKldLapXm3XGUzH3DJfULWL63opuQsG7vfSMnGJ3O+TPGKPiNUiQuVOGFcvHKfrweb2xsPqB0qSbYXz9TIz1SeqJHpcHWtvlUvNoJ3GrCzZsDWGt5aeZqtzce+CBT3mJQnjXqkThMWK5d7Iq3RsRQR4+fsTJxkuwHzhTxNihufkrtZsUl2sdlPoEhRfz4iYb6UM9/KJn2mRnLR8rjMTsbK+fs0cYNorETA5ydyxh5RIcnfRWK7seDKm2U7jMdudq2Ej1jMuMre61qtRm3hUFONhGIJtULPzq8i9GhRUuBmQvMzWw5plmlHp50y7eu0X6Y9nfYgWVOdpmWa6TQrDxzoNChTS6dWmY50OirXkjrJMnGdeHlgT6dembo6dcvU16lfprZO7XL5iU6T//+DrEinKM1veyCmPPR9FtjZbXdepkn+i7xM00V6BfQKUxOoiWkHaAfTa6DXmN4AvcH0Fugtpl2gXUx7QHuY3gG9w7QPtI/pAOgA03ug95gOgQ4xfQD6gKkF1MLUBmpj+gj0EdMR0BEmE8jE1AHqYOoCdTH1gHqY+kB9TAOgAaZjoON0yeeXATJ8nAVkYeJAfNmSAlDg40ZAI0yfgT5jOgc6x+QCuZgkkMR0AXSBaQY0wzQGGmP6AvQFUwQUYQqAAkw2kI3JB/IxxUAxJgfIwTQBmmCaAk0xfQX6uuwjkD1IFKy/lQhtjZa8R5FEc3lCcyKrXhgszpYZzc+/Axem5xH/Ay+bNWG2eNh4mm+P9aMF3ulubzUebj1sP1p7cQiPHSvGbeOOsW40jCfGC2PXaBkdg1ceVY4rvGJX/6n+W/1W/T4fvVKBY/4yLm3VH/8BVSSuQw==</latexit>
rKi
L = (rKi
gi)T
Pi<latexit sha1_base64="G+tV/OPmWzVF+5WpoNnOJlEejZI=">AAAJg3icddZrT9pQGMDxus1N2E23l3vTzCxxWWJAzTZfmKjgBfFS5K5l5PT0VCq92R5AbPo99mn2dvsK+zZr4enJxsOaGE/+v+cUKBCqeZYZ8Fzu98Kjx08Wnz5bymSfv3j56vXyyptG4A58yurUtVy/pZGAWabD6tzkFmt5PiO2ZrGm1i8k3hwyPzBdp8bHHuvY5MYxDZMSHqfu8obqEM0i3VCtlbtmJJ/KO/LaTLvphmb08VtNVmtK18x2l1dz67nJIeNFHharEhxKd2VpW9VdOrCZw6lFguA6n/N4JyQ+N6nFoqw6CJhHaJ/csOt46RCbBZ1w8uIi+UNcdNlw/fjP4fKk/r0jJHYQjG0tnrQJ7wWzlsR5dj3gxtdOaDregDOHTh/IGFgyd+XkSsm66TPKrXG8INQ34+cq0x7xCeXx9cxms6rODLXGe4yTUNVcS0+ehWupkxIBF0I1eWTNkAtpKopUTFNZpHKaSiKVIGkjkUZpGos0Tje2RWqnSRFJSVNVpGp6rkCkIE1UJJpubIrUTFNDpEaaWiK10lQRqZKefijS8P8vSPNE8qLksjtsRF3bJo4eX3ZjLwqTf/JeFM3SPtA+pgJQAVMRqIjpAOgA0yHQIaYjoCNMx0DHmEpAJUwnQCeYykBlTKdAp5jOgM4wnQOdY7oAusCkACmYKkAVTJdAl5iqQFVMNaAapjpQHVMDqIGpCdTE1AJqYWoDtTFdAV1Fcz6/BJDgfRqQhokC0XmnZIAM7+sB9TDdAt1i6gP1MZlAJqYAKMB0D3SPaQw0xjQAGmC6A7rD5AF5mBwgB5MOpGOygWxMPpCPyQAyMA2BhphGQCNMD0AP8z4CXm/6xohfJVlVenPeIy9Ac0lCcyyuluvMzqYZzU9/A2empxF/gefN1mB2crOxnRyfxa0FXjQ21vOb65uVrdXdc7jtWJLeSe+lNSkvfZF2pWNJkeoSlb5LP6Sf0q/MYuZTZiOzNR19tAB73kr/HJmdP1oLebw=</latexit>
rPi
L = Yi gi(Yi 1, Ki)
rP1
L = Y1<latexit sha1_base64="R97XYgjXBLgfnV6oS5mGt9Q8OEM=">AAAJpHicddZbU9pAFMDx2KvQm7aPfdmpU8fOVAdqp60PzqjgBfECctdQZrNsJJKbyQJiJh+sH6VPfW2/RRM42akemhnH9f87GyDgEM01DV9kMj/nHjx89PjJ0/lU+tnzFy9fLSy+rvvOwGO8xhzT8Zoa9blp2LwmDGHyputxamkmb2j9XOyNIfd8w7GrYuzytkUvbUM3GBVR6ixUllWbaibtBGq11DFCcrRJ1GqrY5BVctkxViZ/BMZqNvwYLYvRMvygquk7u7LRLrJJVuPRbISdhaXMWmZyELzIwmJJgaPUWZzfULsOG1jcFsykvn+RzbiiHVBPGMzkYVod+NylrE8v+UW0tKnF/XYwefUheR+VLtEdL/qxBZnUf3cE1PL9saVFkxYVPf++xXGWXQyE/q0dGLY7ENxm0wfSByYRDokvJekaHmfCHEcLyjwjeq6E9ahHmYgueDqdVrtcV6uixwUNVM0xu/GzcEx1UkLgXKDGj6zpJJekvEz5JBVlKiapIFMBkjaSaZSksUzjZGNLplaSSjKVklSRqZKcy5fJTxKTiSUbGzI1klSXqZ6kpkzNJJVlKienH8o0/P8L0lyZ3DC+7DYfMceyqN2NLru+HQbxL7IdhvdpB2gHUw4ohykPlMe0C7SLaQ9oD9M+0D6mA6ADTAWgAqZDoENMRaAipiOgI0zHQMeYToBOMJ0CnWIqAZUwlYHKmM6AzjBVgCqYqkBVTDWgGqY6UB1TA6iBqQnUxNQCamE6BzoPZ3x+KSDF+zQgDRMDYrNOyQE53tcD6mG6ArrC1AfqYzKADEw+kI/pBugG0xhojGkANMB0DXSNyQVyMdlANqYuUBeTBWRh8oA8TDqQjmkINMQ0AhphugW6nfURcHvTN0Z+KxG11JvxHrk+mosTmuNRNR37/myS0fz0O/De9DTif+BZs1WYndxsbMTHF3lrgRf1T2vZ9bX18uelrRO47ZhX3irvlBUlq3xVtpQDpaTUFKb8UH4pv5U/qeXUUaqSqk1HH8zBnjfKnSP1/S8ez4QH</latexit>
First	order	necessary	optimality	conditions	
satisfied	if	partial	derivatives	vanish:
Network	-	optimization
rKi
L = (rKi
gi)T
Pi<latexit sha1_base64="G+tV/OPmWzVF+5WpoNnOJlEejZI=">AAAJg3icddZrT9pQGMDxus1N2E23l3vTzCxxWWJAzTZfmKjgBfFS5K5l5PT0VCq92R5AbPo99mn2dvsK+zZr4enJxsOaGE/+v+cUKBCqeZYZ8Fzu98Kjx08Wnz5bymSfv3j56vXyyptG4A58yurUtVy/pZGAWabD6tzkFmt5PiO2ZrGm1i8k3hwyPzBdp8bHHuvY5MYxDZMSHqfu8obqEM0i3VCtlbtmJJ/KO/LaTLvphmb08VtNVmtK18x2l1dz67nJIeNFHharEhxKd2VpW9VdOrCZw6lFguA6n/N4JyQ+N6nFoqw6CJhHaJ/csOt46RCbBZ1w8uIi+UNcdNlw/fjP4fKk/r0jJHYQjG0tnrQJ7wWzlsR5dj3gxtdOaDregDOHTh/IGFgyd+XkSsm66TPKrXG8INQ34+cq0x7xCeXx9cxms6rODLXGe4yTUNVcS0+ehWupkxIBF0I1eWTNkAtpKopUTFNZpHKaSiKVIGkjkUZpGos0Tje2RWqnSRFJSVNVpGp6rkCkIE1UJJpubIrUTFNDpEaaWiK10lQRqZKefijS8P8vSPNE8qLksjtsRF3bJo4eX3ZjLwqTf/JeFM3SPtA+pgJQAVMRqIjpAOgA0yHQIaYjoCNMx0DHmEpAJUwnQCeYykBlTKdAp5jOgM4wnQOdY7oAusCkACmYKkAVTJdAl5iqQFVMNaAapjpQHVMDqIGpCdTE1AJqYWoDtTFdAV1Fcz6/BJDgfRqQhokC0XmnZIAM7+sB9TDdAt1i6gP1MZlAJqYAKMB0D3SPaQw0xjQAGmC6A7rD5AF5mBwgB5MOpGOygWxMPpCPyQAyMA2BhphGQCNMD0AP8z4CXm/6xohfJVlVenPeIy9Ac0lCcyyuluvMzqYZzU9/A2empxF/gefN1mB2crOxnRyfxa0FXjQ21vOb65uVrdXdc7jtWJLeSe+lNSkvfZF2pWNJkeoSlb5LP6Sf0q/MYuZTZiOzNR19tAB73kr/HJmdP1oLebw=</latexit>
gradient	w.r.t.	network	parameters
Network	-	optimization
rKi
L = (rKi
gi)T
Pi<latexit sha1_base64="G+tV/OPmWzVF+5WpoNnOJlEejZI=">AAAJg3icddZrT9pQGMDxus1N2E23l3vTzCxxWWJAzTZfmKjgBfFS5K5l5PT0VCq92R5AbPo99mn2dvsK+zZr4enJxsOaGE/+v+cUKBCqeZYZ8Fzu98Kjx08Wnz5bymSfv3j56vXyyptG4A58yurUtVy/pZGAWabD6tzkFmt5PiO2ZrGm1i8k3hwyPzBdp8bHHuvY5MYxDZMSHqfu8obqEM0i3VCtlbtmJJ/KO/LaTLvphmb08VtNVmtK18x2l1dz67nJIeNFHharEhxKd2VpW9VdOrCZw6lFguA6n/N4JyQ+N6nFoqw6CJhHaJ/csOt46RCbBZ1w8uIi+UNcdNlw/fjP4fKk/r0jJHYQjG0tnrQJ7wWzlsR5dj3gxtdOaDregDOHTh/IGFgyd+XkSsm66TPKrXG8INQ34+cq0x7xCeXx9cxms6rODLXGe4yTUNVcS0+ehWupkxIBF0I1eWTNkAtpKopUTFNZpHKaSiKVIGkjkUZpGos0Tje2RWqnSRFJSVNVpGp6rkCkIE1UJJpubIrUTFNDpEaaWiK10lQRqZKefijS8P8vSPNE8qLksjtsRF3bJo4eX3ZjLwqTf/JeFM3SPtA+pgJQAVMRqIjpAOgA0yHQIaYjoCNMx0DHmEpAJUwnQCeYykBlTKdAp5jOgM4wnQOdY7oAusCkACmYKkAVTJdAl5iqQFVMNaAapjpQHVMDqIGpCdTE1AJqYWoDtTFdAV1Fcz6/BJDgfRqQhokC0XmnZIAM7+sB9TDdAt1i6gP1MZlAJqYAKMB0D3SPaQw0xjQAGmC6A7rD5AF5mBwgB5MOpGOygWxMPpCPyQAyMA2BhphGQCNMD0AP8z4CXm/6xohfJVlVenPeIy9Ac0lCcyyuluvMzqYZzU9/A2empxF/gefN1mB2crOxnRyfxa0FXjQ21vOb65uVrdXdc7jtWJLeSe+lNSkvfZF2pWNJkeoSlb5LP6Sf0q/MYuZTZiOzNR19tAB73kr/HJmdP1oLebw=</latexit>
(rKi
gi)T
Pi = rKi
⇥
2Wi 1Yi 1 Vi 2Yi 2 + h2
fi(Ki, Wi 1Yi 1, bi)
⇤T
Pi
= rKi
⇥
h2
fi(Ki, Wi 1Yi 1, bi)
⇤T
Pi
= rKi
⇥
h2
KT
i (KiWi 1Yi 1 + bi)
⇤T
Pi
= h2

rKi
⇥
KT
i
⇤
(KiWi 1Yi 1 + bi) + KT
rKi
⇥
(KiWi 1Yi 1 + bi)
⇤ T
Pi
= h2
rvec(Ki)

vec( (KiWi 1Yi 1 + bi)KiI) Pi
h2

KT
i diag(
0
(KiWi 1Yi 1 + bi))(YT
i 1WT
i 1 ⌦ I)
T
Pi
= h2
rvec(Ki)

(I ⌦ (KiWi 1Yi 1 + bi)) vec(Ki) Pi
h2
⇥
KT
i diag(
0
(KiWi 1Yi 1 + bi))(YT
i 1WT
i 1 ⌦ I)
T
Pi
= h2

I ⌦ (KiWi 1Yi 1 + bi) + (Wi 1Yi 1 ⌦ I) diag(
0
(KiWi 1Yi 1 + bi))Ki Pi
<latexit sha1_base64="CTLcsa6tVDyBNgBKLdU6jB9Dyck=">AAAOcniczdbdcttEFAdwtcUQDDVNuYObhQxgT5OMnTJAL5hp67SN6za1G382cjwreW1vo69Ksl1Xo2uehlt4Ft6DB2Aln91iH6XTDL1AM4l3/r/dsx9WIhmexYOwXP7rytVrH+U+/mTr0/xnn18vfHFj+2YncGe+ydqma7l+z6ABs7jD2iEPLdbzfEZtw2Jd47yaeHfO/IC7Titcemxg04nDx9ykoYiG2zlS1B1qWHQY6a36kMdkMox4XDprEb3VGHLy/a9ko4Nu8MkpORDeFV33KiJp9aG1J9qdpH2gUtG6RaZnB2Q85MW0xK4aqQbuEmPIS2npgZpa1/MXzf6h6+2lFZNIbF4MIHrAJzYtyihzs7cungTqCRHFM2d8O1c6/lIzis+86JaMzSp9+dWvVnrBJuQUrsd8Grq+Q20WzZkZr06/FKt9ZvS4xEKSYuJ3rQSrgbXspatZO8+kZ7LWtflGnE7khGfRD7C6d8xYEre+jNKdd9+23ZDbLEhWQ0of5GyKaS1V972PpZR5qmntdx7T/+qU5Nd2yQOQ93oxk9en/+/bhBtw/VTFNoY3dsr75fQiuFGBxo4GV2O4vXVHH7nmzGZOaFo0CE4rZS8cRNQPuWmxOK/PAuZR85xO2KloJgsOBlH6vzwm34lkRMauL36ckKTpv0dE1A6CpW2InjYNp8GmJWGWnc7C8S+DiDveLGSOuZpoPLNI6JLkwUBG3GdmaC1Fg5o+F2sl5pT61AzF4yOfz+sjNtZb4ZSFNNIN1xolq3AtPU1i4GqkJzMbY1KV0aGKDmVUV1FdRjUV1SAyFipayGipoqUc2FdRX0YNFTVkdKKiE1krUFEgI1NFphzYVVFXRh0VdWTUU1FPRk0VNWX5uYrmF2/I8FTkxcmxO2xhurZNnZE49vG9OEo+yL043qT7QPcxVYGqmA6BDjE9AHqA6SHQQ0yPgB5hOgI6wlQDqmF6DPQYUx2ojukJ0BNMT4GeYjoGOsb0DOgZpgZQA1MTqInpOdBzTCdAJ5haQC1MbaA2pg5QB1MXqIupB9TD1AfqY3oB9CLOuH8pIMXjDCADkwlkZpVkgAyPmwJNMb0EeonpHOgcEwfimAKgANNroNeYlkBLTDOgGaZXQK8weUAeJgfIwTQCGmGygWxMPpCPaQw0xjQHmmNaAC0wvQF6k3ULeNPVF6OeSkRvTDO+Iy9A/ZII9WMitVxns6+MUf/VM3Cj9yrEf8BZfVvQN33ZuJNcP6lXC9zoHOxXbu/fbv64c/cYXju2tK+1b7WiVtF+1u5qR1pDa2tm7rfc77k/cn9e/7vwVeGbAryjXL0CY77U1q7C7j9VHBb7</latexit>
For	hyperbolic	network:
gradient	w.r.t.	network	parameters
Network	-	optimization
rKi
L = (rKi
gi)T
Pi<latexit sha1_base64="G+tV/OPmWzVF+5WpoNnOJlEejZI=">AAAJg3icddZrT9pQGMDxus1N2E23l3vTzCxxWWJAzTZfmKjgBfFS5K5l5PT0VCq92R5AbPo99mn2dvsK+zZr4enJxsOaGE/+v+cUKBCqeZYZ8Fzu98Kjx08Wnz5bymSfv3j56vXyyptG4A58yurUtVy/pZGAWabD6tzkFmt5PiO2ZrGm1i8k3hwyPzBdp8bHHuvY5MYxDZMSHqfu8obqEM0i3VCtlbtmJJ/KO/LaTLvphmb08VtNVmtK18x2l1dz67nJIeNFHharEhxKd2VpW9VdOrCZw6lFguA6n/N4JyQ+N6nFoqw6CJhHaJ/csOt46RCbBZ1w8uIi+UNcdNlw/fjP4fKk/r0jJHYQjG0tnrQJ7wWzlsR5dj3gxtdOaDregDOHTh/IGFgyd+XkSsm66TPKrXG8INQ34+cq0x7xCeXx9cxms6rODLXGe4yTUNVcS0+ehWupkxIBF0I1eWTNkAtpKopUTFNZpHKaSiKVIGkjkUZpGos0Tje2RWqnSRFJSVNVpGp6rkCkIE1UJJpubIrUTFNDpEaaWiK10lQRqZKefijS8P8vSPNE8qLksjtsRF3bJo4eX3ZjLwqTf/JeFM3SPtA+pgJQAVMRqIjpAOgA0yHQIaYjoCNMx0DHmEpAJUwnQCeYykBlTKdAp5jOgM4wnQOdY7oAusCkACmYKkAVTJdAl5iqQFVMNaAapjpQHVMDqIGpCdTE1AJqYWoDtTFdAV1Fcz6/BJDgfRqQhokC0XmnZIAM7+sB9TDdAt1i6gP1MZlAJqYAKMB0D3SPaQw0xjQAGmC6A7rD5AF5mBwgB5MOpGOygWxMPpCPyQAyMA2BhphGQCNMD0AP8z4CXm/6xohfJVlVenPeIy9Ac0lCcyyuluvMzqYZzU9/A2empxF/gefN1mB2crOxnRyfxa0FXjQ21vOb65uVrdXdc7jtWJLeSe+lNSkvfZF2pWNJkeoSlb5LP6Sf0q/MYuZTZiOzNR19tAB73kr/HJmdP1oLebw=</latexit>
(rKi
gi)T
Pi = rKi
⇥
2Wi 1Yi 1 Vi 2Yi 2 + h2
fi(Ki, Wi 1Yi 1, bi)
⇤T
Pi
= rKi
⇥
h2
fi(Ki, Wi 1Yi 1, bi)
⇤T
Pi
= rKi
⇥
h2
KT
i (KiWi 1Yi 1 + bi)
⇤T
Pi
= h2

rKi
⇥
KT
i
⇤
(KiWi 1Yi 1 + bi) + KT
rKi
⇥
(KiWi 1Yi 1 + bi)
⇤ T
Pi
= h2
rvec(Ki)

vec( (KiWi 1Yi 1 + bi)KiI) Pi
h2

KT
i diag(
0
(KiWi 1Yi 1 + bi))(YT
i 1WT
i 1 ⌦ I)
T
Pi
= h2
rvec(Ki)

(I ⌦ (KiWi 1Yi 1 + bi)) vec(Ki) Pi
h2
⇥
KT
i diag(
0
(KiWi 1Yi 1 + bi))(YT
i 1WT
i 1 ⌦ I)
T
Pi
= h2

I ⌦ (KiWi 1Yi 1 + bi) + (Wi 1Yi 1 ⌦ I) diag(
0
(KiWi 1Yi 1 + bi))Ki Pi
<latexit sha1_base64="CTLcsa6tVDyBNgBKLdU6jB9Dyck=">AAAOcniczdbdcttEFAdwtcUQDDVNuYObhQxgT5OMnTJAL5hp67SN6za1G382cjwreW1vo69Ksl1Xo2uehlt4Ft6DB2Aln91iH6XTDL1AM4l3/r/dsx9WIhmexYOwXP7rytVrH+U+/mTr0/xnn18vfHFj+2YncGe+ydqma7l+z6ABs7jD2iEPLdbzfEZtw2Jd47yaeHfO/IC7Titcemxg04nDx9ykoYiG2zlS1B1qWHQY6a36kMdkMox4XDprEb3VGHLy/a9ko4Nu8MkpORDeFV33KiJp9aG1J9qdpH2gUtG6RaZnB2Q85MW0xK4aqQbuEmPIS2npgZpa1/MXzf6h6+2lFZNIbF4MIHrAJzYtyihzs7cungTqCRHFM2d8O1c6/lIzis+86JaMzSp9+dWvVnrBJuQUrsd8Grq+Q20WzZkZr06/FKt9ZvS4xEKSYuJ3rQSrgbXspatZO8+kZ7LWtflGnE7khGfRD7C6d8xYEre+jNKdd9+23ZDbLEhWQ0of5GyKaS1V972PpZR5qmntdx7T/+qU5Nd2yQOQ93oxk9en/+/bhBtw/VTFNoY3dsr75fQiuFGBxo4GV2O4vXVHH7nmzGZOaFo0CE4rZS8cRNQPuWmxOK/PAuZR85xO2KloJgsOBlH6vzwm34lkRMauL36ckKTpv0dE1A6CpW2InjYNp8GmJWGWnc7C8S+DiDveLGSOuZpoPLNI6JLkwUBG3GdmaC1Fg5o+F2sl5pT61AzF4yOfz+sjNtZb4ZSFNNIN1xolq3AtPU1i4GqkJzMbY1KV0aGKDmVUV1FdRjUV1SAyFipayGipoqUc2FdRX0YNFTVkdKKiE1krUFEgI1NFphzYVVFXRh0VdWTUU1FPRk0VNWX5uYrmF2/I8FTkxcmxO2xhurZNnZE49vG9OEo+yL043qT7QPcxVYGqmA6BDjE9AHqA6SHQQ0yPgB5hOgI6wlQDqmF6DPQYUx2ojukJ0BNMT4GeYjoGOsb0DOgZpgZQA1MTqInpOdBzTCdAJ5haQC1MbaA2pg5QB1MXqIupB9TD1AfqY3oB9CLOuH8pIMXjDCADkwlkZpVkgAyPmwJNMb0EeonpHOgcEwfimAKgANNroNeYlkBLTDOgGaZXQK8weUAeJgfIwTQCGmGygWxMPpCPaQw0xjQHmmNaAC0wvQF6k3ULeNPVF6OeSkRvTDO+Iy9A/ZII9WMitVxns6+MUf/VM3Cj9yrEf8BZfVvQN33ZuJNcP6lXC9zoHOxXbu/fbv64c/cYXju2tK+1b7WiVtF+1u5qR1pDa2tm7rfc77k/cn9e/7vwVeGbAryjXL0CY77U1q7C7j9VHBb7</latexit>
For	hyperbolic	network:
gradient	w.r.t.	network	parameters
depends	on:		
		-	Lagrangian	multiplier	
		-	state	
for	1	layer	only
Network	-	optimization
Standard	algorithm	known	as:		
reduced-space/Lagrangian	,	Adjoint-state,	backpropagation	
1)	Propagate	forward	to	obtain	all	network	states	and	satisfy	equality	constraints	
rPi L = 0 = Yi gi(Yi 1, Ki)<latexit sha1_base64="6l/BFHtZoF/91nq1/cHPTKn5wn8=">AAAJi3icddbbThpBGMDx1Z6E1lbby95MaprYpBKoSVtNTVTwgHgA5ahryOwwK6t7cncAcbMP0l71aXrb3vZtugvfTlo+Oglh+P9mFlggrOaahi+y2d8zsw8ePnr8ZC6Vfvps/vmLhcWXdd/peYzXmGM6XlOjPjcNm9eEIUzedD1OLc3kDe0mH3ujzz3fcOyqGLr80qJXtqEbjIootRfWVZtqJm0HarXcNkJyuEGyZIOo1VbbICvkqm0sjx4ExkoufB9NS9E0fKeq6XR7YSmbyY4GwZMcTJY2M1/j8a3cXpxbUzsO61ncFsykvn+Ry7riMqCeMJjJw7Ta87lL2Q294hfR1KYW9y+D0ZsMyduodIjueNHNFmRU/94RUMv3h5YWrbSo6PqTFsdpdtET+ufLwLDdnuA2Gz+R3jOJcEh8xkjH8DgT5jCaUOYZ0WslrEs9ykR0XtPptNrhuloVXS5ooGqO2YlfhWOqoxIC5wM1fmZNJ/kkFWQqJKkkUylJRZmKkLSBTIMkDWUaJhtbMrWSVJapnKQzmc6SY/ky+UliMrFkY0OmRpLqMtWT1JSpmaSKTJXk8H2Z+v9/Q5orkxvGp93mA+ZYFrU70WnXt8IgviNbYThJ20DbmPJAeUwFoAKmHaAdTLtAu5j2gPYw7QPtYyoCFTEdAB1gKgGVMB0CHWI6AjrCdAx0jOkE6ARTGaiMqQJUwXQKdIrpDOgMUxWoiqkGVMNUB6pjagA1MDWBmphaQC1M50Dn4ZTvLwWkeJ8GpGFiQGzaITkgx/u6QF1M10DXmG6AbjAZQAYmH8jHdAd0h2kINMTUA+phugW6xeQCuZhsIBtTB6iDyQKyMHlAHiYdSMfUB+pjGgANMN0D3U/7Crjd8Qcj/5WIWu5O+YxcH62LE1rHo2o69uTaJKP14//AidXjiH/A09ZWYe3oYmMtHh/lpQWe1D9kcquZ1Up01XGsjMec8lp5oywrOeWTsqnsK2WlpjDlu/JD+an8Ss2nVlPrqS/jpbMzsOeV8s9I7fwBy8x+xw==</latexit>
Network	-	optimization
Standard	algorithm	known	as:		
reduced-space/Lagrangian	,	Adjoint-state,	backpropagation	
1)	Propagate	forward	to	obtain	all	network	states	and	satisfy	equality	constraints	
2)	Propagate	‘backward’	to	obtain	all	Lagrangian	multipliers	
rPi L = 0 = Yi gi(Yi 1, Ki)<latexit sha1_base64="6l/BFHtZoF/91nq1/cHPTKn5wn8=">AAAJi3icddbbThpBGMDx1Z6E1lbby95MaprYpBKoSVtNTVTwgHgA5ahryOwwK6t7cncAcbMP0l71aXrb3vZtugvfTlo+Oglh+P9mFlggrOaahi+y2d8zsw8ePnr8ZC6Vfvps/vmLhcWXdd/peYzXmGM6XlOjPjcNm9eEIUzedD1OLc3kDe0mH3ujzz3fcOyqGLr80qJXtqEbjIootRfWVZtqJm0HarXcNkJyuEGyZIOo1VbbICvkqm0sjx4ExkoufB9NS9E0fKeq6XR7YSmbyY4GwZMcTJY2M1/j8a3cXpxbUzsO61ncFsykvn+Ry7riMqCeMJjJw7Ta87lL2Q294hfR1KYW9y+D0ZsMyduodIjueNHNFmRU/94RUMv3h5YWrbSo6PqTFsdpdtET+ufLwLDdnuA2Gz+R3jOJcEh8xkjH8DgT5jCaUOYZ0WslrEs9ykR0XtPptNrhuloVXS5ooGqO2YlfhWOqoxIC5wM1fmZNJ/kkFWQqJKkkUylJRZmKkLSBTIMkDWUaJhtbMrWSVJapnKQzmc6SY/ky+UliMrFkY0OmRpLqMtWT1JSpmaSKTJXk8H2Z+v9/Q5orkxvGp93mA+ZYFrU70WnXt8IgviNbYThJ20DbmPJAeUwFoAKmHaAdTLtAu5j2gPYw7QPtYyoCFTEdAB1gKgGVMB0CHWI6AjrCdAx0jOkE6ARTGaiMqQJUwXQKdIrpDOgMUxWoiqkGVMNUB6pjagA1MDWBmphaQC1M50Dn4ZTvLwWkeJ8GpGFiQGzaITkgx/u6QF1M10DXmG6AbjAZQAYmH8jHdAd0h2kINMTUA+phugW6xeQCuZhsIBtTB6iDyQKyMHlAHiYdSMfUB+pjGgANMN0D3U/7Crjd8Qcj/5WIWu5O+YxcH62LE1rHo2o69uTaJKP14//AidXjiH/A09ZWYe3oYmMtHh/lpQWe1D9kcquZ1Up01XGsjMec8lp5oywrOeWTsqnsK2WlpjDlu/JD+an8Ss2nVlPrqS/jpbMzsOeV8s9I7fwBy8x+xw==</latexit>
rYn
L = 0 = rYn
(X, Yn) Pn
rYi
L = 0 = Pi + (rYi
gi+1)T
Pi+1<latexit sha1_base64="bs/nsBqV06vXf+PqxJ8USBSr2EY=">AAAJ53icddZbU9NAFMDx1CutN9RHX3ZkdGBQplXHywMzQBHBcmklvUmws9lu6NLcSLYtNZPP4Jvjqx/L7+KDSXuyA5yaGYbl/zublrSdxvRtEcpi8U/u2vUbN2/dnssX7ty9d//B/MNHjdAbBIzXmWd7QcukIbeFy+tSSJu3/IBTx7R50+yXU28OeRAKz9Xl2OfHDj1xhSUYlUnqzP80XGratBMZersTuXFMdsnzVVIkqwQJMfyeWDT01gtIS+QlMfRqujSMwqV5cfFML9MhQZYJWbx0VhGTk2R0uRSTpW/69FSTvwqd+YXiSnFyELwowWJBg6PaeTj3weh6bOBwVzKbhuFRqejL44gGUjCbxwVjEHKfsj494UfJ0qUOD4+jyQWMybOkdInlBcmPK8mkXtwRUScMx46ZTDpU9sKrlsZZdjSQ1vvjSLj+QHKXTR/IGthEeiR9NUhXBJxJe5wsKAtE8lwJ69GAMpm8ZoVCwehyy9Blj0saGaZnd9Nn4dnGpMTA5chIH9m0SDlLmyptZqmiUiVLOyrtQDJHKo2yNFZpnG1sq9TOUlWlapYOVTrMzhWqFGaJqcSyjU2VmllqqNTIUkulVpZqKtWy0w9VGv7/HzJ9lfw4vewuHzHPcajbTS67tR5H6S+yHsdXaQNoA1MZqIxpE2gT00egj5i2gLYwfQL6hGkbaBvTDtAOps9AnzFVgCqYdoF2Me0B7WHaB9rHdAB0gKkKVMVUA6ph+gL0BdMh0CEmHUjHVAeqY2oANTA1gZqYWkAtTG2gNqavQF/jGe9fCkjxPhPIxMSA2KxTckCO9/WAephOgU4x9YH6mASQwBQChZjOgc4xjYHGmAZAA0xnQGeYfCAfkwvkYuoCdTE5QA6mACjAZAFZmIZAQ0wjoBGm70DfZ70FkpuBCatvJWJUezNeIz9Ec2lCczyptudenc0ymp9+B16Znkb8AZ41q8Ps5GbjQ3q8VbcWeNF4tVJ6vfK69mZhbR9uO+a0J9pTbVErae+0NW1bq2p1jWl/c09zy7kXeZH/kf+Z/zUdvZaDPY+1S0f+9z/3YJn8</latexit>
Network	-	optimization
Standard	algorithm	known	as:		
reduced-space/Lagrangian	,	Adjoint-state,	backpropagation	
1)	Propagate	forward	to	obtain	all	network	states	and	satisfy	equality	constraints	
2)	Propagate	‘backward’	to	obtain	all	Lagrangian	multipliers	
3)	Compute	gradient	w.r.t.		network	parameters	for	every	layer
rPi L = 0 = Yi gi(Yi 1, Ki)<latexit sha1_base64="6l/BFHtZoF/91nq1/cHPTKn5wn8=">AAAJi3icddbbThpBGMDx1Z6E1lbby95MaprYpBKoSVtNTVTwgHgA5ahryOwwK6t7cncAcbMP0l71aXrb3vZtugvfTlo+Oglh+P9mFlggrOaahi+y2d8zsw8ePnr8ZC6Vfvps/vmLhcWXdd/peYzXmGM6XlOjPjcNm9eEIUzedD1OLc3kDe0mH3ujzz3fcOyqGLr80qJXtqEbjIootRfWVZtqJm0HarXcNkJyuEGyZIOo1VbbICvkqm0sjx4ExkoufB9NS9E0fKeq6XR7YSmbyY4GwZMcTJY2M1/j8a3cXpxbUzsO61ncFsykvn+Ry7riMqCeMJjJw7Ta87lL2Q294hfR1KYW9y+D0ZsMyduodIjueNHNFmRU/94RUMv3h5YWrbSo6PqTFsdpdtET+ufLwLDdnuA2Gz+R3jOJcEh8xkjH8DgT5jCaUOYZ0WslrEs9ykR0XtPptNrhuloVXS5ooGqO2YlfhWOqoxIC5wM1fmZNJ/kkFWQqJKkkUylJRZmKkLSBTIMkDWUaJhtbMrWSVJapnKQzmc6SY/ky+UliMrFkY0OmRpLqMtWT1JSpmaSKTJXk8H2Z+v9/Q5orkxvGp93mA+ZYFrU70WnXt8IgviNbYThJ20DbmPJAeUwFoAKmHaAdTLtAu5j2gPYw7QPtYyoCFTEdAB1gKgGVMB0CHWI6AjrCdAx0jOkE6ARTGaiMqQJUwXQKdIrpDOgMUxWoiqkGVMNUB6pjagA1MDWBmphaQC1M50Dn4ZTvLwWkeJ8GpGFiQGzaITkgx/u6QF1M10DXmG6AbjAZQAYmH8jHdAd0h2kINMTUA+phugW6xeQCuZhsIBtTB6iDyQKyMHlAHiYdSMfUB+pjGgANMN0D3U/7Crjd8Qcj/5WIWu5O+YxcH62LE1rHo2o69uTaJKP14//AidXjiH/A09ZWYe3oYmMtHh/lpQWe1D9kcquZ1Up01XGsjMec8lp5oywrOeWTsqnsK2WlpjDlu/JD+an8Ss2nVlPrqS/jpbMzsOeV8s9I7fwBy8x+xw==</latexit>
rYn
L = 0 = rYn
(X, Yn) Pn
rYi
L = 0 = Pi + (rYi
gi+1)T
Pi+1<latexit sha1_base64="bs/nsBqV06vXf+PqxJ8USBSr2EY=">AAAJ53icddZbU9NAFMDx1CutN9RHX3ZkdGBQplXHywMzQBHBcmklvUmws9lu6NLcSLYtNZPP4Jvjqx/L7+KDSXuyA5yaGYbl/zublrSdxvRtEcpi8U/u2vUbN2/dnssX7ty9d//B/MNHjdAbBIzXmWd7QcukIbeFy+tSSJu3/IBTx7R50+yXU28OeRAKz9Xl2OfHDj1xhSUYlUnqzP80XGratBMZersTuXFMdsnzVVIkqwQJMfyeWDT01gtIS+QlMfRqujSMwqV5cfFML9MhQZYJWbx0VhGTk2R0uRSTpW/69FSTvwqd+YXiSnFyELwowWJBg6PaeTj3weh6bOBwVzKbhuFRqejL44gGUjCbxwVjEHKfsj494UfJ0qUOD4+jyQWMybOkdInlBcmPK8mkXtwRUScMx46ZTDpU9sKrlsZZdjSQ1vvjSLj+QHKXTR/IGthEeiR9NUhXBJxJe5wsKAtE8lwJ69GAMpm8ZoVCwehyy9Blj0saGaZnd9Nn4dnGpMTA5chIH9m0SDlLmyptZqmiUiVLOyrtQDJHKo2yNFZpnG1sq9TOUlWlapYOVTrMzhWqFGaJqcSyjU2VmllqqNTIUkulVpZqKtWy0w9VGv7/HzJ9lfw4vewuHzHPcajbTS67tR5H6S+yHsdXaQNoA1MZqIxpE2gT00egj5i2gLYwfQL6hGkbaBvTDtAOps9AnzFVgCqYdoF2Me0B7WHaB9rHdAB0gKkKVMVUA6ph+gL0BdMh0CEmHUjHVAeqY2oANTA1gZqYWkAtTG2gNqavQF/jGe9fCkjxPhPIxMSA2KxTckCO9/WAephOgU4x9YH6mASQwBQChZjOgc4xjYHGmAZAA0xnQGeYfCAfkwvkYuoCdTE5QA6mACjAZAFZmIZAQ0wjoBGm70DfZ70FkpuBCatvJWJUezNeIz9Ec2lCczyptudenc0ymp9+B16Znkb8AZ41q8Ps5GbjQ3q8VbcWeNF4tVJ6vfK69mZhbR9uO+a0J9pTbVErae+0NW1bq2p1jWl/c09zy7kXeZH/kf+Z/zUdvZaDPY+1S0f+9z/3YJn8</latexit>
Network	-	optimization
Standard	algorithm	known	as:		
reduced-space/Lagrangian	,	Adjoint-state,	backpropagation	
	1)	Propagate	forward	to	obtain	all	network	states	and	satisfy	equality	constraints	
	2)	Propagate	‘backward’	to	obtain	Lagrangian	multipliers	
	3)	Compute	gradient	w.r.t.	network	parameters	for	every	layer	
Possible	confusion	compared	to	seismic	parameter	estimation:	
		1)	propagate	wavefield	from	source	
		2)	computation	of	Lagrangian	multipliers	is	called	backpropagation	
						(running	wave	propagation	in	reverse,	with	the	data-residual	as	source	at	receivers)
Backpropagation	-	Lagrangian	connection	known	since	
Yet,	most	neural-network	presentations	do	not	use	linear	algebraic	notation
Network	-	notation
[Y. LeCun et al., 1988]
[Image	from	http://
neuralnetworksanddeeplearning.com/
chap2.html]
Backpropagation	-	Lagrangian	connection	known	since	
Yet,	most	neural-network	presentations	do	not	use	linear	algebraic	notation
Network	-	notation
[Y. LeCun et al., 1988]
[Image	from	http://
neuralnetworksanddeeplearning.com/
chap2.html]

Y 1
j
Y 2
j
= f
✓ "
K(✓1,1
j ) K(✓1,2
j )
K(✓2,1
j ) K(✓2,2
j )
# 
Y 1
j 1
Y 2
j 1
◆
<latexit sha1_base64="2hF00A/0kdIke+FNMvKEq2j/NbI=">AAAKP3icddbbT9NQHMDxziubN9BHXxqNBhIl60xUHkyAcR/gBrtKYTk9O906eqM92xhNfdG/xf/DP8O/wPhifPXNdvv1APvNJoTD93NOe1i7bJprGj7PZn+kbty8dfvO3Zl05t79Bw8fzc49rvpOz6OsQh3T8eoa8Zlp2KzCDW6yuusxYmkmq2mn+dhrfeb5hmOX+dBlxxZp24ZuUMKj1Jz9rWqsbdiBZhHuGedhptHsniiyqo4GuXigMrt16R8yuqxqRrs9n5lcWphXy7zDOIlWBsorJVyQX8oixikXNrsL8Tmv1Fw0Ma7XpuYup16/PNruidIMuq+VcLTlk9zlH5Proi0vZJqzz7OL2dEh44ECg+fLu98+r+x//lVszs0sqS2H9ixmc2oS3z9Ssi4/DojHDWqy6MQ9n7mEnpI2O4qGNrGYfxyMbkwov4hKS9YdL/qxuTyqV1cExPL9oaVFM6ONdvxJi+M0O+px/f1xYNhujzObji+k90yZO3J8l+WW4THKzWE0INQzor3KtEM8Qnn0LGQyGbXFdHixA1VzzFa8C8dURyUEzgdqfGVNl/NJWhNpLUkFkQpJ2hZpG5I2EGmQpKFIw2RhQ6RGkooiFZN0KNJhci5fJD9JVCSaLKyJVEtSVaRqkuoi1ZNUEqmUnL4vUv///5DmiuSG8ctuswF1LItEj6aq6SthEP+SV8JwklaBVjHlgfKY1oDWMK0DrWPaANrAtAm0iWkLaAvTNtA2ph2gHUwFoAKmXaBdTHtAe5j2gfYxfQT6iKkIVMRUAiphOgA6wHQIdIipDFTGVAGqYKoCVTHVgGqY6kB1TA2gBqZPQJ/CKc8vASR4nQakYaJAdNopGSDD6zpAHUxdoC6mU6BTTAaQgckH8jGdA51jGgINMfWAepjOgM4wuUAuJhvIxtQCamGygCxMHpCHSQfSMfWB+pgGQANMF0AX0x4BtzO+MeJTSVaLnSn3yPXRvDiheSyqpmNPzk0ymj/+DJyYPY74DTxtbhnmjr5sLMXHW/HVAg+quUXlzeKbUvStY18aHzPSU+mZNC8p0jtpWdqSilJFoqly6iL1JfU1/T39M/07/Wc89UYK1jyRrh3pv/8A27XARQ==</latexit>
(block)	sparsity	pattern	of	K:		what	neurons	connect
Presentation	changes	when	considering	channels	and	skip	connections.	
ResNet:
Network	-	notation
3x3, 64
1x1, 64
relu
1x1, 256
relu
relu
3x3, 64
3x3, 64
relu
relu
64-d 256-d
Figure 5. A deeper residual function F for ImageNet. Left: a
building block (on 56×56 feature maps) as in Fig. 3 for ResNet-
34. Right: a “bottleneck” building block for ResNet-50/101/152.
[He et al., 2015]
Presentation	changes	when	considering	channels	and	skip	connections.	
ResNet:
Network	-	notation

Y 1
j
Y 2
j
=

Y 1
j 1
Y 2
j 1
+ f
✓ "
K(✓1,1
j ) K(✓1,2
j )
K(✓2,1
j ) K(✓2,2
j )
# 
Y 1
j 1
Y 2
j 1
◆
<latexit sha1_base64="FbUp1FbOOxiy9Q5x+co8zpqhJv0=">AAAKenicndbZbtNAFAZgh7WYrYVLbiwqUCugilPEcoHUkq6klKTN2rqNxpNx4tRb7UnS1HJfgbfgDbiF5+AReAcusJPj6XLCDZaiTP7vnPEknsjWPcsMeDb7K3Pt+o2bt25P3ZHv3rv/4OH0zKNq4PZ8yirUtVy/rpOAWabDKtzkFqt7PiO2brGafpRPvNZnfmC6TpkPPXZgk7ZjGiYlPI6aM5k5TWdt0wl1m3DfPInkRrN7qCqaNhrkkoHGnNa5f5BRx6HaDLuv1GjUdZg7/3C584VsKJputttzaIrCnFbmHcZJfM5QfalG88pzRYRJlIua3flkzgtpLi5M0kulufPSy6f/32WPljwvN6dnswvZ0aHggQqD2aWtb2fL22e/i82Zqfday6U9mzmcWiQI9tWsxw9C4nOTWiyeuBcwj9Aj0mb78dAhNgsOwtE1jZRncdJSDNePXw5XRunFjpDYQTC09bgyXmgnuGpJOMn2e9x4dxCajtfjzKHjExk9S+GukmwQpWX6jHJrGA8I9c14rQrtEJ9QHm8jWZa1FjPgxw413bVaySpcSxslEXA+1JIz64aST6MVEa2kUUFEhTTaFNEmRPpARIM0GopomDY2RNRIo6KIimm0K6LddK5AREEaURHRtLEmoloaVUVUTaO6iOppVBJRKZ2+L6L+v7+Q7onIi5Kf3WED6to2ibemphvLUZi8KctRdJU+An3ElAfKY1oBWsG0CrSKaQ1oDdM60DqmDaANTJtAm5g+AX3CVAAqYNoC2sL0Gegzpm2gbUxfgL5gKgIVMZWASph2gHYw7QLtYioDlTFVgCqYqkBVTDWgGqY6UB1TA6iBaQ9oL5qwfwkgwX06kI6JAtFJUzJAhvs6QB1MXaAupiOgI0wmkIkpAAownQCdYBoCDTH1gHqYjoGOMXlAHiYHyMHUAmphsoFsTD6Qj8kAMjD1gfqYBkADTKdAp5O2gNcZXxhxV1K0YmfCNfICVJdEqI7FqeU6V2vTGNWP74FXqsch/gNPqi1D7ehh431yvBGPFnhQzS2oiwuLpfipY1saH1PSE+mpNCep0ltpSdqQilJFopmvme+ZH5mfd/7IT+V5+cW49FoGeh5Llw759V/Yy9GG</latexit>
3x3, 64
1x1, 64
relu
1x1, 256
relu
relu
3x3, 64
3x3, 64
relu
relu
64-d 256-d
Figure 5. A deeper residual function F for ImageNet. Left: a
building block (on 56×56 feature maps) as in Fig. 3 for ResNet-
34. Right: a “bottleneck” building block for ResNet-50/101/152.
[He et al., 2015]
Network	-	notation squeeze8
expand8
1x18convolu.on8filters8
1x18and83x38convolu.on8filters8
ReLU8
ReLU8
Figure 1: Microarchitectural view: Organization of convolution filters in the Fire module.
example, s1x1 = 3, e1x1 = 4, and e3x3 = 4. We illustrate the convolution filters but n
activations.
the choice of layers in which to downsample in the CNN architecture. Most commonly, dow
pling is engineered into CNN architectures by setting the (stride > 1) in some of the convolu
pooling layers (e.g. (Szegedy et al., 2014; Simonyan & Zisserman, 2014; Krizhevsky et al., 2
If early3
layers in the network have large strides, then most layers will have small activation
Conversely, if most layers in the network have a stride of 1, and the strides greater than 1 a
centrated toward the end4
of the network, then many layers in the network will have large act
[F.N. Iandola et al., 2016]
Network	-	notation squeeze8
expand8
1x18convolu.on8filters8
1x18and83x38convolu.on8filters8
ReLU8
ReLU8
Figure 1: Microarchitectural view: Organization of convolution filters in the Fire module.
example, s1x1 = 3, e1x1 = 4, and e3x3 = 4. We illustrate the convolution filters but n
activations.
the choice of layers in which to downsample in the CNN architecture. Most commonly, dow
pling is engineered into CNN architectures by setting the (stride > 1) in some of the convolu
pooling layers (e.g. (Szegedy et al., 2014; Simonyan & Zisserman, 2014; Krizhevsky et al., 2
If early3
layers in the network have large strides, then most layers will have small activation
Conversely, if most layers in the network have a stride of 1, and the strides greater than 1 a
centrated toward the end4
of the network, then many layers in the network will have large act
[F.N. Iandola et al., 2016]

Y 1
j
Y 2
j
=

Y 1
j 1
Y 2
j 1
+ f
✓ "
K(✓1,1
j )
K(✓2,1
j )
#
f
✓ h
K(✓1,1
j ) K(✓1,2
j )
i 
Y 1
j 1
Y 2
j 1
◆◆
<latexit sha1_base64="v/Y88cu9qNnWNjzCf+sVFV3q2hk=">AAAKpXicndZbU9NAFAfwFG8Yr+ijLxkdFUZlmjqj8uAMWFSggC29Q0pns920KbmZbFtqJn4cP4TfxDdf8VOYtCcr9NQHzQzT5f875yRtttPonmUGPJv9kVm4dPnK1WuL1+UbN2/dvnN36V4tcAc+ZVXqWq7f0EnALNNhVW5yizU8nxFbt1hdP8knXh8yPzBdp8LHHmvZpOuYhkkJj6P2Uqas6axrOqFuE+6bp5HcbPePVUXTJotcstCY0/njb2XUcay2w/4LNZp0Hef+/HOx85lsKJpudrvLaERhWavwHuMkPmeoPlejlaRdpMdhLs7a/RU89F9GPlHOTVSf55KJM+P+971NLmIFXtp3H2VXs5NDwQsVFo/Wd7993dj/elZsLy2uaR2XDmzmcGqRIDhSsx5vhcTnJrVYPH4QMI/QE9JlR/HSITYLWuHk9kfK4zjpKIbrx38OVybp+Y6Q2EEwtvW4Mr7cXjBrSTjPjgbceNMKTccbcObQ6YmMgaVwV0n2ktIxfUa5NY4XhPpmfK0K7RGfUB7vOFmWtQ4z4PMONd21OslVuJY2SSLgfKglZ9YNJZ9GmyLaTKOCiApptC2ibYj0kYhGaTQW0ThtbIqomUZFERXTqCyicjorEFGQRlRENG2si6ieRjUR1dKoIaJGGpVEVErHD0U0/Psb0j0ReVHysTtsRF3bJvEG1XRjIwqTF2UjimbpHdA7THmgPKZNoE1M74HeY/oA9AHTR6CPmLaAtjBtA21j2gHawVQAKmDaBdrFtAe0h2kfaB/TJ6BPmIpARUwloBKmA6ADTGWgMqYKUAVTFaiKqQZUw1QHqmNqADUwNYGamA6BDqM5+5cAEtynA+mYKBCdN5IBMtzXA+ph6gP1MZ0AnWAygUxMAVCA6RToFNMYaIxpADTA9BnoMyYPyMPkADmYOkAdTDaQjckH8jEZQAamIdAQ0whohOkL0Jd5W8DrTW+M+FVStGJvzj3yAlSXRKiOxanlOrO1aYzqp7+BM9XTEH+B59VWoHbysLGWHK/EowVe1HKr6svVl6X4qWNfmh6L0gPpobQsqdJraV3akopSVaKZ75mfmbPML/mpvCdX5Nq0dCEDPfelC4fc/g0VoeKI</latexit>
Network	-	notation squeeze8
expand8
1x18convolu.on8filters8
1x18and83x38convolu.on8filters8
ReLU8
ReLU8
Figure 1: Microarchitectural view: Organization of convolution filters in the Fire module.
example, s1x1 = 3, e1x1 = 4, and e3x3 = 4. We illustrate the convolution filters but n
activations.
the choice of layers in which to downsample in the CNN architecture. Most commonly, dow
pling is engineered into CNN architectures by setting the (stride > 1) in some of the convolu
pooling layers (e.g. (Szegedy et al., 2014; Simonyan & Zisserman, 2014; Krizhevsky et al., 2
If early3
layers in the network have large strides, then most layers will have small activation
Conversely, if most layers in the network have a stride of 1, and the strides greater than 1 a
centrated toward the end4
of the network, then many layers in the network will have large act
[F.N. Iandola et al., 2016]

Y 1
j
Y 2
j
=

Y 1
j 1
Y 2
j 1
+ f
✓ "
K(✓1,1
j )
K(✓2,1
j )
#
f
✓ h
K(✓1,1
j ) K(✓1,2
j )
i 
Y 1
j 1
Y 2
j 1
◆◆
<latexit sha1_base64="v/Y88cu9qNnWNjzCf+sVFV3q2hk=">AAAKpXicndZbU9NAFAfwFG8Yr+ijLxkdFUZlmjqj8uAMWFSggC29Q0pns920KbmZbFtqJn4cP4TfxDdf8VOYtCcr9NQHzQzT5f875yRtttPonmUGPJv9kVm4dPnK1WuL1+UbN2/dvnN36V4tcAc+ZVXqWq7f0EnALNNhVW5yizU8nxFbt1hdP8knXh8yPzBdp8LHHmvZpOuYhkkJj6P2Uqas6axrOqFuE+6bp5HcbPePVUXTJotcstCY0/njb2XUcay2w/4LNZp0Hef+/HOx85lsKJpudrvLaERhWavwHuMkPmeoPlejlaRdpMdhLs7a/RU89F9GPlHOTVSf55KJM+P+971NLmIFXtp3H2VXs5NDwQsVFo/Wd7993dj/elZsLy2uaR2XDmzmcGqRIDhSsx5vhcTnJrVYPH4QMI/QE9JlR/HSITYLWuHk9kfK4zjpKIbrx38OVybp+Y6Q2EEwtvW4Mr7cXjBrSTjPjgbceNMKTccbcObQ6YmMgaVwV0n2ktIxfUa5NY4XhPpmfK0K7RGfUB7vOFmWtQ4z4PMONd21OslVuJY2SSLgfKglZ9YNJZ9GmyLaTKOCiApptC2ibYj0kYhGaTQW0ThtbIqomUZFERXTqCyicjorEFGQRlRENG2si6ieRjUR1dKoIaJGGpVEVErHD0U0/Psb0j0ReVHysTtsRF3bJvEG1XRjIwqTF2UjimbpHdA7THmgPKZNoE1M74HeY/oA9AHTR6CPmLaAtjBtA21j2gHawVQAKmDaBdrFtAe0h2kfaB/TJ6BPmIpARUwloBKmA6ADTGWgMqYKUAVTFaiKqQZUw1QHqmNqADUwNYGamA6BDqM5+5cAEtynA+mYKBCdN5IBMtzXA+ph6gP1MZ0AnWAygUxMAVCA6RToFNMYaIxpADTA9BnoMyYPyMPkADmYOkAdTDaQjckH8jEZQAamIdAQ0whohOkL0Jd5W8DrTW+M+FVStGJvzj3yAlSXRKiOxanlOrO1aYzqp7+BM9XTEH+B59VWoHbysLGWHK/EowVe1HKr6svVl6X4qWNfmh6L0gPpobQsqdJraV3akopSVaKZ75mfmbPML/mpvCdX5Nq0dCEDPfelC4fc/g0VoeKI</latexit>
diagonal	matrices	with	0/1
Understanding	of	the	block	structure	of	
enables	various	parameterizations	including	
• block-circulant	
• block-diagonal	convolution	+	scalar	off-diagonal	elements [J. Ephrath et al., 2018]
Network	-	notation
K ⌘
2
6
6
6
4
K(✓1,1
) K(✓1,2
) . . . K(✓1,nchan in
)
K(✓2,1
) K(✓2,2
) . . . K(✓2,nchan in
)
...
...
...
...
K(✓nchan out,1
) K(✓nchan out,2
) . . . K(✓nchan out,nchan in
)
3
7
7
7
5
<latexit sha1_base64="00n7wAhCSd3IFzSOUGYls8hPZd4=">AAAKwnicddZbb9s2FAdwOevaVLu02R73QrTY0AJDYLvA2r61tdvGdS9241saZgFFUbESiVJEyo6rCti3HPZhBkyyj7jUxyYQmPj/Do9pmoHlxIGvdL3+T23nmxvf3ry1e9v+7vsffrxzd++nkYrShIshj4IomThMicCXYqh9HYhJnAgWOoEYOxet0sczkSg/kgO9iMVJyM6k7/mc6SI63avFdNAlVFym/ozY1BFnvsyckOnEv8rt7gOqp0KzP7PG7438IfmNXE+ay4RO3UirNZKnVIsrnfEpk8SXeVFJKbnWr4n6Nbf3a27rR2dQS100WRGl9rVGX7WJUp2jXeCKrbvCpdt2KaT7/5me3r1f368vB8GTBkzuP3v4l1WO3une7lPqRjwNhdQ8YEodN+qxPslYon0eiNymqRIx4xfsTBwXU8lCoU6y5d3Iya9F4hIvSoo/qckyvb4iY6FSi9ApKosdTtW6leEmO0619+Qk82WcaiH56o28NCA6IuVFI66fCK6DRTFhPPGLvZLiXBLGdXEdbdumrvDoYHmWGXWiwC13EQWr082BWxkt39nxSKuK2iZqV1HXRN0q6pioA5EzN9G8ihYmWlQLj0x0VEU9E/Wq6NBEh1UvZSJVRdxEvFo4NtG4ikYmGlXRxESTKuqbqF+1n5lotv0DObGJ4rw8dinmPApDVtxJ6njP86x8Ic/zfJ1eAL3A1AJqYWoDtTG9BHqJ6RXQK0yvgV5jOgA6wNQB6mB6A/QGUxeoi+kt0FtM74DeYXoP9B7TB6APmHpAPUx9oD6mj0AfMR0CHWIaAA0wDYGGmEZAI0xjoDGmCdAE0xHQEaZPQJ/yDfeXATK8zgFyMHEgvqmlABR43RRoiukc6BzTBdAFJh/Ix6SAFKYroCtMC6AFphQoxXQJdIkpBooxSSCJyQVyMYVAIaYEKMHkAXmYZkAzTHOgOabPQJ83XYF4uvpizK8Sob3phu8oVqiujFCdKNIgkuu1VYzqV7+Ba9WrEP8Db6odQO3yYeNpOf4wjxZ4MmruNx7tP+oXTx0dazV2rV+se9YDq2E9tp5ZB1bPGlq89nft352bO7fstn1uX9pqVbpTgzU/W18N+8t/7Yrnxw==</latexit>
[E. Treister et al., 2018]
Network	-	optimization
Standard	algorithm	known	as:		
reduced-space/Lagrangian	,	Adjoint-state,	backpropagation	
	1)	Propagate	forward	to	obtain	all	network	states	and	satisfy	equality	constraints	
	2)	Propagate	‘backward’	to	obtain	Lagrangian	multipliers	
	3)	Compute	gradient	w.r.t.	network	parameters	for	every	layer	
• Reverse-mode	automatic	differentiation	implements	this	algorithm	as	well	
• stores	all	network	states	
• memory	requirements	grow	with	network	depth
Network	-	optimization
Standard	algorithm	known	as:		
reduced-space/Lagrangian	,	Adjoint-state,	backpropagation	
	1)	Propagate	forward	to	obtain	all	network	states	and	satisfy	equality	constraints	
	2)	Propagate	‘backward’	to	obtain	Lagrangian	multipliers	
	3)	Compute	gradient	w.r.t.	network	parameters	for	every	layer	
• Reverse-mode	automatic	differentiation	implements	this	algorithm	as	well	
• stores	all	network	states	
• memory	requirements	grow	with	network	depth	
can	avoid	if	we	could	re-compute	these	on	the	fly	in	(2)
• Reversible	systems	allow	states	to	be	computed	from	earlier	and	later	states	
• U-net	skip-connections,	ResNet	blocks	are	not	reversible
Reversible	networks
Reversible	networks
Consider	leapfrog	discretization	of	the	nonlinear	Telegraph	equation:	
Reverse	propagation	follows	as	
[Chang et al., 2018]
Yi = W 1
i

2Wi+1Yi+1 f(Wi+1Yi+1, Ki+2) Yi+2
<latexit sha1_base64="DZsFvT7XrYfYmRuw4YHG++4w14s=">AAAJunicddbZbtNAFIBhl7UJWwuX3IxASEVAlbQSUAmktumeLgnN2jpE48m4mcZb7UnS1DKPw+twzSPwANwCdnI8gp5gqero/844qZ0qNjxLBDKX+z5z4+at23fuzmay9+4/ePhobv5xLXD7PuNV5lqu3zBowC3h8KoU0uINz+fUNixeN3qFxOsD7gfCdSpy5PGWTc8cYQpGZZzac0KvNNuCfCR6pd4Wn8M3+Yjohjg7OyVL4xaKV0mKpyarN8QkC9PgdbwuJuul6GU8BbAEp2tl23PPc4u58UHwIg+L56v7X7+sHX75UWrPz67oHZf1be5IZtEgOM3nPNkKqS8Fs3iU1fsB9yjr0TN+Gi8davOgFY6vSURexKVDTNePfxxJxvXvHSG1g2BkG/GkTWU3uG5JnGanfWm+b4XC8fqSO2zyQmbfItIlyQUmHeFzJq1RvKDMF/F7JaxLfcpkfBuy2aze4aZekV0uaagbrtVJ3oVr6eMSARdCPXllwySFNG2otJGmokrFNO2qtAvJGKo0TNNIpVG6salSM00llUppOlbpOD1XoFKQJqYSSzfWVaqnqaZSLU0NlRppKqtUTk8/UGnw/z/I8FTyouSyO3zIXNumTie+7OZaFCa/yFoUXad1oHVMBaACpg2gDUybQJuYtoC2MG0DbWPaAdrBtAu0i2kPaA9TEaiIaR9oH9MB0AGmQ6BDTEdAR5hKQCVMZaAypk9AnzAdAx1jqgBVMFWBqphqQDVMdaA6pgZQA1MTqInpBOgkmvL5pYAU7zOADEwMiE07JQfkeF8XqIvpHOgcUw+oh0kACUwBUIDpEugS0whohKkP1Md0AXSByQPyMDlADqYOUAeTDWRj8oF8TCaQiWkANMA0BBpiugK6mvYR8LqTG6O+lYhe6k65R16A5pKE5nhcLde5PptmND/5Drw2PYn4H3jabAVmxw8bK8nxVj1a4EVtaTG/vLhcjp86DrXJMas91Z5pC1pee6etajtaSatqTPum/dR+ab8zHzJGRmR6k9EbM7DnifbPkZF/AHN8jtE=</latexit>
Yi+1 = 2WiYi Wi 1Yi 1 + f(WiYi, Ki)<latexit sha1_base64="9kNDvopV4v3ZCvTrvEYCF6doK38=">AAAJoXicddbZbtNAFIBhl7UJWwuX3IyokIqAKgEJ6AVSS9IlTVucNmvrKhpPxs003rAnSVPLDwIPwrsgcQvPgZ0cj6AnWKo6+b8zWZxEsenbIpSFwo+FGzdv3b5zdzGXv3f/wcNHS8uPm6E3DBhvMM/2grZJQ24LlzekkDZv+wGnjmnzljkopd4a8SAUnluXE5+fOfTcFZZgVCapu6QbptXpRuJlMSYfyRuS3Gx1xTQK8prMbkfidcIwmS5fEmKtwuysi1dGvdoVL/LdpZXCWmF6ELwowmJlY+1renzTu8uL60bPY0OHu5LZNAxPiwVfnkU0kILZPM4bw5D7lA3oOT9Nli51eHgWTV96TJ4npUcsL0j+XEmm9e8dEXXCcOKYyaRDZT+8bmmcZ6dDaX04i4TrDyV32eyBrKFNpEfS80h6IuBM2pNkQVkgkudKWJ8GlMnkbOfzeaPHLaMu+1zSyDA9u5c+C882piUGLkVG+simRUpZKqtUzlJVpWqWKipVIJljlcZZmqg0yTZ2VOpkSVdJz9KxSsfZfYUqhVliKrFsY0ulVpaaKjWz1FapnaWaSrXs7kcqjf7/gkxfJT9OT7vLx8xzHOr2ktNubcZR+o9sxvF1+gT0CVMJqISpDFTGtAW0hWkbaBvTDtAOpl2gXUwVoAqmPaA9TFWgKqZ9oH1MB0AHmA6BDjF9BvqMSQfSMdWAapiOgI4wHQMdY6oD1TE1gBqYmkBNTC2gFqY2UBtTB6iD6QToJJ7z+aWAFO8zgUxMDIjNu0sOyPG+PlAf0wXQBaYB0ACTABKYQqAQ0yXQJaYJ0ATTEGiI6QvQF0w+kI/JBXIx9YB6mBwgB1MAFGCygCxMI6ARpjHQGNMV0NW8j4Dfn70x6leJGHp/znvkh2guTWiOJ9X23OuzWUbzs9/Aa9OziL/A82brMDu92FhPj3fq0gIvmm/Wim/X3taSq45DbXYsak+1Z9qqVtTeaxvarqZrDY1p37Wf2i/td24lV8npuaPZ6I0F2PNE++fInf4BafeGsA==</latexit>
Reversible	networks
Consider	leapfrog	discretization	of	the	nonlinear	Telegraph	equation:	
Reverse	propagation	follows	as	
Changes	channels	and	resolution	(pooling),	generally	not	invertible.	
Orthogonal	wavelet	transform	suits	the	purpose
[Chang et al., 2018]
Yi = W 1
i

2Wi+1Yi+1 f(Wi+1Yi+1, Ki+2) Yi+2
<latexit sha1_base64="DZsFvT7XrYfYmRuw4YHG++4w14s=">AAAJunicddbZbtNAFIBhl7UJWwuX3IxASEVAlbQSUAmktumeLgnN2jpE48m4mcZb7UnS1DKPw+twzSPwANwCdnI8gp5gqero/844qZ0qNjxLBDKX+z5z4+at23fuzmay9+4/ePhobv5xLXD7PuNV5lqu3zBowC3h8KoU0uINz+fUNixeN3qFxOsD7gfCdSpy5PGWTc8cYQpGZZzac0KvNNuCfCR6pd4Wn8M3+Yjohjg7OyVL4xaKV0mKpyarN8QkC9PgdbwuJuul6GU8BbAEp2tl23PPc4u58UHwIg+L56v7X7+sHX75UWrPz67oHZf1be5IZtEgOM3nPNkKqS8Fs3iU1fsB9yjr0TN+Gi8davOgFY6vSURexKVDTNePfxxJxvXvHSG1g2BkG/GkTWU3uG5JnGanfWm+b4XC8fqSO2zyQmbfItIlyQUmHeFzJq1RvKDMF/F7JaxLfcpkfBuy2aze4aZekV0uaagbrtVJ3oVr6eMSARdCPXllwySFNG2otJGmokrFNO2qtAvJGKo0TNNIpVG6salSM00llUppOlbpOD1XoFKQJqYSSzfWVaqnqaZSLU0NlRppKqtUTk8/UGnw/z/I8FTyouSyO3zIXNumTie+7OZaFCa/yFoUXad1oHVMBaACpg2gDUybQJuYtoC2MG0DbWPaAdrBtAu0i2kPaA9TEaiIaR9oH9MB0AGmQ6BDTEdAR5hKQCVMZaAypk9AnzAdAx1jqgBVMFWBqphqQDVMdaA6pgZQA1MTqInpBOgkmvL5pYAU7zOADEwMiE07JQfkeF8XqIvpHOgcUw+oh0kACUwBUIDpEugS0whohKkP1Md0AXSByQPyMDlADqYOUAeTDWRj8oF8TCaQiWkANMA0BBpiugK6mvYR8LqTG6O+lYhe6k65R16A5pKE5nhcLde5PptmND/5Drw2PYn4H3jabAVmxw8bK8nxVj1a4EVtaTG/vLhcjp86DrXJMas91Z5pC1pee6etajtaSatqTPum/dR+ab8zHzJGRmR6k9EbM7DnifbPkZF/AHN8jtE=</latexit>
Yi+1 = 2WiYi Wi 1Yi 1 + f(WiYi, Ki)<latexit sha1_base64="9kNDvopV4v3ZCvTrvEYCF6doK38=">AAAJoXicddbZbtNAFIBhl7UJWwuX3IyokIqAKgEJ6AVSS9IlTVucNmvrKhpPxs003rAnSVPLDwIPwrsgcQvPgZ0cj6AnWKo6+b8zWZxEsenbIpSFwo+FGzdv3b5zdzGXv3f/wcNHS8uPm6E3DBhvMM/2grZJQ24LlzekkDZv+wGnjmnzljkopd4a8SAUnluXE5+fOfTcFZZgVCapu6QbptXpRuJlMSYfyRuS3Gx1xTQK8prMbkfidcIwmS5fEmKtwuysi1dGvdoVL/LdpZXCWmF6ELwowmJlY+1renzTu8uL60bPY0OHu5LZNAxPiwVfnkU0kILZPM4bw5D7lA3oOT9Nli51eHgWTV96TJ4npUcsL0j+XEmm9e8dEXXCcOKYyaRDZT+8bmmcZ6dDaX04i4TrDyV32eyBrKFNpEfS80h6IuBM2pNkQVkgkudKWJ8GlMnkbOfzeaPHLaMu+1zSyDA9u5c+C882piUGLkVG+simRUpZKqtUzlJVpWqWKipVIJljlcZZmqg0yTZ2VOpkSVdJz9KxSsfZfYUqhVliKrFsY0ulVpaaKjWz1FapnaWaSrXs7kcqjf7/gkxfJT9OT7vLx8xzHOr2ktNubcZR+o9sxvF1+gT0CVMJqISpDFTGtAW0hWkbaBvTDtAOpl2gXUwVoAqmPaA9TFWgKqZ9oH1MB0AHmA6BDjF9BvqMSQfSMdWAapiOgI4wHQMdY6oD1TE1gBqYmkBNTC2gFqY2UBtTB6iD6QToJJ7z+aWAFO8zgUxMDIjNu0sOyPG+PlAf0wXQBaYB0ACTABKYQqAQ0yXQJaYJ0ATTEGiI6QvQF0w+kI/JBXIx9YB6mBwgB1MAFGCygCxMI6ARpjHQGNMV0NW8j4Dfn70x6leJGHp/znvkh2guTWiOJ9X23OuzWUbzs9/Aa9OziL/A82brMDu92FhPj3fq0gIvmm/Wim/X3taSq45DbXYsak+1Z9qqVtTeaxvarqZrDY1p37Wf2i/td24lV8npuaPZ6I0F2PNE++fInf4BafeGsA==</latexit>
[Lensink, Haber & B.P, 2019]
Discrete	wavelet	transform
Example:	1-level	Haar	transform	
• halves	resolution	in	each	direction	
• increases	number	of	channels	4x	(2D)	
• inverse	transform	known	in	closed-form
Fully	reversible	Hyperbolic	Networks
• memory	becomes	independent	of	network	depth	and	pooling
Can	train	networks	
to	map	large	3D	to	
3D	volumes	on	1	
GPU
• train	on	a	single	video	
• 3	slices	provided
Example	-	single	video	segmentation
Training	label
Example	-	single	video	segmentation
Prediction
Example	-	single	video	segmentation
Prediction	on	top	of	data
Example	-	single	video	segmentation
Prediction	on	top	of	data
Example	-	single	video	segmentation
Prediction	on	top	of	data
Example	-	single	video	segmentation
Example	-	video	segmentation
• direct	video-to-video	maps	provide	a	simpler	approach	
• only	1	video	used,	no	pre-training	
• not	a	real-time	approach
Conclusions
We	can	work	with	sparse	labels	directly.	No	need	to	create	patches.	
Regularization	can	mitigate	a	lack	of	labels.	
We	can	interpolate	and	extrapolate	labels,	given	full	data.	
Partial	losses	+	regularization									do	not	always	need	a	lot	of	data	&	
labels	
Fully	reversible	networks	enable	training	on	large	3D	input-output
References
• Automatic classification of geologic units in seismic images using partially interpreted examples
Bas Peters, Justin Granek, Eldad Haber
81st EAGE Conference and Exhibition 2019.
• Multi-resolution neural networks for tracking seismic horizons from few training images
Bas Peters, Justin Granek, Eldad Haber
Interpretation, 7, no. 3 (2019): 1-54.
• Neural-networks for geophysicists and their application to seismic data interpretation
Bas Peters, Eldad Haber, Justin Granek
The Leading Edge, 38, no. 7 (2019): 534-540
• Does shallow geological knowledge help neural-networks to predict deep units?
Bas Peters, Eldad Haber, Justin Granek
SEG Technical Program Expanded Abstracts 2019
• Fully Hyperbolic Convolutional Neural Networks
Keegan Lensink, Eldad Haber, Bas Peters
arXiv:1905.10484

More Related Content

PPT
Using Very High Resolution Satellite Images for Planning Activities in Mining
PPTX
PPT
Potential Applications for the Nikon iSpace Tracking System in Geophysical Su...
PDF
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
PPTX
Deep Learning Tomography
PDF
Lecture 4 (part c) empirical models
PPTX
IGARSS_2011_GALLOZA.pptx
PPTX
FIRE DETECTION USING VIDEO ANALYTICS
Using Very High Resolution Satellite Images for Planning Activities in Mining
Potential Applications for the Nikon iSpace Tracking System in Geophysical Su...
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
Deep Learning Tomography
Lecture 4 (part c) empirical models
IGARSS_2011_GALLOZA.pptx
FIRE DETECTION USING VIDEO ANALYTICS

What's hot (18)

PPT
IGARSS11_takaku_dsm_report.ppt
PPT
IGARSS 2011.ppt
PPTX
Multi spectral imaging sensors
PPTX
BalloonNet: A Deploying Method for a Three-Dimensional Wireless Network Surro...
PDF
Performance of waveform cross correlation using a global and regular grid of ...
PDF
"A Fast Object Detector for ADAS using Deep Learning," a Presentation from Pa...
PPT
FR4.L09 - QUANTITATIVE ASSESSMENT ON THE REQUIREMENTS OF DESDYNI MISSION FOR ...
PDF
Multispectral imaging in Forensics with VideometerLab 3
PDF
IRJET-Reversible Image Watermarking Based on Histogram Shifting Technique
PPTX
Introduction to TLS Applications Presentation
PDF
PDF
MSc Proposal Presentation: A comparison of TLS and Photogrammetry
PDF
IRJET- Fire Detection using Infrared Images for Uav-Based Forest Fire Sur...
PPTX
Extend Your Journey: Introducing Signal Strength into Location-based Applicat...
PPTX
Omid Badretale Low-Dose CT noise reduction
PPT
Real-time Implementation of Sphere Decoder-based MIMO Wireless System (EUSIPC...
PPTX
PDF
APPROVED - Raytheon - Capstone Final Poster
IGARSS11_takaku_dsm_report.ppt
IGARSS 2011.ppt
Multi spectral imaging sensors
BalloonNet: A Deploying Method for a Three-Dimensional Wireless Network Surro...
Performance of waveform cross correlation using a global and regular grid of ...
"A Fast Object Detector for ADAS using Deep Learning," a Presentation from Pa...
FR4.L09 - QUANTITATIVE ASSESSMENT ON THE REQUIREMENTS OF DESDYNI MISSION FOR ...
Multispectral imaging in Forensics with VideometerLab 3
IRJET-Reversible Image Watermarking Based on Histogram Shifting Technique
Introduction to TLS Applications Presentation
MSc Proposal Presentation: A comparison of TLS and Photogrammetry
IRJET- Fire Detection using Infrared Images for Uav-Based Forest Fire Sur...
Extend Your Journey: Introducing Signal Strength into Location-based Applicat...
Omid Badretale Low-Dose CT noise reduction
Real-time Implementation of Sphere Decoder-based MIMO Wireless System (EUSIPC...
APPROVED - Raytheon - Capstone Final Poster
Ad

Recently uploaded (20)

PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
Microbiology with diagram medical studies .pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PDF
An interstellar mission to test astrophysical black holes
PDF
. Radiology Case Scenariosssssssssssssss
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
lecture 2026 of Sjogren's syndrome l .pdf
Derivatives of integument scales, beaks, horns,.pptx
AlphaEarth Foundations and the Satellite Embedding dataset
The KM-GBF monitoring framework – status & key messages.pptx
Phytochemical Investigation of Miliusa longipes.pdf
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Placing the Near-Earth Object Impact Probability in Context
Classification Systems_TAXONOMY_SCIENCE8.pptx
Comparative Structure of Integument in Vertebrates.pptx
neck nodes and dissection types and lymph nodes levels
Microbiology with diagram medical studies .pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
An interstellar mission to test astrophysical black holes
. Radiology Case Scenariosssssssssssssss
Ad

Learning From a Few Large-Scale Partial Examples: Computational Tools, Regularization, and Network Design

  • 2. This talk Learning from a single or few examples • loss functions • mitigating lack of data with regularization Networks and optimization for • large scare inputs-outputs Applications • Video segmentation • seismic interpretation, aquifer mapping, mineral prospectively
  • 3. Seismic imaging Time domain LS-RTM Model%separation Data%separation 0 1000 2000 3000 4000 Lateral [m] 0 500 1000 1500 2000 Depth[m] -1 -0.5 0 0.5 1 km/s 0 1000 2000 3000 4000 Lateral [m] 0 500 1000 1500 2000 Depth[m] 1.5 2 2.5 3 3.5 4 4.5 5 km/s 0 1000 2000 3000 4000 Lateral [m] 0 500 1000 1500 2000 Depth[m] 1.5 2 2.5 3 3.5 4 4.5 5 km/s = + True model Smooth background model Model perturbation mt ms m dD ime[m] 0 0.5 1 1.5 2 0 50 100 Ds ime[m] 0 0.5 1 1.5 2 0 50 100 Dt ime[m] 0 0.5 1 1.5 2 0 50 100 = +
  • 10. Opportunities all data recorded at training time for some applications only the labels are unknown
  • 11. Goals Design • networks • loss-functions • network-regularization to avoid the need for • large data volumes & storage/access issues • many labeled pixels or fully annotated images • GPU numbers/training time
  • 14. Partial Loss-Functions Example: for non-linear regression type problems: l(f(y, ✓), c) = NX i=1 |f(y, ✓)i ci| . <latexit sha1_base64="xlLzo4reQHU9IrqrTS8+bm7njP8=">AAADBHicbZHLbtQwFIY94VbCpVO6ZGOoKk2lMkpgQTeVKrGBTTWjdjpTjYfIdpwZq04c2U6rKM2WPS/AE7DjsuQ9YAsPgjOTRLTlSJZ/fef8OvY5JBVcG8/72XFu3b5z997afffBw0eP17sbT060zBRlIyqFVBOCNRM8YSPDjWCTVDEcE8HG5OxNlR+fM6W5TI5NnrJZjOcJjzjFxqKgeyh6UQ+RfBcdmwUzeGcXEboD9yHSWRwUfN8v3x9CJFhkLuHVyoDDF9BW2xspPl+Yy74bdLe8vrcMeFP4tdg6GKx/fLa9+X0QbHQ+oVDSLGaJoQJrPfW91MwKrAyngpUuyjRLMT3Dcza1MsEx07Ni+fESblsSwkgqexIDl/RfR4FjrfOY2MoYm4W+nqvg/3LTzER7s4InaWZYQleNokxAI2E1RRhyxagRuRWYKm7fCukCK0yNnbWLQhbVUyoQkSKs3iAFWpKyTo8LVPUlERzXiFy06KJBeYvyxnjaotMGHbXoqDHqFukG0RbRxjhp0aRBwxYNS9etNupf399NcfKy77/qe0O72ndgFWvgKXgOesAHr8EBeAsGYAQo+Ap+gd/gj/PB+ex8cb6tSp1O7dkEV8L58RchFfXb</latexit> Neural network `1<latexit sha1_base64="qaOsBLyU0uGok6bykghRg/MN0o0=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0V9Bj04jGCeUCyhNlJbzJmdmaZmRVCyD948aCIV//Hm3/jJNmDJhY0FFXddHdFqeDG+v63t7K6tr6xWdgqbu/s7u2XDg4bRmWaYZ0poXQrogYFl1i33ApspRppEglsRsPbqd98Qm24kg92lGKY0L7kMWfUOqnRQSG6QbdU9iv+DGSZBDkpQ45at/TV6SmWJSgtE9SYduCnNhxTbTkTOCl2MoMpZUPax7ajkiZowvHs2gk5dUqPxEq7kpbM1N8TY5oYM0oi15lQOzCL3lT8z2tnNr4Ox1ymmUXJ5oviTBCryPR10uMamRUjRyjT3N1K2IBqyqwLqOhCCBZfXiaN80pwUfHvL8vVmzyOAhzDCZxBAFdQhTuoQR0YPMIzvMKbp7wX7937mLeuePnMEfyB9/kDNcaO4Q==</latexit>
  • 15. Partial Loss-Functions Example: for non-linear regression type problems: l(f(y, ✓), c) = NX i=1 |f(y, ✓)i ci| . <latexit sha1_base64="xlLzo4reQHU9IrqrTS8+bm7njP8=">AAADBHicbZHLbtQwFIY94VbCpVO6ZGOoKk2lMkpgQTeVKrGBTTWjdjpTjYfIdpwZq04c2U6rKM2WPS/AE7DjsuQ9YAsPgjOTRLTlSJZ/fef8OvY5JBVcG8/72XFu3b5z997afffBw0eP17sbT060zBRlIyqFVBOCNRM8YSPDjWCTVDEcE8HG5OxNlR+fM6W5TI5NnrJZjOcJjzjFxqKgeyh6UQ+RfBcdmwUzeGcXEboD9yHSWRwUfN8v3x9CJFhkLuHVyoDDF9BW2xspPl+Yy74bdLe8vrcMeFP4tdg6GKx/fLa9+X0QbHQ+oVDSLGaJoQJrPfW91MwKrAyngpUuyjRLMT3Dcza1MsEx07Ni+fESblsSwkgqexIDl/RfR4FjrfOY2MoYm4W+nqvg/3LTzER7s4InaWZYQleNokxAI2E1RRhyxagRuRWYKm7fCukCK0yNnbWLQhbVUyoQkSKs3iAFWpKyTo8LVPUlERzXiFy06KJBeYvyxnjaotMGHbXoqDHqFukG0RbRxjhp0aRBwxYNS9etNupf399NcfKy77/qe0O72ndgFWvgKXgOesAHr8EBeAsGYAQo+Ap+gd/gj/PB+ex8cb6tSp1O7dkEV8L58RchFfXb</latexit> `1<latexit sha1_base64="qaOsBLyU0uGok6bykghRg/MN0o0=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0V9Bj04jGCeUCyhNlJbzJmdmaZmRVCyD948aCIV//Hm3/jJNmDJhY0FFXddHdFqeDG+v63t7K6tr6xWdgqbu/s7u2XDg4bRmWaYZ0poXQrogYFl1i33ApspRppEglsRsPbqd98Qm24kg92lGKY0L7kMWfUOqnRQSG6QbdU9iv+DGSZBDkpQ45at/TV6SmWJSgtE9SYduCnNhxTbTkTOCl2MoMpZUPax7ajkiZowvHs2gk5dUqPxEq7kpbM1N8TY5oYM0oi15lQOzCL3lT8z2tnNr4Ox1ymmUXJ5oviTBCryPR10uMamRUjRyjT3N1K2IBqyqwLqOhCCBZfXiaN80pwUfHvL8vVmzyOAhzDCZxBAFdQhTuoQR0YPMIzvMKbp7wX7937mLeuePnMEfyB9/kDNcaO4Q==</latexit> Network parameters: convolutional kernels
  • 16. Partial Loss-Functions Example: for non-linear regression type problems: l(f(y, ✓), c) = NX i=1 |f(y, ✓)i ci| . <latexit sha1_base64="xlLzo4reQHU9IrqrTS8+bm7njP8=">AAADBHicbZHLbtQwFIY94VbCpVO6ZGOoKk2lMkpgQTeVKrGBTTWjdjpTjYfIdpwZq04c2U6rKM2WPS/AE7DjsuQ9YAsPgjOTRLTlSJZ/fef8OvY5JBVcG8/72XFu3b5z997afffBw0eP17sbT060zBRlIyqFVBOCNRM8YSPDjWCTVDEcE8HG5OxNlR+fM6W5TI5NnrJZjOcJjzjFxqKgeyh6UQ+RfBcdmwUzeGcXEboD9yHSWRwUfN8v3x9CJFhkLuHVyoDDF9BW2xspPl+Yy74bdLe8vrcMeFP4tdg6GKx/fLa9+X0QbHQ+oVDSLGaJoQJrPfW91MwKrAyngpUuyjRLMT3Dcza1MsEx07Ni+fESblsSwkgqexIDl/RfR4FjrfOY2MoYm4W+nqvg/3LTzER7s4InaWZYQleNokxAI2E1RRhyxagRuRWYKm7fCukCK0yNnbWLQhbVUyoQkSKs3iAFWpKyTo8LVPUlERzXiFy06KJBeYvyxnjaotMGHbXoqDHqFukG0RbRxjhp0aRBwxYNS9etNupf399NcfKy77/qe0O72ndgFWvgKXgOesAHr8EBeAsGYAQo+Ap+gd/gj/PB+ex8cb6tSp1O7dkEV8L58RchFfXb</latexit> `1<latexit sha1_base64="qaOsBLyU0uGok6bykghRg/MN0o0=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0V9Bj04jGCeUCyhNlJbzJmdmaZmRVCyD948aCIV//Hm3/jJNmDJhY0FFXddHdFqeDG+v63t7K6tr6xWdgqbu/s7u2XDg4bRmWaYZ0poXQrogYFl1i33ApspRppEglsRsPbqd98Qm24kg92lGKY0L7kMWfUOqnRQSG6QbdU9iv+DGSZBDkpQ45at/TV6SmWJSgtE9SYduCnNhxTbTkTOCl2MoMpZUPax7ajkiZowvHs2gk5dUqPxEq7kpbM1N8TY5oYM0oi15lQOzCL3lT8z2tnNr4Ox1ymmUXJ5oviTBCryPR10uMamRUjRyjT3N1K2IBqyqwLqOhCCBZfXiaN80pwUfHvL8vVmzyOAhzDCZxBAFdQhTuoQR0YPMIzvMKbp7wX7937mLeuePnMEfyB9/kDNcaO4Q==</latexit> Vectorized input image
  • 17. Partial Loss-Functions Example: for non-linear regression type problems: l(f(y, ✓), c) = NX i=1 |f(y, ✓)i ci| . <latexit sha1_base64="xlLzo4reQHU9IrqrTS8+bm7njP8=">AAADBHicbZHLbtQwFIY94VbCpVO6ZGOoKk2lMkpgQTeVKrGBTTWjdjpTjYfIdpwZq04c2U6rKM2WPS/AE7DjsuQ9YAsPgjOTRLTlSJZ/fef8OvY5JBVcG8/72XFu3b5z997afffBw0eP17sbT060zBRlIyqFVBOCNRM8YSPDjWCTVDEcE8HG5OxNlR+fM6W5TI5NnrJZjOcJjzjFxqKgeyh6UQ+RfBcdmwUzeGcXEboD9yHSWRwUfN8v3x9CJFhkLuHVyoDDF9BW2xspPl+Yy74bdLe8vrcMeFP4tdg6GKx/fLa9+X0QbHQ+oVDSLGaJoQJrPfW91MwKrAyngpUuyjRLMT3Dcza1MsEx07Ni+fESblsSwkgqexIDl/RfR4FjrfOY2MoYm4W+nqvg/3LTzER7s4InaWZYQleNokxAI2E1RRhyxagRuRWYKm7fCukCK0yNnbWLQhbVUyoQkSKs3iAFWpKyTo8LVPUlERzXiFy06KJBeYvyxnjaotMGHbXoqDHqFukG0RbRxjhp0aRBwxYNS9etNupf399NcfKy77/qe0O72ndgFWvgKXgOesAHr8EBeAsGYAQo+Ap+gd/gj/PB+ex8cb6tSp1O7dkEV8L58RchFfXb</latexit> `1<latexit sha1_base64="qaOsBLyU0uGok6bykghRg/MN0o0=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0V9Bj04jGCeUCyhNlJbzJmdmaZmRVCyD948aCIV//Hm3/jJNmDJhY0FFXddHdFqeDG+v63t7K6tr6xWdgqbu/s7u2XDg4bRmWaYZ0poXQrogYFl1i33ApspRppEglsRsPbqd98Qm24kg92lGKY0L7kMWfUOqnRQSG6QbdU9iv+DGSZBDkpQ45at/TV6SmWJSgtE9SYduCnNhxTbTkTOCl2MoMpZUPax7ajkiZowvHs2gk5dUqPxEq7kpbM1N8TY5oYM0oi15lQOzCL3lT8z2tnNr4Ox1ymmUXJ5oviTBCryPR10uMamRUjRyjT3N1K2IBqyqwLqOhCCBZfXiaN80pwUfHvL8vVmzyOAhzDCZxBAFdQhTuoQR0YPMIzvMKbp7wX7937mLeuePnMEfyB9/kDNcaO4Q==</latexit> Vectorized Label Image
  • 18. Partial Loss-Functions Example: for non-linear regression type problems: Want to use sparse labels directly -> partial loss-function: Related, but different from point-annotations l(f(y, ✓), c) = NX i=1 |f(y, ✓)i ci| . <latexit sha1_base64="xlLzo4reQHU9IrqrTS8+bm7njP8=">AAADBHicbZHLbtQwFIY94VbCpVO6ZGOoKk2lMkpgQTeVKrGBTTWjdjpTjYfIdpwZq04c2U6rKM2WPS/AE7DjsuQ9YAsPgjOTRLTlSJZ/fef8OvY5JBVcG8/72XFu3b5z997afffBw0eP17sbT060zBRlIyqFVBOCNRM8YSPDjWCTVDEcE8HG5OxNlR+fM6W5TI5NnrJZjOcJjzjFxqKgeyh6UQ+RfBcdmwUzeGcXEboD9yHSWRwUfN8v3x9CJFhkLuHVyoDDF9BW2xspPl+Yy74bdLe8vrcMeFP4tdg6GKx/fLa9+X0QbHQ+oVDSLGaJoQJrPfW91MwKrAyngpUuyjRLMT3Dcza1MsEx07Ni+fESblsSwkgqexIDl/RfR4FjrfOY2MoYm4W+nqvg/3LTzER7s4InaWZYQleNokxAI2E1RRhyxagRuRWYKm7fCukCK0yNnbWLQhbVUyoQkSKs3iAFWpKyTo8LVPUlERzXiFy06KJBeYvyxnjaotMGHbXoqDHqFukG0RbRxjhp0aRBwxYNS9etNupf399NcfKy77/qe0O72ndgFWvgKXgOesAHr8EBeAsGYAQo+Ap+gd/gj/PB+ex8cb6tSp1O7dkEV8L58RchFfXb</latexit> `1<latexit sha1_base64="qaOsBLyU0uGok6bykghRg/MN0o0=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0V9Bj04jGCeUCyhNlJbzJmdmaZmRVCyD948aCIV//Hm3/jJNmDJhY0FFXddHdFqeDG+v63t7K6tr6xWdgqbu/s7u2XDg4bRmWaYZ0poXQrogYFl1i33ApspRppEglsRsPbqd98Qm24kg92lGKY0L7kMWfUOqnRQSG6QbdU9iv+DGSZBDkpQ45at/TV6SmWJSgtE9SYduCnNhxTbTkTOCl2MoMpZUPax7ajkiZowvHs2gk5dUqPxEq7kpbM1N8TY5oYM0oi15lQOzCL3lT8z2tnNr4Ox1ymmUXJ5oviTBCryPR10uMamRUjRyjT3N1K2IBqyqwLqOhCCBZfXiaN80pwUfHvL8vVmzyOAhzDCZxBAFdQhTuoQR0YPMIzvMKbp7wX7937mLeuePnMEfyB9/kDNcaO4Q==</latexit> l⌦(f(y, ✓), c⌦) = X i2⌦ |f(y, ✓)i ci| . <latexit sha1_base64="SB5F5Jd6/J3GaL63CS18bq2VEqk=">AAADGXicbZFPa9RAGMZn478a/3SrRy+Di7CFuiRV0ItQ8KInu7Tb3bKzhJnJZHfoZBJm3lhCmk8i+F28qVdPfgqvejPZTYJtfWHg4fe8D+/wvixV0oLn/ew5N27eun1n66577/6Dh9v9nUcnNskMFxOeqMTMGLVCSS0mIEGJWWoEjZkSU3b2tvanH4WxMtHHkKdiEdOllpHkFCoU9LkKyIdYLOkwGhKW75FjWAmgu3uE8cbZxW8wsVkcFBITqfGGlpgoEcEFvpwLJH6O62zVa+RyBRcjN+gPvJG3Lnxd+I0YoKYOg53eZxImPIuFBq6otXPfS2FRUAOSK1G6JLMipfyMLsW8kprGwi6K9TZK/KwiIY4SUz0NeE3/TRQ0tjaPWdUZU1jZq14N/+fNM4heLwqp0wyE5ptBUaYwJLheLQ6lERxUXgnKjaz+ivmKGsqhOoBLQhE1WyoIS1RY/yFRZE3Kxp4WpJ7LIjxtEDvv0HmL8g7lbfC0Q6ctOurQURu0HbIt4h3ibXDWoVmLxh0al65bX9S/er/r4mR/5L8Y7Y9fDg7eN7fdQk/QUzREPnqFDtA7dIgmiKNv6Bf6jf44n5wvzlfn+6bV6TWZx+hSOT/+AmzM/Fo=</latexit> [Bearman et al., 2016]
  • 19. Partial Loss-Functions Example: for non-linear regression type problems: 1) compute full forward-pass using full data 2) compute misfit & grad from subsampled l(f(y, ✓), c) = NX i=1 |f(y, ✓)i ci| . <latexit sha1_base64="xlLzo4reQHU9IrqrTS8+bm7njP8=">AAADBHicbZHLbtQwFIY94VbCpVO6ZGOoKk2lMkpgQTeVKrGBTTWjdjpTjYfIdpwZq04c2U6rKM2WPS/AE7DjsuQ9YAsPgjOTRLTlSJZ/fef8OvY5JBVcG8/72XFu3b5z997afffBw0eP17sbT060zBRlIyqFVBOCNRM8YSPDjWCTVDEcE8HG5OxNlR+fM6W5TI5NnrJZjOcJjzjFxqKgeyh6UQ+RfBcdmwUzeGcXEboD9yHSWRwUfN8v3x9CJFhkLuHVyoDDF9BW2xspPl+Yy74bdLe8vrcMeFP4tdg6GKx/fLa9+X0QbHQ+oVDSLGaJoQJrPfW91MwKrAyngpUuyjRLMT3Dcza1MsEx07Ni+fESblsSwkgqexIDl/RfR4FjrfOY2MoYm4W+nqvg/3LTzER7s4InaWZYQleNokxAI2E1RRhyxagRuRWYKm7fCukCK0yNnbWLQhbVUyoQkSKs3iAFWpKyTo8LVPUlERzXiFy06KJBeYvyxnjaotMGHbXoqDHqFukG0RbRxjhp0aRBwxYNS9etNupf399NcfKy77/qe0O72ndgFWvgKXgOesAHr8EBeAsGYAQo+Ap+gd/gj/PB+ex8cb6tSp1O7dkEV8L58RchFfXb</latexit> `1<latexit sha1_base64="qaOsBLyU0uGok6bykghRg/MN0o0=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0V9Bj04jGCeUCyhNlJbzJmdmaZmRVCyD948aCIV//Hm3/jJNmDJhY0FFXddHdFqeDG+v63t7K6tr6xWdgqbu/s7u2XDg4bRmWaYZ0poXQrogYFl1i33ApspRppEglsRsPbqd98Qm24kg92lGKY0L7kMWfUOqnRQSG6QbdU9iv+DGSZBDkpQ45at/TV6SmWJSgtE9SYduCnNhxTbTkTOCl2MoMpZUPax7ajkiZowvHs2gk5dUqPxEq7kpbM1N8TY5oYM0oi15lQOzCL3lT8z2tnNr4Ox1ymmUXJ5oviTBCryPR10uMamRUjRyjT3N1K2IBqyqwLqOhCCBZfXiaN80pwUfHvL8vVmzyOAhzDCZxBAFdQhTuoQR0YPMIzvMKbp7wX7937mLeuePnMEfyB9/kDNcaO4Q==</latexit> f(y, ✓)<latexit sha1_base64="kJHO6G9+4Z69IfUiML9FEDGsEJ0=">AAACxXicbZFNa9tAEIbX6kdS9SNOe+xliSmktBgpPTRH0x7aY0zi2MFrzO5qFC9ZacXuKEEI0z/RW6Gn9kfl31SyJdEmHVh4ed59mWFGZFo5DILbnvfg4aPHO7tP/KfPnr/Y6++/PHcmtxIm0mhjZ4I70CqFCSrUMMss8ERomIqrz7U/vQbrlEnPsMhgkfDLVMVKcqzQsr8XHzJRvGdnuALkb/1lfxAMg03R+yJsxGB0wN59vx0VJ8v93k8WGZknkKLU3Ll5GGS4KLlFJTWsfZY7yLi84pcwr2TKE3CLcjP5mr6pSERjY6uXIt3QvxMlT5wrElH9TDiu3F2vhv/z5jnGx4tSpVmOkMptozjXFA2t10AjZUGiLirBpVXVrFSuuOUSq2X5LIK4WUnJhNFRPYPRbEPWjT0tWd1XxHTaIHHToZsWFR0q2uBFhy5adNqh0zboOuRaJDsk2+CsQ7MWjTs0Xvt+fdHw7v3ui/OjYfhhGIzDwegT2dYueU0OyCEJyUcyIl/JCZkQSXLyg/wiv70vXuKhd7396vWazCvyT3nf/gDWud8Y</latexit> f(y, ✓)i, i 2 ⌦<latexit sha1_base64="ZJZndQ4xSHJI1Atq/jvaODmI0Mk=">AAAC2nicbZFNb9NAEIY35qPFfDSFI5dVA1IRVWTDoainCC7caNSmSZWNot31OFl1vWvtrqksKxduwLVnfgNX+Cf9N9iJbUHLSCu9emZezewMS6WwLgiuO96du/fub20/8B8+evxkp7v79MzqzHAYcS21mTBqQQoFIyechElqgCZMwphdfKjy489grNDq1OUpzBK6UCIWnLoSzbsv4n3C8gNy6pbg6Ku5OMDkiBxhgYlQmHxKYEH9ebcX9IN14NsirEVvsEdeX10P8uP5bucHiTTPElCOS2rtNAxSNyuocYJLWPkks5BSfkEXMC2lognYWbH+zgq/LEmEY23Kpxxe078dBU2szRNWVibULe3NXAX/l5tmLn43K4RKMweKbxrFmcRO42o3OBIGuJN5KSg3opwV8yU1lLtygz6JIK73VBCmZVTNoCVZk1WdHhek6stiPK4Ru2zRZYPyFuWN8bxF5w06adFJY7Qtsg3iLeKNcdKiSYOGLRqufL+6aHjzfrfF2Zt++LYfDMPe4D3axDZ6jvbQPgrRIRqgj+gYjRBH39BP9Av99oj3xfvqfd+Uep3a8wz9E97VH7Ag5ms=</latexit>
  • 20. Partial Loss-Functions Example: for non-linear regression type problems: 1) compute full forward-pass using full data 2) compute misfit & grad from subsampled l(f(y, ✓), c) = NX i=1 |f(y, ✓)i ci| . <latexit sha1_base64="xlLzo4reQHU9IrqrTS8+bm7njP8=">AAADBHicbZHLbtQwFIY94VbCpVO6ZGOoKk2lMkpgQTeVKrGBTTWjdjpTjYfIdpwZq04c2U6rKM2WPS/AE7DjsuQ9YAsPgjOTRLTlSJZ/fef8OvY5JBVcG8/72XFu3b5z997afffBw0eP17sbT060zBRlIyqFVBOCNRM8YSPDjWCTVDEcE8HG5OxNlR+fM6W5TI5NnrJZjOcJjzjFxqKgeyh6UQ+RfBcdmwUzeGcXEboD9yHSWRwUfN8v3x9CJFhkLuHVyoDDF9BW2xspPl+Yy74bdLe8vrcMeFP4tdg6GKx/fLa9+X0QbHQ+oVDSLGaJoQJrPfW91MwKrAyngpUuyjRLMT3Dcza1MsEx07Ni+fESblsSwkgqexIDl/RfR4FjrfOY2MoYm4W+nqvg/3LTzER7s4InaWZYQleNokxAI2E1RRhyxagRuRWYKm7fCukCK0yNnbWLQhbVUyoQkSKs3iAFWpKyTo8LVPUlERzXiFy06KJBeYvyxnjaotMGHbXoqDHqFukG0RbRxjhp0aRBwxYNS9etNupf399NcfKy77/qe0O72ndgFWvgKXgOesAHr8EBeAsGYAQo+Ap+gd/gj/PB+ex8cb6tSp1O7dkEV8L58RchFfXb</latexit> `1<latexit sha1_base64="qaOsBLyU0uGok6bykghRg/MN0o0=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0V9Bj04jGCeUCyhNlJbzJmdmaZmRVCyD948aCIV//Hm3/jJNmDJhY0FFXddHdFqeDG+v63t7K6tr6xWdgqbu/s7u2XDg4bRmWaYZ0poXQrogYFl1i33ApspRppEglsRsPbqd98Qm24kg92lGKY0L7kMWfUOqnRQSG6QbdU9iv+DGSZBDkpQ45at/TV6SmWJSgtE9SYduCnNhxTbTkTOCl2MoMpZUPax7ajkiZowvHs2gk5dUqPxEq7kpbM1N8TY5oYM0oi15lQOzCL3lT8z2tnNr4Ox1ymmUXJ5oviTBCryPR10uMamRUjRyjT3N1K2IBqyqwLqOhCCBZfXiaN80pwUfHvL8vVmzyOAhzDCZxBAFdQhTuoQR0YPMIzvMKbp7wX7937mLeuePnMEfyB9/kDNcaO4Q==</latexit> f(y, ✓)<latexit sha1_base64="kJHO6G9+4Z69IfUiML9FEDGsEJ0=">AAACxXicbZFNa9tAEIbX6kdS9SNOe+xliSmktBgpPTRH0x7aY0zi2MFrzO5qFC9ZacXuKEEI0z/RW6Gn9kfl31SyJdEmHVh4ed59mWFGZFo5DILbnvfg4aPHO7tP/KfPnr/Y6++/PHcmtxIm0mhjZ4I70CqFCSrUMMss8ERomIqrz7U/vQbrlEnPsMhgkfDLVMVKcqzQsr8XHzJRvGdnuALkb/1lfxAMg03R+yJsxGB0wN59vx0VJ8v93k8WGZknkKLU3Ll5GGS4KLlFJTWsfZY7yLi84pcwr2TKE3CLcjP5mr6pSERjY6uXIt3QvxMlT5wrElH9TDiu3F2vhv/z5jnGx4tSpVmOkMptozjXFA2t10AjZUGiLirBpVXVrFSuuOUSq2X5LIK4WUnJhNFRPYPRbEPWjT0tWd1XxHTaIHHToZsWFR0q2uBFhy5adNqh0zboOuRaJDsk2+CsQ7MWjTs0Xvt+fdHw7v3ui/OjYfhhGIzDwegT2dYueU0OyCEJyUcyIl/JCZkQSXLyg/wiv70vXuKhd7396vWazCvyT3nf/gDWud8Y</latexit> f(y, ✓)i, i 2 ⌦<latexit sha1_base64="ZJZndQ4xSHJI1Atq/jvaODmI0Mk=">AAAC2nicbZFNb9NAEIY35qPFfDSFI5dVA1IRVWTDoainCC7caNSmSZWNot31OFl1vWvtrqksKxduwLVnfgNX+Cf9N9iJbUHLSCu9emZezewMS6WwLgiuO96du/fub20/8B8+evxkp7v79MzqzHAYcS21mTBqQQoFIyechElqgCZMwphdfKjy489grNDq1OUpzBK6UCIWnLoSzbsv4n3C8gNy6pbg6Ku5OMDkiBxhgYlQmHxKYEH9ebcX9IN14NsirEVvsEdeX10P8uP5bucHiTTPElCOS2rtNAxSNyuocYJLWPkks5BSfkEXMC2lognYWbH+zgq/LEmEY23Kpxxe078dBU2szRNWVibULe3NXAX/l5tmLn43K4RKMweKbxrFmcRO42o3OBIGuJN5KSg3opwV8yU1lLtygz6JIK73VBCmZVTNoCVZk1WdHhek6stiPK4Ru2zRZYPyFuWN8bxF5w06adFJY7Qtsg3iLeKNcdKiSYOGLRqufL+6aHjzfrfF2Zt++LYfDMPe4D3axDZ6jvbQPgrRIRqgj+gYjRBH39BP9Av99oj3xfvqfd+Uep3a8wz9E97VH7Ag5ms=</latexit> Can train (randomized) a network on single image using SGD and partial loss
  • 21. Partial Loss-Functions 1. compute full forward-pass using full data 2. compute misfit & grad from subsampled Same procedure as in seismic full-waveform inversion: 1. Forward propagate wavefield from source 2. Sample wavefield at receivers and compute misfit & gradient f(y, ✓)<latexit sha1_base64="kJHO6G9+4Z69IfUiML9FEDGsEJ0=">AAACxXicbZFNa9tAEIbX6kdS9SNOe+xliSmktBgpPTRH0x7aY0zi2MFrzO5qFC9ZacXuKEEI0z/RW6Gn9kfl31SyJdEmHVh4ed59mWFGZFo5DILbnvfg4aPHO7tP/KfPnr/Y6++/PHcmtxIm0mhjZ4I70CqFCSrUMMss8ERomIqrz7U/vQbrlEnPsMhgkfDLVMVKcqzQsr8XHzJRvGdnuALkb/1lfxAMg03R+yJsxGB0wN59vx0VJ8v93k8WGZknkKLU3Ll5GGS4KLlFJTWsfZY7yLi84pcwr2TKE3CLcjP5mr6pSERjY6uXIt3QvxMlT5wrElH9TDiu3F2vhv/z5jnGx4tSpVmOkMptozjXFA2t10AjZUGiLirBpVXVrFSuuOUSq2X5LIK4WUnJhNFRPYPRbEPWjT0tWd1XxHTaIHHToZsWFR0q2uBFhy5adNqh0zboOuRaJDsk2+CsQ7MWjTs0Xvt+fdHw7v3ui/OjYfhhGIzDwegT2dYueU0OyCEJyUcyIl/JCZkQSXLyg/wiv70vXuKhd7396vWazCvyT3nf/gDWud8Y</latexit> f(y, ✓)i, i 2 ⌦<latexit sha1_base64="ZJZndQ4xSHJI1Atq/jvaODmI0Mk=">AAAC2nicbZFNb9NAEIY35qPFfDSFI5dVA1IRVWTDoainCC7caNSmSZWNot31OFl1vWvtrqksKxduwLVnfgNX+Cf9N9iJbUHLSCu9emZezewMS6WwLgiuO96du/fub20/8B8+evxkp7v79MzqzHAYcS21mTBqQQoFIyechElqgCZMwphdfKjy489grNDq1OUpzBK6UCIWnLoSzbsv4n3C8gNy6pbg6Ku5OMDkiBxhgYlQmHxKYEH9ebcX9IN14NsirEVvsEdeX10P8uP5bucHiTTPElCOS2rtNAxSNyuocYJLWPkks5BSfkEXMC2lognYWbH+zgq/LEmEY23Kpxxe078dBU2szRNWVibULe3NXAX/l5tmLn43K4RKMweKbxrFmcRO42o3OBIGuJN5KSg3opwV8yU1lLtygz6JIK73VBCmZVTNoCVZk1WdHhek6stiPK4Ru2zRZYPyFuWN8bxF5w06adFJY7Qtsg3iLeKNcdKiSYOGLRqufL+6aHjzfrfF2Zt++LYfDMPe4D3axDZ6jvbQPgrRIRqgj+gYjRBH39BP9Av99oj3xfvqfd+Uep3a8wz9E97VH7Ag5ms=</latexit>
  • 25. Example: label interpolation Probability map for one of the classes Thresholded: argmax class per pixel
  • 27. Network output regularization What if we only have very few known labels? • standard approach: data augmentation • alternatively: regularization of network output Any network for semantic segmentation: f(✓, y) : RN ! RN⇥nclass <latexit sha1_base64="iU3v/Sa32WgU3CLbkCkO3RSck7c=">AAADGHicbZFPb9MwGMbd8G+Efx0cuVhMSJuEqmQcQJwm9QIXWNm6dmpKZTtOa82JI/sNJYoi8Tn4Fhy4c0NcuSGu8D1w0iQaG68U6dHv8WO/eV+aSmHA8372nCtXr12/sXXTvXX7zt17/e37J0ZlmvExU1LpKSWGS5HwMQiQfJpqTmIq+YSeDSt/8p5rI1RyDHnK5zFZJiISjIBFi/6baDc4hhUH8iSg+R5+gYOYwIrS4m357jUOtFiugGit1ueNwjogYm5wsgiAf4CCSWJMWbqL/o438OrCl4XfiJ2DvY+oqsPFdu9zECqWxTyB+pKZ76UwL4gGwSQv3SAzPCXsjCz5zMqE2GfnRf3nJX5sSYgjpe2XAK7p+URBYmPymNqTVffmolfB/3mzDKLn80IkaQY8YZuHokxiULgaIw6F5gxkbgVhWtheMVsRTRjYYbtByKNmrEVAlQyrHpQMalI29rDYTDTCwxZNOjRpEF13aN2ivEN5Gzzt0GmLjjp01AZNh0yLWIdYG5x2aNqiUYdGpetWS/YvrvSyONkf+E8H3shu+xXa1BZ6iB6hXeSjZ+gAvUSHaIwY+oJ+od/oj/PJ+ep8c75vjjq9JvMA/VPOj7+U8v8s</latexit>
  • 28. Network output regularization What if we only have very few known labels? • standard approach: data augmentation • alternatively: regularization of network output Add prior knowledge via a penalty function (per class): L(y, ✓, C) = l(y, ✓, C) + ↵ nclassX j=1 r(f(✓, y)j) <latexit sha1_base64="uEdpajUqLxgvsDWDscpch5lHP8M=">AAADLXicbdHNb9MwFABwN3yN8LEOjlwsKqRWQJUAElwmTfTCgcOqrWunukS246zeHCeyHUYU5W9C/BvcOSAhrly5wokkSyxYsRTl6ff8bOs9kgqujed97TlXrl67fmPrpnvr9p272/2de0c6yRRlM5qIRC0I1kxwyWaGG8EWqWI4JoLNydmkzs/fM6V5Ig9NnrJVjE8kjzjFpqKgH74dIpI/QYdmzQyu/pMRhLvwKRSb/hgiLNI1hkhncVCc7vrlu0IGyLAPpqACa12WUA2joa0i+Sg4HblBf+CNvWbBzcBvgwFo136w0/uEwoRmMZOmOXjpe6lZFVgZTgUrXZRplmJ6hk/YsgoljpleFU07SviokhBGiao+aWCjf1cUONY6j0m1M8ZmrS/navxfbpmZ6NWq4DLNDJP04qIoE9AksO4tDLli1Ii8CjBVvHorpGusMDXVBFwUsqjtTIFIIsL6DYlAjZRtelKg+l4SwUlHc0vzlsi5pfOOckt5V3hs6bijA0sHXaG2pDuilmhXuLC06GhqaVq6bj1k//JIN4OjZ2P/+dibvhjsvW7HvQUegIdgCHzwEuyBN2AfzAAFn8FP8Av8dj46X5xvzveLrU6vrbkP/lnOjz9TuAMv</latexit> Total loss
  • 29. Network output regularization What if we only have very few known labels? • standard approach: data augmentation • alternatively: regularization of network output Add prior knowledge via a penalty function (per class): L(y, ✓, C) = l(y, ✓, C) + ↵ nclassX j=1 r(f(✓, y)j) <latexit sha1_base64="uEdpajUqLxgvsDWDscpch5lHP8M=">AAADLXicbdHNb9MwFABwN3yN8LEOjlwsKqRWQJUAElwmTfTCgcOqrWunukS246zeHCeyHUYU5W9C/BvcOSAhrly5wokkSyxYsRTl6ff8bOs9kgqujed97TlXrl67fmPrpnvr9p272/2de0c6yRRlM5qIRC0I1kxwyWaGG8EWqWI4JoLNydmkzs/fM6V5Ig9NnrJVjE8kjzjFpqKgH74dIpI/QYdmzQyu/pMRhLvwKRSb/hgiLNI1hkhncVCc7vrlu0IGyLAPpqACa12WUA2joa0i+Sg4HblBf+CNvWbBzcBvgwFo136w0/uEwoRmMZOmOXjpe6lZFVgZTgUrXZRplmJ6hk/YsgoljpleFU07SviokhBGiao+aWCjf1cUONY6j0m1M8ZmrS/navxfbpmZ6NWq4DLNDJP04qIoE9AksO4tDLli1Ii8CjBVvHorpGusMDXVBFwUsqjtTIFIIsL6DYlAjZRtelKg+l4SwUlHc0vzlsi5pfOOckt5V3hs6bijA0sHXaG2pDuilmhXuLC06GhqaVq6bj1k//JIN4OjZ2P/+dibvhjsvW7HvQUegIdgCHzwEuyBN2AfzAAFn8FP8Av8dj46X5xvzveLrU6vrbkP/lnOjz9TuAMv</latexit> Multi-class cross-entropy
  • 30. Network output regularization What if we only have very few known labels? • standard approach: data augmentation • alternatively: regularization of network output Add prior knowledge via a penalty function (per class): L(y, ✓, C) = l(y, ✓, C) + ↵ nclassX j=1 r(f(✓, y)j) <latexit sha1_base64="uEdpajUqLxgvsDWDscpch5lHP8M=">AAADLXicbdHNb9MwFABwN3yN8LEOjlwsKqRWQJUAElwmTfTCgcOqrWunukS246zeHCeyHUYU5W9C/BvcOSAhrly5wokkSyxYsRTl6ff8bOs9kgqujed97TlXrl67fmPrpnvr9p272/2de0c6yRRlM5qIRC0I1kxwyWaGG8EWqWI4JoLNydmkzs/fM6V5Ig9NnrJVjE8kjzjFpqKgH74dIpI/QYdmzQyu/pMRhLvwKRSb/hgiLNI1hkhncVCc7vrlu0IGyLAPpqACa12WUA2joa0i+Sg4HblBf+CNvWbBzcBvgwFo136w0/uEwoRmMZOmOXjpe6lZFVgZTgUrXZRplmJ6hk/YsgoljpleFU07SviokhBGiao+aWCjf1cUONY6j0m1M8ZmrS/navxfbpmZ6NWq4DLNDJP04qIoE9AksO4tDLli1Ii8CjBVvHorpGusMDXVBFwUsqjtTIFIIsL6DYlAjZRtelKg+l4SwUlHc0vzlsi5pfOOckt5V3hs6bijA0sHXaG2pDuilmhXuLC06GhqaVq6bj1k//JIN4OjZ2P/+dibvhjsvW7HvQUegIdgCHzwEuyBN2AfzAAFn8FP8Av8dj46X5xvzveLrU6vrbkP/lnOjz9TuAMv</latexit> Regularization term
  • 31. Network output regularization Add prior knowledge via a penalty function (per class): Following example uses quadratic smoothing function: L(y, ✓, C) = l(y, ✓, C) + ↵ nclassX j=1 r(f(✓, y)j) <latexit sha1_base64="uEdpajUqLxgvsDWDscpch5lHP8M=">AAADLXicbdHNb9MwFABwN3yN8LEOjlwsKqRWQJUAElwmTfTCgcOqrWunukS246zeHCeyHUYU5W9C/BvcOSAhrly5wokkSyxYsRTl6ff8bOs9kgqujed97TlXrl67fmPrpnvr9p272/2de0c6yRRlM5qIRC0I1kxwyWaGG8EWqWI4JoLNydmkzs/fM6V5Ig9NnrJVjE8kjzjFpqKgH74dIpI/QYdmzQyu/pMRhLvwKRSb/hgiLNI1hkhncVCc7vrlu0IGyLAPpqACa12WUA2joa0i+Sg4HblBf+CNvWbBzcBvgwFo136w0/uEwoRmMZOmOXjpe6lZFVgZTgUrXZRplmJ6hk/YsgoljpleFU07SviokhBGiao+aWCjf1cUONY6j0m1M8ZmrS/navxfbpmZ6NWq4DLNDJP04qIoE9AksO4tDLli1Ii8CjBVvHorpGusMDXVBFwUsqjtTIFIIsL6DYlAjZRtelKg+l4SwUlHc0vzlsi5pfOOckt5V3hs6bijA0sHXaG2pDuilmhXuLC06GhqaVq6bj1k//JIN4OjZ2P/+dibvhjsvW7HvQUegIdgCHzwEuyBN2AfzAAFn8FP8Av8dj46X5xvzveLrU6vrbkP/lnOjz9TuAMv</latexit> r(f(✓, y)) = 1 2 nclassX j=1 k ✓ ↵1(Iy ⌦ Dx) ↵2(Dy ⌦ Ix) ◆ f(✓, y)jk2 2 <latexit sha1_base64="un5G130BWDWLxd9PeoVMt92kwBI=">AAADfnicbZFbT9swHMUdsgvLboU98mKt2lSkrSTdJHhBYisPY3uhgtKiukSO47QG56LYGY1MPtQe9mH2bea0ScZlliz99Ts+9pGPl3AmpG3/MdbMR4+fPF1/Zj1/8fLV69bG5pmIs5TQIYl5nI49LChnER1KJjkdJynFocfpyLvql/roJ00Fi6NTmSd0GuJZxAJGsNTIbf1OO0EHnco5lfgD8vLtbQj3IQpSTJRTqF4BkchCV13uO8WFilwk6UIqwrEQhdZuIEQenbFIJSGWKVtohnkyx64DO0duDlEsWUgFPHQX+maEYK33YOfwln5U6ohG/r+L7uZyL7X15qLn9izLclttu2svF3w4ONXQBtU6djeMX8iPSRbSSC7DTxw7kVOFU8kIp4WFMkETTK7wjE70GGEdaqqW/1vAd5r4MIhTvSMJl/S2Q+FQiDz09Ekdfi7uayX8nzbJZLA3VSxKMkkjsnooyDiUMSzLgj5LKZE81wMmKdNZIZlj3Y3UlVrIp0H1RQp5MffLDDFHS1JUcl+h8l0vgP0ajRo0qpB33aDrGuUNymvjeYPOa3TSoJPaKBokakQaRGrjuEHjGg0aNChWJTv3K304nPW6zqeuPfjcPvha1b0OtsBb0AEO2AUH4Bs4BkNAjC3ji/Hd+GEC87350dxZHV0zKs8bcGeZe38BFv0aDQ==</latexit>
  • 32. Network output regularization Add prior knowledge via a penalty function (per class): Compare to weight decay / Tikhonov regularization: L(y, ✓, C) = l(y, ✓, C) + ↵ nclassX j=1 r(f(✓, y)j) <latexit sha1_base64="uEdpajUqLxgvsDWDscpch5lHP8M=">AAADLXicbdHNb9MwFABwN3yN8LEOjlwsKqRWQJUAElwmTfTCgcOqrWunukS246zeHCeyHUYU5W9C/BvcOSAhrly5wokkSyxYsRTl6ff8bOs9kgqujed97TlXrl67fmPrpnvr9p272/2de0c6yRRlM5qIRC0I1kxwyWaGG8EWqWI4JoLNydmkzs/fM6V5Ig9NnrJVjE8kjzjFpqKgH74dIpI/QYdmzQyu/pMRhLvwKRSb/hgiLNI1hkhncVCc7vrlu0IGyLAPpqACa12WUA2joa0i+Sg4HblBf+CNvWbBzcBvgwFo136w0/uEwoRmMZOmOXjpe6lZFVgZTgUrXZRplmJ6hk/YsgoljpleFU07SviokhBGiao+aWCjf1cUONY6j0m1M8ZmrS/navxfbpmZ6NWq4DLNDJP04qIoE9AksO4tDLli1Ii8CjBVvHorpGusMDXVBFwUsqjtTIFIIsL6DYlAjZRtelKg+l4SwUlHc0vzlsi5pfOOckt5V3hs6bijA0sHXaG2pDuilmhXuLC06GhqaVq6bj1k//JIN4OjZ2P/+dibvhjsvW7HvQUegIdgCHzwEuyBN2AfzAAFn8FP8Av8dj46X5xvzveLrU6vrbkP/lnOjz9TuAMv</latexit> L(y, ✓, C) = l(y, ✓, C) + ↵ nkernelsX k=1 r(✓k) <latexit sha1_base64="2FfgD08yykDRWFRkcdQSHfA+4H0=">AAADKHicbdHNb9MwFABwN3yN8NXBkYtFhdQJqJKBtF0mTfTCgcOqrWunukSO87JacZzIdhhRlH+IG38Ed25oV65c4U7SJRGsPCnK0+/5ydZ7fiq4No5z2bNu3Lx1+87WXfve/QcPH/W3H5/qJFMMpiwRiZr7VIPgEqaGGwHzVAGNfQEzPxrX9dlHUJon8sTkKSxjei55yBk1FXl97/2Q+PlLcmJWYGj1H+9gfIBfYbHpLzChIl1RTHQWe0V04JYfCukRA59MEYGSIHRZYjVsurxox/b6A2fkrANvJm6TDFATR9527wsJEpbFIA0TVOuF66RmWVBlOBNQ2iTTkFIW0XNYVKmkMehlsZ5EiZ9XEuAwUdUnDV7r3x0FjbXOY786GVOz0tdrNf6vtshMuL8suEwzA5JdXRRmApsE12PFAVfAjMirhDLFq7ditqKKMlMN3yYBhM1QCuInIqjfkAiylrIpjwtS3+uHeNzSrKNZQ/5FRxct5R3lbeNZR2ctHXd03DbqjnRLrCPWNs47mrc06WhS2na9ZPf6SjeT092R+3q0O3kzOHzbrHsLPUXP0BC5aA8donfoCE0RQ1/RT/QL/bY+W9+s79bl1VGr1/Q8Qf+E9eMPBMMBxQ==</latexit>
  • 33. Example: label interpolation Thresholded: argmax class per pixel Prediction from regularized training for one of the classes
  • 34. Networks vs PDE-constrained optimization train a network to track more than one horizon simultaneously? (2) How do networks deal with multiple horizons that merge and split? These two questions warrant a new look at the automatic horizon tracking/interpolation problem because results with Conclusions In this paper, we introduced DNNs from an inverse p point of view. We have shown that the network can be con as the forward problem and the training as the inverse pr Geophysical forward problem Geophysical inverse problem Neural network Discrete problem structure Discretized differential opera- tors in time-stepping scheme Network structure w.r.t. Yj , e.g., Yj+1 =Yj −Kj T σ KjYj + Bj( ) Model parameters Known physical parameters Unknown physical parameters Unknown convolutional kernels with unclear meaning Model parameter regularization Tikhonov regularization on physical model parameters Weight decay on kernels and biases Output/state regularization Would be equivalent to: regu- larization of final elastic/elec- tromagnetic field Tikhonov regularization on final network state (probability maps) Table 1. Overview of similarities and differences between geophysical forward/inverse modeling and neural networks for interpretation of geophysical data/imag
  • 35. Software Many geoscience problems have similar structure: • one/few large data/label images ( pix.) • small number of labeled points at sparse locations • all examples were trained on 1 GPU, < 1 hour Same algorithms and code used for • seismic interpretation • aquifer mapping • mineral prospectively mapping 1000 ⇥ 1000<latexit sha1_base64="zYa2rcoFiRHcrdp1tLqeSdl53Sw=">AAAC3nicbdE7jxMxEABgZ3lcWF45HhWNRYREFXmPgisj0lBQJLrLJac4Cl6vN7HO+8Ce5bRapaVDtLS0UPEX+Bf8G7ybrAV3jGRp9I1HM7LDXEkDhPzueDdu3rp90L3j3713/8HD3uGjM5MVmospz1Sm5yEzQslUTEGCEvNcC5aESszCi1Fdn30U2sgsPYUyF8uErVMZS87A0qr3lK7FBxwQQjAFmQjT5KtenwxIE/h6EuyT/vDJu19j3V2MV4ednzTKeJGIFLhixiwCksOyYhokV2Lr08KInPELthYLm6bMjlpWzf5b/MJKhONM25MCbvTvjoolxpRJaG8mDDbmaq3G/9UWBcTHy0qmeQEi5btBcaEwZLh+DBxJLTio0iaMa2l3xXzDNONgn8ynkYjpKWwEsIqGmYrqHTJFG9nuy6OK1nPDGI9amjma7Sm8dHTZUumobBvPHZ23dOLopG00jkxL3BFvG+eO5i1NHE22vu/bTw6ufun15OxoELwakEnQH75Bu+iiZ+g5eokC9BoN0Vs0RlPEUYW+oe/oh/fe++R99r7srnqdfc9j9E94X/8AK+/m/g==</latexit>
  • 36. Aquifer mapping Delineating New Aquifers To demonstrate the scalability and power of CGI’s new AI methodology, a number of public regional datasets, including magnetic, gravity, topography and geology were used as inputs to map out the large-scale aquifers of Australia’s Northern Territory. A map of the known aquifer extents was used to validate the results. The data were processed at CGI and used to train our proprietary VNet AI algorithm, using only 1% of the known aquifer map as training targets. The algorithm is a purpose built deep neural
  • 37. Aquifer mapping identify large regional targets, and potentially to delineate new unidentified aquifers previously hidden in complex datasets. Prospec explora product is a co geoscie informa Training on small number of point-annotations
  • 38. Mineral prospectively structural interpretation, as well as a full suite of geochemical assaying. Gold had previously been identified in a handful of locations via hand-samples and drilling intercepts, and the goal of the project was to use these examples to train our VNet to highlight new targets in the region. Sample geoscience inputs from the Auryn property The algorithm was trained using a variety of different parameters and input combinations, and the resulting gold prospectivity maps were shown Predicted prospectivity map overlaid on the simplified geological interpretation. Known mineralized locations are plotted as gold stars. property of interest has seen extensive ation work over the years and contains many g geoscientific datasets, including airborne magnetics, magnetics, geological mapping, ral interpretation, as well as a full suite of emical assaying. ad previously been identified in a handful of ns via hand-samples and drilling intercepts, he goal of the project was to use these les to train our VNet to highlight new targets region. which generates confidence in the methodolo and its usefulness for exploration targeting. Predicted prospectivity map overlaid on the simplified geological interpretation. Known Predict
  • 40. Network Y0 = D Yj = g(Yj 1, Kj) , j = 1, 2, · · · , n (X Yn)<latexit sha1_base64="MZvW2k0U8C8OxOfVEff3FASbTK4=">AAAJ0nicddbZbtNAFIBhh7UJWwEJLrgZUYFASqsEEMsFEpAALWFJqLNRV9V4PG7c2mPXM05II0sgbnlBnoDXwE6OR9ATLFUd/d8ZJ3VS2Xbke1LVar9Kp06fOXvu/Eq5cuHipctXVq9e68kwiRnvstAP44FNJfc9wbvKUz4fRDGnge3zvn3YyL0/5rH0QmGqacR3A7ovPNdjVGVpb/X7Xcsc7tXIc2KZTWKJUCSBzWNiWaQyp9lBmuH+vcV6vZ5WLbOV1/vEOkqoQ6rkIBuoVx9ULeaESlbFP6fJzhKNvGz7gKyT+UlEvrWYqFT2VtdqG7X5QfCiDos1A4723tWVZ5YTsiTgQjGfSrlTr0Vqd0Zj5TGfpxUrkTyi7JDu851sKWjA5e5sfq1ScicrDnHDOPsRiszr3ztmNJByGtjZZEDVSJ60PC6znUS5T3dnnogSxQVbvJCb+ESFJL/wxPFizpQ/zRaUxV72Xgkb0ZgylX08lUrFcrhrmWrEFZ1Zdug7+bsIfWteUuDGzMpf2XZJo0hNnZpFaunUKtKWTluQ7IlOkyJNdZoWG4c6DYvU1qldpG2dtotzSZ1kkZhOrNjY16lfpJ5OvSINdBoUqaNTpzj9WKfx//8gO9IpSvPLLviEhUFAhZNddvdlOst/kZdpepJeAb3C1ABqYGoCNTG9BnqN6Q3QG0xvgd5i2gTaxLQFtIXpHdA7TC2gFqb3QO8xfQD6gOkj0EdMn4A+YWoDtTF1gDqYPgN9xrQNtI3JBDIxdYG6mHpAPUx9oD6mAdAA0xBoiOkL0Jd0yfeXAlK8zwayMTEgtuyUHJDjfSOgEaYDoANMh0CHmDwgD5MEkpi+An3FNAWaYkqAEkxHQEeYIqAIkwASmBwgB1MAFGCKgWJMLpCLaQw0xjQBmmA6Bjpe9hXI7uBz1nclYrVHSz6jSKK5PKE5nlU/FCdni4zmF/fAE9OLiP+Bl82aMDt/2HiWH4/1owVe9B5s1B9uPOw8WnvR+bZ47Fgxbhm3jXtG3XhivDA2jbbRNZjxu3SpdKN0s2yWj8vfyz8Wo6dK8Khy3fjnKP/8A6wEkew=</latexit> Initial state Data Loss Label
  • 41. Network - Notation Y ⌘ 2 6 6 6 4 Y 1 Y 2 ... Y nchan 3 7 7 7 5 <latexit sha1_base64="2kwQXlD3z36GpujklkN4bDHZGHM=">AAAJqXicddbJbtNAGMDxKWsTthaOXCwqJCSkKmkloCeWdEsLbdJmbadU4/G4MfVWzzhpavnVeAau3LnCM2AnnwfoFyxFtv6/GceZOIrN0HWkqlS+z924eev2nbvzpfK9+w8ePlpYfNyRQRxx0eaBG0Q9k0nhOr5oK0e5ohdGgnmmK7rmeS337lBE0gn8lhqH4sRjZ75jO5ypLJ0u9Girb1BxETvDMjXFmeMnpsdU5Fym5f7nqkFptlvJd3RoBUpOQ+KfUiUuVcIHzE/TCQvf+jO1XD5dWKosVyabgQ+qcLD09huZbI3Txfk1agU89oSvuMukPK5WQnWSsEg53BVpmcZShIyfszNxnB36zBPyJJksQWo8z4pl2EGUvXxlTOrfMxLmSTn2zGxkdo0Ded3yOMuOY2W/OUkcP4yV8Pn0jezYNVRg5OtpWE4kuHLH2QHjkZNdq5EtSsS4ylY9WwdqCZu21EAollAzcK38KgKXTkoKXEto/s6mbdSKtK7TepF2ddotUl2nOiRzpNOoSGOdxsXEvk79IjV0ahTpUKfD4lxSJ1kkrhMvJnZ16hapo1OnSD2dekVq6tQsTj/Uafj/D2SGOoX57Ud9MeKB57HsrqSm/T5N8p3xPk2v0wegD5hqQDVM60DrmDaANjBtAm1i2gLawrQNtI2pDlTHtAO0g2kXaBfTR6CPmD4BfcK0B7SHaR9oH1MDqIGpCdTEdAB0gOkQ6BBTC6iFqQ3UxtQB6mDqAnUx9YB6mPpAfUxHQEfpjPuXATI8zwQyMXEgPuuUAlDgeQOgAaYvQF8wnQOdY3KAHEwSSGK6BLrENAYaY4qBYkwXQBeYQqAQkw/kY7KALEwekIcpAoow2UA2piHQENMIaITpCuhq1i0QDqZfjP5XMmhjMOM7CiUalyc0TmTVDfzrY4uMxk//A6+Nnkb8A541tgVjJw8ba/n2Sj9a4IPOynJ1dXm1WVl6V58+dZB58pQ8Iy9Ilbwm78g2aZA24eQr+UF+kl+ll6VmqVc6mg69MQdznpB/thL/DVuyifg=</latexit> Rnx⇥ny⇥nz⇥nchan <latexit sha1_base64="6m4cyL0rDGHSsonDZCrMqw26Vfw=">AAAJjnicddbJbtpAGMBxJ90C3Uh77MVqVKmnCBopbQ5RFkgCIQuENYkpGg/j4OAt9rA4lp+mT9Nrc+rb1IbPk5aPjhRl9P/NGDyAQHUM3ePZ7O+l5SdPnz1/sZJKv3z1+s3bzOq7pmcPXcoa1DZst60Sjxm6xRpc5wZrOy4jpmqwljrIx94aMdfTbavOfYd1THJj6ZpOCY9SN7OtmIT3VTW4CL8HVnciK1w3mSdbXf9xev84VTib8ID2iRXKYbqbWcuuZ6dDxpMcTNZ2Hjamo9JdXdlSejYdmszi1CCed53LOrwTEJfr1GBhWhl6zCF0QG7YdTS1SPSwnWB6n6H8KSo9WbPd6M/i8rT+vSMgpuf5phqtjO/Km7c4LrLrIde+dQLdcoacWXT2QNrQkLktx4cm93SXUW740YRQV4+eqxwdgUsoj442nU4rPaYpdd5nnASKahu9+FnYhjItIXA+mJ21JueTVBCpkKSySOUklUQqQVLHIo2T5IvkJxsvRbpMUkWkSpJqItWSa3kieUmiItFkY0ukVpKaIjWT1BapnaSqSNXk8iORRv+/IdURyQnjY7fYmNqmSaxedOzaXhjE/+S9MJynfaB9THmgPKYCUAHTAdABpkOgQ0xHQEeYikBFTCWgEqZjoGNMZaAyphOgE0ynQKeYzoDOMJ0DnWOqAFUwVYGqmC6ALjDVgGqY6kB1TA2gBqYmUBNTC6iFqQ3UxnQJdInpCugqXPD+JYAE71OBVEwUiC66JANkeF8fqI/pFugW0wBogEkH0jF5QB6mCdAEkw/kYxoCDTHdAd1hcoAcTBaQhakH1MNkApmYXCAXkwakYRoBjTCNgcaY7oHuF70FnP7shRHfSrJS6S94jRwPrYsTWseiatjW/Noko/Wz78C51bOIP8CL1tZh7fTHxlY8NsVPCzxpflnPbaxvVLNru2fSbKxIH6SP0mcpJ32VdqWiVJEaEpV+SD+lX9JDKpPaTG2ndmZLl5dgz3vpn5Eq/gFmQYJ9</latexit> Write tensor-valued network state as a block-vector Y0 = D Yj = g(Yj 1, Kj) , j = 1, 2, · · · , n (X Yn)<latexit sha1_base64="MZvW2k0U8C8OxOfVEff3FASbTK4=">AAAJ0nicddbZbtNAFIBhh7UJWwEJLrgZUYFASqsEEMsFEpAALWFJqLNRV9V4PG7c2mPXM05II0sgbnlBnoDXwE6OR9ATLFUd/d8ZJ3VS2Xbke1LVar9Kp06fOXvu/Eq5cuHipctXVq9e68kwiRnvstAP44FNJfc9wbvKUz4fRDGnge3zvn3YyL0/5rH0QmGqacR3A7ovPNdjVGVpb/X7Xcsc7tXIc2KZTWKJUCSBzWNiWaQyp9lBmuH+vcV6vZ5WLbOV1/vEOkqoQ6rkIBuoVx9ULeaESlbFP6fJzhKNvGz7gKyT+UlEvrWYqFT2VtdqG7X5QfCiDos1A4723tWVZ5YTsiTgQjGfSrlTr0Vqd0Zj5TGfpxUrkTyi7JDu851sKWjA5e5sfq1ScicrDnHDOPsRiszr3ztmNJByGtjZZEDVSJ60PC6znUS5T3dnnogSxQVbvJCb+ESFJL/wxPFizpQ/zRaUxV72Xgkb0ZgylX08lUrFcrhrmWrEFZ1Zdug7+bsIfWteUuDGzMpf2XZJo0hNnZpFaunUKtKWTluQ7IlOkyJNdZoWG4c6DYvU1qldpG2dtotzSZ1kkZhOrNjY16lfpJ5OvSINdBoUqaNTpzj9WKfx//8gO9IpSvPLLviEhUFAhZNddvdlOst/kZdpepJeAb3C1ABqYGoCNTG9BnqN6Q3QG0xvgd5i2gTaxLQFtIXpHdA7TC2gFqb3QO8xfQD6gOkj0EdMn4A+YWoDtTF1gDqYPgN9xrQNtI3JBDIxdYG6mHpAPUx9oD6mAdAA0xBoiOkL0Jd0yfeXAlK8zwayMTEgtuyUHJDjfSOgEaYDoANMh0CHmDwgD5MEkpi+An3FNAWaYkqAEkxHQEeYIqAIkwASmBwgB1MAFGCKgWJMLpCLaQw0xjQBmmA6Bjpe9hXI7uBz1nclYrVHSz6jSKK5PKE5nlU/FCdni4zmF/fAE9OLiP+Bl82aMDt/2HiWH4/1owVe9B5s1B9uPOw8WnvR+bZ47Fgxbhm3jXtG3XhivDA2jbbRNZjxu3SpdKN0s2yWj8vfyz8Wo6dK8Khy3fjnKP/8A6wEkew=</latexit>
  • 42. Network - Notation K ⌘ 2 6 6 6 4 K(✓1,1 ) K(✓1,2 ) . . . K(✓1,nchan in ) K(✓2,1 ) K(✓2,2 ) . . . K(✓2,nchan in ) ... ... ... ... K(✓nchan out,1 ) K(✓nchan out,2 ) . . . K(✓nchan out,nchan in ) 3 7 7 7 5 <latexit sha1_base64="00n7wAhCSd3IFzSOUGYls8hPZd4=">AAAKwnicddZbb9s2FAdwOevaVLu02R73QrTY0AJDYLvA2r61tdvGdS9241saZgFFUbESiVJEyo6rCti3HPZhBkyyj7jUxyYQmPj/Do9pmoHlxIGvdL3+T23nmxvf3ry1e9v+7vsffrxzd++nkYrShIshj4IomThMicCXYqh9HYhJnAgWOoEYOxet0sczkSg/kgO9iMVJyM6k7/mc6SI63avFdNAlVFym/ozY1BFnvsyckOnEv8rt7gOqp0KzP7PG7438IfmNXE+ay4RO3UirNZKnVIsrnfEpk8SXeVFJKbnWr4n6Nbf3a27rR2dQS100WRGl9rVGX7WJUp2jXeCKrbvCpdt2KaT7/5me3r1f368vB8GTBkzuP3v4l1WO3une7lPqRjwNhdQ8YEodN+qxPslYon0eiNymqRIx4xfsTBwXU8lCoU6y5d3Iya9F4hIvSoo/qckyvb4iY6FSi9ApKosdTtW6leEmO0619+Qk82WcaiH56o28NCA6IuVFI66fCK6DRTFhPPGLvZLiXBLGdXEdbdumrvDoYHmWGXWiwC13EQWr082BWxkt39nxSKuK2iZqV1HXRN0q6pioA5EzN9G8ihYmWlQLj0x0VEU9E/Wq6NBEh1UvZSJVRdxEvFo4NtG4ikYmGlXRxESTKuqbqF+1n5lotv0DObGJ4rw8dinmPApDVtxJ6njP86x8Ic/zfJ1eAL3A1AJqYWoDtTG9BHqJ6RXQK0yvgV5jOgA6wNQB6mB6A/QGUxeoi+kt0FtM74DeYXoP9B7TB6APmHpAPUx9oD6mj0AfMR0CHWIaAA0wDYGGmEZAI0xjoDGmCdAE0xHQEaZPQJ/yDfeXATK8zgFyMHEgvqmlABR43RRoiukc6BzTBdAFJh/Ix6SAFKYroCtMC6AFphQoxXQJdIkpBooxSSCJyQVyMYVAIaYEKMHkAXmYZkAzTHOgOabPQJ83XYF4uvpizK8Sob3phu8oVqiujFCdKNIgkuu1VYzqV7+Ba9WrEP8Db6odQO3yYeNpOf4wjxZ4MmruNx7tP+oXTx0dazV2rV+se9YDq2E9tp5ZB1bPGlq89nft352bO7fstn1uX9pqVbpTgzU/W18N+8t/7Yrnxw==</latexit> Y0 = D Yj = g(Yj 1, Kj) , j = 1, 2, · · · , n (X Yn)<latexit sha1_base64="MZvW2k0U8C8OxOfVEff3FASbTK4=">AAAJ0nicddbZbtNAFIBhh7UJWwEJLrgZUYFASqsEEMsFEpAALWFJqLNRV9V4PG7c2mPXM05II0sgbnlBnoDXwE6OR9ATLFUd/d8ZJ3VS2Xbke1LVar9Kp06fOXvu/Eq5cuHipctXVq9e68kwiRnvstAP44FNJfc9wbvKUz4fRDGnge3zvn3YyL0/5rH0QmGqacR3A7ovPNdjVGVpb/X7Xcsc7tXIc2KZTWKJUCSBzWNiWaQyp9lBmuH+vcV6vZ5WLbOV1/vEOkqoQ6rkIBuoVx9ULeaESlbFP6fJzhKNvGz7gKyT+UlEvrWYqFT2VtdqG7X5QfCiDos1A4723tWVZ5YTsiTgQjGfSrlTr0Vqd0Zj5TGfpxUrkTyi7JDu851sKWjA5e5sfq1ScicrDnHDOPsRiszr3ztmNJByGtjZZEDVSJ60PC6znUS5T3dnnogSxQVbvJCb+ESFJL/wxPFizpQ/zRaUxV72Xgkb0ZgylX08lUrFcrhrmWrEFZ1Zdug7+bsIfWteUuDGzMpf2XZJo0hNnZpFaunUKtKWTluQ7IlOkyJNdZoWG4c6DYvU1qldpG2dtotzSZ1kkZhOrNjY16lfpJ5OvSINdBoUqaNTpzj9WKfx//8gO9IpSvPLLviEhUFAhZNddvdlOst/kZdpepJeAb3C1ABqYGoCNTG9BnqN6Q3QG0xvgd5i2gTaxLQFtIXpHdA7TC2gFqb3QO8xfQD6gOkj0EdMn4A+YWoDtTF1gDqYPgN9xrQNtI3JBDIxdYG6mHpAPUx9oD6mAdAA0xBoiOkL0Jd0yfeXAlK8zwayMTEgtuyUHJDjfSOgEaYDoANMh0CHmDwgD5MEkpi+An3FNAWaYkqAEkxHQEeYIqAIkwASmBwgB1MAFGCKgWJMLpCLaQw0xjQBmmA6Bjpe9hXI7uBz1nclYrVHSz6jSKK5PKE5nlU/FCdni4zmF/fAE9OLiP+Bl82aMDt/2HiWH4/1owVe9B5s1B9uPOw8WnvR+bZ47Fgxbhm3jXtG3XhivDA2jbbRNZjxu3SpdKN0s2yWj8vfyz8Wo6dK8Khy3fjnKP/8A6wEkew=</latexit> [Treister et al., 2018; Ruthotto & Haber, 2018] Block-Toeplitz matrix:
  • 43. Network Y0 = D Yj = g(Yj 1, Kj) , j = 1, 2, · · · , n (X Yn)<latexit sha1_base64="MZvW2k0U8C8OxOfVEff3FASbTK4=">AAAJ0nicddbZbtNAFIBhh7UJWwEJLrgZUYFASqsEEMsFEpAALWFJqLNRV9V4PG7c2mPXM05II0sgbnlBnoDXwE6OR9ATLFUd/d8ZJ3VS2Xbke1LVar9Kp06fOXvu/Eq5cuHipctXVq9e68kwiRnvstAP44FNJfc9wbvKUz4fRDGnge3zvn3YyL0/5rH0QmGqacR3A7ovPNdjVGVpb/X7Xcsc7tXIc2KZTWKJUCSBzWNiWaQyp9lBmuH+vcV6vZ5WLbOV1/vEOkqoQ6rkIBuoVx9ULeaESlbFP6fJzhKNvGz7gKyT+UlEvrWYqFT2VtdqG7X5QfCiDos1A4723tWVZ5YTsiTgQjGfSrlTr0Vqd0Zj5TGfpxUrkTyi7JDu851sKWjA5e5sfq1ScicrDnHDOPsRiszr3ztmNJByGtjZZEDVSJ60PC6znUS5T3dnnogSxQVbvJCb+ESFJL/wxPFizpQ/zRaUxV72Xgkb0ZgylX08lUrFcrhrmWrEFZ1Zdug7+bsIfWteUuDGzMpf2XZJo0hNnZpFaunUKtKWTluQ7IlOkyJNdZoWG4c6DYvU1qldpG2dtotzSZ1kkZhOrNjY16lfpJ5OvSINdBoUqaNTpzj9WKfx//8gO9IpSvPLLviEhUFAhZNddvdlOst/kZdpepJeAb3C1ABqYGoCNTG9BnqN6Q3QG0xvgd5i2gTaxLQFtIXpHdA7TC2gFqb3QO8xfQD6gOkj0EdMn4A+YWoDtTF1gDqYPgN9xrQNtI3JBDIxdYG6mHpAPUx9oD6mAdAA0xBoiOkL0Jd0yfeXAlK8zwayMTEgtuyUHJDjfSOgEaYDoANMh0CHmDwgD5MEkpi+An3FNAWaYkqAEkxHQEeYIqAIkwASmBwgB1MAFGCKgWJMLpCLaQw0xjQBmmA6Bjpe9hXI7uBz1nclYrVHSz6jSKK5PKE5nlU/FCdni4zmF/fAE9OLiP+Bl82aMDt/2HiWH4/1owVe9B5s1B9uPOw8WnvR+bZ47Fgxbhm3jXtG3XhivDA2jbbRNZjxu3SpdKN0s2yWj8vfyz8Wo6dK8Khy3fjnKP/8A6wEkew=</latexit> g(Yj 1, Kj) = Yj 1 hf(KjYj 1)<latexit sha1_base64="TBpxmBJOykHz2zqhtb8WMYqKZoA=">AAAJi3icddZrT9NQGMDxgjc2RUFf+qaRmEAiZJNEhWgCbFzGuGzsDl2W07NTWtYb7enGmP0oJn4a3+pb3/lRbLenR+WZTQgn/99zuq3bsqquafg8k/k5M3vv/oOHj+ZS6cdP5p8+W1h8XvedwKOsRh3T8Zoq8Zlp2KzGDW6ypusxYqkma6i9XOyNPvN8w7GrfOiytkUubUMzKOFR6ixsXi4r1VZndLWaDd8o1WK0ClfkT7KI8qqsy9oy0J++ku4sLGXWMuNDxossLJa2sgPz9vOXX6XO4tyG0nVoYDGbU5P4/kU24/L2iHjcoCYL00rgM5fQHrlkF9HSJhbz26Pxiwzl11HpyprjRX82l8f17x0jYvn+0FKjSYtw3b9rcZxmFwHXPrRHhu0GnNl08kBaYMrckeMrJncNj1FuDqMFoZ4RPVeZ6sQjlEfXNZ1OK12mKVWuM05GiuqY3fhZOKYyLiFwbqTEj6xqci5JeZHySSqKVExSQaQCJHUg0iBJQ5GGycaWSK0klUQqJakiUiU5ly+SnyQqEk02NkRqJKkuUj1JTZGaSSqLVE5O3xep//8XpLoiuWF82W02oI5lEbsbXXZtOxzF/+TtMLxLO0A7mHJAOUx5oDymXaBdTHtAe5j2gfYxHQAdYCoAFTAdAh1iKgIVMR0BHWE6BjrGdAJ0gukU6BRTCaiEqQxUxnQGdIapAlTBVAWqYqoB1TDVgeqYGkANTE2gJqYWUAvTOdB5OOXzSwAJ3qcCqZgoEJ12SgbI8D4dSMd0BXSFqQfUw2QAGZh8IB/TDdANpiHQEFMAFGC6BrrG5AK5mGwgG1MXqIvJArIweUAeJg1Iw9QH6mMaAA0w3QLdTvsIuPrkjRG/SrJS0qe8R66P5uKE5lhUTce+O5tkND/5DbwzPYn4Czxttgqz45uNjfh4J24t8KL+di27vrZeju46TqTJMSe9lF5Jy1JWei9tSQdSSapJVPoqfZO+Sz9S86n11Gbq42R0dgb2vJD+OVK7vwED6X7p</latexit> ResNet: Hyperbolic: g(Yj 1, Yj 2, Kj) = 2Yj 1 Yj 2 + h2 f(KjYj 1)<latexit sha1_base64="Sr1KPox5lmODDi467FN8yb/Y3rQ=">AAAJpXicddbbUtpAGMDxaE9CT9pe9manTqc6rQ7oTFsvOqPiGQ8gZw1lNpuNRHMyWUCkeZS+SN+kd962T9EEvmxbPpoZx53/79sAAYZonmUGIpP5MTV97/6Dh49mUunHT54+ez4796IauB2f8QpzLdevazTglunwijCFxeuez6mtWbymXeVir3W5H5iuUxZ9jzdteuGYhsmoiFJrtnyxoJYbrcHlUjZ8D6uVeJWPVuEi+UxWiBwgS0SOkHek/WWFGAsw+mdqMd2anc8sZ4YHwYssLObXsz3r9uu3u0JrbmZN1V3WsbkjmEWD4Dyb8URzQH1hMouHabUTcI+yK3rBz6OlQ20eNAfDlx+SN1HRieH60Z8jyLD+vWNA7SDo21o0aVPRDsYtjpPsvCOMT82B6XgdwR02eiCjYxHhkvhaEt30ORNWP1pQ5pvRcyWsTX3KRHTF0+m0qnNDLYs2F3Sgaq6lx8/CtdRhCYFzAzV+ZM0guSRtybSVpLxM+STty7QPSevJ1EtSX6Z+srEhUyNJBZkKSSrJVErOFcgUJInJxJKNNZlqSarKVE1SXaZ6kooyFZPTd2Xq/v8FaZ5MXhhfdof3mGvb1NGjy25shIP4H9kIw3HaBNrElAPKYdoC2sK0DbSNaQdoB9Mu0C6mPaA9TPtA+5gOgA4w5YHymA6BDjEdAR1hOgY6xnQCdIKpAFTAVAQqYjoFOsVUAiphKgOVMVWAKpiqQFVMNaAapjpQHVMDqIHpDOgsnPD5pYAU79OANEwMiE06JQfkeF8bqI3pEugS0xXQFSYTyMQUAAWYboBuMPWB+pg6QB1M10DXmDwgD5MD5GDSgXRMNpCNyQfyMRlABqYuUBdTD6iH6RbodtJHwGuP3hj5q0TUQnvCe+QFaC5OaI5H1XKd8dkko/nRb+DY9CjiL/Ck2TLMDm821uLjg7y1wIvqynJ2dXm1GN11HCujY0Z5pbxWFpSs8lFZV/aUglJRmPJduVN+Kr9Sb1NHqXKqOhqdnoI9L5V/jlTrN1oCh7g=</latexit> [Chang et al., 2018] [He et al., 2015]
  • 44. Network - optimization min {Kj },{Yj } (X Yn) s.t. Yn = g(Yn 1, Kn) Yn 1 = g(Yn 2, Kn 1) ... Y2 = g(Y1, K2) Y1 = D<latexit sha1_base64="jJSHf5fpDxlHxpRhT6M2hcMeuuI=">AAAKJHicddZbb9s2FMBxubvF3i1dH/dCLFiQAa1hp0W3FCjQ1m6b1G1qN742DAyKomIlulWk7biCvs4+zd6KPexln2WSfEikOZ6AIPT/dyg7sgPLjn1Pqkbjn8qtL7786utvtqq1b7/7/ocft2//NJTRPOFiwCM/SsY2k8L3QjFQnvLFOE4EC2xfjOzLVuGjhUikF4V9tYrFWcDOQ8/1OFN5mm5/ooEXTlNCU9rvTC8Ize6W60m5JhnZpfHM26P9MblHipyG2W+U1napElcqlXVVz/KHIGT3MTnfWz+418zuFucsNhAzktfPhvb1UA7l2C6hj+gjQhdOpKTetn99kz7v/rXzrs9K++081KbbO416ozwIXjRhsWPB0Z3e3jqgTsTngQgV95mUp81GrM5SliiP+yKr0bkUMeOX7Fyc5suQBUKepeXVz8iveXGIGyX5T6hIWa/vSFkg5Sqw88mAqZm8aUXcZKdz5f5xlnphPFci5Osncuc+UREp3krieIngyl/lC8YTL3+thM9YwrjK3/BarUYd4dK+mgnFUmpHvlO8isinZcmAWyktntl2SUuntkltnTomdXQ6MukIkr00aanTyqSV3jgxaaJT16SuTicmnehzSZOkTtwkrjeOTBrpNDRpqNPYpLFOPZN6+vQLkxb//wfZsUlxVlz2UCx5FAQsdPLL7j7N0uIXeZplN+kZ0DNMLaAWpjZQG9NzoOeYXgC9wPQS6CWmQ6BDTEdAR5heAb3C1AHqYHoN9BrTG6A3mI6BjjG9BXqLqQvUxdQD6mF6B/QO0wnQCaY+UB/TAGiAaQg0xDQCGmEaA40xTYAmmN4Dvc82fH4ZIMP7bCAbEwfim04pAAXeNwOaYboAusB0CXSJyQPyMEkgiekK6ArTCmiFaQ40x/QB6AOmGCjGFAKFmBwgB1MAFGBKgBJMLpCLaQG0wLQEWmL6CPRx00cgv6ko2XwrEdqdbXiPYonmioTmRF79KLw5qzOaX38H3pheR/wPvGm2D7PlzcZBcTw0txZ4MdyvN+/X7/ce7Dw5htuOLetn6xdrz2pav1tPrEOraw0sXjmoTCuzilf9s/pX9VP17/XorQrsuWN9dlT//Q9jIa0a</latexit>
  • 45. Network - optimization min {Kj },{Yj } (X Yn) s.t. Yn = g(Yn 1, Kn) Yn 1 = g(Yn 2, Kn 1) ... Y2 = g(Y1, K2) Y1 = D<latexit sha1_base64="jJSHf5fpDxlHxpRhT6M2hcMeuuI=">AAAKJHicddZbb9s2FMBxubvF3i1dH/dCLFiQAa1hp0W3FCjQ1m6b1G1qN742DAyKomIlulWk7biCvs4+zd6KPexln2WSfEikOZ6AIPT/dyg7sgPLjn1Pqkbjn8qtL7786utvtqq1b7/7/ocft2//NJTRPOFiwCM/SsY2k8L3QjFQnvLFOE4EC2xfjOzLVuGjhUikF4V9tYrFWcDOQ8/1OFN5mm5/ooEXTlNCU9rvTC8Ize6W60m5JhnZpfHM26P9MblHipyG2W+U1napElcqlXVVz/KHIGT3MTnfWz+418zuFucsNhAzktfPhvb1UA7l2C6hj+gjQhdOpKTetn99kz7v/rXzrs9K++081KbbO416ozwIXjRhsWPB0Z3e3jqgTsTngQgV95mUp81GrM5SliiP+yKr0bkUMeOX7Fyc5suQBUKepeXVz8iveXGIGyX5T6hIWa/vSFkg5Sqw88mAqZm8aUXcZKdz5f5xlnphPFci5Osncuc+UREp3krieIngyl/lC8YTL3+thM9YwrjK3/BarUYd4dK+mgnFUmpHvlO8isinZcmAWyktntl2SUuntkltnTomdXQ6MukIkr00aanTyqSV3jgxaaJT16SuTicmnehzSZOkTtwkrjeOTBrpNDRpqNPYpLFOPZN6+vQLkxb//wfZsUlxVlz2UCx5FAQsdPLL7j7N0uIXeZplN+kZ0DNMLaAWpjZQG9NzoOeYXgC9wPQS6CWmQ6BDTEdAR5heAb3C1AHqYHoN9BrTG6A3mI6BjjG9BXqLqQvUxdQD6mF6B/QO0wnQCaY+UB/TAGiAaQg0xDQCGmEaA40xTYAmmN4Dvc82fH4ZIMP7bCAbEwfim04pAAXeNwOaYboAusB0CXSJyQPyMEkgiekK6ArTCmiFaQ40x/QB6AOmGCjGFAKFmBwgB1MAFGBKgBJMLpCLaQG0wLQEWmL6CPRx00cgv6ko2XwrEdqdbXiPYonmioTmRF79KLw5qzOaX38H3pheR/wPvGm2D7PlzcZBcTw0txZ4MdyvN+/X7/ce7Dw5htuOLetn6xdrz2pav1tPrEOraw0sXjmoTCuzilf9s/pX9VP17/XorQrsuWN9dlT//Q9jIa0a</latexit> L({Yi}, {Pi}, {Ki}) = (X, Yn) nX i=2 PT i (Yi gi(Yi 1, Ki)) PT 1 (Y1 D)<latexit sha1_base64="5TUoYTLitFuuQK7J2L/Ce3ajyr0=">AAAJ53icddZbb9NIFMBxB3bZJixsYR/3ZUQFakSpEkBcHpCABGhJKQl1bnTaaDwZN0N9w5eEYPkz7Bva1/1Y+132YcfO8WjpCZaqTv6/M07qpIqtwJFR3Gj8U7l0+aefr/yyUa1d/fXa9d82b9wcRH4SctHnvuOHI4tFwpGe6McydsQoCAVzLUcMrfNW7sO5CCPpe2a8DMSJy848aUvOYpUmm9/uHGzTlFBzPJGEZjuEptTsTqRaFrlTZFInzwgNZnKbmqOdfDj1sjqhtHaH3CM0StxJKp/dz05VJsX+U5NsF3MyUxNnEwmP7jXVidVZVa/XV/vz+eapWQw01aN2ndQmm1uN3UZxELxowmLLgKM7ubHxlE59nrjCi7nDoui42Qjik5SFseSOyGo0iUTA+Dk7E8dq6TFXRCdpcQEzcluVKbH9UP14MSnq/3ekzI2ipWupSZfFs+ii5XGdHSex/eQklV6QxMLjqyeyE4fEPsnfDTKVoeCxs1QLxkOpXivhMxYyHqv3rFar0amwqRnPRMxSavnONH8VvkOLkgG3Upo/s2WTVpnaOrXL1NGpU6Z9nfYhWQudFmVa6rQsN451Gpepq1O3TEc6HZXninSKysR14uXGoU7DMg10GpRppNOoTD2deuXp5zrNf/wHWYFOQZZfdk8suO+6zJuqy26/yNL8F3mRZRfpJdBLTC2gFqY2UBvTK6BXmF4Dvcb0BugNpj2gPUz7QPuY3gK9xdQB6mA6ADrA9A7oHaZDoENM74HeY+oCdTH1gHqYPgB9wHQEdITJBDIx9YH6mAZAA0xDoCGmEdAI0xhojOkj0MdszeeXATK8zwKyMHEgvu6UAlDgfTOgGaZPQJ8wnQOdY5JAElMEFGH6AvQF0xJoiSkBSjB9BvqMKQAKMHlAHqYp0BSTC+RiCoFCTDaQjWkONMe0AFpg+gr0dd1HQN0OFKy/lQjtzta8R0GE5vKE5oSqju9dnC0zml99B16YXkX8D7xu1oTZ4mbjaX480rcWeDG4v9t8sPug93Dr+SHcdmwYfxi3jG2jaTw2nht7RtfoG9z4t3KrcreyU5XVP6vfqn+tRi9VYM/vxndH9e//AG04l+I=</latexit>
  • 46. Network - optimization L({Yi}, {Pi}, {Ki}) = (X, Yn) nX i=2 PT i (Yi gi(Yi 1, Ki)) PT 1 (Y1 D)<latexit sha1_base64="5TUoYTLitFuuQK7J2L/Ce3ajyr0=">AAAJ53icddZbb9NIFMBxB3bZJixsYR/3ZUQFakSpEkBcHpCABGhJKQl1bnTaaDwZN0N9w5eEYPkz7Bva1/1Y+132YcfO8WjpCZaqTv6/M07qpIqtwJFR3Gj8U7l0+aefr/yyUa1d/fXa9d82b9wcRH4SctHnvuOHI4tFwpGe6McydsQoCAVzLUcMrfNW7sO5CCPpe2a8DMSJy848aUvOYpUmm9/uHGzTlFBzPJGEZjuEptTsTqRaFrlTZFInzwgNZnKbmqOdfDj1sjqhtHaH3CM0StxJKp/dz05VJsX+U5NsF3MyUxNnEwmP7jXVidVZVa/XV/vz+eapWQw01aN2ndQmm1uN3UZxELxowmLLgKM7ubHxlE59nrjCi7nDoui42Qjik5SFseSOyGo0iUTA+Dk7E8dq6TFXRCdpcQEzcluVKbH9UP14MSnq/3ekzI2ipWupSZfFs+ii5XGdHSex/eQklV6QxMLjqyeyE4fEPsnfDTKVoeCxs1QLxkOpXivhMxYyHqv3rFar0amwqRnPRMxSavnONH8VvkOLkgG3Upo/s2WTVpnaOrXL1NGpU6Z9nfYhWQudFmVa6rQsN451Gpepq1O3TEc6HZXninSKysR14uXGoU7DMg10GpRppNOoTD2deuXp5zrNf/wHWYFOQZZfdk8suO+6zJuqy26/yNL8F3mRZRfpJdBLTC2gFqY2UBvTK6BXmF4Dvcb0BugNpj2gPUz7QPuY3gK9xdQB6mA6ADrA9A7oHaZDoENM74HeY+oCdTH1gHqYPgB9wHQEdITJBDIx9YH6mAZAA0xDoCGmEdAI0xhojOkj0MdszeeXATK8zwKyMHEgvu6UAlDgfTOgGaZPQJ8wnQOdY5JAElMEFGH6AvQF0xJoiSkBSjB9BvqMKQAKMHlAHqYp0BSTC+RiCoFCTDaQjWkONMe0AFpg+gr0dd1HQN0OFKy/lQjtzta8R0GE5vKE5oSqju9dnC0zml99B16YXkX8D7xu1oTZ4mbjaX480rcWeDG4v9t8sPug93Dr+SHcdmwYfxi3jG2jaTw2nht7RtfoG9z4t3KrcreyU5XVP6vfqn+tRi9VYM/vxndH9e//AG04l+I=</latexit> rYn L = rYn (X, Y2) Pn rYi L = Pi + (rYi gi+1)T Pi+1 for i = (n 2), . . . 2 rY1 L = P1<latexit sha1_base64="P+qm+lwP2GST23k1vMcPs9KB6Xo=">AAAKH3icddbdbtNIFMBxh4WlCbtL2b3kZkQFagWtkoJgQUICUqClpSTU+WqnROPxuDH1F55J0mD5YfZp9g5xy9us7RyPaE7WUlXn/zseu06q2Io8V6p6/Uflyi9Xr/16faVau/Hb73/cXL31Z1eG45iLDg+9MO5bTArPDURHucoT/SgWzLc80bPOm7n3JiKWbhiYahaJU5+dBa7jcqayNFz9RgNmeWyYUHMwTII0JQfk3nOCKqHRyF2nZv9BkbbTDbJJqNnKldLapXm3XGUzH3DJfULWL63opuQsG7vfSMnGJ3O+TPGKPiNUiQuVOGFcvHKfrweb2xsPqB0qSbYXz9TIz1SeqJHpcHWtvlUvNoJ3GrCzZsDWGt5aeZqtzce+CBT3mJQnjXqkThMWK5d7Iq3RsRQR4+fsTJxkuwHzhTxNihufkrtZsUl2sdlPoEhRfz4iYb6UM9/KJn2mRnLR8rjMTsbK+fs0cYNorETA5ydyxh5RIcnfRWK7seDKm2U7jMdudq2Ej1jMuMre61qtRm3hUFONhGIJtULPzq8i9GhRUuBmQvMzWw5plmlHp50y7eu0X6Y9nfYgWVOdpmWa6TQrDxzoNChTS6dWmY50OirXkjrJMnGdeHlgT6dembo6dcvU16lfprZO7XL5iU6T//+DrEinKM1veyCmPPR9FtjZbXdepkn+i7xM00V6BfQKUxOoiWkHaAfTa6DXmN4AvcH0Fugtpl2gXUx7QHuY3gG9w7QPtI/pAOgA03ug95gOgQ4xfQD6gKkF1MLUBmpj+gj0EdMR0BEmE8jE1AHqYOoCdTH1gHqY+kB9TAOgAaZjoON0yeeXATJ8nAVkYeJAfNmSAlDg40ZAI0yfgT5jOgc6x+QCuZgkkMR0AXSBaQY0wzQGGmP6AvQFUwQUYQqAAkw2kI3JB/IxxUAxJgfIwTQBmmCaAk0xfQX6uuwjkD1IFKy/lQhtjZa8R5FEc3lCcyKrXhgszpYZzc+/Axem5xH/Ay+bNWG2eNh4mm+P9aMF3ulubzUebj1sP1p7cQiPHSvGbeOOsW40jCfGC2PXaBkdg1ceVY4rvGJX/6n+W/1W/T4fvVKBY/4yLm3VH/8BVSSuQw==</latexit> rKi L = (rKi gi)T Pi<latexit sha1_base64="G+tV/OPmWzVF+5WpoNnOJlEejZI=">AAAJg3icddZrT9pQGMDxus1N2E23l3vTzCxxWWJAzTZfmKjgBfFS5K5l5PT0VCq92R5AbPo99mn2dvsK+zZr4enJxsOaGE/+v+cUKBCqeZYZ8Fzu98Kjx08Wnz5bymSfv3j56vXyyptG4A58yurUtVy/pZGAWabD6tzkFmt5PiO2ZrGm1i8k3hwyPzBdp8bHHuvY5MYxDZMSHqfu8obqEM0i3VCtlbtmJJ/KO/LaTLvphmb08VtNVmtK18x2l1dz67nJIeNFHharEhxKd2VpW9VdOrCZw6lFguA6n/N4JyQ+N6nFoqw6CJhHaJ/csOt46RCbBZ1w8uIi+UNcdNlw/fjP4fKk/r0jJHYQjG0tnrQJ7wWzlsR5dj3gxtdOaDregDOHTh/IGFgyd+XkSsm66TPKrXG8INQ34+cq0x7xCeXx9cxms6rODLXGe4yTUNVcS0+ehWupkxIBF0I1eWTNkAtpKopUTFNZpHKaSiKVIGkjkUZpGos0Tje2RWqnSRFJSVNVpGp6rkCkIE1UJJpubIrUTFNDpEaaWiK10lQRqZKefijS8P8vSPNE8qLksjtsRF3bJo4eX3ZjLwqTf/JeFM3SPtA+pgJQAVMRqIjpAOgA0yHQIaYjoCNMx0DHmEpAJUwnQCeYykBlTKdAp5jOgM4wnQOdY7oAusCkACmYKkAVTJdAl5iqQFVMNaAapjpQHVMDqIGpCdTE1AJqYWoDtTFdAV1Fcz6/BJDgfRqQhokC0XmnZIAM7+sB9TDdAt1i6gP1MZlAJqYAKMB0D3SPaQw0xjQAGmC6A7rD5AF5mBwgB5MOpGOygWxMPpCPyQAyMA2BhphGQCNMD0AP8z4CXm/6xohfJVlVenPeIy9Ac0lCcyyuluvMzqYZzU9/A2empxF/gefN1mB2crOxnRyfxa0FXjQ21vOb65uVrdXdc7jtWJLeSe+lNSkvfZF2pWNJkeoSlb5LP6Sf0q/MYuZTZiOzNR19tAB73kr/HJmdP1oLebw=</latexit> rPi L = Yi gi(Yi 1, Ki) rP1 L = Y1<latexit sha1_base64="R97XYgjXBLgfnV6oS5mGt9Q8OEM=">AAAJpHicddZbU9pAFMDx2KvQm7aPfdmpU8fOVAdqp60PzqjgBfECctdQZrNsJJKbyQJiJh+sH6VPfW2/RRM42akemhnH9f87GyDgEM01DV9kMj/nHjx89PjJ0/lU+tnzFy9fLSy+rvvOwGO8xhzT8Zoa9blp2LwmDGHyputxamkmb2j9XOyNIfd8w7GrYuzytkUvbUM3GBVR6ixUllWbaibtBGq11DFCcrRJ1GqrY5BVctkxViZ/BMZqNvwYLYvRMvygquk7u7LRLrJJVuPRbISdhaXMWmZyELzIwmJJgaPUWZzfULsOG1jcFsykvn+RzbiiHVBPGMzkYVod+NylrE8v+UW0tKnF/XYwefUheR+VLtEdL/qxBZnUf3cE1PL9saVFkxYVPf++xXGWXQyE/q0dGLY7ENxm0wfSByYRDokvJekaHmfCHEcLyjwjeq6E9ahHmYgueDqdVrtcV6uixwUNVM0xu/GzcEx1UkLgXKDGj6zpJJekvEz5JBVlKiapIFMBkjaSaZSksUzjZGNLplaSSjKVklSRqZKcy5fJTxKTiSUbGzI1klSXqZ6kpkzNJJVlKienH8o0/P8L0lyZ3DC+7DYfMceyqN2NLru+HQbxL7IdhvdpB2gHUw4ohykPlMe0C7SLaQ9oD9M+0D6mA6ADTAWgAqZDoENMRaAipiOgI0zHQMeYToBOMJ0CnWIqAZUwlYHKmM6AzjBVgCqYqkBVTDWgGqY6UB1TA6iBqQnUxNQCamE6BzoPZ3x+KSDF+zQgDRMDYrNOyQE53tcD6mG6ArrC1AfqYzKADEw+kI/pBugG0xhojGkANMB0DXSNyQVyMdlANqYuUBeTBWRh8oA8TDqQjmkINMQ0AhphugW6nfURcHvTN0Z+KxG11JvxHrk+mosTmuNRNR37/myS0fz0O/De9DTif+BZs1WYndxsbMTHF3lrgRf1T2vZ9bX18uelrRO47ZhX3irvlBUlq3xVtpQDpaTUFKb8UH4pv5U/qeXUUaqSqk1HH8zBnjfKnSP1/S8ez4QH</latexit> First order necessary optimality conditions satisfied if partial derivatives vanish:
  • 47. Network - optimization rKi L = (rKi gi)T Pi<latexit sha1_base64="G+tV/OPmWzVF+5WpoNnOJlEejZI=">AAAJg3icddZrT9pQGMDxus1N2E23l3vTzCxxWWJAzTZfmKjgBfFS5K5l5PT0VCq92R5AbPo99mn2dvsK+zZr4enJxsOaGE/+v+cUKBCqeZYZ8Fzu98Kjx08Wnz5bymSfv3j56vXyyptG4A58yurUtVy/pZGAWabD6tzkFmt5PiO2ZrGm1i8k3hwyPzBdp8bHHuvY5MYxDZMSHqfu8obqEM0i3VCtlbtmJJ/KO/LaTLvphmb08VtNVmtK18x2l1dz67nJIeNFHharEhxKd2VpW9VdOrCZw6lFguA6n/N4JyQ+N6nFoqw6CJhHaJ/csOt46RCbBZ1w8uIi+UNcdNlw/fjP4fKk/r0jJHYQjG0tnrQJ7wWzlsR5dj3gxtdOaDregDOHTh/IGFgyd+XkSsm66TPKrXG8INQ34+cq0x7xCeXx9cxms6rODLXGe4yTUNVcS0+ehWupkxIBF0I1eWTNkAtpKopUTFNZpHKaSiKVIGkjkUZpGos0Tje2RWqnSRFJSVNVpGp6rkCkIE1UJJpubIrUTFNDpEaaWiK10lQRqZKefijS8P8vSPNE8qLksjtsRF3bJo4eX3ZjLwqTf/JeFM3SPtA+pgJQAVMRqIjpAOgA0yHQIaYjoCNMx0DHmEpAJUwnQCeYykBlTKdAp5jOgM4wnQOdY7oAusCkACmYKkAVTJdAl5iqQFVMNaAapjpQHVMDqIGpCdTE1AJqYWoDtTFdAV1Fcz6/BJDgfRqQhokC0XmnZIAM7+sB9TDdAt1i6gP1MZlAJqYAKMB0D3SPaQw0xjQAGmC6A7rD5AF5mBwgB5MOpGOygWxMPpCPyQAyMA2BhphGQCNMD0AP8z4CXm/6xohfJVlVenPeIy9Ac0lCcyyuluvMzqYZzU9/A2empxF/gefN1mB2crOxnRyfxa0FXjQ21vOb65uVrdXdc7jtWJLeSe+lNSkvfZF2pWNJkeoSlb5LP6Sf0q/MYuZTZiOzNR19tAB73kr/HJmdP1oLebw=</latexit> gradient w.r.t. network parameters
  • 48. Network - optimization rKi L = (rKi gi)T Pi<latexit sha1_base64="G+tV/OPmWzVF+5WpoNnOJlEejZI=">AAAJg3icddZrT9pQGMDxus1N2E23l3vTzCxxWWJAzTZfmKjgBfFS5K5l5PT0VCq92R5AbPo99mn2dvsK+zZr4enJxsOaGE/+v+cUKBCqeZYZ8Fzu98Kjx08Wnz5bymSfv3j56vXyyptG4A58yurUtVy/pZGAWabD6tzkFmt5PiO2ZrGm1i8k3hwyPzBdp8bHHuvY5MYxDZMSHqfu8obqEM0i3VCtlbtmJJ/KO/LaTLvphmb08VtNVmtK18x2l1dz67nJIeNFHharEhxKd2VpW9VdOrCZw6lFguA6n/N4JyQ+N6nFoqw6CJhHaJ/csOt46RCbBZ1w8uIi+UNcdNlw/fjP4fKk/r0jJHYQjG0tnrQJ7wWzlsR5dj3gxtdOaDregDOHTh/IGFgyd+XkSsm66TPKrXG8INQ34+cq0x7xCeXx9cxms6rODLXGe4yTUNVcS0+ehWupkxIBF0I1eWTNkAtpKopUTFNZpHKaSiKVIGkjkUZpGos0Tje2RWqnSRFJSVNVpGp6rkCkIE1UJJpubIrUTFNDpEaaWiK10lQRqZKefijS8P8vSPNE8qLksjtsRF3bJo4eX3ZjLwqTf/JeFM3SPtA+pgJQAVMRqIjpAOgA0yHQIaYjoCNMx0DHmEpAJUwnQCeYykBlTKdAp5jOgM4wnQOdY7oAusCkACmYKkAVTJdAl5iqQFVMNaAapjpQHVMDqIGpCdTE1AJqYWoDtTFdAV1Fcz6/BJDgfRqQhokC0XmnZIAM7+sB9TDdAt1i6gP1MZlAJqYAKMB0D3SPaQw0xjQAGmC6A7rD5AF5mBwgB5MOpGOygWxMPpCPyQAyMA2BhphGQCNMD0AP8z4CXm/6xohfJVlVenPeIy9Ac0lCcyyuluvMzqYZzU9/A2empxF/gefN1mB2crOxnRyfxa0FXjQ21vOb65uVrdXdc7jtWJLeSe+lNSkvfZF2pWNJkeoSlb5LP6Sf0q/MYuZTZiOzNR19tAB73kr/HJmdP1oLebw=</latexit> (rKi gi)T Pi = rKi ⇥ 2Wi 1Yi 1 Vi 2Yi 2 + h2 fi(Ki, Wi 1Yi 1, bi) ⇤T Pi = rKi ⇥ h2 fi(Ki, Wi 1Yi 1, bi) ⇤T Pi = rKi ⇥ h2 KT i (KiWi 1Yi 1 + bi) ⇤T Pi = h2  rKi ⇥ KT i ⇤ (KiWi 1Yi 1 + bi) + KT rKi ⇥ (KiWi 1Yi 1 + bi) ⇤ T Pi = h2 rvec(Ki)  vec( (KiWi 1Yi 1 + bi)KiI) Pi h2  KT i diag( 0 (KiWi 1Yi 1 + bi))(YT i 1WT i 1 ⌦ I) T Pi = h2 rvec(Ki)  (I ⌦ (KiWi 1Yi 1 + bi)) vec(Ki) Pi h2 ⇥ KT i diag( 0 (KiWi 1Yi 1 + bi))(YT i 1WT i 1 ⌦ I) T Pi = h2  I ⌦ (KiWi 1Yi 1 + bi) + (Wi 1Yi 1 ⌦ I) diag( 0 (KiWi 1Yi 1 + bi))Ki Pi <latexit sha1_base64="CTLcsa6tVDyBNgBKLdU6jB9Dyck=">AAAOcniczdbdcttEFAdwtcUQDDVNuYObhQxgT5OMnTJAL5hp67SN6za1G382cjwreW1vo69Ksl1Xo2uehlt4Ft6DB2Aln91iH6XTDL1AM4l3/r/dsx9WIhmexYOwXP7rytVrH+U+/mTr0/xnn18vfHFj+2YncGe+ydqma7l+z6ABs7jD2iEPLdbzfEZtw2Jd47yaeHfO/IC7Titcemxg04nDx9ykoYiG2zlS1B1qWHQY6a36kMdkMox4XDprEb3VGHLy/a9ko4Nu8MkpORDeFV33KiJp9aG1J9qdpH2gUtG6RaZnB2Q85MW0xK4aqQbuEmPIS2npgZpa1/MXzf6h6+2lFZNIbF4MIHrAJzYtyihzs7cungTqCRHFM2d8O1c6/lIzis+86JaMzSp9+dWvVnrBJuQUrsd8Grq+Q20WzZkZr06/FKt9ZvS4xEKSYuJ3rQSrgbXspatZO8+kZ7LWtflGnE7khGfRD7C6d8xYEre+jNKdd9+23ZDbLEhWQ0of5GyKaS1V972PpZR5qmntdx7T/+qU5Nd2yQOQ93oxk9en/+/bhBtw/VTFNoY3dsr75fQiuFGBxo4GV2O4vXVHH7nmzGZOaFo0CE4rZS8cRNQPuWmxOK/PAuZR85xO2KloJgsOBlH6vzwm34lkRMauL36ckKTpv0dE1A6CpW2InjYNp8GmJWGWnc7C8S+DiDveLGSOuZpoPLNI6JLkwUBG3GdmaC1Fg5o+F2sl5pT61AzF4yOfz+sjNtZb4ZSFNNIN1xolq3AtPU1i4GqkJzMbY1KV0aGKDmVUV1FdRjUV1SAyFipayGipoqUc2FdRX0YNFTVkdKKiE1krUFEgI1NFphzYVVFXRh0VdWTUU1FPRk0VNWX5uYrmF2/I8FTkxcmxO2xhurZNnZE49vG9OEo+yL043qT7QPcxVYGqmA6BDjE9AHqA6SHQQ0yPgB5hOgI6wlQDqmF6DPQYUx2ojukJ0BNMT4GeYjoGOsb0DOgZpgZQA1MTqInpOdBzTCdAJ5haQC1MbaA2pg5QB1MXqIupB9TD1AfqY3oB9CLOuH8pIMXjDCADkwlkZpVkgAyPmwJNMb0EeonpHOgcEwfimAKgANNroNeYlkBLTDOgGaZXQK8weUAeJgfIwTQCGmGygWxMPpCPaQw0xjQHmmNaAC0wvQF6k3ULeNPVF6OeSkRvTDO+Iy9A/ZII9WMitVxns6+MUf/VM3Cj9yrEf8BZfVvQN33ZuJNcP6lXC9zoHOxXbu/fbv64c/cYXju2tK+1b7WiVtF+1u5qR1pDa2tm7rfc77k/cn9e/7vwVeGbAryjXL0CY77U1q7C7j9VHBb7</latexit> For hyperbolic network: gradient w.r.t. network parameters
  • 49. Network - optimization rKi L = (rKi gi)T Pi<latexit sha1_base64="G+tV/OPmWzVF+5WpoNnOJlEejZI=">AAAJg3icddZrT9pQGMDxus1N2E23l3vTzCxxWWJAzTZfmKjgBfFS5K5l5PT0VCq92R5AbPo99mn2dvsK+zZr4enJxsOaGE/+v+cUKBCqeZYZ8Fzu98Kjx08Wnz5bymSfv3j56vXyyptG4A58yurUtVy/pZGAWabD6tzkFmt5PiO2ZrGm1i8k3hwyPzBdp8bHHuvY5MYxDZMSHqfu8obqEM0i3VCtlbtmJJ/KO/LaTLvphmb08VtNVmtK18x2l1dz67nJIeNFHharEhxKd2VpW9VdOrCZw6lFguA6n/N4JyQ+N6nFoqw6CJhHaJ/csOt46RCbBZ1w8uIi+UNcdNlw/fjP4fKk/r0jJHYQjG0tnrQJ7wWzlsR5dj3gxtdOaDregDOHTh/IGFgyd+XkSsm66TPKrXG8INQ34+cq0x7xCeXx9cxms6rODLXGe4yTUNVcS0+ehWupkxIBF0I1eWTNkAtpKopUTFNZpHKaSiKVIGkjkUZpGos0Tje2RWqnSRFJSVNVpGp6rkCkIE1UJJpubIrUTFNDpEaaWiK10lQRqZKefijS8P8vSPNE8qLksjtsRF3bJo4eX3ZjLwqTf/JeFM3SPtA+pgJQAVMRqIjpAOgA0yHQIaYjoCNMx0DHmEpAJUwnQCeYykBlTKdAp5jOgM4wnQOdY7oAusCkACmYKkAVTJdAl5iqQFVMNaAapjpQHVMDqIGpCdTE1AJqYWoDtTFdAV1Fcz6/BJDgfRqQhokC0XmnZIAM7+sB9TDdAt1i6gP1MZlAJqYAKMB0D3SPaQw0xjQAGmC6A7rD5AF5mBwgB5MOpGOygWxMPpCPyQAyMA2BhphGQCNMD0AP8z4CXm/6xohfJVlVenPeIy9Ac0lCcyyuluvMzqYZzU9/A2empxF/gefN1mB2crOxnRyfxa0FXjQ21vOb65uVrdXdc7jtWJLeSe+lNSkvfZF2pWNJkeoSlb5LP6Sf0q/MYuZTZiOzNR19tAB73kr/HJmdP1oLebw=</latexit> (rKi gi)T Pi = rKi ⇥ 2Wi 1Yi 1 Vi 2Yi 2 + h2 fi(Ki, Wi 1Yi 1, bi) ⇤T Pi = rKi ⇥ h2 fi(Ki, Wi 1Yi 1, bi) ⇤T Pi = rKi ⇥ h2 KT i (KiWi 1Yi 1 + bi) ⇤T Pi = h2  rKi ⇥ KT i ⇤ (KiWi 1Yi 1 + bi) + KT rKi ⇥ (KiWi 1Yi 1 + bi) ⇤ T Pi = h2 rvec(Ki)  vec( (KiWi 1Yi 1 + bi)KiI) Pi h2  KT i diag( 0 (KiWi 1Yi 1 + bi))(YT i 1WT i 1 ⌦ I) T Pi = h2 rvec(Ki)  (I ⌦ (KiWi 1Yi 1 + bi)) vec(Ki) Pi h2 ⇥ KT i diag( 0 (KiWi 1Yi 1 + bi))(YT i 1WT i 1 ⌦ I) T Pi = h2  I ⌦ (KiWi 1Yi 1 + bi) + (Wi 1Yi 1 ⌦ I) diag( 0 (KiWi 1Yi 1 + bi))Ki Pi <latexit sha1_base64="CTLcsa6tVDyBNgBKLdU6jB9Dyck=">AAAOcniczdbdcttEFAdwtcUQDDVNuYObhQxgT5OMnTJAL5hp67SN6za1G382cjwreW1vo69Ksl1Xo2uehlt4Ft6DB2Aln91iH6XTDL1AM4l3/r/dsx9WIhmexYOwXP7rytVrH+U+/mTr0/xnn18vfHFj+2YncGe+ydqma7l+z6ABs7jD2iEPLdbzfEZtw2Jd47yaeHfO/IC7Titcemxg04nDx9ykoYiG2zlS1B1qWHQY6a36kMdkMox4XDprEb3VGHLy/a9ko4Nu8MkpORDeFV33KiJp9aG1J9qdpH2gUtG6RaZnB2Q85MW0xK4aqQbuEmPIS2npgZpa1/MXzf6h6+2lFZNIbF4MIHrAJzYtyihzs7cungTqCRHFM2d8O1c6/lIzis+86JaMzSp9+dWvVnrBJuQUrsd8Grq+Q20WzZkZr06/FKt9ZvS4xEKSYuJ3rQSrgbXspatZO8+kZ7LWtflGnE7khGfRD7C6d8xYEre+jNKdd9+23ZDbLEhWQ0of5GyKaS1V972PpZR5qmntdx7T/+qU5Nd2yQOQ93oxk9en/+/bhBtw/VTFNoY3dsr75fQiuFGBxo4GV2O4vXVHH7nmzGZOaFo0CE4rZS8cRNQPuWmxOK/PAuZR85xO2KloJgsOBlH6vzwm34lkRMauL36ckKTpv0dE1A6CpW2InjYNp8GmJWGWnc7C8S+DiDveLGSOuZpoPLNI6JLkwUBG3GdmaC1Fg5o+F2sl5pT61AzF4yOfz+sjNtZb4ZSFNNIN1xolq3AtPU1i4GqkJzMbY1KV0aGKDmVUV1FdRjUV1SAyFipayGipoqUc2FdRX0YNFTVkdKKiE1krUFEgI1NFphzYVVFXRh0VdWTUU1FPRk0VNWX5uYrmF2/I8FTkxcmxO2xhurZNnZE49vG9OEo+yL043qT7QPcxVYGqmA6BDjE9AHqA6SHQQ0yPgB5hOgI6wlQDqmF6DPQYUx2ojukJ0BNMT4GeYjoGOsb0DOgZpgZQA1MTqInpOdBzTCdAJ5haQC1MbaA2pg5QB1MXqIupB9TD1AfqY3oB9CLOuH8pIMXjDCADkwlkZpVkgAyPmwJNMb0EeonpHOgcEwfimAKgANNroNeYlkBLTDOgGaZXQK8weUAeJgfIwTQCGmGygWxMPpCPaQw0xjQHmmNaAC0wvQF6k3ULeNPVF6OeSkRvTDO+Iy9A/ZII9WMitVxns6+MUf/VM3Cj9yrEf8BZfVvQN33ZuJNcP6lXC9zoHOxXbu/fbv64c/cYXju2tK+1b7WiVtF+1u5qR1pDa2tm7rfc77k/cn9e/7vwVeGbAryjXL0CY77U1q7C7j9VHBb7</latexit> For hyperbolic network: gradient w.r.t. network parameters depends on: - Lagrangian multiplier - state for 1 layer only
  • 50. Network - optimization Standard algorithm known as: reduced-space/Lagrangian , Adjoint-state, backpropagation 1) Propagate forward to obtain all network states and satisfy equality constraints rPi L = 0 = Yi gi(Yi 1, Ki)<latexit sha1_base64="6l/BFHtZoF/91nq1/cHPTKn5wn8=">AAAJi3icddbbThpBGMDx1Z6E1lbby95MaprYpBKoSVtNTVTwgHgA5ahryOwwK6t7cncAcbMP0l71aXrb3vZtugvfTlo+Oglh+P9mFlggrOaahi+y2d8zsw8ePnr8ZC6Vfvps/vmLhcWXdd/peYzXmGM6XlOjPjcNm9eEIUzedD1OLc3kDe0mH3ujzz3fcOyqGLr80qJXtqEbjIootRfWVZtqJm0HarXcNkJyuEGyZIOo1VbbICvkqm0sjx4ExkoufB9NS9E0fKeq6XR7YSmbyY4GwZMcTJY2M1/j8a3cXpxbUzsO61ncFsykvn+Ry7riMqCeMJjJw7Ta87lL2Q294hfR1KYW9y+D0ZsMyduodIjueNHNFmRU/94RUMv3h5YWrbSo6PqTFsdpdtET+ufLwLDdnuA2Gz+R3jOJcEh8xkjH8DgT5jCaUOYZ0WslrEs9ykR0XtPptNrhuloVXS5ooGqO2YlfhWOqoxIC5wM1fmZNJ/kkFWQqJKkkUylJRZmKkLSBTIMkDWUaJhtbMrWSVJapnKQzmc6SY/ky+UliMrFkY0OmRpLqMtWT1JSpmaSKTJXk8H2Z+v9/Q5orkxvGp93mA+ZYFrU70WnXt8IgviNbYThJ20DbmPJAeUwFoAKmHaAdTLtAu5j2gPYw7QPtYyoCFTEdAB1gKgGVMB0CHWI6AjrCdAx0jOkE6ARTGaiMqQJUwXQKdIrpDOgMUxWoiqkGVMNUB6pjagA1MDWBmphaQC1M50Dn4ZTvLwWkeJ8GpGFiQGzaITkgx/u6QF1M10DXmG6AbjAZQAYmH8jHdAd0h2kINMTUA+phugW6xeQCuZhsIBtTB6iDyQKyMHlAHiYdSMfUB+pjGgANMN0D3U/7Crjd8Qcj/5WIWu5O+YxcH62LE1rHo2o69uTaJKP14//AidXjiH/A09ZWYe3oYmMtHh/lpQWe1D9kcquZ1Up01XGsjMec8lp5oywrOeWTsqnsK2WlpjDlu/JD+an8Ss2nVlPrqS/jpbMzsOeV8s9I7fwBy8x+xw==</latexit>
  • 51. Network - optimization Standard algorithm known as: reduced-space/Lagrangian , Adjoint-state, backpropagation 1) Propagate forward to obtain all network states and satisfy equality constraints 2) Propagate ‘backward’ to obtain all Lagrangian multipliers rPi L = 0 = Yi gi(Yi 1, Ki)<latexit sha1_base64="6l/BFHtZoF/91nq1/cHPTKn5wn8=">AAAJi3icddbbThpBGMDx1Z6E1lbby95MaprYpBKoSVtNTVTwgHgA5ahryOwwK6t7cncAcbMP0l71aXrb3vZtugvfTlo+Oglh+P9mFlggrOaahi+y2d8zsw8ePnr8ZC6Vfvps/vmLhcWXdd/peYzXmGM6XlOjPjcNm9eEIUzedD1OLc3kDe0mH3ujzz3fcOyqGLr80qJXtqEbjIootRfWVZtqJm0HarXcNkJyuEGyZIOo1VbbICvkqm0sjx4ExkoufB9NS9E0fKeq6XR7YSmbyY4GwZMcTJY2M1/j8a3cXpxbUzsO61ncFsykvn+Ry7riMqCeMJjJw7Ta87lL2Q294hfR1KYW9y+D0ZsMyduodIjueNHNFmRU/94RUMv3h5YWrbSo6PqTFsdpdtET+ufLwLDdnuA2Gz+R3jOJcEh8xkjH8DgT5jCaUOYZ0WslrEs9ykR0XtPptNrhuloVXS5ooGqO2YlfhWOqoxIC5wM1fmZNJ/kkFWQqJKkkUylJRZmKkLSBTIMkDWUaJhtbMrWSVJapnKQzmc6SY/ky+UliMrFkY0OmRpLqMtWT1JSpmaSKTJXk8H2Z+v9/Q5orkxvGp93mA+ZYFrU70WnXt8IgviNbYThJ20DbmPJAeUwFoAKmHaAdTLtAu5j2gPYw7QPtYyoCFTEdAB1gKgGVMB0CHWI6AjrCdAx0jOkE6ARTGaiMqQJUwXQKdIrpDOgMUxWoiqkGVMNUB6pjagA1MDWBmphaQC1M50Dn4ZTvLwWkeJ8GpGFiQGzaITkgx/u6QF1M10DXmG6AbjAZQAYmH8jHdAd0h2kINMTUA+phugW6xeQCuZhsIBtTB6iDyQKyMHlAHiYdSMfUB+pjGgANMN0D3U/7Crjd8Qcj/5WIWu5O+YxcH62LE1rHo2o69uTaJKP14//AidXjiH/A09ZWYe3oYmMtHh/lpQWe1D9kcquZ1Up01XGsjMec8lp5oywrOeWTsqnsK2WlpjDlu/JD+an8Ss2nVlPrqS/jpbMzsOeV8s9I7fwBy8x+xw==</latexit> rYn L = 0 = rYn (X, Yn) Pn rYi L = 0 = Pi + (rYi gi+1)T Pi+1<latexit sha1_base64="bs/nsBqV06vXf+PqxJ8USBSr2EY=">AAAJ53icddZbU9NAFMDx1CutN9RHX3ZkdGBQplXHywMzQBHBcmklvUmws9lu6NLcSLYtNZPP4Jvjqx/L7+KDSXuyA5yaGYbl/zublrSdxvRtEcpi8U/u2vUbN2/dnssX7ty9d//B/MNHjdAbBIzXmWd7QcukIbeFy+tSSJu3/IBTx7R50+yXU28OeRAKz9Xl2OfHDj1xhSUYlUnqzP80XGratBMZersTuXFMdsnzVVIkqwQJMfyeWDT01gtIS+QlMfRqujSMwqV5cfFML9MhQZYJWbx0VhGTk2R0uRSTpW/69FSTvwqd+YXiSnFyELwowWJBg6PaeTj3weh6bOBwVzKbhuFRqejL44gGUjCbxwVjEHKfsj494UfJ0qUOD4+jyQWMybOkdInlBcmPK8mkXtwRUScMx46ZTDpU9sKrlsZZdjSQ1vvjSLj+QHKXTR/IGthEeiR9NUhXBJxJe5wsKAtE8lwJ69GAMpm8ZoVCwehyy9Blj0saGaZnd9Nn4dnGpMTA5chIH9m0SDlLmyptZqmiUiVLOyrtQDJHKo2yNFZpnG1sq9TOUlWlapYOVTrMzhWqFGaJqcSyjU2VmllqqNTIUkulVpZqKtWy0w9VGv7/HzJ9lfw4vewuHzHPcajbTS67tR5H6S+yHsdXaQNoA1MZqIxpE2gT00egj5i2gLYwfQL6hGkbaBvTDtAOps9AnzFVgCqYdoF2Me0B7WHaB9rHdAB0gKkKVMVUA6ph+gL0BdMh0CEmHUjHVAeqY2oANTA1gZqYWkAtTG2gNqavQF/jGe9fCkjxPhPIxMSA2KxTckCO9/WAephOgU4x9YH6mASQwBQChZjOgc4xjYHGmAZAA0xnQGeYfCAfkwvkYuoCdTE5QA6mACjAZAFZmIZAQ0wjoBGm70DfZ70FkpuBCatvJWJUezNeIz9Ec2lCczyptudenc0ymp9+B16Znkb8AZ41q8Ps5GbjQ3q8VbcWeNF4tVJ6vfK69mZhbR9uO+a0J9pTbVErae+0NW1bq2p1jWl/c09zy7kXeZH/kf+Z/zUdvZaDPY+1S0f+9z/3YJn8</latexit>
  • 52. Network - optimization Standard algorithm known as: reduced-space/Lagrangian , Adjoint-state, backpropagation 1) Propagate forward to obtain all network states and satisfy equality constraints 2) Propagate ‘backward’ to obtain all Lagrangian multipliers 3) Compute gradient w.r.t. network parameters for every layer rPi L = 0 = Yi gi(Yi 1, Ki)<latexit sha1_base64="6l/BFHtZoF/91nq1/cHPTKn5wn8=">AAAJi3icddbbThpBGMDx1Z6E1lbby95MaprYpBKoSVtNTVTwgHgA5ahryOwwK6t7cncAcbMP0l71aXrb3vZtugvfTlo+Oglh+P9mFlggrOaahi+y2d8zsw8ePnr8ZC6Vfvps/vmLhcWXdd/peYzXmGM6XlOjPjcNm9eEIUzedD1OLc3kDe0mH3ujzz3fcOyqGLr80qJXtqEbjIootRfWVZtqJm0HarXcNkJyuEGyZIOo1VbbICvkqm0sjx4ExkoufB9NS9E0fKeq6XR7YSmbyY4GwZMcTJY2M1/j8a3cXpxbUzsO61ncFsykvn+Ry7riMqCeMJjJw7Ta87lL2Q294hfR1KYW9y+D0ZsMyduodIjueNHNFmRU/94RUMv3h5YWrbSo6PqTFsdpdtET+ufLwLDdnuA2Gz+R3jOJcEh8xkjH8DgT5jCaUOYZ0WslrEs9ykR0XtPptNrhuloVXS5ooGqO2YlfhWOqoxIC5wM1fmZNJ/kkFWQqJKkkUylJRZmKkLSBTIMkDWUaJhtbMrWSVJapnKQzmc6SY/ky+UliMrFkY0OmRpLqMtWT1JSpmaSKTJXk8H2Z+v9/Q5orkxvGp93mA+ZYFrU70WnXt8IgviNbYThJ20DbmPJAeUwFoAKmHaAdTLtAu5j2gPYw7QPtYyoCFTEdAB1gKgGVMB0CHWI6AjrCdAx0jOkE6ARTGaiMqQJUwXQKdIrpDOgMUxWoiqkGVMNUB6pjagA1MDWBmphaQC1M50Dn4ZTvLwWkeJ8GpGFiQGzaITkgx/u6QF1M10DXmG6AbjAZQAYmH8jHdAd0h2kINMTUA+phugW6xeQCuZhsIBtTB6iDyQKyMHlAHiYdSMfUB+pjGgANMN0D3U/7Crjd8Qcj/5WIWu5O+YxcH62LE1rHo2o69uTaJKP14//AidXjiH/A09ZWYe3oYmMtHh/lpQWe1D9kcquZ1Up01XGsjMec8lp5oywrOeWTsqnsK2WlpjDlu/JD+an8Ss2nVlPrqS/jpbMzsOeV8s9I7fwBy8x+xw==</latexit> rYn L = 0 = rYn (X, Yn) Pn rYi L = 0 = Pi + (rYi gi+1)T Pi+1<latexit sha1_base64="bs/nsBqV06vXf+PqxJ8USBSr2EY=">AAAJ53icddZbU9NAFMDx1CutN9RHX3ZkdGBQplXHywMzQBHBcmklvUmws9lu6NLcSLYtNZPP4Jvjqx/L7+KDSXuyA5yaGYbl/zublrSdxvRtEcpi8U/u2vUbN2/dnssX7ty9d//B/MNHjdAbBIzXmWd7QcukIbeFy+tSSJu3/IBTx7R50+yXU28OeRAKz9Xl2OfHDj1xhSUYlUnqzP80XGratBMZersTuXFMdsnzVVIkqwQJMfyeWDT01gtIS+QlMfRqujSMwqV5cfFML9MhQZYJWbx0VhGTk2R0uRSTpW/69FSTvwqd+YXiSnFyELwowWJBg6PaeTj3weh6bOBwVzKbhuFRqejL44gGUjCbxwVjEHKfsj494UfJ0qUOD4+jyQWMybOkdInlBcmPK8mkXtwRUScMx46ZTDpU9sKrlsZZdjSQ1vvjSLj+QHKXTR/IGthEeiR9NUhXBJxJe5wsKAtE8lwJ69GAMpm8ZoVCwehyy9Blj0saGaZnd9Nn4dnGpMTA5chIH9m0SDlLmyptZqmiUiVLOyrtQDJHKo2yNFZpnG1sq9TOUlWlapYOVTrMzhWqFGaJqcSyjU2VmllqqNTIUkulVpZqKtWy0w9VGv7/HzJ9lfw4vewuHzHPcajbTS67tR5H6S+yHsdXaQNoA1MZqIxpE2gT00egj5i2gLYwfQL6hGkbaBvTDtAOps9AnzFVgCqYdoF2Me0B7WHaB9rHdAB0gKkKVMVUA6ph+gL0BdMh0CEmHUjHVAeqY2oANTA1gZqYWkAtTG2gNqavQF/jGe9fCkjxPhPIxMSA2KxTckCO9/WAephOgU4x9YH6mASQwBQChZjOgc4xjYHGmAZAA0xnQGeYfCAfkwvkYuoCdTE5QA6mACjAZAFZmIZAQ0wjoBGm70DfZ70FkpuBCatvJWJUezNeIz9Ec2lCczyptudenc0ymp9+B16Znkb8AZ41q8Ps5GbjQ3q8VbcWeNF4tVJ6vfK69mZhbR9uO+a0J9pTbVErae+0NW1bq2p1jWl/c09zy7kXeZH/kf+Z/zUdvZaDPY+1S0f+9z/3YJn8</latexit>
  • 55. Backpropagation - Lagrangian connection known since Yet, most neural-network presentations do not use linear algebraic notation Network - notation [Y. LeCun et al., 1988] [Image from http:// neuralnetworksanddeeplearning.com/ chap2.html]  Y 1 j Y 2 j = f ✓ " K(✓1,1 j ) K(✓1,2 j ) K(✓2,1 j ) K(✓2,2 j ) #  Y 1 j 1 Y 2 j 1 ◆ <latexit sha1_base64="2hF00A/0kdIke+FNMvKEq2j/NbI=">AAAKP3icddbbT9NQHMDxziubN9BHXxqNBhIl60xUHkyAcR/gBrtKYTk9O906eqM92xhNfdG/xf/DP8O/wPhifPXNdvv1APvNJoTD93NOe1i7bJprGj7PZn+kbty8dfvO3Zl05t79Bw8fzc49rvpOz6OsQh3T8eoa8Zlp2KzCDW6yuusxYmkmq2mn+dhrfeb5hmOX+dBlxxZp24ZuUMKj1Jz9rWqsbdiBZhHuGedhptHsniiyqo4GuXigMrt16R8yuqxqRrs9n5lcWphXy7zDOIlWBsorJVyQX8oixikXNrsL8Tmv1Fw0Ma7XpuYup16/PNruidIMuq+VcLTlk9zlH5Proi0vZJqzz7OL2dEh44ECg+fLu98+r+x//lVszs0sqS2H9ixmc2oS3z9Ssi4/DojHDWqy6MQ9n7mEnpI2O4qGNrGYfxyMbkwov4hKS9YdL/qxuTyqV1cExPL9oaVFM6ONdvxJi+M0O+px/f1xYNhujzObji+k90yZO3J8l+WW4THKzWE0INQzor3KtEM8Qnn0LGQyGbXFdHixA1VzzFa8C8dURyUEzgdqfGVNl/NJWhNpLUkFkQpJ2hZpG5I2EGmQpKFIw2RhQ6RGkooiFZN0KNJhci5fJD9JVCSaLKyJVEtSVaRqkuoi1ZNUEqmUnL4vUv///5DmiuSG8ctuswF1LItEj6aq6SthEP+SV8JwklaBVjHlgfKY1oDWMK0DrWPaANrAtAm0iWkLaAvTNtA2ph2gHUwFoAKmXaBdTHtAe5j2gfYxfQT6iKkIVMRUAiphOgA6wHQIdIipDFTGVAGqYKoCVTHVgGqY6kB1TA2gBqZPQJ/CKc8vASR4nQakYaJAdNopGSDD6zpAHUxdoC6mU6BTTAaQgckH8jGdA51jGgINMfWAepjOgM4wuUAuJhvIxtQCamGygCxMHpCHSQfSMfWB+pgGQANMF0AX0x4BtzO+MeJTSVaLnSn3yPXRvDiheSyqpmNPzk0ymj/+DJyYPY74DTxtbhnmjr5sLMXHW/HVAg+quUXlzeKbUvStY18aHzPSU+mZNC8p0jtpWdqSilJFoqly6iL1JfU1/T39M/07/Wc89UYK1jyRrh3pv/8A27XARQ==</latexit> (block) sparsity pattern of K: what neurons connect
  • 56. Presentation changes when considering channels and skip connections. ResNet: Network - notation 3x3, 64 1x1, 64 relu 1x1, 256 relu relu 3x3, 64 3x3, 64 relu relu 64-d 256-d Figure 5. A deeper residual function F for ImageNet. Left: a building block (on 56×56 feature maps) as in Fig. 3 for ResNet- 34. Right: a “bottleneck” building block for ResNet-50/101/152. [He et al., 2015]
  • 57. Presentation changes when considering channels and skip connections. ResNet: Network - notation  Y 1 j Y 2 j =  Y 1 j 1 Y 2 j 1 + f ✓ " K(✓1,1 j ) K(✓1,2 j ) K(✓2,1 j ) K(✓2,2 j ) #  Y 1 j 1 Y 2 j 1 ◆ <latexit sha1_base64="FbUp1FbOOxiy9Q5x+co8zpqhJv0=">AAAKenicndbZbtNAFAZgh7WYrYVLbiwqUCugilPEcoHUkq6klKTN2rqNxpNx4tRb7UnS1HJfgbfgDbiF5+AReAcusJPj6XLCDZaiTP7vnPEknsjWPcsMeDb7K3Pt+o2bt25P3ZHv3rv/4OH0zKNq4PZ8yirUtVy/rpOAWabDKtzkFqt7PiO2brGafpRPvNZnfmC6TpkPPXZgk7ZjGiYlPI6aM5k5TWdt0wl1m3DfPInkRrN7qCqaNhrkkoHGnNa5f5BRx6HaDLuv1GjUdZg7/3C584VsKJputttzaIrCnFbmHcZJfM5QfalG88pzRYRJlIua3flkzgtpLi5M0kulufPSy6f/32WPljwvN6dnswvZ0aHggQqD2aWtb2fL22e/i82Zqfday6U9mzmcWiQI9tWsxw9C4nOTWiyeuBcwj9Aj0mb78dAhNgsOwtE1jZRncdJSDNePXw5XRunFjpDYQTC09bgyXmgnuGpJOMn2e9x4dxCajtfjzKHjExk9S+GukmwQpWX6jHJrGA8I9c14rQrtEJ9QHm8jWZa1FjPgxw413bVaySpcSxslEXA+1JIz64aST6MVEa2kUUFEhTTaFNEmRPpARIM0GopomDY2RNRIo6KIimm0K6LddK5AREEaURHRtLEmoloaVUVUTaO6iOppVBJRKZ2+L6L+v7+Q7onIi5Kf3WED6to2ibemphvLUZi8KctRdJU+An3ElAfKY1oBWsG0CrSKaQ1oDdM60DqmDaANTJtAm5g+AX3CVAAqYNoC2sL0Gegzpm2gbUxfgL5gKgIVMZWASph2gHYw7QLtYioDlTFVgCqYqkBVTDWgGqY6UB1TA6iBaQ9oL5qwfwkgwX06kI6JAtFJUzJAhvs6QB1MXaAupiOgI0wmkIkpAAownQCdYBoCDTH1gHqYjoGOMXlAHiYHyMHUAmphsoFsTD6Qj8kAMjD1gfqYBkADTKdAp5O2gNcZXxhxV1K0YmfCNfICVJdEqI7FqeU6V2vTGNWP74FXqsch/gNPqi1D7ehh431yvBGPFnhQzS2oiwuLpfipY1saH1PSE+mpNCep0ltpSdqQilJFopmvme+ZH5mfd/7IT+V5+cW49FoGeh5Llw759V/Yy9GG</latexit> 3x3, 64 1x1, 64 relu 1x1, 256 relu relu 3x3, 64 3x3, 64 relu relu 64-d 256-d Figure 5. A deeper residual function F for ImageNet. Left: a building block (on 56×56 feature maps) as in Fig. 3 for ResNet- 34. Right: a “bottleneck” building block for ResNet-50/101/152. [He et al., 2015]
  • 58. Network - notation squeeze8 expand8 1x18convolu.on8filters8 1x18and83x38convolu.on8filters8 ReLU8 ReLU8 Figure 1: Microarchitectural view: Organization of convolution filters in the Fire module. example, s1x1 = 3, e1x1 = 4, and e3x3 = 4. We illustrate the convolution filters but n activations. the choice of layers in which to downsample in the CNN architecture. Most commonly, dow pling is engineered into CNN architectures by setting the (stride > 1) in some of the convolu pooling layers (e.g. (Szegedy et al., 2014; Simonyan & Zisserman, 2014; Krizhevsky et al., 2 If early3 layers in the network have large strides, then most layers will have small activation Conversely, if most layers in the network have a stride of 1, and the strides greater than 1 a centrated toward the end4 of the network, then many layers in the network will have large act [F.N. Iandola et al., 2016]
  • 59. Network - notation squeeze8 expand8 1x18convolu.on8filters8 1x18and83x38convolu.on8filters8 ReLU8 ReLU8 Figure 1: Microarchitectural view: Organization of convolution filters in the Fire module. example, s1x1 = 3, e1x1 = 4, and e3x3 = 4. We illustrate the convolution filters but n activations. the choice of layers in which to downsample in the CNN architecture. Most commonly, dow pling is engineered into CNN architectures by setting the (stride > 1) in some of the convolu pooling layers (e.g. (Szegedy et al., 2014; Simonyan & Zisserman, 2014; Krizhevsky et al., 2 If early3 layers in the network have large strides, then most layers will have small activation Conversely, if most layers in the network have a stride of 1, and the strides greater than 1 a centrated toward the end4 of the network, then many layers in the network will have large act [F.N. Iandola et al., 2016]  Y 1 j Y 2 j =  Y 1 j 1 Y 2 j 1 + f ✓ " K(✓1,1 j ) K(✓2,1 j ) # f ✓ h K(✓1,1 j ) K(✓1,2 j ) i  Y 1 j 1 Y 2 j 1 ◆◆ <latexit sha1_base64="v/Y88cu9qNnWNjzCf+sVFV3q2hk=">AAAKpXicndZbU9NAFAfwFG8Yr+ijLxkdFUZlmjqj8uAMWFSggC29Q0pns920KbmZbFtqJn4cP4TfxDdf8VOYtCcr9NQHzQzT5f875yRtttPonmUGPJv9kVm4dPnK1WuL1+UbN2/dvnN36V4tcAc+ZVXqWq7f0EnALNNhVW5yizU8nxFbt1hdP8knXh8yPzBdp8LHHmvZpOuYhkkJj6P2Uqas6axrOqFuE+6bp5HcbPePVUXTJotcstCY0/njb2XUcay2w/4LNZp0Hef+/HOx85lsKJpudrvLaERhWavwHuMkPmeoPlejlaRdpMdhLs7a/RU89F9GPlHOTVSf55KJM+P+971NLmIFXtp3H2VXs5NDwQsVFo/Wd7993dj/elZsLy2uaR2XDmzmcGqRIDhSsx5vhcTnJrVYPH4QMI/QE9JlR/HSITYLWuHk9kfK4zjpKIbrx38OVybp+Y6Q2EEwtvW4Mr7cXjBrSTjPjgbceNMKTccbcObQ6YmMgaVwV0n2ktIxfUa5NY4XhPpmfK0K7RGfUB7vOFmWtQ4z4PMONd21OslVuJY2SSLgfKglZ9YNJZ9GmyLaTKOCiApptC2ibYj0kYhGaTQW0ThtbIqomUZFERXTqCyicjorEFGQRlRENG2si6ieRjUR1dKoIaJGGpVEVErHD0U0/Psb0j0ReVHysTtsRF3bJvEG1XRjIwqTF2UjimbpHdA7THmgPKZNoE1M74HeY/oA9AHTR6CPmLaAtjBtA21j2gHawVQAKmDaBdrFtAe0h2kfaB/TJ6BPmIpARUwloBKmA6ADTGWgMqYKUAVTFaiKqQZUw1QHqmNqADUwNYGamA6BDqM5+5cAEtynA+mYKBCdN5IBMtzXA+ph6gP1MZ0AnWAygUxMAVCA6RToFNMYaIxpADTA9BnoMyYPyMPkADmYOkAdTDaQjckH8jEZQAamIdAQ0whohOkL0Jd5W8DrTW+M+FVStGJvzj3yAlSXRKiOxanlOrO1aYzqp7+BM9XTEH+B59VWoHbysLGWHK/EowVe1HKr6svVl6X4qWNfmh6L0gPpobQsqdJraV3akopSVaKZ75mfmbPML/mpvCdX5Nq0dCEDPfelC4fc/g0VoeKI</latexit>
  • 60. Network - notation squeeze8 expand8 1x18convolu.on8filters8 1x18and83x38convolu.on8filters8 ReLU8 ReLU8 Figure 1: Microarchitectural view: Organization of convolution filters in the Fire module. example, s1x1 = 3, e1x1 = 4, and e3x3 = 4. We illustrate the convolution filters but n activations. the choice of layers in which to downsample in the CNN architecture. Most commonly, dow pling is engineered into CNN architectures by setting the (stride > 1) in some of the convolu pooling layers (e.g. (Szegedy et al., 2014; Simonyan & Zisserman, 2014; Krizhevsky et al., 2 If early3 layers in the network have large strides, then most layers will have small activation Conversely, if most layers in the network have a stride of 1, and the strides greater than 1 a centrated toward the end4 of the network, then many layers in the network will have large act [F.N. Iandola et al., 2016]  Y 1 j Y 2 j =  Y 1 j 1 Y 2 j 1 + f ✓ " K(✓1,1 j ) K(✓2,1 j ) # f ✓ h K(✓1,1 j ) K(✓1,2 j ) i  Y 1 j 1 Y 2 j 1 ◆◆ <latexit sha1_base64="v/Y88cu9qNnWNjzCf+sVFV3q2hk=">AAAKpXicndZbU9NAFAfwFG8Yr+ijLxkdFUZlmjqj8uAMWFSggC29Q0pns920KbmZbFtqJn4cP4TfxDdf8VOYtCcr9NQHzQzT5f875yRtttPonmUGPJv9kVm4dPnK1WuL1+UbN2/dvnN36V4tcAc+ZVXqWq7f0EnALNNhVW5yizU8nxFbt1hdP8knXh8yPzBdp8LHHmvZpOuYhkkJj6P2Uqas6axrOqFuE+6bp5HcbPePVUXTJotcstCY0/njb2XUcay2w/4LNZp0Hef+/HOx85lsKJpudrvLaERhWavwHuMkPmeoPlejlaRdpMdhLs7a/RU89F9GPlHOTVSf55KJM+P+971NLmIFXtp3H2VXs5NDwQsVFo/Wd7993dj/elZsLy2uaR2XDmzmcGqRIDhSsx5vhcTnJrVYPH4QMI/QE9JlR/HSITYLWuHk9kfK4zjpKIbrx38OVybp+Y6Q2EEwtvW4Mr7cXjBrSTjPjgbceNMKTccbcObQ6YmMgaVwV0n2ktIxfUa5NY4XhPpmfK0K7RGfUB7vOFmWtQ4z4PMONd21OslVuJY2SSLgfKglZ9YNJZ9GmyLaTKOCiApptC2ibYj0kYhGaTQW0ThtbIqomUZFERXTqCyicjorEFGQRlRENG2si6ieRjUR1dKoIaJGGpVEVErHD0U0/Psb0j0ReVHysTtsRF3bJvEG1XRjIwqTF2UjimbpHdA7THmgPKZNoE1M74HeY/oA9AHTR6CPmLaAtjBtA21j2gHawVQAKmDaBdrFtAe0h2kfaB/TJ6BPmIpARUwloBKmA6ADTGWgMqYKUAVTFaiKqQZUw1QHqmNqADUwNYGamA6BDqM5+5cAEtynA+mYKBCdN5IBMtzXA+ph6gP1MZ0AnWAygUxMAVCA6RToFNMYaIxpADTA9BnoMyYPyMPkADmYOkAdTDaQjckH8jEZQAamIdAQ0whohOkL0Jd5W8DrTW+M+FVStGJvzj3yAlSXRKiOxanlOrO1aYzqp7+BM9XTEH+B59VWoHbysLGWHK/EowVe1HKr6svVl6X4qWNfmh6L0gPpobQsqdJraV3akopSVaKZ75mfmbPML/mpvCdX5Nq0dCEDPfelC4fc/g0VoeKI</latexit> diagonal matrices with 0/1
  • 61. Understanding of the block structure of enables various parameterizations including • block-circulant • block-diagonal convolution + scalar off-diagonal elements [J. Ephrath et al., 2018] Network - notation K ⌘ 2 6 6 6 4 K(✓1,1 ) K(✓1,2 ) . . . K(✓1,nchan in ) K(✓2,1 ) K(✓2,2 ) . . . K(✓2,nchan in ) ... ... ... ... K(✓nchan out,1 ) K(✓nchan out,2 ) . . . K(✓nchan out,nchan in ) 3 7 7 7 5 <latexit sha1_base64="00n7wAhCSd3IFzSOUGYls8hPZd4=">AAAKwnicddZbb9s2FAdwOevaVLu02R73QrTY0AJDYLvA2r61tdvGdS9241saZgFFUbESiVJEyo6rCti3HPZhBkyyj7jUxyYQmPj/Do9pmoHlxIGvdL3+T23nmxvf3ry1e9v+7vsffrxzd++nkYrShIshj4IomThMicCXYqh9HYhJnAgWOoEYOxet0sczkSg/kgO9iMVJyM6k7/mc6SI63avFdNAlVFym/ozY1BFnvsyckOnEv8rt7gOqp0KzP7PG7438IfmNXE+ay4RO3UirNZKnVIsrnfEpk8SXeVFJKbnWr4n6Nbf3a27rR2dQS100WRGl9rVGX7WJUp2jXeCKrbvCpdt2KaT7/5me3r1f368vB8GTBkzuP3v4l1WO3une7lPqRjwNhdQ8YEodN+qxPslYon0eiNymqRIx4xfsTBwXU8lCoU6y5d3Iya9F4hIvSoo/qckyvb4iY6FSi9ApKosdTtW6leEmO0619+Qk82WcaiH56o28NCA6IuVFI66fCK6DRTFhPPGLvZLiXBLGdXEdbdumrvDoYHmWGXWiwC13EQWr082BWxkt39nxSKuK2iZqV1HXRN0q6pioA5EzN9G8ihYmWlQLj0x0VEU9E/Wq6NBEh1UvZSJVRdxEvFo4NtG4ikYmGlXRxESTKuqbqF+1n5lotv0DObGJ4rw8dinmPApDVtxJ6njP86x8Ic/zfJ1eAL3A1AJqYWoDtTG9BHqJ6RXQK0yvgV5jOgA6wNQB6mB6A/QGUxeoi+kt0FtM74DeYXoP9B7TB6APmHpAPUx9oD6mj0AfMR0CHWIaAA0wDYGGmEZAI0xjoDGmCdAE0xHQEaZPQJ/yDfeXATK8zgFyMHEgvqmlABR43RRoiukc6BzTBdAFJh/Ix6SAFKYroCtMC6AFphQoxXQJdIkpBooxSSCJyQVyMYVAIaYEKMHkAXmYZkAzTHOgOabPQJ83XYF4uvpizK8Sob3phu8oVqiujFCdKNIgkuu1VYzqV7+Ba9WrEP8Db6odQO3yYeNpOf4wjxZ4MmruNx7tP+oXTx0dazV2rV+se9YDq2E9tp5ZB1bPGlq89nft352bO7fstn1uX9pqVbpTgzU/W18N+8t/7Yrnxw==</latexit> [E. Treister et al., 2018]
  • 65. Reversible networks Consider leapfrog discretization of the nonlinear Telegraph equation: Reverse propagation follows as [Chang et al., 2018] Yi = W 1 i  2Wi+1Yi+1 f(Wi+1Yi+1, Ki+2) Yi+2 <latexit sha1_base64="DZsFvT7XrYfYmRuw4YHG++4w14s=">AAAJunicddbZbtNAFIBhl7UJWwuX3IxASEVAlbQSUAmktumeLgnN2jpE48m4mcZb7UnS1DKPw+twzSPwANwCdnI8gp5gqero/844qZ0qNjxLBDKX+z5z4+at23fuzmay9+4/ePhobv5xLXD7PuNV5lqu3zBowC3h8KoU0uINz+fUNixeN3qFxOsD7gfCdSpy5PGWTc8cYQpGZZzac0KvNNuCfCR6pd4Wn8M3+Yjohjg7OyVL4xaKV0mKpyarN8QkC9PgdbwuJuul6GU8BbAEp2tl23PPc4u58UHwIg+L56v7X7+sHX75UWrPz67oHZf1be5IZtEgOM3nPNkKqS8Fs3iU1fsB9yjr0TN+Gi8davOgFY6vSURexKVDTNePfxxJxvXvHSG1g2BkG/GkTWU3uG5JnGanfWm+b4XC8fqSO2zyQmbfItIlyQUmHeFzJq1RvKDMF/F7JaxLfcpkfBuy2aze4aZekV0uaagbrtVJ3oVr6eMSARdCPXllwySFNG2otJGmokrFNO2qtAvJGKo0TNNIpVG6salSM00llUppOlbpOD1XoFKQJqYSSzfWVaqnqaZSLU0NlRppKqtUTk8/UGnw/z/I8FTyouSyO3zIXNumTie+7OZaFCa/yFoUXad1oHVMBaACpg2gDUybQJuYtoC2MG0DbWPaAdrBtAu0i2kPaA9TEaiIaR9oH9MB0AGmQ6BDTEdAR5hKQCVMZaAypk9AnzAdAx1jqgBVMFWBqphqQDVMdaA6pgZQA1MTqInpBOgkmvL5pYAU7zOADEwMiE07JQfkeF8XqIvpHOgcUw+oh0kACUwBUIDpEugS0whohKkP1Md0AXSByQPyMDlADqYOUAeTDWRj8oF8TCaQiWkANMA0BBpiugK6mvYR8LqTG6O+lYhe6k65R16A5pKE5nhcLde5PptmND/5Drw2PYn4H3jabAVmxw8bK8nxVj1a4EVtaTG/vLhcjp86DrXJMas91Z5pC1pee6etajtaSatqTPum/dR+ab8zHzJGRmR6k9EbM7DnifbPkZF/AHN8jtE=</latexit> Yi+1 = 2WiYi Wi 1Yi 1 + f(WiYi, Ki)<latexit sha1_base64="9kNDvopV4v3ZCvTrvEYCF6doK38=">AAAJoXicddbZbtNAFIBhl7UJWwuX3IyokIqAKgEJ6AVSS9IlTVucNmvrKhpPxs003rAnSVPLDwIPwrsgcQvPgZ0cj6AnWKo6+b8zWZxEsenbIpSFwo+FGzdv3b5zdzGXv3f/wcNHS8uPm6E3DBhvMM/2grZJQ24LlzekkDZv+wGnjmnzljkopd4a8SAUnluXE5+fOfTcFZZgVCapu6QbptXpRuJlMSYfyRuS3Gx1xTQK8prMbkfidcIwmS5fEmKtwuysi1dGvdoVL/LdpZXCWmF6ELwowmJlY+1renzTu8uL60bPY0OHu5LZNAxPiwVfnkU0kILZPM4bw5D7lA3oOT9Nli51eHgWTV96TJ4npUcsL0j+XEmm9e8dEXXCcOKYyaRDZT+8bmmcZ6dDaX04i4TrDyV32eyBrKFNpEfS80h6IuBM2pNkQVkgkudKWJ8GlMnkbOfzeaPHLaMu+1zSyDA9u5c+C882piUGLkVG+simRUpZKqtUzlJVpWqWKipVIJljlcZZmqg0yTZ2VOpkSVdJz9KxSsfZfYUqhVliKrFsY0ulVpaaKjWz1FapnaWaSrXs7kcqjf7/gkxfJT9OT7vLx8xzHOr2ktNubcZR+o9sxvF1+gT0CVMJqISpDFTGtAW0hWkbaBvTDtAOpl2gXUwVoAqmPaA9TFWgKqZ9oH1MB0AHmA6BDjF9BvqMSQfSMdWAapiOgI4wHQMdY6oD1TE1gBqYmkBNTC2gFqY2UBtTB6iD6QToJJ7z+aWAFO8zgUxMDIjNu0sOyPG+PlAf0wXQBaYB0ACTABKYQqAQ0yXQJaYJ0ATTEGiI6QvQF0w+kI/JBXIx9YB6mBwgB1MAFGCygCxMI6ARpjHQGNMV0NW8j4Dfn70x6leJGHp/znvkh2guTWiOJ9X23OuzWUbzs9/Aa9OziL/A82brMDu92FhPj3fq0gIvmm/Wim/X3taSq45DbXYsak+1Z9qqVtTeaxvarqZrDY1p37Wf2i/td24lV8npuaPZ6I0F2PNE++fInf4BafeGsA==</latexit>
  • 66. Reversible networks Consider leapfrog discretization of the nonlinear Telegraph equation: Reverse propagation follows as Changes channels and resolution (pooling), generally not invertible. Orthogonal wavelet transform suits the purpose [Chang et al., 2018] Yi = W 1 i  2Wi+1Yi+1 f(Wi+1Yi+1, Ki+2) Yi+2 <latexit sha1_base64="DZsFvT7XrYfYmRuw4YHG++4w14s=">AAAJunicddbZbtNAFIBhl7UJWwuX3IxASEVAlbQSUAmktumeLgnN2jpE48m4mcZb7UnS1DKPw+twzSPwANwCdnI8gp5gqero/844qZ0qNjxLBDKX+z5z4+at23fuzmay9+4/ePhobv5xLXD7PuNV5lqu3zBowC3h8KoU0uINz+fUNixeN3qFxOsD7gfCdSpy5PGWTc8cYQpGZZzac0KvNNuCfCR6pd4Wn8M3+Yjohjg7OyVL4xaKV0mKpyarN8QkC9PgdbwuJuul6GU8BbAEp2tl23PPc4u58UHwIg+L56v7X7+sHX75UWrPz67oHZf1be5IZtEgOM3nPNkKqS8Fs3iU1fsB9yjr0TN+Gi8davOgFY6vSURexKVDTNePfxxJxvXvHSG1g2BkG/GkTWU3uG5JnGanfWm+b4XC8fqSO2zyQmbfItIlyQUmHeFzJq1RvKDMF/F7JaxLfcpkfBuy2aze4aZekV0uaagbrtVJ3oVr6eMSARdCPXllwySFNG2otJGmokrFNO2qtAvJGKo0TNNIpVG6salSM00llUppOlbpOD1XoFKQJqYSSzfWVaqnqaZSLU0NlRppKqtUTk8/UGnw/z/I8FTyouSyO3zIXNumTie+7OZaFCa/yFoUXad1oHVMBaACpg2gDUybQJuYtoC2MG0DbWPaAdrBtAu0i2kPaA9TEaiIaR9oH9MB0AGmQ6BDTEdAR5hKQCVMZaAypk9AnzAdAx1jqgBVMFWBqphqQDVMdaA6pgZQA1MTqInpBOgkmvL5pYAU7zOADEwMiE07JQfkeF8XqIvpHOgcUw+oh0kACUwBUIDpEugS0whohKkP1Md0AXSByQPyMDlADqYOUAeTDWRj8oF8TCaQiWkANMA0BBpiugK6mvYR8LqTG6O+lYhe6k65R16A5pKE5nhcLde5PptmND/5Drw2PYn4H3jabAVmxw8bK8nxVj1a4EVtaTG/vLhcjp86DrXJMas91Z5pC1pee6etajtaSatqTPum/dR+ab8zHzJGRmR6k9EbM7DnifbPkZF/AHN8jtE=</latexit> Yi+1 = 2WiYi Wi 1Yi 1 + f(WiYi, Ki)<latexit sha1_base64="9kNDvopV4v3ZCvTrvEYCF6doK38=">AAAJoXicddbZbtNAFIBhl7UJWwuX3IyokIqAKgEJ6AVSS9IlTVucNmvrKhpPxs003rAnSVPLDwIPwrsgcQvPgZ0cj6AnWKo6+b8zWZxEsenbIpSFwo+FGzdv3b5zdzGXv3f/wcNHS8uPm6E3DBhvMM/2grZJQ24LlzekkDZv+wGnjmnzljkopd4a8SAUnluXE5+fOfTcFZZgVCapu6QbptXpRuJlMSYfyRuS3Gx1xTQK8prMbkfidcIwmS5fEmKtwuysi1dGvdoVL/LdpZXCWmF6ELwowmJlY+1renzTu8uL60bPY0OHu5LZNAxPiwVfnkU0kILZPM4bw5D7lA3oOT9Nli51eHgWTV96TJ4npUcsL0j+XEmm9e8dEXXCcOKYyaRDZT+8bmmcZ6dDaX04i4TrDyV32eyBrKFNpEfS80h6IuBM2pNkQVkgkudKWJ8GlMnkbOfzeaPHLaMu+1zSyDA9u5c+C882piUGLkVG+simRUpZKqtUzlJVpWqWKipVIJljlcZZmqg0yTZ2VOpkSVdJz9KxSsfZfYUqhVliKrFsY0ulVpaaKjWz1FapnaWaSrXs7kcqjf7/gkxfJT9OT7vLx8xzHOr2ktNubcZR+o9sxvF1+gT0CVMJqISpDFTGtAW0hWkbaBvTDtAOpl2gXUwVoAqmPaA9TFWgKqZ9oH1MB0AHmA6BDjF9BvqMSQfSMdWAapiOgI4wHQMdY6oD1TE1gBqYmkBNTC2gFqY2UBtTB6iD6QToJJ7z+aWAFO8zgUxMDIjNu0sOyPG+PlAf0wXQBaYB0ACTABKYQqAQ0yXQJaYJ0ATTEGiI6QvQF0w+kI/JBXIx9YB6mBwgB1MAFGCygCxMI6ARpjHQGNMV0NW8j4Dfn70x6leJGHp/znvkh2guTWiOJ9X23OuzWUbzs9/Aa9OziL/A82brMDu92FhPj3fq0gIvmm/Wim/X3taSq45DbXYsak+1Z9qqVtTeaxvarqZrDY1p37Wf2i/td24lV8npuaPZ6I0F2PNE++fInf4BafeGsA==</latexit> [Lensink, Haber & B.P, 2019]
  • 77. References • Automatic classification of geologic units in seismic images using partially interpreted examples Bas Peters, Justin Granek, Eldad Haber 81st EAGE Conference and Exhibition 2019. • Multi-resolution neural networks for tracking seismic horizons from few training images Bas Peters, Justin Granek, Eldad Haber Interpretation, 7, no. 3 (2019): 1-54. • Neural-networks for geophysicists and their application to seismic data interpretation Bas Peters, Eldad Haber, Justin Granek The Leading Edge, 38, no. 7 (2019): 534-540 • Does shallow geological knowledge help neural-networks to predict deep units? Bas Peters, Eldad Haber, Justin Granek SEG Technical Program Expanded Abstracts 2019 • Fully Hyperbolic Convolutional Neural Networks Keegan Lensink, Eldad Haber, Bas Peters arXiv:1905.10484