SlideShare a Scribd company logo
December 
3rd, 
2014 
– 
Silicon 
Valley, 
CA
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
Agenda 
• IntroducEon 
• Understanding 
basis 
issues 
for 
OSS 
compliance 
• Understanding 
new 
issues 
for 
OSS 
compliance 
in 
the 
GitHub 
Era 
• Best 
pracEces 
to 
reduce 
risk 
• Latest 
trends 
for 
process 
and 
tools 
to 
manage 
open 
source 
compliance 
• QuesEons
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
Most 
Common 
OSS 
License 
ObligaEons 
• Copyright 
and 
license 
noEces 
• ANribuEon 
obligaEons 
• “CopyleP” 
obligaEons 
– Licensing 
of 
derivaEve 
works 
– Change 
noEces 
– Offer 
to 
provide 
source 
code 
• Carve 
out 
for 
OSS 
in 
your 
license 
agreements
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
Key 
OSS 
Compliance 
Challenges 
• Tracking 
acquisiEon 
and 
use 
of 
open 
source 
• GeWng 
OSS 
informaEon 
from 
suppliers 
• Delivering 
OSS 
informaEon 
to 
customers 
– ANribuEon 
noEce 
creaEon 
and 
delivery 
– CopyleP 
-­‐ 
source 
code 
packaging 
and 
delivery
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
The 
“GitHub 
Era” 
• Decentralized 
and 
distributed 
model 
of 
Git 
represents 
many 
of 
the 
new 
OSS 
trends 
• More 
individuals 
engaged 
directly 
• Smaller 
projects/components 
with 
many 
more 
dependencies 
• Forking 
is 
encouraged 
à 
exponenEal 
growth 
in 
number 
of 
copies 
of 
popular 
components 
• Explosion 
in 
the 
number 
of 
disEnct 
OSS 
components 
used 
in 
products 
and 
systems 
-­‐ 
from 
dozens 
to 
hundreds 
to 
thousands 
or 
more
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
Growth 
of 
component 
repositories 
• In 
January 
2011 
there 
were 
less 
than 
80K 
components 
available 
in 
the 
main 
component 
repositories 
(Maven, 
CPAN, 
Pypi, 
RubyGems) 
• In 
December 
2014 
there 
are 
more 
than 
500K 
components 
and 
counEng 
(including 
NPM, 
Bower, 
Godoc, 
Packagist, 
NuGet) 
• In 
2014, 
new 
components 
have 
been 
added 
to 
these 
repositories 
at 
the 
rate 
of 
over 
10,000 
new 
component-­‐versions 
per 
month. 
Source: 
hNp://www.modulecounts.com/
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
GitHub 
– 
more 
background 
• Provides 
Git-­‐based 
services. 
• Git 
is 
a 
version 
control 
and 
content 
management 
tool 
from 
Linus 
Torvalds 
(GPL 
v2) 
• GitHub 
key 
aNributes 
are 
easy 
code 
sharing 
and 
collaboraEon 
• JavaScript 
is 
dominant 
– Other 
languages: 
Ruby, 
Java, 
PHP, 
Python, 
C/C++ 
• Started 
in 
2008 
– Over 
17 
million 
repos 
and 
7.8 
million 
users 
claimed 
• Over 
the 
last 
12 
months, 
new 
public 
open 
source 
components 
repositories: 
– Over 
350K 
created 
per 
month 
(excluding 
forks) 
– Over 
10,000+ 
created 
daily 
Source: 
nexB 
research 
data, 
Github 
API, 
2013-­‐11/2014-­‐11
EvoluEon 
of 
OSS 
Development 
OLD 
OSS 
• Centralized 
development 
• CVS, 
Subversion 
• Project 
leader 
is 
benevolent 
dictator 
• Fewer 
larger 
components 
• Push 
releases 
• C/C++, 
Java 
• SourceForge, 
Maven 
• L/GPL 
v2, 
BSD, 
MIT 
• Desktops 
and 
servers 
NEW 
OSS 
• Decentralized 
development 
• Git 
/ 
GitHub 
• Each 
developer 
forks 
code 
at 
any 
Eme 
• More 
smaller 
components 
• Pull 
releases 
• JavaScript, 
Ruby, 
Scala, 
Go 
• RubyGems, 
NPM 
• MIT, 
Apache, 
L/GPL 
v3 
• Mobile 
and 
Cloud
EvoluEon 
of 
OSS 
Compliance 
Challenges 
OLD 
OSS 
• 
Components 
without 
a 
license 
• 
OSS 
code 
downloaded 
to 
internal 
codebase 
and 
compiled 
locally 
(vendored) 
• 
DistribuEon 
means 
shipment 
or 
download 
• 
Snippets 
NEW 
OSS 
• 
Many 
more 
components 
without 
a 
license 
• 
Deep 
external 
dependencies 
provisioned 
live 
from 
the 
web 
at 
deployment 
or 
runEme 
• DistribuEon 
via 
network 
/ 
Internet 
deployment 
• 
Many 
more 
snippets
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
Challenges 
-­‐ 
Missing 
licenses 
• No 
license 
from 
copyright 
holder 
means 
that 
you 
do 
not 
have 
a 
right 
to 
copy 
or 
re-­‐use 
the 
soPware 
• License 
at 
project 
/ 
README 
level 
helps, 
but… 
• Ambiguous 
without 
noEces 
in 
source 
files 
• License 
informaEon 
is 
lost 
when 
code 
is 
parEally 
copied 
• Not 
a 
new 
problem, 
but 
scale 
is 
increasing 
rapidly
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
SoluEons 
– 
Missing 
licenses
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
Challenges 
– 
Snippets 
• Many 
snippet-­‐sharing 
/ 
educaEonal 
web 
sites 
have 
vague 
or 
no 
license 
terms 
– Someone 
who 
posts 
a 
code 
snippet 
or 
code 
example 
does 
not 
usually 
think 
about 
an 
explicit 
license 
– Terms 
of 
service 
are 
the 
typical 
default 
• StackOverflow 
example 
– Major 
source 
of 
advice 
about 
coding 
including 
code 
snippets 
– StackOverflow 
license 
is 
CC-­‐BY-­‐SA 
which 
is 
effecEvely 
copyleP
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
Challenges 
– 
JavaScript 
example 
• Accelerated 
usage 
on 
server 
and 
clients 
– 
esp. 
mobile 
• Very 
common 
to 
mash 
up 
snippets 
of 
JavaScript 
from 
mulEple 
origins 
and 
compile/minify 
them 
in 
a 
single 
file 
for 
execuEon 
efficiency 
– License 
informaEon 
oPen 
lost 
when 
extracEng 
snippets 
– Most 
restricEve 
license 
applies 
to 
the 
JavaScript 
file 
• jQuery 
core 
components 
are 
MIT-­‐licensed, 
but 
components 
named 
jquery-­‐xxxxx 
may 
be 
copyleP-­‐ 
licensed 
– ExecuEng 
JS 
on 
client 
could 
be 
considered 
distribuEon 
– And 
could 
have 
copyleP 
impact 
on 
server-­‐side 
code
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
Healthcare.gov 
case 
study 
• Healthcare.gov 
uses 
JavaScript 
code 
from 
Datatables 
(jQuery 
plug-­‐in 
under 
BSD 
3-­‐clause 
or 
GPL 
v2) 
• Weekly 
Standard 
accused 
HHS 
of 
removing 
copyright 
& 
license 
noEces 
from 
the 
borrowed 
code 
• Our 
analysis 
determined 
that 
the 
developers 
did 
not 
remove 
noEces 
– 
they 
created 
their 
own 
Datatables.js 
file 
from 
snippets 
from 
other 
Datatables 
project 
files 
that 
did 
not 
contain 
license 
noEces 
• HHS 
quickly 
corrected 
this 
case, 
but 
the 
error 
indicates 
poor 
guidance 
to 
developers 
See 
hNp://www.dejacode.org/healthcare_case_study.html
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
Challenges 
– 
Managing 
dependencies 
• Java, 
JavaScript, 
Ruby, 
Go 
and 
many 
newer 
languages 
automate 
provisioning 
of 
required 
components, 
aka 
dependencies 
• AutomaEon 
is 
convenient 
for 
developers, 
but 
adds 
risk 
– Dependent 
components 
may 
not 
be 
provisioned 
unEl 
deployment 
or 
runEme 
– Dependencies 
may 
be 
deep 
and 
recursive 
– AutomaEcally 
provisioned 
components 
may 
contain 
“hidden” 
security, 
quality 
or 
licensing 
issues 
– Accurate 
ANribuEon 
for 
OSS 
components 
may 
be 
very 
complex
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
SoluEons 
– 
Dependency 
Management 
• A 
basic 
soluEon 
is 
“vendoring” 
– 
explicitly 
control 
provisioning 
of 
third-­‐party 
components 
• SoP 
vendoring 
– 
define 
exact 
list 
of 
third-­‐party 
component-­‐ 
versions 
from 
known/veNed 
repositories 
• Hard 
vendoring 
– 
physically 
copy 
the 
third-­‐party 
component-­‐ 
versions 
to 
a 
/vendor 
folder 
in 
your 
codebase 
• Different 
repositories 
/ 
plarorms 
provide 
different 
tools 
• Maven 
and 
others 
for 
Java 
• .gitmodules 
file 
for 
Git 
• Godep 
for 
Go, 
NPM 
for 
Node.js, 
Bundler 
for 
Ruby, 
etc.
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
Compliance 
in 
the 
GitHub 
Era 
• Open 
source 
code 
is 
evolving 
and 
expanding 
too 
quickly 
for 
tradiEonal 
source 
code 
scanning 
and 
matching 
techniques 
– Number 
of 
possible 
matches 
increase 
with 
each 
fork 
– Many 
or 
most 
of 
the 
open 
source 
components 
may 
not 
actually 
be 
in 
your 
codebase 
(dependencies) 
• Risk 
focus 
on 
components 
over 
snippets 
even 
more 
important 
• AcceleraEng 
proliferaEon 
of 
languages, 
plarorms 
and 
repositories 
requires 
acEve 
management 
and 
coordinaEon 
from 
business, 
engineering 
and 
legal 
teams 
17
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
Compliance 
in 
the 
GitHub 
Era 
• Adapt 
policies 
to 
specific 
languages 
and 
plarorms 
upfront: 
– Define 
acceptable 
licenses 
in 
context 
of 
the 
technology 
and 
usage 
• Distributed 
as 
soPware 
package 
or 
Cloud-­‐based 
service? 
• What 
does 
copyleP 
mean 
in 
context? 
– Create 
Light-­‐weight 
process 
for 
idenEfying 
and 
resolving 
provenance 
gaps 
/ 
issues 
– Evaluate 
preferred 
sources 
for 
provisioning 
components 
– Determine 
best 
dependency 
management 
approach 
for 
each 
technology 
18
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
Compliance 
in 
the 
GitHub 
Era 
• Embed 
open 
source 
provenance 
data 
in 
your 
codebase 
– As 
close 
to 
the 
code 
as 
possible 
– Adapt 
techniques 
to 
leverage 
exisEng 
tools 
and 
data 
from 
each 
plarorm 
/ 
repository 
– Use 
simple 
approach 
to 
document 
provenance 
data 
if 
missing 
from 
original 
project 
– Instrument 
your 
build 
processes 
to 
idenEfy 
components 
that 
you 
actually 
use 
in 
each 
deployed 
product 
• Most 
accurate 
way 
to 
track 
and 
fulfill 
OSS 
obligaEons 
• Fully 
automate 
aNribuEon 
documentaEon 
• RedistribuEon, 
if 
applicable, 
has 
extra 
steps 
See 
also 
hNps://github.com/dejacode/about-­‐code-­‐tool 
19
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
Compliance 
in 
the 
GitHub 
Era 
• Establish 
central 
database 
of 
open 
source 
and 
third-­‐ 
party 
components 
– Collect 
provenance 
data 
for 
all 
products 
across 
languages 
and 
plarorms 
– Document 
all 
effecEve 
component 
dependencies 
– Harmonize 
open 
source 
compliance 
by 
product 
across 
languages 
and 
plarorms 
• Current 
soluEons 
from 
several 
vendors, 
but 
no 
OSS 
soluEon 
is 
available 
today 
See 
also 
hNps://enterprise.dejacode.com/landing/ 
20
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
QuesEons
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
About 
nexB 
Inc. 
• nexB 
offers: 
– DejaCode 
– 
a 
central 
business 
system 
for 
managing 
soPware 
components 
– SoPware 
analysis/audit 
services 
for 
products 
and 
for 
acquisiEons 
• 350+ 
soPware 
audit 
projects 
completed 
to-­‐date 
– Aggregated 
audited 
codebases 
> 
3 
billion 
lines 
of 
source 
code 
– Aggregated 
value 
of 
the 
acquisiEons 
transacEons 
> 
$5B 
• See 
DejaCode 
at 
www.dejacode.com
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
Contacts 
• O’Melveny 
& 
Myers 
Heather 
Meeker 
hmeeker@omm.com 
+1 
650 
473 
2635 
Subscribe 
to 
news 
and 
events 
alert 
at 
hNp://heathermeeker.squarespace.com/ 
• nexB 
Inc. 
Michael 
Herzog 
mjherzog@nexB.com 
+1 
650 
380 
0680
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
Resources 
– 
OSS 
Licensing 
Trends 
• Neil 
McAllister 
-­‐ 
Study: 
Most 
projects 
on 
GitHub 
not 
open 
source 
licensed 
hNp://www.theregister.co.uk/2013/04/18/github_licensing_study/ 
• MaN 
Asay 
-­‐ 
Open 
Source 
Is 
Old 
School, 
Says 
The 
GitHub 
GeneraEon 
hNp://readwrite.com/2013/05/15/open-­‐source-­‐is-­‐old-­‐school-­‐says-­‐the-­‐github-­‐ 
generaEon 
• Richard 
Fontana 
-­‐ 
Post 
open 
source 
soPware, 
licensing 
and 
GitHub 
hNp://opensource.com/law/13/8/github-­‐poss-­‐licensing 
• Simon 
Phipps 
-­‐ 
GitHub 
finally 
takes 
open 
source 
licenses 
seriously 
hNp://www.infoworld.com/arEcle/2611422/open-­‐source-­‐soPware/github-­‐finally-­‐ 
takes-­‐open-­‐source-­‐licenses-­‐seriously.html 
• Armin 
Ronacher 
-­‐ 
Licensing 
in 
a 
Post 
Copyright 
World 
hNp://lucumr.pocoo.org/2013/7/23/licensing/ 
24
Managing 
Open 
Source 
in 
the 
GitHub 
Era 
Resources 
– 
OSS 
Language 
/ 
Repo 
Trends 
• GitHub 
growth 
and 
language 
trends 
hNp://redmonk.com/dberkholz/2013/01/21/github-­‐will-­‐hit-­‐5-­‐million-­‐users-­‐within-­‐a-­‐ 
year/ 
hNp://redmonk.com/dberkholz/2014/05/02/github-­‐language-­‐trends-­‐and-­‐the-­‐ 
fragmenEng-­‐landscape/ 
hNp://beust.com/weblog/2014/05/03/language-­‐popularity-­‐on-­‐github/ 
hNp://redmonk.com/dberkholz/2014/09/26/githubs-­‐vanishing-­‐acceleraEon/ 
• Repository 
package 
growth 
staEsEcs 
hNp://www.modulecounts.com/ 
• GitHub 
Users 
Worldwide 
hNp://aasen.in/github_globe/ 
25

More Related Content

PDF
nexB Software Audit M&A: What to expect as a Seller
PDF
Open source software governance with DejaCode
PPTX
How to Manage Open Source requirements with AboutCode
PPTX
Managing Open Source Software License Compliance with DejaCode
PPT
Managing Software Inventories & Automating Open Source Software Compliance
PDF
nexB - FOSS Introduction
PPTX
Rightsizing Open Source Software Identification
PDF
Open source governance with Dejacode
nexB Software Audit M&A: What to expect as a Seller
Open source software governance with DejaCode
How to Manage Open Source requirements with AboutCode
Managing Open Source Software License Compliance with DejaCode
Managing Software Inventories & Automating Open Source Software Compliance
nexB - FOSS Introduction
Rightsizing Open Source Software Identification
Open source governance with Dejacode

What's hot (20)

PDF
nexB - Software audit for product release
PDF
Software audit for acquisition due diligence with nexB
PDF
nexB: Software Audit for Acquisition Due Diligence
PPTX
Optimizing The Cost Of Open Source Software Management
PDF
Thick Application Penetration Testing - A Crash Course
PPTX
Supporting Digital Media Workflows in the Cloud with Perforce Helix
PPTX
Using Perforce Data in Development at Tableau
PPTX
Global Software Development powered by Perforce
PPTX
Software Testing in a Distributed Environment
PPTX
Open-Source Software Panel - IP Track
PPTX
From ClearCase to Perforce Helix: Breakthroughs in Scalability at Intel
PPTX
Creating and Maintaining an Open Source Library
PPTX
Reverse_Engineering_Thick-clients
PPTX
Approaches to debugging mixed-language HPC apps
PPTX
Perforce Helix Never Dies: DevOps at Bandai Namco Studios
PDF
Implementing Continuous Delivery with Enterprise Middleware
PDF
BYOP: Custom Processor Development with Apache NiFi
PPTX
Accelerating Software Development with NetApp's P4flex
PDF
Swarm Update
PPTX
Securing the Helix Platform at Citrix
nexB - Software audit for product release
Software audit for acquisition due diligence with nexB
nexB: Software Audit for Acquisition Due Diligence
Optimizing The Cost Of Open Source Software Management
Thick Application Penetration Testing - A Crash Course
Supporting Digital Media Workflows in the Cloud with Perforce Helix
Using Perforce Data in Development at Tableau
Global Software Development powered by Perforce
Software Testing in a Distributed Environment
Open-Source Software Panel - IP Track
From ClearCase to Perforce Helix: Breakthroughs in Scalability at Intel
Creating and Maintaining an Open Source Library
Reverse_Engineering_Thick-clients
Approaches to debugging mixed-language HPC apps
Perforce Helix Never Dies: DevOps at Bandai Namco Studios
Implementing Continuous Delivery with Enterprise Middleware
BYOP: Custom Processor Development with Apache NiFi
Accelerating Software Development with NetApp's P4flex
Swarm Update
Securing the Helix Platform at Citrix
Ad

Similar to Managing Open Source Software in the GitHub Era (20)

PDF
Creating an Open Source Office: Lessons from Twitter
PDF
OSSF 2018 - Jamie Jones of GitHub - Pull what where? Contributing to Open Sou...
PDF
August OpenNTF Webinar - Git and GitHub Explained
PPTX
Guide to open source
PDF
Threat Modeling the CI/CD Pipeline to Improve Software Supply Chain Security ...
PPTX
Intro to Github
PDF
Europace's journey to InnerSource
PDF
GitHub Actions in Action MEAP V03 Michael Kaufmann
PDF
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdf
PDF
GitHub Actions in Action MEAP V03 Michael Kaufmann
KEY
Open Source Compliance at Twitter
PPTX
Techorama 2022 - Adventures of building Promitor, an open-source product
PPTX
Github
PDF
GitHub Workflows for Technical Communication
PPTX
Seminole County Teach In 2017: Crooms Acadamy of Information Technology
PPT
Open up your platform with Open Source and GitHub
PPTX
Hacktoberfest 2020 - Open source for beginners
PPTX
Managing Open Source Software Supply Chains
PPTX
How We Learned To Stop Worrying And Love (or at least live with) GitHub
PPTX
Git and GitHub (1).pptx
Creating an Open Source Office: Lessons from Twitter
OSSF 2018 - Jamie Jones of GitHub - Pull what where? Contributing to Open Sou...
August OpenNTF Webinar - Git and GitHub Explained
Guide to open source
Threat Modeling the CI/CD Pipeline to Improve Software Supply Chain Security ...
Intro to Github
Europace's journey to InnerSource
GitHub Actions in Action MEAP V03 Michael Kaufmann
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdf
GitHub Actions in Action MEAP V03 Michael Kaufmann
Open Source Compliance at Twitter
Techorama 2022 - Adventures of building Promitor, an open-source product
Github
GitHub Workflows for Technical Communication
Seminole County Teach In 2017: Crooms Acadamy of Information Technology
Open up your platform with Open Source and GitHub
Hacktoberfest 2020 - Open source for beginners
Managing Open Source Software Supply Chains
How We Learned To Stop Worrying And Love (or at least live with) GitHub
Git and GitHub (1).pptx
Ad

Recently uploaded (20)

PPTX
New Microsoft PowerPoint Presentation - Copy.pptx
DOCX
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
PPTX
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
PDF
Business model innovation report 2022.pdf
DOCX
Business Management - unit 1 and 2
PDF
WRN_Investor_Presentation_August 2025.pdf
PDF
Dr. Enrique Segura Ense Group - A Self-Made Entrepreneur And Executive
PDF
COST SHEET- Tender and Quotation unit 2.pdf
PDF
Deliverable file - Regulatory guideline analysis.pdf
PPTX
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
PDF
Ôn tập tiếng anh trong kinh doanh nâng cao
PPTX
The Marketing Journey - Tracey Phillips - Marketing Matters 7-2025.pptx
PDF
Roadmap Map-digital Banking feature MB,IB,AB
PDF
Chapter 5_Foreign Exchange Market in .pdf
PDF
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
PPTX
Belch_12e_PPT_Ch18_Accessible_university.pptx
PPTX
job Avenue by vinith.pptxvnbvnvnvbnvbnbmnbmbh
DOCX
unit 1 COST ACCOUNTING AND COST SHEET
PDF
Training And Development of Employee .pdf
PDF
IFRS Notes in your pocket for study all the time
New Microsoft PowerPoint Presentation - Copy.pptx
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
Business model innovation report 2022.pdf
Business Management - unit 1 and 2
WRN_Investor_Presentation_August 2025.pdf
Dr. Enrique Segura Ense Group - A Self-Made Entrepreneur And Executive
COST SHEET- Tender and Quotation unit 2.pdf
Deliverable file - Regulatory guideline analysis.pdf
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
Ôn tập tiếng anh trong kinh doanh nâng cao
The Marketing Journey - Tracey Phillips - Marketing Matters 7-2025.pptx
Roadmap Map-digital Banking feature MB,IB,AB
Chapter 5_Foreign Exchange Market in .pdf
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
Belch_12e_PPT_Ch18_Accessible_university.pptx
job Avenue by vinith.pptxvnbvnvnvbnvbnbmnbmbh
unit 1 COST ACCOUNTING AND COST SHEET
Training And Development of Employee .pdf
IFRS Notes in your pocket for study all the time

Managing Open Source Software in the GitHub Era

  • 1. December 3rd, 2014 – Silicon Valley, CA
  • 2. Managing Open Source in the GitHub Era Agenda • IntroducEon • Understanding basis issues for OSS compliance • Understanding new issues for OSS compliance in the GitHub Era • Best pracEces to reduce risk • Latest trends for process and tools to manage open source compliance • QuesEons
  • 3. Managing Open Source in the GitHub Era Most Common OSS License ObligaEons • Copyright and license noEces • ANribuEon obligaEons • “CopyleP” obligaEons – Licensing of derivaEve works – Change noEces – Offer to provide source code • Carve out for OSS in your license agreements
  • 4. Managing Open Source in the GitHub Era Key OSS Compliance Challenges • Tracking acquisiEon and use of open source • GeWng OSS informaEon from suppliers • Delivering OSS informaEon to customers – ANribuEon noEce creaEon and delivery – CopyleP -­‐ source code packaging and delivery
  • 5. Managing Open Source in the GitHub Era The “GitHub Era” • Decentralized and distributed model of Git represents many of the new OSS trends • More individuals engaged directly • Smaller projects/components with many more dependencies • Forking is encouraged à exponenEal growth in number of copies of popular components • Explosion in the number of disEnct OSS components used in products and systems -­‐ from dozens to hundreds to thousands or more
  • 6. Managing Open Source in the GitHub Era Growth of component repositories • In January 2011 there were less than 80K components available in the main component repositories (Maven, CPAN, Pypi, RubyGems) • In December 2014 there are more than 500K components and counEng (including NPM, Bower, Godoc, Packagist, NuGet) • In 2014, new components have been added to these repositories at the rate of over 10,000 new component-­‐versions per month. Source: hNp://www.modulecounts.com/
  • 7. Managing Open Source in the GitHub Era GitHub – more background • Provides Git-­‐based services. • Git is a version control and content management tool from Linus Torvalds (GPL v2) • GitHub key aNributes are easy code sharing and collaboraEon • JavaScript is dominant – Other languages: Ruby, Java, PHP, Python, C/C++ • Started in 2008 – Over 17 million repos and 7.8 million users claimed • Over the last 12 months, new public open source components repositories: – Over 350K created per month (excluding forks) – Over 10,000+ created daily Source: nexB research data, Github API, 2013-­‐11/2014-­‐11
  • 8. EvoluEon of OSS Development OLD OSS • Centralized development • CVS, Subversion • Project leader is benevolent dictator • Fewer larger components • Push releases • C/C++, Java • SourceForge, Maven • L/GPL v2, BSD, MIT • Desktops and servers NEW OSS • Decentralized development • Git / GitHub • Each developer forks code at any Eme • More smaller components • Pull releases • JavaScript, Ruby, Scala, Go • RubyGems, NPM • MIT, Apache, L/GPL v3 • Mobile and Cloud
  • 9. EvoluEon of OSS Compliance Challenges OLD OSS • Components without a license • OSS code downloaded to internal codebase and compiled locally (vendored) • DistribuEon means shipment or download • Snippets NEW OSS • Many more components without a license • Deep external dependencies provisioned live from the web at deployment or runEme • DistribuEon via network / Internet deployment • Many more snippets
  • 10. Managing Open Source in the GitHub Era Challenges -­‐ Missing licenses • No license from copyright holder means that you do not have a right to copy or re-­‐use the soPware • License at project / README level helps, but… • Ambiguous without noEces in source files • License informaEon is lost when code is parEally copied • Not a new problem, but scale is increasing rapidly
  • 11. Managing Open Source in the GitHub Era SoluEons – Missing licenses
  • 12. Managing Open Source in the GitHub Era Challenges – Snippets • Many snippet-­‐sharing / educaEonal web sites have vague or no license terms – Someone who posts a code snippet or code example does not usually think about an explicit license – Terms of service are the typical default • StackOverflow example – Major source of advice about coding including code snippets – StackOverflow license is CC-­‐BY-­‐SA which is effecEvely copyleP
  • 13. Managing Open Source in the GitHub Era Challenges – JavaScript example • Accelerated usage on server and clients – esp. mobile • Very common to mash up snippets of JavaScript from mulEple origins and compile/minify them in a single file for execuEon efficiency – License informaEon oPen lost when extracEng snippets – Most restricEve license applies to the JavaScript file • jQuery core components are MIT-­‐licensed, but components named jquery-­‐xxxxx may be copyleP-­‐ licensed – ExecuEng JS on client could be considered distribuEon – And could have copyleP impact on server-­‐side code
  • 14. Managing Open Source in the GitHub Era Healthcare.gov case study • Healthcare.gov uses JavaScript code from Datatables (jQuery plug-­‐in under BSD 3-­‐clause or GPL v2) • Weekly Standard accused HHS of removing copyright & license noEces from the borrowed code • Our analysis determined that the developers did not remove noEces – they created their own Datatables.js file from snippets from other Datatables project files that did not contain license noEces • HHS quickly corrected this case, but the error indicates poor guidance to developers See hNp://www.dejacode.org/healthcare_case_study.html
  • 15. Managing Open Source in the GitHub Era Challenges – Managing dependencies • Java, JavaScript, Ruby, Go and many newer languages automate provisioning of required components, aka dependencies • AutomaEon is convenient for developers, but adds risk – Dependent components may not be provisioned unEl deployment or runEme – Dependencies may be deep and recursive – AutomaEcally provisioned components may contain “hidden” security, quality or licensing issues – Accurate ANribuEon for OSS components may be very complex
  • 16. Managing Open Source in the GitHub Era SoluEons – Dependency Management • A basic soluEon is “vendoring” – explicitly control provisioning of third-­‐party components • SoP vendoring – define exact list of third-­‐party component-­‐ versions from known/veNed repositories • Hard vendoring – physically copy the third-­‐party component-­‐ versions to a /vendor folder in your codebase • Different repositories / plarorms provide different tools • Maven and others for Java • .gitmodules file for Git • Godep for Go, NPM for Node.js, Bundler for Ruby, etc.
  • 17. Managing Open Source in the GitHub Era Compliance in the GitHub Era • Open source code is evolving and expanding too quickly for tradiEonal source code scanning and matching techniques – Number of possible matches increase with each fork – Many or most of the open source components may not actually be in your codebase (dependencies) • Risk focus on components over snippets even more important • AcceleraEng proliferaEon of languages, plarorms and repositories requires acEve management and coordinaEon from business, engineering and legal teams 17
  • 18. Managing Open Source in the GitHub Era Compliance in the GitHub Era • Adapt policies to specific languages and plarorms upfront: – Define acceptable licenses in context of the technology and usage • Distributed as soPware package or Cloud-­‐based service? • What does copyleP mean in context? – Create Light-­‐weight process for idenEfying and resolving provenance gaps / issues – Evaluate preferred sources for provisioning components – Determine best dependency management approach for each technology 18
  • 19. Managing Open Source in the GitHub Era Compliance in the GitHub Era • Embed open source provenance data in your codebase – As close to the code as possible – Adapt techniques to leverage exisEng tools and data from each plarorm / repository – Use simple approach to document provenance data if missing from original project – Instrument your build processes to idenEfy components that you actually use in each deployed product • Most accurate way to track and fulfill OSS obligaEons • Fully automate aNribuEon documentaEon • RedistribuEon, if applicable, has extra steps See also hNps://github.com/dejacode/about-­‐code-­‐tool 19
  • 20. Managing Open Source in the GitHub Era Compliance in the GitHub Era • Establish central database of open source and third-­‐ party components – Collect provenance data for all products across languages and plarorms – Document all effecEve component dependencies – Harmonize open source compliance by product across languages and plarorms • Current soluEons from several vendors, but no OSS soluEon is available today See also hNps://enterprise.dejacode.com/landing/ 20
  • 21. Managing Open Source in the GitHub Era QuesEons
  • 22. Managing Open Source in the GitHub Era About nexB Inc. • nexB offers: – DejaCode – a central business system for managing soPware components – SoPware analysis/audit services for products and for acquisiEons • 350+ soPware audit projects completed to-­‐date – Aggregated audited codebases > 3 billion lines of source code – Aggregated value of the acquisiEons transacEons > $5B • See DejaCode at www.dejacode.com
  • 23. Managing Open Source in the GitHub Era Contacts • O’Melveny & Myers Heather Meeker hmeeker@omm.com +1 650 473 2635 Subscribe to news and events alert at hNp://heathermeeker.squarespace.com/ • nexB Inc. Michael Herzog mjherzog@nexB.com +1 650 380 0680
  • 24. Managing Open Source in the GitHub Era Resources – OSS Licensing Trends • Neil McAllister -­‐ Study: Most projects on GitHub not open source licensed hNp://www.theregister.co.uk/2013/04/18/github_licensing_study/ • MaN Asay -­‐ Open Source Is Old School, Says The GitHub GeneraEon hNp://readwrite.com/2013/05/15/open-­‐source-­‐is-­‐old-­‐school-­‐says-­‐the-­‐github-­‐ generaEon • Richard Fontana -­‐ Post open source soPware, licensing and GitHub hNp://opensource.com/law/13/8/github-­‐poss-­‐licensing • Simon Phipps -­‐ GitHub finally takes open source licenses seriously hNp://www.infoworld.com/arEcle/2611422/open-­‐source-­‐soPware/github-­‐finally-­‐ takes-­‐open-­‐source-­‐licenses-­‐seriously.html • Armin Ronacher -­‐ Licensing in a Post Copyright World hNp://lucumr.pocoo.org/2013/7/23/licensing/ 24
  • 25. Managing Open Source in the GitHub Era Resources – OSS Language / Repo Trends • GitHub growth and language trends hNp://redmonk.com/dberkholz/2013/01/21/github-­‐will-­‐hit-­‐5-­‐million-­‐users-­‐within-­‐a-­‐ year/ hNp://redmonk.com/dberkholz/2014/05/02/github-­‐language-­‐trends-­‐and-­‐the-­‐ fragmenEng-­‐landscape/ hNp://beust.com/weblog/2014/05/03/language-­‐popularity-­‐on-­‐github/ hNp://redmonk.com/dberkholz/2014/09/26/githubs-­‐vanishing-­‐acceleraEon/ • Repository package growth staEsEcs hNp://www.modulecounts.com/ • GitHub Users Worldwide hNp://aasen.in/github_globe/ 25