SlideShare a Scribd company logo
Data Engineer
Cisco Umbrella
yeungp@cisco.com
Unified Data Platform
Pauline Yeung
ClickHouse Meetup
Dec 3, 2019
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Agenda
1
2
3
4
5
Problems
Use Case: Authlog
Use Case: Whois Records
Use Case: Network Tunnels
Next
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
• Data Engineer at Cisco Umbrella,
Investigate team
• M. S. Computer Engineering, Santa
Clara U
• B. S. Electrical Engineering, U of
Calgary
$ whois Pauline
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Problems
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Investigate: the most powerful way to uncover threats
Console
API
SIEM, TIP
Key points
Intelligence about domains, IPs,
and malware across the internet
Live graph of DNS requests and
other contextual data
Correlated against statistical models
Discover and predict malicious
domains and IPs
Enrich security data with global intelligence
domains, IPs, ASNs, file hashes
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Investigate Backend
Whois ASN
IntelDB
Umbrella
Investigate
passive
DNS
We want
• Easy, fast, and flexible platform for ad hoc
analysis of authlog, which are stored in
passive DNS.
• Increase throughput and reduce costs for
Whois database.
• Fast access to ASN and enrich security
data.
• One datastore for multiple use cases.
Share datastore with other product teams.
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
DNS Authoritative Log
(authlog)
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Passive DNS
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Domain to IP Relationships
11 JAN 2019
domain2.com
10 JAN 2019
domain1.com
12 JAN 2019
domain3.com
12.4.0.4/32
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
AuthLog Examples
owner name datacenter name_server
1 tabiturient.ru. tabiturient.ru. lon ns4.nic.ru.
2 thefacebook.com. certs.thefacebook.com. sea b.ns.facebook.com.
3 333az.net. nbb4yd.333az.net. yyz ns2-09.azure-dns.net.
4 dotnetwork2.co.za. d1000253-146.dotnetwork2.co.za. jnb ns3.dotnetworkdns.co.za.
name_server_ip rr ttl type timestamp
1 194.226.96.8 195.24.68.22 3600 A 2019-12-02 10:56:47
2 2a03:2880:ffff:c:face:b00c:0:35 2620:10d:c0a1:10:0:0:0:35 600 AAAA 2019-12-02 12:40:46
3 2620:1ec:8ec::9 ns1-07.azure-dns.com. 20 NS 2019-12-02 10:34:15
4 41.223.172.166 mail.d1000253-146.dotnetwork2.co.za. 3600 MX 2019-11-30 03:05:17
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
AuthLog Data Pipeline
authlog
producer
authlog
clickhouse
ingester
S3
archiver
resolvers
32 data centers
3 days
authlog
HBase
ingester
Investigate
UI
API Server
6 nodes, 1 replica
r4.4xlarge
16 vCPU, 122 GB, 2 TB disk
32 nodes
i3 2xlarge
8 vCPU, 61 GB
authlog
parquet
passive
DNS
120b requests/day
4b authlog/day
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
One Set of Questions
• What’s the increase in disk usage for passive DNS per day?
• What type of traffic contribute the most to the increase of disk usage?
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
AuthLog for Past 3 Days
CREATE TABLE IF NOT EXISTS authlog.alog_local (
owner String,
name String,
datacenter String,
name_server String,
name_server_ip String,
rr String,
ttl Int32,
type String,
timestamp DateTime)
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (name, timestamp)
TTL timestamp + toIntervalDay(3)
SETTINGS index_granularity = 8192
48 golang workers write to 6 shards
ingest 1.2m rows per second
ClickHouse kafka engine does not support avro
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
AuthLog for Past 3 Days
CREATE TABLE IF NOT EXISTS authlog.alog(
owner String,
name String,
datacenter String,
name_server String,
name_server_ip String,
rr String,
ttl Int32,
type String,
timestamp DateTime)
ENGINE = Distributed(log_cluster, authlog, alog_local, cityHash64(name))
access all shards
4b rows per day
200 GB for 3 days
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Payload for Domains in 2 Consecutive Days
CREATE TABLE IF NOT EXISTS authlog.a
ENGINE = MergeTree()
ORDER BY name
AS SELECT name, type, sum(length(name) + length(rr)) AS payload
FROM authlog.alog
WHERE timestamp >= toDateTime('2019-11-28 16:00:00') and timestamp <= toDateTime('2019-
11-28 19:59:59')
GROUP BY name, type
CREATE TABLE IF NOT EXISTS authlog.b
ENGINE = MergeTree()
ORDER BY name
AS SELECT name, type, sum(length(name) + length(rr)) AS payload
FROM authlog.alog
WHERE timestamp >= toDateTime('2019-11-29 16:00:00') and timestamp <= toDateTime('2019-
11-29 19:59:59')
GROUP BY name, type
took 2 minutes, 148m rows, 3.0 GB
took 2 minutes, 166m rows, 3.4 GB
4 hours, ¼ daily authlog
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Payload for Domains Only in Day 2
CREATE TABLE IF NOT EXISTS authlog.b_only
ENGINE = MergeTree()
ORDER BY (name)
AS SELECT
b.name as name,
b.type as type,
sum(b.payload) as payload
FROM a
RIGHT JOIN b ON a.name = b.name
WHERE a.name like ''
GROUP BY
b.name,
b.type
ba
took 5 minutes, 108m rows, 2.5 GB
users.xml
max_memory_usage = 96GB
max_bytes_before_external_group_by = 48GB
max_bytes_before_external_sort = 48GB
4 hours
Nov 28
4 hours
Nov 29
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Second Level Domains with Highest Payload
SELECT
arrayStringConcat([splitByString('.', name)[-3], '.', splitByString('.', name)[-2], '.’]) AS pname,
sum(payload) / 1024 / 1024 AS payload_MB
FROM b_only
GROUP BY pname
ORDER BY payload_MB DESC
LIMIT 100
pname payload_MB
cloudfront.net. 878.2994289398193
office.com. 719.9946641921997
clienttons.com. 608.300389289856
cnr.io. 473.1693649291992
akamaihd.net. 395.0745334625244
cedexis-radar.net. 364.29007720947266
footprintdns.com. 265.04366874694824
gstatic.com. 250.41933727264404
squarespace.com. 151.24806880950928
forter.com. 151.08679962158203
wacodenver-com.mail.protection.outlook.com.
to
outlook.com.
took 6 seconds
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Resource Type with Highest Payload
SELECT
type,
sum(payload) / 1024 / 1024 / 1024 AS payload_GB
FROM b_only
GROUP BY type
ORDER BY payload_GB desc
type payload_GB
A 4.34150860644877
CNAME 3.289442714303732
RRSIG 1.5495505537837744
DNSKEY 0.6393735473975539
TXT 0.5898861000314355
SELECT sum(payload) / 1024 / 1024 / 1024 as payload_GB FROM b_only
payload_GB
11.225993978790939
took 483 msec
took 131 msec
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
DataBricks Spark
• 1 day authlog, ~4b rows
• Process 4x authlog
• Took 9 minutes
pname payload_GB
office.com. 3.812719924375415
cloudfront.net. 3.3973056096583605
cnr.io. 2.608651074580848
clienttons.com. 2.2882667966187
cedexis-radar.net. 1.9318219376727939
type payload_GB
A 16.849326515570283
CNAME 14.351725150831044
RRSIG 3.1499833753332496
TXT 3.047112719155848
NS 1.328178352676332
payload_GB
41.024286944419146
took 5 seconds
took 1.3
seconds
took 490 msec
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Whois Record Data
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
WHOIS Record Data
§ Who registered the domain
§ Contact information used
§ When/where registered
§ Expiration date
§ Historical data
§ Correlations with other
malicious domains
See relationships between
attackers’ infrastructure
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
$ whois facebook.com
:
Domain Name: FACEBOOK.COM
Registry Domain ID: 2320948_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.registrarsafe.com
Registrar URL: https://www.registrarsafe.com
Updated Date: 2019-10-17T18:52:06Z
Creation Date: 1997-03-29T05:00:00Z
Registrar Registration Expiration Date: 2028-03-30T04:00:00Z
Registrar: RegistrarSafe, LLC
Registrar IANA ID: 3237
Registrar Abuse Contact Email: abusecomplaints@registrarsafe.com
Registrar Abuse Contact Phone: +1.6503087004
Domain Status: clientDeleteProhibited https://www.icann.org/epp#clientDeleteProhibited
Domain Status: clientTransferProhibited https://www.icann.org/epp#clientTransferProhibited
Domain Status: serverDeleteProhibited https://www.icann.org/epp#serverDeleteProhibited
Domain Status: serverTransferProhibited https://www.icann.org/epp#serverTransferProhibited
Domain Status: clientUpdateProhibited https://www.icann.org/epp#clientUpdateProhibited
Domain Status: serverUpdateProhibited https://www.icann.org/epp#serverUpdateProhibited
API request: domainName
API response: WhoisRecord_rawText
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Registry Registrant ID:
Registrant Name: Domain Admin
Registrant Organization: Facebook, Inc.
Registrant Street: 1601 Willow Rd
Registrant City: Menlo Park
Registrant State/Province: CA
Registrant Postal Code: 94025
Registrant Country: US
Registrant Phone: +1.6505434800
Registrant Phone Ext:
Registrant Fax: +1.6505434800
Registrant Fax Ext:
Registrant Email: domain@fb.com
Registry Admin ID:
Admin Name: Domain Admin
Admin Organization: Facebook, Inc.
Admin Street: 1601 Willow Rd
Admin City: Menlo Park
Admin State/Province: CA
Admin Postal Code: 94025
Admin Country: US
Admin Phone: +1.6505434800
Admin Phone Ext:
Admin Fax: +1.6505434800
Admin Fax Ext:
Admin Email: domain@fb.com
Tech Name: Domain Admin
Tech Organization: Facebook, Inc.
Tech Street: 1601 Willow Rd
Tech City: Menlo Park
Tech State/Province: CA
Tech Postal Code: 94025
Tech Country: US
Tech Phone: +1.6505434800
Tech Phone Ext:
Tech Fax: +1.6505434800
Tech Fax Ext:
Tech Email: domain@fb.com
Name Server: A.NS.FACEBOOK.COM
Name Server: B.NS.FACEBOOK.COM
DNSSEC: unsigned
:
API request: contactEmail
API response: list of domainName
API request: list of nameServer
API response: list of domainName
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Questions
• Continue to maintain 29 nodes cluster, running Ubuntu Trusty, ElasticSearch 1.6?
• Migrate to AWS ElasticSearch 7.1?
• Migrate to AWS Aurora Postgres?
• Migrate to ClickHouse?
don’t need full text search
does not support shard for scaling
update is slow
insert efficient for bulk insert only
no secondary index
no
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Whois Data Pipeline
whois
ingester
whois
indexer
ClickHouse
ElasticSearch
Investigate
UI
API Server
29 nodes, 2 replicas
2 indices
12 TB
6 nodes, 2 replicas
3 tables, 1 materialized view
2 TB
download
download
1 index
2 tables, 1 MV
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Domain Table
CREATE TABLE IF NOT EXISTS whois.domain_local(
domainName String,
contactEmail String,
RegistryData_rawText String,
WhoisRecord_rawText String)
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{shard}/whois.domain_local',
'{replica}’)
PRIMARY KEY (domainName)
ORDER BY (domainName)
SETTINGS index_granularity = 512
CREATE TABLE IF NOT EXISTS whois.domain(
domainName String,
contactEmail String,
RegistryData_rawText String,
WhoisRecord_rawText String)
ENGINE = Distributed(whois_cluster, whois, domain_local, cityHash64(domainName))
48 golang writers write to 6 shards
cityhash.Hash64([]byte(domainName)) % numHosts
for merging,
ClickHouse selects the last inserted row,
or if version column exists,
selects the row with the max value in the version column
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Email Table
CREATE MATERIALIZED VIEW IF NOT EXISTS whois.email_mv_local(
contactEmail String,
domainName String)
ENGINE = AggregatingMergeTree
ORDER BY (contactEmail, domainName)
POPULATE
AS SELECT contactEmail, domainName
FROM db.domain_local
WHERE contactEmail != ''
GROUP BY contactEmail, domainName
CREATE TABLE IF NOT EXISTS whois.email_mv(
contactEmail String,
domainName String)
ENGINE = Distributed(whois_cluster, db, email_mv_local, cityHash64(contactEmail))
domain table is 150 GB
email table is 3 GB
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
NS Table
CREATE TABLE IF NOT EXISTS whois.ns_local(
nameServer String,
domainName String)
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{shard}/whois.ns_local', '{replica}')
PRIMARY KEY (nameServer, domainName)
ORDER BY (nameServer, domainName)
SETTINGS index_granularity = 512
CREATE TABLE IF NOT EXISTS whois.ns(
nameServer String,
domainName String)
ENGINE = Distributed(whois_cluster, whois, ns_local, cityHash64(nameServer))
name server table is 4 GB
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Whois Queries
SELECT WhoisRecord_rawText FROM domain_local FINAL WHERE domainName = 'facebook.com’
SELECT WhoisRecord_rawText FROM domain FINAL WHERE domainName = 'facebook.com’
SELECT domainName FROM email_mv WHERE contactEmail = 'domain@fb.com’
domainName
buyfbfansnow.com
facebook-hardware.com
instagram-engineering.net
pokerface-book.com
what3app.com
SELECT * FROM ns WHERE nameServer LIKE ‘%.facebook.com’
nameServer domainName
ns1.facebook.com djgabeholm.com
ns2.facebook.com shellpriv.com
ns3.facebook.com arabfashioncompany.com
a.ns.facebook.com zuckerberg.com
b.ns.facebook.com zuckerberg.net
took 9 msec
postgres 7 msec
took 21 msec
data selected fully merged, slower
took 16 msec, 2549 rows
took 36 msec, 7075 rows
Will add TLD column, e.g. facebook.com
ORDER By TLD, nameserver, domainName
expect query < 10msec
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Network Tunnels
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Network Tunnels
• Network tunnels deliver the
branch office traffic to the
Cisco’s cloud edge where
Umbrella runs a number of
security functions.
• Firewall, web security, DNS
security.
Provisioned per
organization
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Network Tunnels
DNS
security
web security
S3
Network
Tunnels UI
API Server
Tunnel
Visibility
Sensors
states
events
ClickHouse
downloadnetwork tunnels
3 nodes, 1 replica
firewall
script
CSV
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Event Table
CREATE TABLE IF NOT EXISTS tunnel_viz.event_local(
OrgID UInt32,
TunnelID UInt32,
EventTime DateTime,
EventID String,
EventType String,
PeerID String,
PeerIP IPv4,
PeerPort UInt16,
Code LowCardinality(String),
Reason String)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(EventTime)
PRIMARY KEY (OrgID, TunnelID, EventTime)
ORDER BY (OrgID, TunnelID, EventTime)
TTL EventTime + toIntervalDay(120)
SETTINGS index_granularity = 8196
dictionary encoding, 10 unique codes
event table, 3.3m rows, 260 MB
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Event Table
CREATE TABLE IF NOT EXISTS tunnel_viz.event(
OrgID UInt32,
TunnelID UInt32,
EventTime DateTime,
EventID String,
EventType String,
PeerID String,
PeerIP IPv4,
PeerPort UInt16,
Code LowCardinality(String),
Reason String)
ENGINE = Distributed(cdfw_cluster, tunnel_viz, event_local, murmurHash3_32(OrgID))"
fairly even distribution for integer
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Event Queries
SELECT Code, count() as c FROM event GROUP BY Code ORDER BY c DESC
┌─Code─────────────────┬──────c─┐
│ PROPOSAL_MISMATCH_CHILD │ 2164575 │
│ PEER_AUTH_FAILED │ 871937 │
│ RETRANSMIT_SEND │ 243559 │
│ RETRANSMIT_SEND_TIMEOUT │ 42829 │
│ UNIQUE_REPLACE │ 26540 │
│ PARSE_ERROR_BODY │ 2717 │
│ CERT_REVOKED │ 1004 │
│ LOCAL_AUTH_FAILED │ 186 │
│ TS_MISMATCH │ 9 │
│ VIP_FAILURE │ 1 │
└─────────────────────┴────────┘
took 23 msec, 10 unique codes
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Next
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
What we learned
• Cheap
• Fast
• Flexible
• Good compression
• Cluster isolation for multiple
stores
• Ad hoc analysis for authlog
using 200 GB storage
• Lower cost and acceptable
performance for whois
database.
• Share hardware for different
type of datastores.
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
ClickHouse Wish List
• Support avro in kafka engine.
• Balance cluster and copy data after failed or added node.
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Questions?
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Backup – Other Use Cases
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Threat Library
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Threat Library Data Pipeline
Web UI
API Server
S3
DNS query log
threat/attack
DNS query log
blocked domains
run job
blocked domains
Kubernetes
Cluster
threat/attack feed 1..n blocked domains
threat/attack
ClickHouse
Airflow
1 pod, 256 GB disk
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Blocked Domains
CREATE TABLE IF NOT EXISTS attribution.blocked(
datetime DateTime,
domain String,
threat LowCardinality(String),
attack LowCardinality(String),
count UInt32)
ENGINE = ReplacingMergeTree()
PARTITION BY toYYYYMMDD(datetime)
PRIMARY KEY (datetime, domain, threat, attack)
ORDER BY (datetime, domain, threat, attack)
TTL datetime + toIntervalDay(30)
SETTINGS index_granularity = 8196
dictionary encoding, 37 threats, 118 attacks
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
ASN
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Autonomous Systems
• IP to ASN
ASN Attribution
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Domain à IP à ASN relationships
AS 701AS 3462 AS 12271
1.168.6.17
domain1.com
100.2.65.157 104.162.93.136
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
ASN Data Pipeline
data
importer
ClickHouse
Aurora
Postgres
Web UI
API Server
1 write 1 read
download
download
CSV
multiple
tables
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Postgres
WITH b as (
SELECT asn, cidr FROM delta_bgp_routes
WHERE (cidr >>= CAST('104.244.42.193' AS ip4r))
AND (period && DATERANGE(CURRENT_DATE - integer '2', CURRENT_DATE,'[]')))
SELECT a.asn, b.cidr, a.description, a.creation_date AS creationDate, a.ir
FROM delta_autonomous_systems AS a, b
WHERE (a.asn = b.asn)
AND (period && DATERANGE(CURRENT_DATE - integer '2', CURRENT_DATE,'[]’));
asn | cidr | description | creationdate | ir
-------+-----------------+----------------------------------+--------------+----
13414 | 104.244.42.0/24 | TWITTER - Twitter Inc., US 86400 | 2010-07-09 | 3
Took 52 ms
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
/etc/clickhouse-server/asn_dictionary.xml<yandex>
<dictionary>
<name>asn_dict</name>
<layout>
<ip_trie />
</layout>
<structure>
<key>
<attribute>
<name>prefix</name>
<type>String</type>
</attribute>
</key>
<attribute>
<name>asn</name>
<type>UInt32</type>
<null_value />
</attribute>
<attribute>
<name>country</name>
<type>String</type>
<null_value>??</null_value>
</attribute>
<attribute>
<name>created_at</name>
<type>DateTime</type>
<null_value />
</attribute>
<attribute>
<name>registry</name>
<type>UInt32</type>
<null_value />
</attribute>
<attribute>
<name>description</name>
<type>String</type>
<null_value />
</attribute>
<attribute>
<name>datastr</name>
<type>String</type>
<null_value />
</attribute>
</structure>
<source>
<file>
<path>/opt/dictionaries/asnprefixes.csv</path>
<format>CSVWithNames</format>
</file>
</source>
<lifetime>300</lifetime>
</dictionary>
</yandex>
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
ASN Dictionary
• 1.5 minutes in Spark job to download and generate CSV
SELECT dictGetString('asn_dict', 'datastr', tuple(IPv4StringToNum('143.202.186.23’)))
143.202.186.0/24 264076 BR 1445817600 4 BREM TECHNOLOGY LTDA - ME, BR
SELECT dictGetString('asn_dict', 'datastr', tuple(IPv6StringToNum('2800:5f0:800::1’)))
40.0.0.0/19 4249 US 0789782400 3 LILLY-AS - Eli Lilly and Company, US
took 2 ms
took 2 ms
Unified Data Platform, by Pauline Yeung of Cisco Systems

More Related Content

PDF
cLoki: Like Loki but for ClickHouse
PPT
Introduction to SSH
PDF
ClickHouse Intro
PDF
MongoDB WiredTiger Internals
PDF
Adventures in Observability - Clickhouse and Instana
PDF
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
PDF
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ODP
MySQL HA with PaceMaker
cLoki: Like Loki but for ClickHouse
Introduction to SSH
ClickHouse Intro
MongoDB WiredTiger Internals
Adventures in Observability - Clickhouse and Instana
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
MySQL HA with PaceMaker

What's hot (20)

PDF
Altinity Quickstart for ClickHouse
PDF
Logical Replication in PostgreSQL - FLOSSUK 2016
PPTX
Apache Spark Architecture
PDF
kubernetes for beginners
PPT
Dremel: Interactive Analysis of Web-Scale Datasets
PDF
Useful Linux and Unix commands handbook
PDF
RocksDB Performance and Reliability Practices
PPT
Hadoop Map Reduce
PDF
Spectrum Scale Best Practices by Olaf Weiser
PDF
Innodb에서의 Purge 메커니즘 deep internal (by 이근오)
PPTX
memcached Distributed Cache
PDF
Log Structured Merge Tree
PDF
Kvm performance optimization for ubuntu
PDF
Object storage의 이해와 활용
PDF
Présentation de Apache Zookeeper
PDF
Distributed Deep Learning with Hadoop and TensorFlow
PDF
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
PDF
ClickHouse Keeper
PDF
Thinking Big - Big data: principes et architecture
Altinity Quickstart for ClickHouse
Logical Replication in PostgreSQL - FLOSSUK 2016
Apache Spark Architecture
kubernetes for beginners
Dremel: Interactive Analysis of Web-Scale Datasets
Useful Linux and Unix commands handbook
RocksDB Performance and Reliability Practices
Hadoop Map Reduce
Spectrum Scale Best Practices by Olaf Weiser
Innodb에서의 Purge 메커니즘 deep internal (by 이근오)
memcached Distributed Cache
Log Structured Merge Tree
Kvm performance optimization for ubuntu
Object storage의 이해와 활용
Présentation de Apache Zookeeper
Distributed Deep Learning with Hadoop and TensorFlow
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
ClickHouse Keeper
Thinking Big - Big data: principes et architecture
Ad

Similar to Unified Data Platform, by Pauline Yeung of Cisco Systems (20)

PDF
Inspec one tool to rule them all
PPTX
Building an Automated Behavioral Malware Analysis Environment using Free and ...
PPTX
Cloud-based Virtualization for Test Automation
PDF
Designing ISE for Scale & High Availability.pdf
PDF
27.2.12 lab interpret http and dns data to isolate threat actor
PDF
How Cisco Provides World-Class Technology Conference Experiences Using Automa...
PDF
Passive DNS Collection – Henry Stern, Cisco
PPTX
Introdução ao Data Warehouse Amazon Redshift
PDF
Good-cyber-hygiene-at-scale-and-speed
PDF
PyGotham 2014 Introduction to Profiling
PPTX
Hacker Halted 2014 - Why Botnet Takedowns Never Work, Unless It’s a SmackDown!
PPTX
The Boring Security Talk - Azure Global Bootcamp Melbourne 2019
PDF
Atelier Technique CISCO ACSS 2018
PDF
DNSSEC Tutorial, by Champika Wijayatunga [APNIC 38]
PPT
Predictable Big Data Performance in Real-time
PDF
20150909_network_security_lecture
PPTX
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
PDF
#NSD15 - Attaques DDoS Internet et comment les arrêter
PDF
Deploying Secure Converged Wired, Wireless Campus
PPTX
Patterns and Packages in PostgreSQL for Privacy Preservation
Inspec one tool to rule them all
Building an Automated Behavioral Malware Analysis Environment using Free and ...
Cloud-based Virtualization for Test Automation
Designing ISE for Scale & High Availability.pdf
27.2.12 lab interpret http and dns data to isolate threat actor
How Cisco Provides World-Class Technology Conference Experiences Using Automa...
Passive DNS Collection – Henry Stern, Cisco
Introdução ao Data Warehouse Amazon Redshift
Good-cyber-hygiene-at-scale-and-speed
PyGotham 2014 Introduction to Profiling
Hacker Halted 2014 - Why Botnet Takedowns Never Work, Unless It’s a SmackDown!
The Boring Security Talk - Azure Global Bootcamp Melbourne 2019
Atelier Technique CISCO ACSS 2018
DNSSEC Tutorial, by Champika Wijayatunga [APNIC 38]
Predictable Big Data Performance in Real-time
20150909_network_security_lecture
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
#NSD15 - Attaques DDoS Internet et comment les arrêter
Deploying Secure Converged Wired, Wireless Campus
Patterns and Packages in PostgreSQL for Privacy Preservation
Ad

More from Altinity Ltd (20)

PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
PDF
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source
PDF
Fun with ClickHouse Window Functions-2021-08-19.pdf
PDF
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
PDF
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
PDF
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
PDF
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
PDF
ClickHouse ReplacingMergeTree in Telecom Apps
PDF
Adventures with the ClickHouse ReplacingMergeTree Engine
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
PDF
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
PDF
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
PDF
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
PDF
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
PDF
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
PDF
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
PDF
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
PDF
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Fun with ClickHouse Window Functions-2021-08-19.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
ClickHouse ReplacingMergeTree in Telecom Apps
Adventures with the ClickHouse ReplacingMergeTree Engine
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
cuic standard and advanced reporting.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
KodekX | Application Modernization Development
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Big Data Technologies - Introduction.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Review of recent advances in non-invasive hemoglobin estimation
Spectral efficient network and resource selection model in 5G networks
Understanding_Digital_Forensics_Presentation.pptx
MYSQL Presentation for SQL database connectivity
cuic standard and advanced reporting.pdf
Unlocking AI with Model Context Protocol (MCP)
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Network Security Unit 5.pdf for BCA BBA.
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Chapter 3 Spatial Domain Image Processing.pdf
Encapsulation_ Review paper, used for researhc scholars
The AUB Centre for AI in Media Proposal.docx
KodekX | Application Modernization Development
Advanced methodologies resolving dimensionality complications for autism neur...
Big Data Technologies - Introduction.pptx

Unified Data Platform, by Pauline Yeung of Cisco Systems

  • 1. Data Engineer Cisco Umbrella yeungp@cisco.com Unified Data Platform Pauline Yeung ClickHouse Meetup Dec 3, 2019
  • 2. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Agenda 1 2 3 4 5 Problems Use Case: Authlog Use Case: Whois Records Use Case: Network Tunnels Next
  • 3. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential • Data Engineer at Cisco Umbrella, Investigate team • M. S. Computer Engineering, Santa Clara U • B. S. Electrical Engineering, U of Calgary $ whois Pauline
  • 4. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Problems
  • 5. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Investigate: the most powerful way to uncover threats Console API SIEM, TIP Key points Intelligence about domains, IPs, and malware across the internet Live graph of DNS requests and other contextual data Correlated against statistical models Discover and predict malicious domains and IPs Enrich security data with global intelligence domains, IPs, ASNs, file hashes
  • 6. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Investigate Backend Whois ASN IntelDB Umbrella Investigate passive DNS We want • Easy, fast, and flexible platform for ad hoc analysis of authlog, which are stored in passive DNS. • Increase throughput and reduce costs for Whois database. • Fast access to ASN and enrich security data. • One datastore for multiple use cases. Share datastore with other product teams.
  • 7. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential DNS Authoritative Log (authlog)
  • 8. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Passive DNS
  • 9. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Domain to IP Relationships 11 JAN 2019 domain2.com 10 JAN 2019 domain1.com 12 JAN 2019 domain3.com 12.4.0.4/32
  • 10. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential AuthLog Examples owner name datacenter name_server 1 tabiturient.ru. tabiturient.ru. lon ns4.nic.ru. 2 thefacebook.com. certs.thefacebook.com. sea b.ns.facebook.com. 3 333az.net. nbb4yd.333az.net. yyz ns2-09.azure-dns.net. 4 dotnetwork2.co.za. d1000253-146.dotnetwork2.co.za. jnb ns3.dotnetworkdns.co.za. name_server_ip rr ttl type timestamp 1 194.226.96.8 195.24.68.22 3600 A 2019-12-02 10:56:47 2 2a03:2880:ffff:c:face:b00c:0:35 2620:10d:c0a1:10:0:0:0:35 600 AAAA 2019-12-02 12:40:46 3 2620:1ec:8ec::9 ns1-07.azure-dns.com. 20 NS 2019-12-02 10:34:15 4 41.223.172.166 mail.d1000253-146.dotnetwork2.co.za. 3600 MX 2019-11-30 03:05:17
  • 11. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential AuthLog Data Pipeline authlog producer authlog clickhouse ingester S3 archiver resolvers 32 data centers 3 days authlog HBase ingester Investigate UI API Server 6 nodes, 1 replica r4.4xlarge 16 vCPU, 122 GB, 2 TB disk 32 nodes i3 2xlarge 8 vCPU, 61 GB authlog parquet passive DNS 120b requests/day 4b authlog/day
  • 12. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential One Set of Questions • What’s the increase in disk usage for passive DNS per day? • What type of traffic contribute the most to the increase of disk usage?
  • 13. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential AuthLog for Past 3 Days CREATE TABLE IF NOT EXISTS authlog.alog_local ( owner String, name String, datacenter String, name_server String, name_server_ip String, rr String, ttl Int32, type String, timestamp DateTime) ENGINE = MergeTree() PARTITION BY toYYYYMMDD(timestamp) ORDER BY (name, timestamp) TTL timestamp + toIntervalDay(3) SETTINGS index_granularity = 8192 48 golang workers write to 6 shards ingest 1.2m rows per second ClickHouse kafka engine does not support avro
  • 14. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential AuthLog for Past 3 Days CREATE TABLE IF NOT EXISTS authlog.alog( owner String, name String, datacenter String, name_server String, name_server_ip String, rr String, ttl Int32, type String, timestamp DateTime) ENGINE = Distributed(log_cluster, authlog, alog_local, cityHash64(name)) access all shards 4b rows per day 200 GB for 3 days
  • 15. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Payload for Domains in 2 Consecutive Days CREATE TABLE IF NOT EXISTS authlog.a ENGINE = MergeTree() ORDER BY name AS SELECT name, type, sum(length(name) + length(rr)) AS payload FROM authlog.alog WHERE timestamp >= toDateTime('2019-11-28 16:00:00') and timestamp <= toDateTime('2019- 11-28 19:59:59') GROUP BY name, type CREATE TABLE IF NOT EXISTS authlog.b ENGINE = MergeTree() ORDER BY name AS SELECT name, type, sum(length(name) + length(rr)) AS payload FROM authlog.alog WHERE timestamp >= toDateTime('2019-11-29 16:00:00') and timestamp <= toDateTime('2019- 11-29 19:59:59') GROUP BY name, type took 2 minutes, 148m rows, 3.0 GB took 2 minutes, 166m rows, 3.4 GB 4 hours, ¼ daily authlog
  • 16. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Payload for Domains Only in Day 2 CREATE TABLE IF NOT EXISTS authlog.b_only ENGINE = MergeTree() ORDER BY (name) AS SELECT b.name as name, b.type as type, sum(b.payload) as payload FROM a RIGHT JOIN b ON a.name = b.name WHERE a.name like '' GROUP BY b.name, b.type ba took 5 minutes, 108m rows, 2.5 GB users.xml max_memory_usage = 96GB max_bytes_before_external_group_by = 48GB max_bytes_before_external_sort = 48GB 4 hours Nov 28 4 hours Nov 29
  • 17. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Second Level Domains with Highest Payload SELECT arrayStringConcat([splitByString('.', name)[-3], '.', splitByString('.', name)[-2], '.’]) AS pname, sum(payload) / 1024 / 1024 AS payload_MB FROM b_only GROUP BY pname ORDER BY payload_MB DESC LIMIT 100 pname payload_MB cloudfront.net. 878.2994289398193 office.com. 719.9946641921997 clienttons.com. 608.300389289856 cnr.io. 473.1693649291992 akamaihd.net. 395.0745334625244 cedexis-radar.net. 364.29007720947266 footprintdns.com. 265.04366874694824 gstatic.com. 250.41933727264404 squarespace.com. 151.24806880950928 forter.com. 151.08679962158203 wacodenver-com.mail.protection.outlook.com. to outlook.com. took 6 seconds
  • 18. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Resource Type with Highest Payload SELECT type, sum(payload) / 1024 / 1024 / 1024 AS payload_GB FROM b_only GROUP BY type ORDER BY payload_GB desc type payload_GB A 4.34150860644877 CNAME 3.289442714303732 RRSIG 1.5495505537837744 DNSKEY 0.6393735473975539 TXT 0.5898861000314355 SELECT sum(payload) / 1024 / 1024 / 1024 as payload_GB FROM b_only payload_GB 11.225993978790939 took 483 msec took 131 msec
  • 19. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential DataBricks Spark • 1 day authlog, ~4b rows • Process 4x authlog • Took 9 minutes pname payload_GB office.com. 3.812719924375415 cloudfront.net. 3.3973056096583605 cnr.io. 2.608651074580848 clienttons.com. 2.2882667966187 cedexis-radar.net. 1.9318219376727939 type payload_GB A 16.849326515570283 CNAME 14.351725150831044 RRSIG 3.1499833753332496 TXT 3.047112719155848 NS 1.328178352676332 payload_GB 41.024286944419146 took 5 seconds took 1.3 seconds took 490 msec
  • 20. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Whois Record Data
  • 21. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential WHOIS Record Data § Who registered the domain § Contact information used § When/where registered § Expiration date § Historical data § Correlations with other malicious domains See relationships between attackers’ infrastructure
  • 22. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential $ whois facebook.com : Domain Name: FACEBOOK.COM Registry Domain ID: 2320948_DOMAIN_COM-VRSN Registrar WHOIS Server: whois.registrarsafe.com Registrar URL: https://www.registrarsafe.com Updated Date: 2019-10-17T18:52:06Z Creation Date: 1997-03-29T05:00:00Z Registrar Registration Expiration Date: 2028-03-30T04:00:00Z Registrar: RegistrarSafe, LLC Registrar IANA ID: 3237 Registrar Abuse Contact Email: abusecomplaints@registrarsafe.com Registrar Abuse Contact Phone: +1.6503087004 Domain Status: clientDeleteProhibited https://www.icann.org/epp#clientDeleteProhibited Domain Status: clientTransferProhibited https://www.icann.org/epp#clientTransferProhibited Domain Status: serverDeleteProhibited https://www.icann.org/epp#serverDeleteProhibited Domain Status: serverTransferProhibited https://www.icann.org/epp#serverTransferProhibited Domain Status: clientUpdateProhibited https://www.icann.org/epp#clientUpdateProhibited Domain Status: serverUpdateProhibited https://www.icann.org/epp#serverUpdateProhibited API request: domainName API response: WhoisRecord_rawText
  • 23. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Registry Registrant ID: Registrant Name: Domain Admin Registrant Organization: Facebook, Inc. Registrant Street: 1601 Willow Rd Registrant City: Menlo Park Registrant State/Province: CA Registrant Postal Code: 94025 Registrant Country: US Registrant Phone: +1.6505434800 Registrant Phone Ext: Registrant Fax: +1.6505434800 Registrant Fax Ext: Registrant Email: domain@fb.com Registry Admin ID: Admin Name: Domain Admin Admin Organization: Facebook, Inc. Admin Street: 1601 Willow Rd Admin City: Menlo Park Admin State/Province: CA Admin Postal Code: 94025 Admin Country: US Admin Phone: +1.6505434800 Admin Phone Ext: Admin Fax: +1.6505434800 Admin Fax Ext: Admin Email: domain@fb.com Tech Name: Domain Admin Tech Organization: Facebook, Inc. Tech Street: 1601 Willow Rd Tech City: Menlo Park Tech State/Province: CA Tech Postal Code: 94025 Tech Country: US Tech Phone: +1.6505434800 Tech Phone Ext: Tech Fax: +1.6505434800 Tech Fax Ext: Tech Email: domain@fb.com Name Server: A.NS.FACEBOOK.COM Name Server: B.NS.FACEBOOK.COM DNSSEC: unsigned : API request: contactEmail API response: list of domainName API request: list of nameServer API response: list of domainName
  • 24. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Questions • Continue to maintain 29 nodes cluster, running Ubuntu Trusty, ElasticSearch 1.6? • Migrate to AWS ElasticSearch 7.1? • Migrate to AWS Aurora Postgres? • Migrate to ClickHouse? don’t need full text search does not support shard for scaling update is slow insert efficient for bulk insert only no secondary index no
  • 25. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Whois Data Pipeline whois ingester whois indexer ClickHouse ElasticSearch Investigate UI API Server 29 nodes, 2 replicas 2 indices 12 TB 6 nodes, 2 replicas 3 tables, 1 materialized view 2 TB download download 1 index 2 tables, 1 MV
  • 26. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Domain Table CREATE TABLE IF NOT EXISTS whois.domain_local( domainName String, contactEmail String, RegistryData_rawText String, WhoisRecord_rawText String) ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{shard}/whois.domain_local', '{replica}’) PRIMARY KEY (domainName) ORDER BY (domainName) SETTINGS index_granularity = 512 CREATE TABLE IF NOT EXISTS whois.domain( domainName String, contactEmail String, RegistryData_rawText String, WhoisRecord_rawText String) ENGINE = Distributed(whois_cluster, whois, domain_local, cityHash64(domainName)) 48 golang writers write to 6 shards cityhash.Hash64([]byte(domainName)) % numHosts for merging, ClickHouse selects the last inserted row, or if version column exists, selects the row with the max value in the version column
  • 27. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Email Table CREATE MATERIALIZED VIEW IF NOT EXISTS whois.email_mv_local( contactEmail String, domainName String) ENGINE = AggregatingMergeTree ORDER BY (contactEmail, domainName) POPULATE AS SELECT contactEmail, domainName FROM db.domain_local WHERE contactEmail != '' GROUP BY contactEmail, domainName CREATE TABLE IF NOT EXISTS whois.email_mv( contactEmail String, domainName String) ENGINE = Distributed(whois_cluster, db, email_mv_local, cityHash64(contactEmail)) domain table is 150 GB email table is 3 GB
  • 28. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NS Table CREATE TABLE IF NOT EXISTS whois.ns_local( nameServer String, domainName String) ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{shard}/whois.ns_local', '{replica}') PRIMARY KEY (nameServer, domainName) ORDER BY (nameServer, domainName) SETTINGS index_granularity = 512 CREATE TABLE IF NOT EXISTS whois.ns( nameServer String, domainName String) ENGINE = Distributed(whois_cluster, whois, ns_local, cityHash64(nameServer)) name server table is 4 GB
  • 29. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Whois Queries SELECT WhoisRecord_rawText FROM domain_local FINAL WHERE domainName = 'facebook.com’ SELECT WhoisRecord_rawText FROM domain FINAL WHERE domainName = 'facebook.com’ SELECT domainName FROM email_mv WHERE contactEmail = 'domain@fb.com’ domainName buyfbfansnow.com facebook-hardware.com instagram-engineering.net pokerface-book.com what3app.com SELECT * FROM ns WHERE nameServer LIKE ‘%.facebook.com’ nameServer domainName ns1.facebook.com djgabeholm.com ns2.facebook.com shellpriv.com ns3.facebook.com arabfashioncompany.com a.ns.facebook.com zuckerberg.com b.ns.facebook.com zuckerberg.net took 9 msec postgres 7 msec took 21 msec data selected fully merged, slower took 16 msec, 2549 rows took 36 msec, 7075 rows Will add TLD column, e.g. facebook.com ORDER By TLD, nameserver, domainName expect query < 10msec
  • 30. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Network Tunnels
  • 31. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Network Tunnels • Network tunnels deliver the branch office traffic to the Cisco’s cloud edge where Umbrella runs a number of security functions. • Firewall, web security, DNS security. Provisioned per organization
  • 32. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Network Tunnels DNS security web security S3 Network Tunnels UI API Server Tunnel Visibility Sensors states events ClickHouse downloadnetwork tunnels 3 nodes, 1 replica firewall script CSV
  • 33. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Event Table CREATE TABLE IF NOT EXISTS tunnel_viz.event_local( OrgID UInt32, TunnelID UInt32, EventTime DateTime, EventID String, EventType String, PeerID String, PeerIP IPv4, PeerPort UInt16, Code LowCardinality(String), Reason String) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventTime) PRIMARY KEY (OrgID, TunnelID, EventTime) ORDER BY (OrgID, TunnelID, EventTime) TTL EventTime + toIntervalDay(120) SETTINGS index_granularity = 8196 dictionary encoding, 10 unique codes event table, 3.3m rows, 260 MB
  • 34. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Event Table CREATE TABLE IF NOT EXISTS tunnel_viz.event( OrgID UInt32, TunnelID UInt32, EventTime DateTime, EventID String, EventType String, PeerID String, PeerIP IPv4, PeerPort UInt16, Code LowCardinality(String), Reason String) ENGINE = Distributed(cdfw_cluster, tunnel_viz, event_local, murmurHash3_32(OrgID))" fairly even distribution for integer
  • 35. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Event Queries SELECT Code, count() as c FROM event GROUP BY Code ORDER BY c DESC ┌─Code─────────────────┬──────c─┐ │ PROPOSAL_MISMATCH_CHILD │ 2164575 │ │ PEER_AUTH_FAILED │ 871937 │ │ RETRANSMIT_SEND │ 243559 │ │ RETRANSMIT_SEND_TIMEOUT │ 42829 │ │ UNIQUE_REPLACE │ 26540 │ │ PARSE_ERROR_BODY │ 2717 │ │ CERT_REVOKED │ 1004 │ │ LOCAL_AUTH_FAILED │ 186 │ │ TS_MISMATCH │ 9 │ │ VIP_FAILURE │ 1 │ └─────────────────────┴────────┘ took 23 msec, 10 unique codes
  • 36. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Next
  • 37. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential What we learned • Cheap • Fast • Flexible • Good compression • Cluster isolation for multiple stores • Ad hoc analysis for authlog using 200 GB storage • Lower cost and acceptable performance for whois database. • Share hardware for different type of datastores.
  • 38. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential ClickHouse Wish List • Support avro in kafka engine. • Balance cluster and copy data after failed or added node.
  • 39. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Questions?
  • 40. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Backup – Other Use Cases
  • 41. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Threat Library
  • 42. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
  • 43. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
  • 44. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Threat Library Data Pipeline Web UI API Server S3 DNS query log threat/attack DNS query log blocked domains run job blocked domains Kubernetes Cluster threat/attack feed 1..n blocked domains threat/attack ClickHouse Airflow 1 pod, 256 GB disk
  • 45. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Blocked Domains CREATE TABLE IF NOT EXISTS attribution.blocked( datetime DateTime, domain String, threat LowCardinality(String), attack LowCardinality(String), count UInt32) ENGINE = ReplacingMergeTree() PARTITION BY toYYYYMMDD(datetime) PRIMARY KEY (datetime, domain, threat, attack) ORDER BY (datetime, domain, threat, attack) TTL datetime + toIntervalDay(30) SETTINGS index_granularity = 8196 dictionary encoding, 37 threats, 118 attacks
  • 46. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential ASN
  • 47. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Autonomous Systems • IP to ASN ASN Attribution
  • 48. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Domain à IP à ASN relationships AS 701AS 3462 AS 12271 1.168.6.17 domain1.com 100.2.65.157 104.162.93.136
  • 49. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential ASN Data Pipeline data importer ClickHouse Aurora Postgres Web UI API Server 1 write 1 read download download CSV multiple tables
  • 50. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Postgres WITH b as ( SELECT asn, cidr FROM delta_bgp_routes WHERE (cidr >>= CAST('104.244.42.193' AS ip4r)) AND (period && DATERANGE(CURRENT_DATE - integer '2', CURRENT_DATE,'[]'))) SELECT a.asn, b.cidr, a.description, a.creation_date AS creationDate, a.ir FROM delta_autonomous_systems AS a, b WHERE (a.asn = b.asn) AND (period && DATERANGE(CURRENT_DATE - integer '2', CURRENT_DATE,'[]’)); asn | cidr | description | creationdate | ir -------+-----------------+----------------------------------+--------------+---- 13414 | 104.244.42.0/24 | TWITTER - Twitter Inc., US 86400 | 2010-07-09 | 3 Took 52 ms
  • 51. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential /etc/clickhouse-server/asn_dictionary.xml<yandex> <dictionary> <name>asn_dict</name> <layout> <ip_trie /> </layout> <structure> <key> <attribute> <name>prefix</name> <type>String</type> </attribute> </key> <attribute> <name>asn</name> <type>UInt32</type> <null_value /> </attribute> <attribute> <name>country</name> <type>String</type> <null_value>??</null_value> </attribute> <attribute> <name>created_at</name> <type>DateTime</type> <null_value /> </attribute> <attribute> <name>registry</name> <type>UInt32</type> <null_value /> </attribute> <attribute> <name>description</name> <type>String</type> <null_value /> </attribute> <attribute> <name>datastr</name> <type>String</type> <null_value /> </attribute> </structure> <source> <file> <path>/opt/dictionaries/asnprefixes.csv</path> <format>CSVWithNames</format> </file> </source> <lifetime>300</lifetime> </dictionary> </yandex>
  • 52. © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Confidential ASN Dictionary • 1.5 minutes in Spark job to download and generate CSV SELECT dictGetString('asn_dict', 'datastr', tuple(IPv4StringToNum('143.202.186.23’))) 143.202.186.0/24 264076 BR 1445817600 4 BREM TECHNOLOGY LTDA - ME, BR SELECT dictGetString('asn_dict', 'datastr', tuple(IPv6StringToNum('2800:5f0:800::1’))) 40.0.0.0/19 4249 US 0789782400 3 LILLY-AS - Eli Lilly and Company, US took 2 ms took 2 ms