SlideShare a Scribd company logo
Exploring the Enron Email Dataset with Kiji and Hive
●

●
●
●
●
○
○
○
○
●
●
●
●
●
●
●
●
Exploring the Enron Email Dataset with Kiji and Hive
Exploring the Enron Email Dataset with Kiji and Hive
…
●
●
●
●
●
CREATE EXTERNAL TABLE emails (
mid STRUCT<ts: TIMESTAMP, value: STRING>,
dateLong STRUCT<ts: TIMESTAMP, value: BIGINT>,
fromStr STRUCT<ts: TIMESTAMP, value: STRING>,
toStr STRUCT<ts: TIMESTAMP, value: STRING>,
subject STRUCT<ts: TIMESTAMP, value: STRING>,
body STRUCT<ts: TIMESTAMP, value: STRING>,
) STORED BY 'org.kiji.hive.KijiTableStorageHandler'
WITH SERDEPROPERTIES (
'kiji.columns' = ‘info:mid[0],info:date[0],info:from[0],info:to[0],’
+ ‘info:subject[0],info:body[0]’
) TBLPROPERTIES (
'kiji.table.uri' = ' kiji://.env/enron_email/emails '
);
SELECT
fromStr.value AS fromStr,
count(1) AS count
FROM emails
GROUP BY fromStr.value
ORDER BY count DESC
LIMIT 10;
Exploring the Enron Email Dataset with Kiji and Hive
SELECT
fromStr.value AS fromStr,
trim(splitToStr) AS toStr,
count(1) AS count
FROM emails
LATERAL VIEW
explode(split(toStr.value,',')) tos AS splitToStr
GROUP BY fromStr.value,trim(splitToStr)
ORDER BY count DESC
LIMIT 10;
Exploring the Enron Email Dataset with Kiji and Hive
●
●
●
○
○
Exploring the Enron Email Dataset with Kiji and Hive
Emails Table

Sentiment

User Emails

Producer
SELECT
((year(datelong.ts)-1999)*52+weekofyear(datelong.ts))
AS weeknum,
avg(sentiment.value) AS avgsentiment,
stddev(sentiment.value) AS stddevsentiment,
count(1) AS nummessages
FROM emails
WHERE regexp_replace(fromStr.value,".*@","")=="enron.com"
GROUP BY ((year(datelong.ts)-1999)*52+weekofyear(datelong.
ts));
Exploring the Enron Email Dataset with Kiji and Hive
Exploring the Enron Email Dataset with Kiji and Hive
Exploring the Enron Email Dataset with Kiji and Hive
Exploring the Enron Email Dataset with Kiji and Hive
SELECT
lword AS word,
sum(sentiment) AS totalsentiment
FROM (
SELECT
mid.value AS mid,
lower(word) AS lword,
sentiment.value AS sentiment
FROM emails
LATERAL VIEW explode(sentences(body.value)[0]) wds AS word
WHERE regexp_replace(fromStr.value,".*@","")=="enron.com"
) subquery
GROUP BY lword
ORDER BY totalsentiment ASC;
Exploring the Enron Email Dataset with Kiji and Hive
Exploring the Enron Email Dataset with Kiji and Hive
Exploring the Enron Email Dataset with Kiji and Hive
●
●
●

●
●
●
●
Exploring the Enron Email Dataset with Kiji and Hive

More Related Content

KEY
Geo & capped collections with MongoDB
PDF
How to calculate the optimal undo retention in Oracle
TXT
Database growth
PPTX
Querying mongo db
PDF
Building apps why you should bet on the web
PPTX
NoSQL with MongoDB
PDF
Swift & JSON
PDF
C++ Programming - 6th Study
Geo & capped collections with MongoDB
How to calculate the optimal undo retention in Oracle
Database growth
Querying mongo db
Building apps why you should bet on the web
NoSQL with MongoDB
Swift & JSON
C++ Programming - 6th Study

What's hot (14)

PPTX
Sydney Python Presentation (Feb 2010) - Tracking Large Metallic Objects / Goo...
PDF
Building A Web Application To Monitor PubMed Retraction Notices
PDF
C++ Programming - 8th Study
PDF
MongoDB Oplog入門
PPTX
Mongo db modifiers
PDF
MongoDB: Intro & Application for Big Data
PDF
Analyze Data in MongoDB with AWS
PDF
Geospatial Enhancements in MongoDB 2.4
PPTX
ヘルパ・オブジェクト
PDF
Triumph of Simplicity: How databases will be replaced by simple services.
PDF
MongoDB: Replication,Sharding,MapReduce
PDF
Git as NoSQL
PPTX
MongoDB GeoSpatial Feature
PPTX
Mongo db query docuement
Sydney Python Presentation (Feb 2010) - Tracking Large Metallic Objects / Goo...
Building A Web Application To Monitor PubMed Retraction Notices
C++ Programming - 8th Study
MongoDB Oplog入門
Mongo db modifiers
MongoDB: Intro & Application for Big Data
Analyze Data in MongoDB with AWS
Geospatial Enhancements in MongoDB 2.4
ヘルパ・オブジェクト
Triumph of Simplicity: How databases will be replaced by simple services.
MongoDB: Replication,Sharding,MapReduce
Git as NoSQL
MongoDB GeoSpatial Feature
Mongo db query docuement
Ad

More from WibiData (6)

PDF
Data Evolution on HBase with Kiji
PDF
Performing Data Science with HBase
PDF
Analyzing Large-Scale User Data with Hadoop and HBase
PDF
Building Personalized Applications at Scale
PDF
Analyzing Large-Scale User Data with Hadoop and HBase
PDF
Building Personalized Applications with HBase
Data Evolution on HBase with Kiji
Performing Data Science with HBase
Analyzing Large-Scale User Data with Hadoop and HBase
Building Personalized Applications at Scale
Analyzing Large-Scale User Data with Hadoop and HBase
Building Personalized Applications with HBase
Ad

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Tartificialntelligence_presentation.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
MYSQL Presentation for SQL database connectivity
MIND Revenue Release Quarter 2 2025 Press Release
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Group 1 Presentation -Planning and Decision Making .pptx
Assigned Numbers - 2025 - Bluetooth® Document
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Programs and apps: productivity, graphics, security and other tools
Unlocking AI with Model Context Protocol (MCP)
Tartificialntelligence_presentation.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Spectroscopy.pptx food analysis technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
Spectral efficient network and resource selection model in 5G networks
The Rise and Fall of 3GPP – Time for a Sabbatical?
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Digital-Transformation-Roadmap-for-Companies.pptx
20250228 LYD VKU AI Blended-Learning.pptx

Exploring the Enron Email Dataset with Kiji and Hive