SlideShare a Scribd company logo
Elasticsearch
in Hatena Bookmark
Shunsuke KOZAWA
About Me
● Shunsuke KOZAWA
○ Hatena id: skozawa
○ Twitter: @5kozawa
● 2007 - 2012
○ Research: Natural Language Processing
○ Ph.D. in Information Science
● 2012 -
○ Hatena Inc.
■ Hatena Bookmark
■ Ad-tech
Hatena Bookmark
Social Bookmark Service
Search Engine History in Hatena Bookmark
2005 - 2007
MySQL Like
2008 - 2012
Sedue (by Preferred Infrastructure)
2012 - 2014/06
Solr
2014/06 -
Elasticsearch
ref. http://bookmark.hatenastaff.com/entry/2014/06/27/180000
System Architecture
Mapping (partial) of Hatena Bookmark
{ “entry”: {
“properties”: {
“url”: { “type”: “string” },
“title”: { “type”: “string” },
“content”: { “type”: “string” },
“count”: { “type”: “integer” },
“created”: { “type”: “date” },
“bookmark”: {
…
}
}
} }
“bookmark”: {
“type”: “nested”,
“properties”: {
“user”: { “type”: “string” },
“tag”: { “type”: “string” }.
“comment”: { “type”: “string” },
“created”: { “type”: “date” }
}
}
Features powered by Elasticsearch
● Entry Search
○ Tag Search
○ Title Search
○ Content Search
○ URL Search
● Related Entry
● Issue
● Topic
● Bookmark Counter
Tag/Title Search
Tag/Title Search
Search by “Elasticsearch”
Tag/Title Search
Sorting
Filter by the number of bookmark
Filter by timestamp
Tag/Title Search
{
“sort”: { “created”: “desc” },
“query”: {
“bool”: { “must”: [
{ “match_phrase”: { “title”: “elasticsearch” } }
] },
“filtered”: { “filter”: { “bool”: { “must”: [
{ “range”: { “count”: { “gte”: 3 } } },
{ “range”: { “created”: {
“from”: “2015-05-01T00:00:00”,
“to”: “2015-07-15T00:00:00”
} } }
] } } }
}
}
Content Search
Concept Search
● Simple Content Search
○ High recall, but low precision
○ Precision is important in Hatena Bookmark
● Concept Search
○ Query Expansion
■ Use search results retrieved by tag search
■ Expand queries with TF-IDF and IDF, RIDF
● Term Vector API
○ Retrieve using expanded queries
■ eg. 「京都」 -> 「祇園、寺、神社、桜、京、...」
ref. はてなブックマークの全文検索の精度改善
https://speakerdeck.com/takuyaa/hatenabutukumakuquan-wen-jian-suo-falsejing-du-gai-shan
URL Search
http://b.hatena.ne.jp/entrylist?url=http%3A%2F%2Fwww.elastic.co%2F
http://www.elastic.co/
URL Search
http://b.hatena.ne.jp/entrylist?url=http%3A%2F%2Fwww.elastic.co%2F
{ “query”: {
“filtered”: { “filter”: {
“bool”: { “should”: [
{ “prefix”: {
“url”: “http://www.elastic.co/”
} }
] }
} }
} }
http://www.elastic.co/
URL Subdomain Search
hatenablog.com
*.hatenablog.com
Related Entry
ref. はてなブックマークに基づく関連記事レコメンドの開発
http://www.slideshare.net/shunsukekozawa5/hatena-engineer-seminar-5
Issue
Made by editors in Hatena
Entries in special features
Issue
Hard to create Query DSL for non engineers
Made by editors in Hatena
Entries in special features
Edit page for Issue
Edit page for Issue
Friendly for non engineers
Edit page for Issue
Friendly for non engineers
{
“query”: {
“bool”: {
“must”: [
{ “range”: { “count”: { “gte”: 5 } } }
],
“should”: [ (tags, keywords, urls) ],
“must_not”: [ (tags, keywods, urls) ],
“minimum_should_match”: 1
}
},
“sort”: { “created”: “desc” }
}
translate
Topic
Estimate topics from entries in Hatena Bookmark
Topic Page
Entries related with the topic
Topic by Elasticsearch
● Acquire topic keywords
○ Two-layered Significant Terms Aggregation
● Acquire entries related with the topic
○ Function Score Query
○ Retrieve using topic keywords and their scores
官邸、首相、ドローン、落下、カメラ
● 首相官邸にドローン落下 けが人はなし :日本経済新聞
● 首相官邸の屋上にドローン落下、微量の放射線を検出| Reuters
ref. はてなブックマークのトピックページの作り方
http://codezine.jp/article/detail/8767
Bookmark Counter
● Count the number of bookmarks in a web site
○ Count by Sum Aggregation
○ eg. http://d.hatena.ne.jp/
{
“query”: {
{ “prefix”: { “url”: “http://d.hatena.ne.jp/” } }
},
“aggs”: { “total_count”: {
“sum” : { “field”: “count” },
} }
}
Conclusion
● Elasticsearch in Hatena Bookmark
● Features powered by Elasticsearch
○ Tag / Title / Content / URL Search
○ Related entry
○ Issue
○ Topic
○ Bookmark Counter

More Related Content

PDF
Data modeling for Elasticsearch
PDF
01 ElasticSearch : Getting Started
PDF
03. ElasticSearch : Data In, Data Out
PPTX
Peggy elasticsearch應用
PDF
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
PPTX
Data Science Stack with MongoDB and RStudio
ODP
Searching Relational Data with Elasticsearch
PPTX
Hydra: A Vocabulary for Hypermedia-Driven Web APIs
Data modeling for Elasticsearch
01 ElasticSearch : Getting Started
03. ElasticSearch : Data In, Data Out
Peggy elasticsearch應用
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Data Science Stack with MongoDB and RStudio
Searching Relational Data with Elasticsearch
Hydra: A Vocabulary for Hypermedia-Driven Web APIs

What's hot (20)

KEY
Papyri.info's Linked Data Story
PDF
Использование Elasticsearch для организации поиска по сайту
PDF
ElasticSearch: Найдется все... и быстро!
PDF
dataviz on d3.js + elasticsearch
PDF
Into The Box 2018 cbelasticsearch
PPTX
Scaling Saved Searches at eBay Kleinanzeigen
PPT
Introduction to MongoDB (Webinar Jan 2011)
PDF
JSON-LD: Linked Data for Web Apps
PDF
Rich Snippets چیست ؟نسخه PDF
PDF
Data Processing and Aggregation with MongoDB
PDF
Anwendungsfaelle für Elasticsearch
PDF
elasticsearch basics workshop
PDF
Using elasticsearch with rails
PDF
SF ElasticSearch Meetup - How HipChat Scaled to 1B Messages
PDF
Overview of Dan Olteanu's Research presentation
PDF
Appli légère avec d3.js, sinatra, elasticsearch et capucine
PDF
Webinar: Data Processing and Aggregation Options
PDF
The Future of Search and SEO in Drupal
PDF
Lt html data_attributes
PDF
MongoDB World 2016: Poster Sessions eBook
Papyri.info's Linked Data Story
Использование Elasticsearch для организации поиска по сайту
ElasticSearch: Найдется все... и быстро!
dataviz on d3.js + elasticsearch
Into The Box 2018 cbelasticsearch
Scaling Saved Searches at eBay Kleinanzeigen
Introduction to MongoDB (Webinar Jan 2011)
JSON-LD: Linked Data for Web Apps
Rich Snippets چیست ؟نسخه PDF
Data Processing and Aggregation with MongoDB
Anwendungsfaelle für Elasticsearch
elasticsearch basics workshop
Using elasticsearch with rails
SF ElasticSearch Meetup - How HipChat Scaled to 1B Messages
Overview of Dan Olteanu's Research presentation
Appli légère avec d3.js, sinatra, elasticsearch et capucine
Webinar: Data Processing and Aggregation Options
The Future of Search and SEO in Drupal
Lt html data_attributes
MongoDB World 2016: Poster Sessions eBook
Ad

Viewers also liked (20)

PDF
How to make good Xeon Phi
PDF
災害コミュニケーションと視覚情報の共有
PDF
研究所コンテンツは海外へどう拡散しているか?
PDF
いまパブリッククラウドで起きているコト
PDF
USiZEにおけるVyatta活用事例
PDF
Infiniband hack-a-thon #2 Windows班まとめ資料 Windows Server 2012 + FDR Infinibandで...
PDF
Elasticsearchを用いたはてなブックマークのトピック生成
PPTX
Apache cassandraと apache sparkで作るデータ解析プラットフォーム
PDF
低遅延Ethernetとファブリックによるデータセンタ・ネットワーク
PDF
シーサーでのInfiniBand導入事例
PDF
Maven基礎
PPTX
Norikra + Fluentd + Elasticsearch + Kibana リアルタイムストリーミング処理 ログ集計による異常検知
PDF
HBaseでグラフ構造を扱う(開発中)
PDF
ElasticSearch勉強会 第6回
PDF
リクルート流Elasticsearchの使い方
PPTX
Elasticsearch+nodejs+dynamodbで作る全社システム基盤
PDF
サイボウズの方向性
PDF
オウンドメディア企画書
PDF
Kafkaを使った マイクロサービス基盤 part2 +運用して起きたトラブル集
PDF
なんたって”DevQA” アジャイル開発とQAの合体が改善を生む - 永田 敦 氏 #postudy
How to make good Xeon Phi
災害コミュニケーションと視覚情報の共有
研究所コンテンツは海外へどう拡散しているか?
いまパブリッククラウドで起きているコト
USiZEにおけるVyatta活用事例
Infiniband hack-a-thon #2 Windows班まとめ資料 Windows Server 2012 + FDR Infinibandで...
Elasticsearchを用いたはてなブックマークのトピック生成
Apache cassandraと apache sparkで作るデータ解析プラットフォーム
低遅延Ethernetとファブリックによるデータセンタ・ネットワーク
シーサーでのInfiniBand導入事例
Maven基礎
Norikra + Fluentd + Elasticsearch + Kibana リアルタイムストリーミング処理 ログ集計による異常検知
HBaseでグラフ構造を扱う(開発中)
ElasticSearch勉強会 第6回
リクルート流Elasticsearchの使い方
Elasticsearch+nodejs+dynamodbで作る全社システム基盤
サイボウズの方向性
オウンドメディア企画書
Kafkaを使った マイクロサービス基盤 part2 +運用して起きたトラブル集
なんたって”DevQA” アジャイル開発とQAの合体が改善を生む - 永田 敦 氏 #postudy
Ad

More from Shunsuke Kozawa (9)

PDF
Gunosyにおけるパーソナライズシステム
PPTX
Gunosyにおける仮説検証とABテスト
PDF
はてなブックマークのトピックページの裏側 in YAPC::Asia Tokyo 2015
PDF
はてなブックマークに基づく関連記事レコメンドエンジンの開発
PDF
はてなブックマークの新機能における自然言語処理の活用
PDF
Heady news headline abstraction through event pattern clustering
PDF
Active learning with efficient feature weighting methods for improving data q...
PDF
Joint inference of named entity recognition and normalization for tweets
PDF
Topical keyphrase extraction from twitter
Gunosyにおけるパーソナライズシステム
Gunosyにおける仮説検証とABテスト
はてなブックマークのトピックページの裏側 in YAPC::Asia Tokyo 2015
はてなブックマークに基づく関連記事レコメンドエンジンの開発
はてなブックマークの新機能における自然言語処理の活用
Heady news headline abstraction through event pattern clustering
Active learning with efficient feature weighting methods for improving data q...
Joint inference of named entity recognition and normalization for tweets
Topical keyphrase extraction from twitter

Recently uploaded (20)

PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Encapsulation theory and applications.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
KodekX | Application Modernization Development
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Empathic Computing: Creating Shared Understanding
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Chapter 3 Spatial Domain Image Processing.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The AUB Centre for AI in Media Proposal.docx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Encapsulation theory and applications.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Advanced methodologies resolving dimensionality complications for autism neur...
KodekX | Application Modernization Development
“AI and Expert System Decision Support & Business Intelligence Systems”
Spectral efficient network and resource selection model in 5G networks
Encapsulation_ Review paper, used for researhc scholars
Per capita expenditure prediction using model stacking based on satellite ima...
MYSQL Presentation for SQL database connectivity
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
sap open course for s4hana steps from ECC to s4
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Empathic Computing: Creating Shared Understanding
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf

Elasticsearch in hatena bookmark