SlideShare a Scribd company logo
Investigating Web Defacement
Campaigns at Large
Federico Maggi, Marco Balduzzi, Ryan Flores,
Lion Gu, Vincenzo Ciancaglini
Trend Micro, Forward-Looking Threat Research
Investigating Web Defacement Campaigns at Large
# of Records Per Reporting Site
Source Site URL #Records
Zone-H www.zone-h.org 12,303,240
Hack-CN www.hack-cn.com 386,705
Mirror Zone www.mirror-zone.org 195,398
Hack Mirror www.hack-mirror.com 68,980
MyDeface www.mydeface.com 37,843
TOTAL 12,992,166
Metadata
Raw content
Records Per Year
Topics Over the Years
Security Problems
Real World Events
Adoption of Malicious Content in Deface Pages
Key Observation: Deface Page Template
Process of Analyzing Deface Pages
Process of Analyzing Deface Pages
Process of Analyzing Deface Pages
Process of Analyzing Deface Pages
Feature Extraction
Image
Social handler
Text
Page title
Background color
Feature Extraction
Multimedia URL
Email address
Clustering
• BIRCH (Balanced Iterative Reducing and Clustering Hierarchies)
• Statistics values are efficient to compute
• Quickly find the closest cluster for each new data points
Investigating Web Defacement Campaigns at Large
Similar Deface Pages in One Cluster
Similar Deface Pages in One Cluster
Real-World Validation
How Attackers Are Organized
50% actors join at least one
team
Various Campaigns for “Charlie Hebdo” Attacks
Campaign and Defacer Team
Campaign
Team
Team and Defacer
Campaign
Team
Defacer
Overview of “Charlie Hebdo” Attacks
Campaign
Team
Defacer
Long Term Campaigns
Aggressive Campaigns
MostTargetedTLDs
MostTargetedTLDs
Israeli-Palestinian Conflict
Conclusion
• Conduct a large-scale measurement
• 13M records spanning 19 years
• Introduce an approach to semi-automatically detect defacement
campaigns
• Show how our approach empowers the analyst in understanding
modern defacements
• Live campaigns in the real world
• Social structure of actors
• Modus operandi
• Motive, especially political reason
THANK YOU
Q&A

More Related Content

PPTX
Raven pack kevin
PPTX
Using Social Media for Security Monitoring
PPTX
C* Summit 2013: Big Data Analytics – Realize the Investment from Your Big Dat...
PDF
Breach level index_report_2017_gemalto
PDF
Introducción a Briar: inter-conectarse sin internet
PPTX
Connecting Business leaders for collective growth. Powered by data analysis o...
PPTX
Use of hog descriptors in phishing detection
PDF
KMWorld Martin Briefing
Raven pack kevin
Using Social Media for Security Monitoring
C* Summit 2013: Big Data Analytics – Realize the Investment from Your Big Dat...
Breach level index_report_2017_gemalto
Introducción a Briar: inter-conectarse sin internet
Connecting Business leaders for collective growth. Powered by data analysis o...
Use of hog descriptors in phishing detection
KMWorld Martin Briefing

Similar to Investigating Web Defacement Campaigns at Large (20)

PPTX
Content Disarm Reconstruction & Cyber Kill Chain
PPTX
FireHost Webinar: Protect Your Application With Intelligent Security
PPTX
Content Disarm Reconstruction and Cyber Kill Chain - Muhammad Sahputra
PPTX
How Data Analytics is Re-defining Modern Era in Cyber Security
PPTX
Open Source Insight: Container Tech, Data Centre Security & 2018's Biggest Se...
PDF
I´m not a number, I´m a free man
PDF
Web Data Extraction: A Crash Course
PDF
F5 Hero Asset - Inside the head of a Hacker Final
PPT
Using Chaos to Disentangle an ISIS-Related Twitter Network
PPTX
5 Ways To Fight A DDoS Attack
PPTX
What is the Cybersecurity plan for tomorrow?
PPTX
Semantic Web Technologies
PDF
Engineering challenges in vertical search engines
PPTX
Defend-Against-Next-Gen-Attacks-with-Wire-Data-by-Pete-Anderson.pptx
PDF
Information Extraction and Aggregation from Unstructured Web Data for Busines...
PPTX
Semantic Search at Yahoo
PDF
Dark Web Exploring And Data Mining The Dark Side Of The Web 1st Edition Hsinc...
PDF
Semantic Web For Dummies
PPT
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY
PPTX
Cyber-Security-Presentation_Bistro_Group_ppt
Content Disarm Reconstruction & Cyber Kill Chain
FireHost Webinar: Protect Your Application With Intelligent Security
Content Disarm Reconstruction and Cyber Kill Chain - Muhammad Sahputra
How Data Analytics is Re-defining Modern Era in Cyber Security
Open Source Insight: Container Tech, Data Centre Security & 2018's Biggest Se...
I´m not a number, I´m a free man
Web Data Extraction: A Crash Course
F5 Hero Asset - Inside the head of a Hacker Final
Using Chaos to Disentangle an ISIS-Related Twitter Network
5 Ways To Fight A DDoS Attack
What is the Cybersecurity plan for tomorrow?
Semantic Web Technologies
Engineering challenges in vertical search engines
Defend-Against-Next-Gen-Attacks-with-Wire-Data-by-Pete-Anderson.pptx
Information Extraction and Aggregation from Unstructured Web Data for Busines...
Semantic Search at Yahoo
Dark Web Exploring And Data Mining The Dark Side Of The Web 1st Edition Hsinc...
Semantic Web For Dummies
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY
Cyber-Security-Presentation_Bistro_Group_ppt
Ad

More from Trend Micro (20)

PPTX
Industrial Remote Controllers Safety, Security, Vulnerabilities
PDF
Behind the scene of malware operators. Insights and countermeasures. CONFiden...
PDF
Automated Security for the Real-time Enterprise with VMware NSX and Trend Mic...
PDF
Skip the Security Slow Lane with VMware Cloud on AWS
PDF
Dark Web Impact on Hidden Services in the Tor-based Criminal Ecosystem Dr.
PDF
Mobile Telephony Threats in Asia
PDF
Cybercrime In The Deep Web
PDF
AIS Exposed: New vulnerabilities and attacks. (HITB AMS 2014)
PDF
HBR APT framework
PDF
Captain, Where Is Your Ship – Compromising Vessel Tracking Systems
PDF
Countering the Advanced Persistent Threat Challenge with Deep Discovery
PDF
The Custom Defense Against Targeted Attacks
PPTX
Where to Store the Cloud Encryption Keys - InterOp 2012
PDF
[Case Study ~ 2011] Baptist Hospitals of Southest Texas
PDF
Who owns security in the cloud
PPTX
Encryption in the Public Cloud: 16 Bits of Advice for Security Techniques
PPSX
Threat predictions 2011
PDF
Trend micro deep security
PDF
Assuring regulatory compliance, ePHI protection, and secure healthcare delivery
PDF
Solutions for privacy, disclosure and encryption
Industrial Remote Controllers Safety, Security, Vulnerabilities
Behind the scene of malware operators. Insights and countermeasures. CONFiden...
Automated Security for the Real-time Enterprise with VMware NSX and Trend Mic...
Skip the Security Slow Lane with VMware Cloud on AWS
Dark Web Impact on Hidden Services in the Tor-based Criminal Ecosystem Dr.
Mobile Telephony Threats in Asia
Cybercrime In The Deep Web
AIS Exposed: New vulnerabilities and attacks. (HITB AMS 2014)
HBR APT framework
Captain, Where Is Your Ship – Compromising Vessel Tracking Systems
Countering the Advanced Persistent Threat Challenge with Deep Discovery
The Custom Defense Against Targeted Attacks
Where to Store the Cloud Encryption Keys - InterOp 2012
[Case Study ~ 2011] Baptist Hospitals of Southest Texas
Who owns security in the cloud
Encryption in the Public Cloud: 16 Bits of Advice for Security Techniques
Threat predictions 2011
Trend micro deep security
Assuring regulatory compliance, ePHI protection, and secure healthcare delivery
Solutions for privacy, disclosure and encryption
Ad

Recently uploaded (20)

PDF
Electronic commerce courselecture one. Pdf
PDF
cuic standard and advanced reporting.pdf
PDF
KodekX | Application Modernization Development
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Big Data Technologies - Introduction.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
Teaching material agriculture food technology
PPTX
A Presentation on Artificial Intelligence
Electronic commerce courselecture one. Pdf
cuic standard and advanced reporting.pdf
KodekX | Application Modernization Development
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation theory and applications.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Unlocking AI with Model Context Protocol (MCP)
Understanding_Digital_Forensics_Presentation.pptx
Spectral efficient network and resource selection model in 5G networks
“AI and Expert System Decision Support & Business Intelligence Systems”
Network Security Unit 5.pdf for BCA BBA.
Big Data Technologies - Introduction.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Chapter 3 Spatial Domain Image Processing.pdf
NewMind AI Monthly Chronicles - July 2025
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Teaching material agriculture food technology
A Presentation on Artificial Intelligence

Investigating Web Defacement Campaigns at Large

Editor's Notes

  • #2: Today we are going to talk about web defacement attack.
  • #3: Website defacement is a very common attack. We know that hackers attack websites everyday. After websites are compromised, web pages could be altered by hackers. Hackers usually leave some messages in deface pages, like who they are, why they attack. Most of the time, hackers are driven by political motivation. Like picture in this slide, hackers want people to pay attention to Palestine situation. But there are more questions about defacement attack. Like, is there any defacement campaigns? And, what is defacer’s modus operandi, social structure and organization? And another question is, Is there any way to track and investigate defacement attacks? Those questions are our research motivation. In this talk, we will use our findings to answer those questions.
  • #4: Data collection is the start point of our research. We found that there are some defacement reporting sites in the Internet. Many defacement incidents are reported to those sites. Zone-h is the largest organization for defacement reporting. So we acquired data from zone-h for research purpose. At the same time, we collected data from another 4 sites. The total amount of our dataset is almost 13M records.
  • #5: In this slide, we will see what our data look like. Zone-h is a major data source in our research. So we can take one zone-h record as an example. The record has two parts: metadata and raw content. In the metadata part, there are several data fields, like timestamp, reporter name, victim domain, victim IP address, and so on. Raw content is the cached deface page. Our dataset has both metadata and raw content.
  • #6: This figure shows records per year for our dataset. Our collection spans over almost 19 years, ending at September 2016. You can see that there is a very clear increasing trend for defacement attack. The reported incidents grew from thousands to more than one million per year.
  • #7: When our dataset is ready, we are very curious about what message defacers conveyed in the deface pages. So we use topic modeling technique to determine the subject of deface pages. The table shows the evolution of the topics. In the early stage, that is from 1998 to 2004, defacers were interested in security problems in websites. We can see some related keywords like ’security’, ‘backup’, ’encryption’. After 2005, the topics are shifting to real world events. More deface pages reacted to some incidents in reality. We can see that some keywords are about real world events, like ‘pope’, ’turkey’, ’terrorism’.
  • #8: Deface pages are not only used for conveying messages. Sometime they have malicious content. We used Trend Micro web security checking service to scan all deface pages, and found a lot of deface pages have malicious scripts. Some malicious scripts may download malware to visitor’s computer. We summarize scan result into this figure. The figure shows the general trend is increasing.
  • #9: When we were checking some deface pages manually, we found these two deface pages by accident. These deface pages are quite interesting. The first impression is that these two pages look very similar. They have almost same page layout, and almost same font color. We believe those deface pages are made from one template. If we read the text, we can know that both of them want more people to understand Islam religion. They have same motivation. So we have one important observation, that is, similar deface pages almost have same motivation, and they belong to same deface campaign. So we think clustering similar deface pages is a good approach to detect deface campaign.
  • #10: Now let me introduce to our approach to analyze deface pages automatically. When deface pages are input into system, they are deduplicated first. We only keep one copy of deface pages with same hash. De-duplication will reduce the computation resources in our pipeline.
  • #11: The next stage is deface page analysis. In this stage, our system extract content from deface pages. We conduct both dynamic analysis and static analysis to extract content. All content and metadata are stored to one Elastic database.
  • #12: Then the system performs campaign detection. Our system use an unsupervised machine-learning pipeline to do campaign detection. First, some features are extracted from deface pages. Then, feature data is normalized. After that, we use clustering to detect campaigns. After clustering, we re-duplicate the data in each cluster. This step help us get the “expanded” clusters with the full set of original records.
  • #13: The last stage is labeling and visualization. We create a web portal to show clustered deface. The portal is designed for security analyst to carry out in-depth investigation manually. I will introduce the web portal later.
  • #14: Feature engineering is central to any clustering problem. We found some features could be extracted to represent a deface page. Let’s look at this deface page. First, Page title is a key element of a web page. Defacers usually leave their names or core messages in there. -----We calculate the ratio of letters, digits, punctuation, whitespace in title as features. Then, we notice that deface pages have different background color. So we extract average color as one feature. Next, defacers tend to put some images in deface pages. So we count number of image tags as one feature. And then, many defacers leave their social handlers in deface pages. The number of social handlers is another important feature. Then, most deface pages have text. Text encoding can be treated as a feature.
  • #15: Some defacers also leave their email addresses or include multimedia URLs in deface pages. email addresses or multimedia URLs are high quality features to represent deface pages. So we take the number of email addresses and multimedia URLs as features.
  • #16: For clustering, we use BIRCH algorithm to do that
  • #17: After clustering, we build a web portal to show clusters. This is the web portal. Each row is a summary of one cluster. Take the first row as one example. This row shows size of cluster, keys, start time, end time, number of attackers, and so on. In the size column, the number is size of cluster. Here this cluster has 920 deface pages. In the key column, there are some icons. Each icon represents one feature.
  • #18: In our cluster results, some clusters have same deface pages. While some clusters have very similar pages. Here is one cluster sample with similar pages Can you find the differences?
  • #19: The difference is highlighted here. If you are fans of spot difference game, we have some clusters for you.
  • #20: After getting cluster results, we want to know if some clusters are connected to certain real world events. So we select some real world events, and then we search evidences in our cluster results. This figure shows the timeline of major real world events. And we also list cluster results related to those events. Let’s look at this timeline. First, for each real world event here, we can find relevant clusters. Then, we notice that, some events got a lot of defacers’ attention. For example, The death of Osama Bin Laden, Battle of Aleppo, Charlie Hebdo Shooting. So we can see that, some defacement attacks are driven by real world events.
  • #21: In this slide, we will try to explain how attackers are organized. This CDF graph gives us some clue. We can see that 50% actors are joined at least one team. That means half actors identify themselves using a team name. They are coordinated to conduct attack.
  • #22: Let’s look at one example. This example is about various campaigns targeting Charlie Hebdo which is a French magazine. A short background story. In 2015, Two terrorists opened fire to headquarters of Charlie Hebdo, and killed 12 people for religious reason. This terrorism attack caused various defacement campaigns.
  • #23: After shooting event happened, we could find some deface pages related to such event. This graph shows the relationship between defacer teams and campaigns. In deface pages, campaign name are usually highlighted by defacers. Here the campaign name is ’opcharlie’. The diamond nodes are defacer teams, like fallaga, thameur. We can see that those three teams joined ‘opcharlie’ campaign.
  • #24: This graph shows the relationship between team and defacer. The diamond nodes are teams. The circles are defacers. If there is one connection between defacer and team, that means the defacer refer the team name in deface pages. We can see that many connections from defacer point to fallaga team, so fallaga team has a lot of members..
  • #25: This graph gives us the overview of charlie Hebdo attack. We can see that there are nine campaigns. Each campaign has participants. The participants are either teams or defacers. Some teams are big, like fallaga. And, we can see that most defacers joined at least one team, and very few defacers worked independently.
  • #26: This slide shows the longest lasting campaigns. The table gives you the idea of how many attacks conducted by campaigns in the years between 1998 and 2016. Long lasting campaigns are the campaigns spanning over years. For example, campaigns r00t lasted 13 years, and campaign redhack span over 10 years and caused many attacks.
  • #27: Comparing to long term campaign, we also conclude some campaigns causing most attacks. We call such campaign “aggressive campaigns”. We find a lot of aggressive campaigns, like savegaza here. Such aggressive campaigns are geopolitical campaigns. Take savegaza as one example. It reacted to war events in Palestine.
  • #28: Let’s look at another example. This example can show how our system help analyst investigate campaigns. Timeline analysis is a very useful method for analyst. So our system provide timeline feature of campaign. We can see that the campaign in this slide spans over 4 years, and includes over 60 clusters. The clusters are grouped by targeted TLDs. We can have many good insight of the campaign, like how many TLDs are targeted, how long each cluster last.
  • #29: Comparing to previous example of long running campaign, this slides shows another 5 campaigns. Those campaigns last quite short, less than one month. And each campaign just attacked one TLD.
  • #30: This slide shows one example of large-scale joint campaigns. The joint campaigns share common motives and objectives The topic of this joint campaign is Israeli-Palestinian Conflict It Involve 12 campaigns It Target Israel websites
  • #31: FEDE: too much text, but it’s probably allowed in the conclusions – maybe use some build-out?
  • #32: FEDE: add FTR/Trend Micro branding FEDE: I know I’m asking to expand the number of slides dramatically and you might find yourself going overtime. It’s OK to just remove the least interesting part and focus on what really matters/excites: the details are on the paper, your goal is to make sure that at least one message is received and people are happy ;-)