Investigating Web Defacement Campaigns at Large

Download as PPTX, PDF

1 like4,453 views

This document summarizes an analysis of over 12 million web defacement records spanning 19 years from several databases. The analysis found that 50% of actors joined at least one campaign team. It identified various campaigns responding to the 2015 "Charlie Hebdo" attacks in France and showed how attackers are organized into interconnected campaigns and teams. The study also examined long-term and aggressive defacement campaigns as well as trends over time, concluding that semi-automated detection of campaigns can help analysts understand modern defacements and their real-world context and motivations such as political conflicts.

Technology

Investigating Web Defacement
Campaigns at Large
Federico Maggi, Marco Balduzzi, Ryan Flores,
Lion Gu, Vincenzo Ciancaglini
Trend Micro, Forward-Looking Threat Research

Investigating Web Defacement Campaigns at Large

# of Records Per Reporting Site
Source Site URL #Records
Zone-H www.zone-h.org 12,303,240
Hack-CN www.hack-cn.com 386,705
Mirror Zone www.mirror-zone.org 195,398
Hack Mirror www.hack-mirror.com 68,980
MyDeface www.mydeface.com 37,843
TOTAL 12,992,166

Topics Over the Years
Security Problems
Real World Events

Adoption of Malicious Content in Deface Pages

Feature Extraction
Image
Social handler
Text
Page title
Background color

Feature Extraction
Multimedia URL
Email address

Clustering
• BIRCH (Balanced Iterative Reducing and Clustering Hierarchies)
• Statistics values are efficient to compute
• Quickly find the closest cluster for each new data points

How Attackers Are Organized
50% actors join at least one
team

Various Campaigns for “Charlie Hebdo” Attacks

Overview of “Charlie Hebdo” Attacks
Campaign
Team
Defacer

Conclusion
• Conduct a large-scale measurement
• 13M records spanning 19 years
• Introduce an approach to semi-automatically detect defacement
campaigns
• Show how our approach empowers the analyst in understanding
modern defacements
• Live campaigns in the real world
• Social structure of actors
• Modus operandi
• Motive, especially political reason

Investigating Web Defacement Campaigns at Large

1. Investigating Web Defacement Campaigns at Large Federico Maggi, Marco Balduzzi, Ryan Flores, Lion Gu, Vincenzo Ciancaglini Trend Micro, Forward-Looking Threat Research

3. # of Records Per Reporting Site Source Site URL #Records Zone-H www.zone-h.org 12,303,240 Hack-CN www.hack-cn.com 386,705 Mirror Zone www.mirror-zone.org 195,398 Hack Mirror www.hack-mirror.com 68,980 MyDeface www.mydeface.com 37,843 TOTAL 12,992,166

4. Metadata Raw content

5. Records Per Year

6. Topics Over the Years Security Problems Real World Events

7. Adoption of Malicious Content in Deface Pages

8. Key Observation: Deface Page Template

9. Process of Analyzing Deface Pages

10. Process of Analyzing Deface Pages

11. Process of Analyzing Deface Pages

12. Process of Analyzing Deface Pages

13. Feature Extraction Image Social handler Text Page title Background color

14. Feature Extraction Multimedia URL Email address

15. Clustering • BIRCH (Balanced Iterative Reducing and Clustering Hierarchies) • Statistics values are efficient to compute • Quickly find the closest cluster for each new data points

17. Similar Deface Pages in One Cluster

18. Similar Deface Pages in One Cluster

19. Real-World Validation

20. How Attackers Are Organized 50% actors join at least one team

21. Various Campaigns for “Charlie Hebdo” Attacks

22. Campaign and Defacer Team Campaign Team

23. Team and Defacer Campaign Team Defacer

24. Overview of “Charlie Hebdo” Attacks Campaign Team Defacer

25. Long Term Campaigns

26. Aggressive Campaigns

27. MostTargetedTLDs

28. MostTargetedTLDs

29. Israeli-Palestinian Conflict

30. Conclusion • Conduct a large-scale measurement • 13M records spanning 19 years • Introduce an approach to semi-automatically detect defacement campaigns • Show how our approach empowers the analyst in understanding modern defacements • Live campaigns in the real world • Social structure of actors • Modus operandi • Motive, especially political reason

31. THANK YOU Q&A

Editor's Notes

#2: Today we are going to talk about web defacement attack.
#3: Website defacement is a very common attack. We know that hackers attack websites everyday. After websites are compromised, web pages could be altered by hackers. Hackers usually leave some messages in deface pages, like who they are, why they attack. Most of the time, hackers are driven by political motivation. Like picture in this slide, hackers want people to pay attention to Palestine situation. But there are more questions about defacement attack. Like, is there any defacement campaigns? And, what is defacer’s modus operandi, social structure and organization? And another question is, Is there any way to track and investigate defacement attacks? Those questions are our research motivation. In this talk, we will use our findings to answer those questions.
#4: Data collection is the start point of our research. We found that there are some defacement reporting sites in the Internet. Many defacement incidents are reported to those sites. Zone-h is the largest organization for defacement reporting. So we acquired data from zone-h for research purpose. At the same time, we collected data from another 4 sites. The total amount of our dataset is almost 13M records.
#5: In this slide, we will see what our data look like. Zone-h is a major data source in our research. So we can take one zone-h record as an example. The record has two parts: metadata and raw content. In the metadata part, there are several data fields, like timestamp, reporter name, victim domain, victim IP address, and so on. Raw content is the cached deface page. Our dataset has both metadata and raw content.
#6: This figure shows records per year for our dataset. Our collection spans over almost 19 years, ending at September 2016. You can see that there is a very clear increasing trend for defacement attack. The reported incidents grew from thousands to more than one million per year.
#7: When our dataset is ready, we are very curious about what message defacers conveyed in the deface pages. So we use topic modeling technique to determine the subject of deface pages. The table shows the evolution of the topics. In the early stage, that is from 1998 to 2004, defacers were interested in security problems in websites. We can see some related keywords like ’security’, ‘backup’, ’encryption’. After 2005, the topics are shifting to real world events. More deface pages reacted to some incidents in reality. We can see that some keywords are about real world events, like ‘pope’, ’turkey’, ’terrorism’.
#8: Deface pages are not only used for conveying messages. Sometime they have malicious content. We used Trend Micro web security checking service to scan all deface pages, and found a lot of deface pages have malicious scripts. Some malicious scripts may download malware to visitor’s computer. We summarize scan result into this figure. The figure shows the general trend is increasing.
#9: When we were checking some deface pages manually, we found these two deface pages by accident. These deface pages are quite interesting. The first impression is that these two pages look very similar. They have almost same page layout, and almost same font color. We believe those deface pages are made from one template. If we read the text, we can know that both of them want more people to understand Islam religion. They have same motivation. So we have one important observation, that is, similar deface pages almost have same motivation, and they belong to same deface campaign. So we think clustering similar deface pages is a good approach to detect deface campaign.
#10: Now let me introduce to our approach to analyze deface pages automatically. When deface pages are input into system, they are deduplicated first. We only keep one copy of deface pages with same hash. De-duplication will reduce the computation resources in our pipeline.
#11: The next stage is deface page analysis. In this stage, our system extract content from deface pages. We conduct both dynamic analysis and static analysis to extract content. All content and metadata are stored to one Elastic database.
#12: Then the system performs campaign detection. Our system use an unsupervised machine-learning pipeline to do campaign detection. First, some features are extracted from deface pages. Then, feature data is normalized. After that, we use clustering to detect campaigns. After clustering, we re-duplicate the data in each cluster. This step help us get the “expanded” clusters with the full set of original records.
#13: The last stage is labeling and visualization. We create a web portal to show clustered deface. The portal is designed for security analyst to carry out in-depth investigation manually. I will introduce the web portal later.
#14: Feature engineering is central to any clustering problem. We found some features could be extracted to represent a deface page. Let’s look at this deface page. First, Page title is a key element of a web page. Defacers usually leave their names or core messages in there. -----We calculate the ratio of letters, digits, punctuation, whitespace in title as features. Then, we notice that deface pages have different background color. So we extract average color as one feature. Next, defacers tend to put some images in deface pages. So we count number of image tags as one feature. And then, many defacers leave their social handlers in deface pages. The number of social handlers is another important feature. Then, most deface pages have text. Text encoding can be treated as a feature.
#15: Some defacers also leave their email addresses or include multimedia URLs in deface pages. email addresses or multimedia URLs are high quality features to represent deface pages. So we take the number of email addresses and multimedia URLs as features.
#16: For clustering, we use BIRCH algorithm to do that
#17: After clustering, we build a web portal to show clusters. This is the web portal. Each row is a summary of one cluster. Take the first row as one example. This row shows size of cluster, keys, start time, end time, number of attackers, and so on. In the size column, the number is size of cluster. Here this cluster has 920 deface pages. In the key column, there are some icons. Each icon represents one feature.
#18: In our cluster results, some clusters have same deface pages. While some clusters have very similar pages. Here is one cluster sample with similar pages Can you find the differences?
#19: The difference is highlighted here. If you are fans of spot difference game, we have some clusters for you.
#20: After getting cluster results, we want to know if some clusters are connected to certain real world events. So we select some real world events, and then we search evidences in our cluster results. This figure shows the timeline of major real world events. And we also list cluster results related to those events. Let’s look at this timeline. First, for each real world event here, we can find relevant clusters. Then, we notice that, some events got a lot of defacers’ attention. For example, The death of Osama Bin Laden, Battle of Aleppo, Charlie Hebdo Shooting. So we can see that, some defacement attacks are driven by real world events.
#21: In this slide, we will try to explain how attackers are organized. This CDF graph gives us some clue. We can see that 50% actors are joined at least one team. That means half actors identify themselves using a team name. They are coordinated to conduct attack.
#22: Let’s look at one example. This example is about various campaigns targeting Charlie Hebdo which is a French magazine. A short background story. In 2015, Two terrorists opened fire to headquarters of Charlie Hebdo, and killed 12 people for religious reason. This terrorism attack caused various defacement campaigns.
#23: After shooting event happened, we could find some deface pages related to such event. This graph shows the relationship between defacer teams and campaigns. In deface pages, campaign name are usually highlighted by defacers. Here the campaign name is ’opcharlie’. The diamond nodes are defacer teams, like fallaga, thameur. We can see that those three teams joined ‘opcharlie’ campaign.
#24: This graph shows the relationship between team and defacer. The diamond nodes are teams. The circles are defacers. If there is one connection between defacer and team, that means the defacer refer the team name in deface pages. We can see that many connections from defacer point to fallaga team, so fallaga team has a lot of members..
#25: This graph gives us the overview of charlie Hebdo attack. We can see that there are nine campaigns. Each campaign has participants. The participants are either teams or defacers. Some teams are big, like fallaga. And, we can see that most defacers joined at least one team, and very few defacers worked independently.
#26: This slide shows the longest lasting campaigns. The table gives you the idea of how many attacks conducted by campaigns in the years between 1998 and 2016. Long lasting campaigns are the campaigns spanning over years. For example, campaigns r00t lasted 13 years, and campaign redhack span over 10 years and caused many attacks.
#27: Comparing to long term campaign, we also conclude some campaigns causing most attacks. We call such campaign “aggressive campaigns”. We find a lot of aggressive campaigns, like savegaza here. Such aggressive campaigns are geopolitical campaigns. Take savegaza as one example. It reacted to war events in Palestine.
#28: Let’s look at another example. This example can show how our system help analyst investigate campaigns. Timeline analysis is a very useful method for analyst. So our system provide timeline feature of campaign. We can see that the campaign in this slide spans over 4 years, and includes over 60 clusters. The clusters are grouped by targeted TLDs. We can have many good insight of the campaign, like how many TLDs are targeted, how long each cluster last.
#29: Comparing to previous example of long running campaign, this slides shows another 5 campaigns. Those campaigns last quite short, less than one month. And each campaign just attacked one TLD.
#30: This slide shows one example of large-scale joint campaigns. The joint campaigns share common motives and objectives The topic of this joint campaign is Israeli-Palestinian Conflict It Involve 12 campaigns It Target Israel websites
#31: FEDE: too much text, but it’s probably allowed in the conclusions – maybe use some build-out?
#32: FEDE: add FTR/Trend Micro branding FEDE: I know I’m asking to expand the number of slides dramatically and you might find yourself going overtime. It’s OK to just remove the least interesting part and focus on what really matters/excites: the details are on the paper, your goal is to make sure that at least one message is received and people are happy ;-)

Investigating Web Defacement Campaigns at Large

More Related Content

Similar to Investigating Web Defacement Campaigns at Large (20)

More from Trend Micro (20)

Recently uploaded (20)

Investigating Web Defacement Campaigns at Large

Editor's Notes