Student: Julia Y. Cheng
Primary mentor: Chris Horsley
Backup mentor: Natalia Stakhanova, Franck Guenichot, Thanh Nguyen, Claudio Guarnieri
Google Melange: http://www.google-melange.com/gsoc/project/google/gsoc2012/juliaycheng/16001
Project Overview:
A large amount of honeypot logs result in difficulties in data analysis and interpretation. In order to alleviate expert's workload and complexity of data analytic, this GSoC idea is to automatically build attack community graph for eliciting attack approaches and intention description.
The GSoC idea will be divided into three stages. The first constructs attack graph by extracting relationship among criminals, victims and malicious servers from honeypot logs. For this project, I will use dionaea logs, glaspot logs and kippo logs as the first-level raw data. From first-level data to conduct second-level analysis data, cuckoo sandbox developed by Claudio Guarnieri, PHP sandbox by Lukas Rist, Thug by Angelo Dellaera, Hale by Patrik Lantz and fast-flux detection are applied for advanced data collection and analysis. After completing data collection and processing, I will extract relationship from those data to build attack graph.
The second stage is to apply centrality mechanism to group graph into individual attack approach compartments. With evaluate the relative centrality of different attack approach compartments, attack community graph construction will be presented by connecting high-density attack approach compartments. Definitely, I will map attack approach compartments with its attack behavior intentions. The second deliverable is a python package to express the attack community graph and its intentions.
The Honeynet Project uses hpfeeds by Mark Schloesseras a generic authenticated datafeed protocol to collect honeypot data around the world. Ben Reardon used Splunk to do data analysis and visualization. The third is to develop a APP to present attack community graph and integrate into Splunk platform as the final deliverable.
Project Plan:
April 23rd - May20th: Community Bonding Period
- Prepare develop and testing environment
- Learn how to develop Splunk App
- Study social network graph drawing tools and library
- Reading social network centrality algorithm and paper
May 21st : GSoC 2012 coding officially starts
May 21st - May 28th:
- Decide which honeypot logs I want to use.
- Modify and integrate curent hpfeeds client to get the instance, called as first-level data set, and stored into testing site
- Format, indexing and create Splunk searches to extract relationship for graph construction.
May 29th - June 15th:
- Code advanced first-level data processing to get second-level data
- Develop python code module for attack social network construction.
- Integrate into Splunk App and show the first draft graph, which only show the relationship between nodes and edges.
June 19th - July 7th:
- Code graph centrality algorithm to build attack approaches compartments
- Connecting high-density attack compartments to figure out attack community
- Discuss whether we should do graph reduction based on centrality algorithm to decrease the visualization complexity
- Integrate into Splunk App and show the second version graph, which can show the attack compartments and community
- Evaluate current project results and scopes. Adjust project scopes and deliverables.
July 9th - July 13th: Mid Term Assessments
- Prepare and deliver mid-term evaluation
July 14th - July 20th:
- Review current project results, methods
- Design display UI and options
- Improve codes and make it better
July 21st - August 10th:
- Code Splunk display UI and integrate into Splunk APPs
- Mapping attack approach compartments with attack behavior intention lists.
August 13th: Suggested "pencils down" date, coding close to done
- bug fixing, code commit as well as comment and documentation writing
August 20th: Firm "pencils down" date, coding must be done
August 24th - August 27th: Final Assessments
August 31st - Public code uploaded and available to Google
Project Deliverables:
A Splunk APPs to display attack social network graph and its attack intention by honeypot logs.
Project Source Code Repository:
not yet ready
Student Weekly Blog: https://www.honeynet.or/blog/327
Project Useful Links:
Papers:
[1] White, D. R., & Borgatti, S. P. (1994 October) Betweenness centrality measures for directed graphs. 16 (4), 335-346.
[2] Zemljič, B., & Hlebec, V. (2005 January) Reliability of measures of centrality and prominence. 27 (1), 73-88.
[3] Kang, S. M., (2007 January) A note on measures of similarity based on centrality. 29 (1), 137-142.
Project Updates:
STUDENT provide any major updates not included in weekly project blog here