Skip to content

Latest commit

 

History

History
33 lines (32 loc) · 890 Bytes

File metadata and controls

33 lines (32 loc) · 890 Bytes

Data-trace-competition

Scripts in the file of analyse-tools

  1. dealShortUrl.py

reverse the most common short urls to the original urls

  1. shortUrl.py

reverse the rest short urls to the original urls

  1. getDomain.py

web scraping to get domain

  1. getDNSInfo.py

web scraping to get dns info

  1. getWhoisInfo.py

web scraping to get whois info

  1. simHash.py

compute the simHash of urls

  1. textDistance.py

calculate the text distances of urls based on the simHash

  1. shang.py

calculate the entropy of texts

  1. countKey.py

count the numbers of the keys of urls

Scripts in the root directory

---for trace1---

Use xgboost to train and predict the data

  1. trace1.py

---for trace2---

Use data mining to cluster the data

  1. ultis.py
  2. trace2-1-step.py
  3. trace2-2-step.py
  4. trace2-3-step.py
  5. trace2-4-step.py
  6. trace2-5-step.py