Construction of enterprise threat intelligence network monitoring platform dark

Creative Commons License Copyright: Attribution, allow others to create paper-based, and must distribute paper (based on the original license agreement with the same license Creative Commons )

I. Introduction

I believe we no stranger to the dark network this concept, we all know, the dark hidden network version of a dark market, illegal tools, the sex trade, drug trafficking, firearms information everywhere, just like a gathering of cybercriminals "tigers of the hole . " We use the Tor browser, etc. can be easily accessed dark network of shallow network, mainly pornography and data intelligence information, such as the Silk Road.

For businesses, often can not help being hacked is get a lot of data, and these data will generally prefer to sell in the dark network, such as 12306 in recent years, major Internet companies such as data breaches. In order to respond promptly burst of data breaches, enterprises need a real-time monitoring of network data leakage dark threat intelligence platform, used to monitor sensitive data leakage, pull out wool and business security risks and other events.

Second, set up a proxy server

Due to the domestic network environment, in order to successfully access the Dark Web, we need a foreign server, the system version is ubuntu 18.04 (of course, other systems can, but this article will put this version of the system as an example), but need this server install Tor and Privoxy as a proxy server to access.

System version of this article:

root@536ef99cab94:/# cat /etc/issue.net
Ubuntu 18.04.2 LTS

2.1 overall architecture
Here Insert Picture Description
can be seen from the figure, Privoxy as a transit agency, mainly to turn socks5 protocol http protocol, and is responsible for the socks5 turn Tor Tor protocol. So the whole process of proxy access to:

1.用户输入后缀为onion的地址,由Privoxy暴露的8118端口访问http协议; 

2.Privoxy把http协议转发给Tor,Tor获取该网站公钥进行加密,通过Tor通信链路发送信息给Tor节点,由该节点转发请求到.onion网站。

2.2 Installation Tor

Many people may start directly execute this command: sudo apt-get install tor, this command is v2 installed version of Tor does not support the newer encryption algorithms, resulting in less access to some of the latest encryption algorithm dark web URL.

Tor v2 to change mainly Tor v3 is the following:

1.签名算法从SHA1/DH/RSA1024升级到SHA3/ed25519/curve25519;

2.改进的Tor directory protocol,安全性更高;

3.更好的洋葱地址,换成sha3,可以提高枚举生成一样地址的难度;

4.可拓展的交换协议。

Installation refer to the official network, install the latest version (v3 version) of Tor as follows:

1. Add the following source /etc/apt/sources.list:

deb https://deb.torproject.org/torproject.org bionic main
deb-src https://deb.torproject.org/torproject.org bionic main

2. Add the gpg key, execute the following command:

curl https://deb.torproject.org/torproject.org/A3C4F0F979CAA22CDBA8F512EE8CBC9E886DDD89.asc | gpg --import
gpg --export A3C4F0F979CAA22CDBA8F512EE8CBC9E886DDD89 | apt-key add -

3. Install Tor:

apt update
apt install tor deb.torproject.org-keyring

4. Review the installed version of Tor, this article shows the installation of Tor version 0.3.5.8:

root@536ef99cab94:/# tor -v
Jun 18 14:30:43.530 [notice] Tor 0.3.5.8 running on Linux with Libevent 2.1.8-stable, OpenSSL 1.1.1, Zlib 1.2.11, Liblzma 5.2.2, and Libzstd 1.3.3.
Jun 18 14:30:43.531 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning
Jun 18 14:30:43.531 [warn] Command-line option '-v' with no value. Failing.
Jun 18 14:30:43.531 [err] Reading config failed--see warnings above.

2.3 Configuring Tor

Tor configuration file located in / etc / tor / torrc and /etc/tor/torsocks.conf:

/etc/tor/torsocks.conf defines a protocol to make socks turn Tor protocol port and address;

/ Etc / tor / torrc is Tor user configuration, in this file we modified http proxy (polipo, privoxy), instant messaging (pidgin, lrssi), TorDNS and so on.

To support Torv3 version, you need to / etc / tor / torrc file add the following:

HiddenServiceDir /var/lib/tor/other_hidden_service/
HiddenServicePort 80 127.0.0.1:80
HiddenServiceVersion 3

After editing, the command line or enter service tor start to start tor Tor.

2.4 Installation and Configuration Privoxy

There being no version requirements, it is possible to perform apt-get install privoxy directly.

After installation, in order to allow Privoxy forwards the http protocol to Tor, you need to edit / etc / privoxy / config plus:

forward-socks5 / 127.0.0.1:9050 .
listen-address 0.0.0.0:8118

After the changes, restart the service service privoxy restart.

2.5 Verify agent is available

Although you can use Tor Browser meek-azure to access the Dark Web sites, but access is slow. To validate our setting up a proxy server is available, we can modify the network settings Tor browser (assuming we as a proxy server ip: 11.11.11.11):
Here Insert Picture Description
After setting a Dark Web site access, if that is successful can access instructions our proxy server is available, but access speed is much faster than the meek-azure.

Of course, we can more quickly enter the following command to test:

➜  ~ curl -x 11.11.11.11:8118 https://httpbin.org/ip
{
  "origin": "178.175.132.225, 178.175.132.225"
}

You can check the location of 178.175.132.225 for foreign Moldova, as Tor exit node address.

Third, the development of real-time monitoring program

In the last chapter we are setting up the proxy server, equivalent to have access to key network of dark, although you can use the Tor Browser + agency network faster access to dark web site, but artificial and can not do 24-hour monitoring and timely detection of data leak, so we need to develop a real-time monitoring crawler dark web site.

Features 3.1 Dark Web site

Dark Web site is different from the surface web site, there is not much fancy js dynamic strategy and a strong anti-climb, so the Dark Web site crawler is relatively simple. After dark summarizes several common web site and found the site of anti-climb dark network strategy is generally the following situations:

1.Referer;

2.针对Cookie的请求频率限制;

3.User-Agent;

4.验证码;

5.对网站代码进行更新,修改html标签名字或位置。

3.2 Dark Web site Fanfan reptiles

For anti-reptile strategy 3.1, we can find ways to circumvent, since the thrust of this article is not anti-reptile in exploring strategies, it is simply put under the Workaround:

1.指定请求头的referer为访问暗网网站的域名;

2.建立多账号Cookie池,同时使用Redis对url去重实现增量爬取减少请求量;

3.指定User-Agent为FireFox浏览器:{‘User-Agent’: ‘Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0’};

4.暗网网站的验证码一般比较简单,可以简单使用ocr技术识别,如tesseract;

5.需要及时更新爬虫代码,有针对地修改反反爬虫代码。

3.3 darknet monitoring reptiles architecture

Scrapy is a Python implementation of the website for crawling data, extract structured data written application framework. Scrapy often used in mining, including data, the information processing program or a series of historical data is stored, and the like. Usually we can easily achieve a reptile by Scrapy framework, crawls the Web site.

Section 3.2 of this paper on the basis of anti-reptile in Scrapy bypass method implements a real-time monitoring program, its structure is as follows:
Here Insert Picture Description
realization 3.4 Monitoring Program

Monitoring program of this paper is to simultaneously monitor several common Dark Web site, because of space limitations, it took only a Dark Web site as an example.

Scrapy proxy settings:

class DarkwebSpiderDownloaderMiddleware(object):

    def process_request(self, request, spider):
        request.meta['proxy'] = 'http://11.11.11.11:8118'

Multi-core code account login:

for accounts in Accounts:
    count = 0
    try:
        logging.info('Account %s is logining ......' % accounts)
        cookie, sid = get_CookieSid(accounts, 'Testtest')
        __i__ = {'cookie': cookie, 'sid': sid}
        # print(__i__)
        logging.info('Account %s finish login !' % accounts)
        __value__.append(__i__)
    except Exception as e:
        logging.error('[*] 超时,忽略一个账户!')
        count += 1
        if(count > 7):
            logging.error('[*] 用户登录个数过少~')
            return
        continue

To recombinant member, primarily used to reduce the amount of requests:

class DuplicateRequestMiddleware(object):

    #初始化redis
    def __init__(self):
        self.RedisQuery = RedisOpera('query')

    #根据redis去重url
    def process_request(self, request, spider):
        spider.logger.info('duplicating >>>>>> %s' % request.url)
        u = request.url
        import hashlib
        if 'vpic' in request.url:
            b = u.index('=') + 1
            MD5 = hashlib.md5()
            MD5.update(bytes(str(b), 'utf-8'))
            if self.RedisQuery.query(MD5.hexdigest()):
                spider.logger.info('duplicate >>>>>> %s' % request.url)
                raise IgnoreRequest("IgnoreRequest : %s" % request.url)
        else:
            spider.logger.info('ignore duplicate >>>>>> %s' % request.url)
            return None

Referer specified request header and User-Agent:

LOGIN_HEADERS = {
    'Host': '%s' %load()['domain'].split(',')[index],
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Encoding': 'gzip, deflate',
    'Referer': 'http://%s/index.php' %load()['domain'].split(',')[index],
    'Content-Type': 'application/x-www-form-urlencoded',
    'Connection': 'keep-alive',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0'
}

Parsing web pages, mainly taken out and stored in the database:

def parse(self, response):
    item = DarkwebSpiderItem()
    soup = BeautifulSoup(response.text, 'lxml')
    # 交易价格
    table = soup.find_all(name = 'table', attrs = {'class':'v_table_1'})
    tr_list = table[0].find_all('tr')
    td_list_price = tr_list[4].find_all('td')
    price_list = re.findall(r'\d+\.?\d*', td_list_price[3].get_text())
    price = price_list[0]
    # 成交数量
    td_list_volume = tr_list[6].find_all('td')
    volume_list = re.findall(r'\d+', td_list_volume[3].get_text())
    volume = volume_list[0]
    # 帖子内容
    content_list = soup.find_all(name='div', attrs={'class': 'content'})
    if len(content_list):
        content = content_list[0].get_text()
    else:
        content = ''
    # 图片url
    img_list = soup.find_all(name = 'img', attrs = {'class':'postimage'})
    item['image_urls'] = []
    if len(img_list) > 0 :
        url_list = []
        for img_url in img_list:
            if img_url['src'].find(self.image_domain) > 0:
                download_url = img_url['src']
            else:
                download_url = self.target_url + img_url['src'].replace('./', '')
            url_list.append(download_url)
        item['image_urls'] = url_list
    # 发布时间
    p_author = soup.find_all('p', attrs={'class': 'author'})
    origin_publish_time = p_author[0].contents[4].strip()
    if len(origin_publish_time):
        # origin_publish_time = pt_str
        ptr_list = [x for x in filter(str.isdigit, str(origin_publish_time))]
        pt_str = "".join(ptr_list)
        y = pt_str[:4]
        d = pt_str[-6:-4]
        h = pt_str[-4:-2]
        min = pt_str[-2:]
        if len(pt_str) == 12:
            m = pt_str[4:6]
        else:
            m = '0' + pt_str[4]
        publish_time = y + '-' + m + '-' + d + " " + h + ":" + min + ":00"
    else:
        publish_time = ''
    dt = datetime.datetime.strptime(publish_time, "%Y-%m-%d %H:%M:%S")
    publish_time = dt.astimezone(pytz.timezone('UTC')).strftime('%Y-%m-%d %H:%M:%S')

    b = response.meta['content_url'].index('=') + 1
    item['title_id'] = (response.meta['content_url'])[b:]
    item['title'] = response.meta['title']
    item['url'] = response.meta['content_url']
    item['content'] = content
    item['price'] = price
    item['volume'] = volume
    item['visits'] = response.meta['visits']
    item['publish_time'] = publish_time
    yield item

When you find new keywords monitor recording, notification by e-mail:

def send_html_email(title, html, content, mailto, cc=None):
html_content = render_to_string(html, content)
send_mail = settings.EMAIL_TO.split(',') if mailto == '' else settings.EMAIL_TO

msg = EmailMultiAlternatives(title, html_content,
                             settings.EMAIL_LUCKY_NAME +    '<' + settings.EMAIL_HOST_USER + '>', send_mail, cc=cc)
msg.attach_alternative(html_content, "text/html")
msg.send()

Fourth, data visualization reptiles

Grafana is a cross-platform, open-source metrics analysis and visualization monitoring tool, timely notification query and visualization shows the data collected, and, after the 4.3 version has support for MySQL data sources, so readers can use Grafana to visualize configure e-mail alerts, specific methods of operation are also your own review.

But the dark network real-time monitoring of this program access to the security management platform, so the use of self-development management page, you can more easily search and view the Dark Web crawler data, roughly effect is as follows:
Here Insert Picture Description
V. Summary

Dark Web monitoring for most people is a mystery of presence, the paper takes the reader step by step to uncover the mysterious veil of this layer, starting from the proxy server set up, after the common interpretation of the anti-climbing strategy, explains how to develop a dark from zero web site monitoring program, introduced last Grafana visual monitoring tool that can be combined with the monitoring program.

Guess you like

Origin blog.csdn.net/kclax/article/details/93730743