nginx log analysis statistics spider

Online statistics spiders, analyze website log  spider statistics- Youcaiyouaiwang

how it was done

In fact, my chatGPT wrote it for me, it is very powerful!

So I will not explain the code, just upload the code directly, if you have any questions, please ask AI to help analyze, if you need to use AI, you can find the AI ​​robot in the upper right corner of this website and click to use

code rendering

This code is used to create a new independent page in typecho, you can choose it yourself

<?php
/**
 * 蜘蛛统计
 *
 * @package custom
 */
if (!defined('__TYPECHO_ROOT_DIR__')) exit;
$this->need('header.php');

?>
<title>在线网站日志分析-在线蜘蛛统计</title>
<meta name="description" content="这是一个方便统计网站蜘蛛的工具页面。上传日志文件后,可以快速显示各大搜索引擎蜘蛛的数量统计结果。">
<div style="background-color: #ffffff; padding: 20px;">
    <h1>蜘蛛统计分析工具</h1>
    <form method="post" enctype="multipart/form-data">
        <input type="file" name="logfile" />
        <button type="submit">上传日志文件</button>
    </form>

    <?php
    if (isset($_FILES['logfile'])) {
        $file = $_FILES['logfile'];
        if ($file['error'] === 0) {
            $pathinfo = pathinfo($file['name']);
            if (strtolower($pathinfo['extension']) === 'log') {
                $handle = fopen($file['tmp_name'], "r");
                $count = ['Baiduspider' => 0, 'Googlebot' => 0, 'bingbot' => 0, 'Sogou' => 0];
                while (($line = fgets($handle)) !== false) {
                    foreach ($count as $key => &$value) {
                        if (stripos($line, $key) !== false) {
                            $value++;
                        }
                    }
                }
                fclose($handle);

                arsort($count);
                $data = [];
                foreach ($count as $key => $value) {
                    $data[] = [$key, $value];
                }
                ?>

                <div id="chart_div" style="width: 900px; height: 500px;"></div>

                <script type="text/javascript" src="https://www.gstatic.com/charts/loader.js"></script>
                <script type="text/javascript">
                    google.charts.load('current', {'packages':['corechart']});
                    google.charts.setOnLoadCallback(drawChart);

                    function drawChart() {
                        var data = google.visualization.arrayToDataTable([
                            ['Browser', 'Visits'],
                            <?php foreach ($data as $item) { ?>
                                ['<?php echo $item[0]; ?>', <?php echo $item[1]; ?>],
                            <?php } ?>
                        ]);

                        var options = {
                            title: 'Spider Statistic',
                            pieHole: 0.4,
                            colors: ['#3366CC', '#DC3912', '#FF9900', '#109618']
                        };

                        var chart = new google.visualization.PieChart(document.getElementById('chart_div'));
                        chart.draw(data, options);
                    }
                </script>

                <?php
            } else {
                echo "<p>请上传.log格式的日志文件。</p>";
            }
        } else {
            echo "<p>上传日志文件出错,请重新尝试。</p>";
        }
    }
    ?>

</div>

<?php $this->need('footer.php'); ?>

Precautions

The maximum limit of file upload will be limited according to the configuration of nginx or PHP. It is recommended not to change it too much, although this code has implemented the method of reading line by line for processing. This avoids reading the entire file into memory at once, reducing memory usage

However, the logs of our general website are very large, and it is very troublesome to manually copy them out of the file, so I wrote a shell script to split the log, store today’s log in a separate file, and if it is executed multiple times, it will detect whether the file is If it exists, just rewrite it in it.

#!/bin/bash
log_path="/www/wwwlogs/www.xxxx.com.log"
log_dir="/www/wwwlogs"
current_date=$(date +%Y-%m-%d)

# 判断日志文件是否存在
if [ ! -f "$log_path" ]; then
    echo "日志文件不存在"
    exit 1
fi

# 判断备份目录是否存在,不存在则新建
if [ ! -d "$log_dir" ]; then
    mkdir -p $log_dir
fi

# 判断今天的日志文件是否已经存在,存在则将日志内容追加到已存在的文件中,不存在则新建文件
if [ -f "$log_dir/$current_date.log" ]; then
    grep $(date +"%d/%b/%Y") $log_path >> "$log_dir/$current_date.log"
else
    grep $(date +"%d/%b/%Y") $log_path > "$log_dir/$current_date.log"
fi

Remember to modify the location, and then use the scheduled task of the pagoda to execute it several times a day or once every few hours

update directions

This is just a simple realization, and it can also beautify the interface, including the statistical method that only realizes the cumulative number of spiders crawled, without counting the pages crawled and how many individual spiders there are, etc. I will update it later if necessary.

Guess you like

Origin blog.csdn.net/qq_22163803/article/details/130707295
Recommended