Online statistics spiders, analyze website log spider statistics- Youcaiyouaiwang
how it was done
In fact, my chatGPT wrote it for me, it is very powerful!
So I will not explain the code, just upload the code directly, if you have any questions, please ask AI to help analyze, if you need to use AI, you can find the AI robot in the upper right corner of this website and click to use
code rendering
This code is used to create a new independent page in typecho, you can choose it yourself
<?php
/**
* 蜘蛛统计
*
* @package custom
*/
if (!defined('__TYPECHO_ROOT_DIR__')) exit;
$this->need('header.php');
?>
<title>在线网站日志分析-在线蜘蛛统计</title>
<meta name="description" content="这是一个方便统计网站蜘蛛的工具页面。上传日志文件后,可以快速显示各大搜索引擎蜘蛛的数量统计结果。">
<div style="background-color: #ffffff; padding: 20px;">
<h1>蜘蛛统计分析工具</h1>
<form method="post" enctype="multipart/form-data">
<input type="file" name="logfile" />
<button type="submit">上传日志文件</button>
</form>
<?php
if (isset($_FILES['logfile'])) {
$file = $_FILES['logfile'];
if ($file['error'] === 0) {
$pathinfo = pathinfo($file['name']);
if (strtolower($pathinfo['extension']) === 'log') {
$handle = fopen($file['tmp_name'], "r");
$count = ['Baiduspider' => 0, 'Googlebot' => 0, 'bingbot' => 0, 'Sogou' => 0];
while (($line = fgets($handle)) !== false) {
foreach ($count as $key => &$value) {
if (stripos($line, $key) !== false) {
$value++;
}
}
}
fclose($handle);
arsort($count);
$data = [];
foreach ($count as $key => $value) {
$data[] = [$key, $value];
}
?>
<div id="chart_div" style="width: 900px; height: 500px;"></div>
<script type="text/javascript" src="https://www.gstatic.com/charts/loader.js"></script>
<script type="text/javascript">
google.charts.load('current', {'packages':['corechart']});
google.charts.setOnLoadCallback(drawChart);
function drawChart() {
var data = google.visualization.arrayToDataTable([
['Browser', 'Visits'],
<?php foreach ($data as $item) { ?>
['<?php echo $item[0]; ?>', <?php echo $item[1]; ?>],
<?php } ?>
]);
var options = {
title: 'Spider Statistic',
pieHole: 0.4,
colors: ['#3366CC', '#DC3912', '#FF9900', '#109618']
};
var chart = new google.visualization.PieChart(document.getElementById('chart_div'));
chart.draw(data, options);
}
</script>
<?php
} else {
echo "<p>请上传.log格式的日志文件。</p>";
}
} else {
echo "<p>上传日志文件出错,请重新尝试。</p>";
}
}
?>
</div>
<?php $this->need('footer.php'); ?>
Precautions
The maximum limit of file upload will be limited according to the configuration of nginx or PHP. It is recommended not to change it too much, although this code has implemented the method of reading line by line for processing. This avoids reading the entire file into memory at once, reducing memory usage
However, the logs of our general website are very large, and it is very troublesome to manually copy them out of the file, so I wrote a shell script to split the log, store today’s log in a separate file, and if it is executed multiple times, it will detect whether the file is If it exists, just rewrite it in it.
#!/bin/bash
log_path="/www/wwwlogs/www.xxxx.com.log"
log_dir="/www/wwwlogs"
current_date=$(date +%Y-%m-%d)
# 判断日志文件是否存在
if [ ! -f "$log_path" ]; then
echo "日志文件不存在"
exit 1
fi
# 判断备份目录是否存在,不存在则新建
if [ ! -d "$log_dir" ]; then
mkdir -p $log_dir
fi
# 判断今天的日志文件是否已经存在,存在则将日志内容追加到已存在的文件中,不存在则新建文件
if [ -f "$log_dir/$current_date.log" ]; then
grep $(date +"%d/%b/%Y") $log_path >> "$log_dir/$current_date.log"
else
grep $(date +"%d/%b/%Y") $log_path > "$log_dir/$current_date.log"
fi
Remember to modify the location, and then use the scheduled task of the pagoda to execute it several times a day or once every few hours
update directions
This is just a simple realization, and it can also beautify the interface, including the statistical method that only realizes the cumulative number of spiders crawled, without counting the pages crawled and how many individual spiders there are, etc. I will update it later if necessary.