根据URL爬取内容

数据准备

ACTION_ID|ACTIOB_OBJ_ID|URL|HOST
11103|Kugou-3f04b986936e95b0e4020e05026f9a74|http://trackercdngz.kugou.com/i/v2/?album_audio_id=105339901&behavior=play&module=&cmd=26&token=44d53db5973acde2ce5aacdbb1788236ff5a278849d3afddc7cbaa5517badb3a&album_id=8227853&hash=3f04b986936e95b0e4020e05026f9a74&userid=901736044&pid=2&vipType=0&version=8969&area_code=1&appid=1005&mid=211945784553424907856312531370349722437&key=128a208d0b671b5319a1ff1e78a90ef4&pidversion=3001&with_res_tag=1|trackercdngz.kugou.com
11103|Kugou-5d640ccca02c0c37c2ef3426ef8fa9db|http://trackercdngz.kugou.com/i/v2/?album_audio_id=32135357&behavior=play&module=&cmd=26&token=b0f00fba9dee5a98dfe88d8df95f2919294123e37b9187eb4a8a67f52550f10b&album_id=970745&hash=5d640ccca02c0c37c2ef3426ef8fa9db&userid=1228335680&pid=2&vipType=0&version=8969&area_code=1&appid=1005&mid=266959710960415472701953415488837088398&key=5611023d4aa067471aa6cf483c871cf7&pidversion=3001&with_res_tag=1|trackercdngz.kugou.com
11103|Kugou-fb4bb141b3742d7f2546cca5ef9b3297|http://trackercdngz.kugou.com/i/v2/?album_id=0&version=8969&pid=2&album_audio_id=54137427&key=0ac758b6944ddddf53b8dfe2345902e1&pidversion=3001&appid=1005&behavior=play&area_code=1&cmd=25&hash=fb4bb141b3742d7f2546cca5ef9b3297&with_res_tag=1|trackercdngz.kugou.com
11103|Kugou-7d7f2794d22b2bf686e537895f7b04ac|http://trackercdngz.kugou.com/i/v2/?album_id=4060289&version=8969&pid=2&album_audio_id=88669276&key=dbb76cb19d8bd553a1d23e7590f97313&pidversion=3001&appid=1005&behavior=play&area_code=1&cmd=25&hash=7d7f2794d22b2bf686e537895f7b04ac&with_res_tag=1|trackercdngz.kugou.com
11103|Kugou-5dccdfce96ca976e66cfdef7e36bda5b|http://trackercdngz.kugou.com/i/v2/?cmd=26&pid=3&authType=1&hash=5dccdfce96ca976e66cfdef7e36bda5b&album_audio_id=64517582&key=bb6de0fdd1bc0bf7f4104a38bce6cf24&behavior=play&module=collection&appid=1000&mid=8b207915bb23fd8b340bfff2f9a61411a749cbec&userid=593182243&token=c310629842abab0d137d607caaf04b24c3307ab199884aee8ea9b750d982ecb5&version=8970&vipType=0&area_code=1&pidversion=3001&album_id=1985016|trackercdngz.kugou.com
11103|Kugou-38ca88b56d2963f88efe110424bacec0|http://trackercdngz.kugou.com/i/v2/?album_audio_id=32025730&behavior=play&module=&cmd=26&token=e9d5586b4815815aedea02e2c22c2a58abd8c4450c395f321e94d03bf0d9210e&album_id=958461&hash=38ca88b56d2963f88efe110424bacec0&userid=670771202&pid=2&vipType=0&version=8969&area_code=1&appid=1005&mid=335799197578994737406660220062504299060&key=d25130cddd43aa6d048ad571155e73b5&pidversion=3001&with_res_tag=1|trackercdngz.kugou.com
11103|Kugou-71a3828001d931b239c191144346b472|http://trackercdngz.kugou.com/i/v2/?pid=2&mid=12922136514196526362614468482428312988&cmd=26&token=23ddbe86a1332d9d41264fd1999d24f1ab9e4e9a3fde6d580e8fc0887d5b556c&hash=71a3828001d931b239c191144346b472&area_code=1&behavior=play&appid=1005&module=&vipType=0&userid=844719854&album_id=8537619&pidversion=3001&key=f6b49e3c2034a44cfeeb3a714acec88f&version=8969&album_audio_id=107476985&with_res_tag=1|trackercdngz.kugou.com
11103|Kugou-5893e073d72ee7cf8606e20a1affbe58|http://trackercdngz.kugou.com/i/v2/?pid=2&cmd=25&key=a8e62109da25eddb938966717186aaf0&hash=5893e073d72ee7cf8606e20a1affbe58&area_code=1&version=8851&behavior=play&appid=1005&with_res_tag=1|trackercdngz.kugou.com
11103|Kugou-3e42bc172f305fcf7381f462eb8a4f00|http://trackercdngz.kugou.com/i/v2/?album_audio_id=62056269&behavior=play&module=&cmd=26&token=ae1e3bbb114b624692de8cfe929ccc6e46e339ea27faacb5beab7a2dfda0aa3f&album_id=2398840&hash=3e42bc172f305fcf7381f462eb8a4f00&userid=864502613&pid=2&vipType=0&version=8988&area_code=1&appid=1005&mid=75026101713079366117306031163391060620&key=eaaa6f937a74ebcac8045668309908d2&pidversion=3001&with_res_tag=1|trackercdngz.kugou.com
11103|Kugou-c5fdf4564d33fc695175acab5757e353|http://trackercdngz.kugou.com/i/v2/?pid=2&mid=121982741990180563296354506344232692018&cmd=26&token=e485d31322c89918398ddb4fc0eb98b7e40dfbddf1275ad82e28045b21fdac2e&hash=c5fdf4564d33fc695175acab5757e353&area_code=1&behavior=play&appid=1005&module=collection&vipType=0&userid=972463139&album_id=964538&key=9fbb2869771b03950266b7fe14bf9fa5&version=8851&with_res_tag=1|trackercdngz.kugou.com
11103|Kugou-e8badf9f9ef8f91fabedbb4449123dea|http://trackercdngz.kugou.com/i/v2/?cmd=25&pid=3&authType=1&hash=e8badf9f9ef8f91fabedbb4449123dea&album_audio_id=107578621&key=9856858e50164aa15b9ad0b3c6ef54f1&behavior=play&appid=1000&version=8970&area_code=1&pidversion=3001|trackercdngz.kugou.com
11103|Kugou-718567d263c17bb3945b596cdd887c27|http://trackercdngz.kugou.com/i/v2/?cmd=25&pid=3&authType=1&hash=718567d263c17bb3945b596cdd887c27&key=1106ab1943968e4fdb64262eb8a88dcb&behavior=play&appid=1012&version=8909&area_code=1&album_id=8308163|trackercdngz.kugou.com
11103|Kugou-c3a3c8d769744b6ea96a37cfcc02df4b|http://trackercdngz.kugou.com/i/v2/?album_audio_id=107567896&behavior=play&module=&cmd=26&token=6ceafcd200afc856a79491592c7095bec23fd77efb8cd852ec1835cd0f9d60f5&album_id=8554955&hash=c3a3c8d769744b6ea96a37cfcc02df4b&userid=1297179223&pid=2&vipType=0&version=8969&area_code=1&appid=1005&mid=24971358721241061144398421486362283051&key=3bb472b625b6fc3b8466aa6dc4247a6f&pidversion=3001&with_res_tag=1|trackercdngz.kugou.com
11103|Kugou-784440d86b7cfbf4d35ad73c6fd112f0|http://trackercdngz.kugou.com/i/v2/?album_audio_id=32105083&behavior=play&module=&cmd=26&token=d0734cb92c99591e28249ba27781bd5880a715d49f339614b69b64e5b434a7b4&album_id=967328&hash=784440d86b7cfbf4d35ad73c6fd112f0&userid=121146100&pid=2&vipType=0&version=8969&area_code=1&appid=1005&mid=9465339417365908340129665871990927573&key=aa99e845dc5033d243900d2dc98d2bbe&pidversion=3001&with_res_tag=1|trackercdngz.kugou.com
11103|Kugou-303d5dcf6a7542bf4686effe496706bb|http://trackercdngz.kugou.com/i/v2/?album_id=3156195&pid=2&pidversion=3001&cmd=25&key=5f8dad07987595715f8f4ed9cb2c0994&hash=303d5dcf6a7542bf4686effe496706bb&area_code=1&version=8969&behavior=play&appid=1005&album_audio_id=71172896&with_res_tag=1|trackercdngz.kugou.com
11103|Kugou-cdb465f34e79f4f8d6422581f8d213fa|http://trackercdngz.kugou.com/i/v2/?version=8969&pid=2&album_audio_id=41396673&key=803eee38677e89eeab70b52a9017c9a9&pidversion=3001&appid=1005&behavior=play&area_code=1&cmd=25&hash=cdb465f34e79f4f8d6422581f8d213fa&with_res_tag=1|trackercdngz.kugou.com
11103|Kugou-0a4d3971c000d9862c171e3d9403dbc5|http://trackercdngz.kugou.com/i/v2/?cmd=26&pid=3&authType=1&hash=0a4d3971c000d9862c171e3d9403dbc5&key=c69f44f6dcdce308b935bd78fd81dce8&behavior=play&module=collection&appid=1000&mid=81d9d027c2055e6e151ae0e133a7baad7af6ba68&userid=1160188413&token=a93a3f8e115f57bc191061e1c422a0d11f104606e656c696f445755a73aa4a9e&version=8955&vipType=0&area_code=1&pidversion=3001&album_id=1740396|trackercdngz.kugou.com
11103|Kugou-a3f421512e5ce3c96e3c2503461d5ad6|http://trackercdngz.kugou.com/i/v2/?album_id=684257&pid=2&pidversion=3001&cmd=25&key=bd8c51d96c1d269e4138025fb7170eb9&hash=a3f421512e5ce3c96e3c2503461d5ad6&area_code=1&version=8969&behavior=play&appid=1005&album_audio_id=29507609&with_res_tag=1|trackercdngz.kugou.com

根据url和id反向爬取酷狗网站获取有用的信息

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.da</groupId>
    <artifactId>kugou</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <dependencies>
        <!-- httpclient -->
        <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
            <version>4.5.3</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/com.alibaba/fastjson -->
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.31</version>
        </dependency>
    </dependencies>
</project>

主程序

package com.da.main;

import java.util.ArrayList;
import java.util.List;
import java.util.Set;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.da.address.Adress;
import com.da.utils.HttpClientUtil;

public class MainSpider {
    public static void main(String[] args) {
        if (args.length < 2) {
            // String filePath = "C:/Users/Administrator/Desktop/music.txt";
            // String writePath = "C:/Users/Administrator/Desktop/music2.txt";
            // args = new String[] { filePath, writePath };
            return;
        }
        List<String> datalist = new ArrayList<>();
        // 获取文本的url
        Set<String> urls = Adress.getUrls(args[0]);
        for (String url : urls) {
            String html = HttpClientUtil.doGet(url);
            // System.out.println(html);
            JSONObject obj1 = JSON.parseObject(html);
            JSONObject datas = obj1.getJSONObject("data");
            StringBuilder sb = new StringBuilder();
            sb.append(datas.get("hash")).append(",").append(datas.get("song_name")).append(",")
                    .append(datas.get("author_name")).append(",").append(datas.get("album_name")).append(",")
                    .append(",").append(",").append(",").append(",").append(",").append(",").append(",").append(",")
                    .append(",").append(",").append("酷狗音乐").append("\t\n");
            datalist.add(sb.toString());
        }

        Adress.writeFiles(args[1], datalist);
    }
}

工具类

package com.da.utils;

import java.io.IOException;
import java.net.URI;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

import org.apache.http.NameValuePair;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.client.utils.URIBuilder;
import org.apache.http.entity.ContentType;
import org.apache.http.entity.StringEntity;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.impl.conn.PoolingHttpClientConnectionManager;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.util.EntityUtils;

public class HttpClientUtil {
    private static PoolingHttpClientConnectionManager connMgr;
    private static RequestConfig requestConfig;
    private static final int MAX_TIMEOUT = 5000;

    static {
        // 设置连接池
        connMgr = new PoolingHttpClientConnectionManager();
        // 设置连接池大小
        connMgr.setMaxTotal(200);
        connMgr.setDefaultMaxPerRoute(connMgr.getMaxTotal());

        RequestConfig.Builder configBuilder = RequestConfig.custom();
        // 设置连接超时
        configBuilder.setConnectTimeout(MAX_TIMEOUT);
        // 设置读取超时
        configBuilder.setSocketTimeout(MAX_TIMEOUT);
        // 设置从连接池获取连接实例的超时
        configBuilder.setConnectionRequestTimeout(MAX_TIMEOUT);
        // 在提交请求之前 测试连接是否可用
        // configBuilder.setStaleConnectionCheckEnabled(true);
        // 设置代理
        // configBuilder.setProxy(new HttpHost("119.249.48.235", 80));
        requestConfig = configBuilder.build();
    }

    public static String doGet(String url, Map<String, String> param) {

        // 创建Httpclient对象
        CloseableHttpClient httpclient = HttpClients.custom().setConnectionManager(connMgr).build();

        String resultString = "";
        CloseableHttpResponse response = null;
        try {
            // 创建uri
            URIBuilder builder = new URIBuilder(url);
            if (param != null) {
                for (String key : param.keySet()) {
                    builder.addParameter(key, param.get(key));
                }
            }
            URI uri = builder.build();

            // 创建http GET请求
            HttpGet httpGet = new HttpGet(uri);
            httpGet.setConfig(requestConfig);

            // 执行请求
            response = httpclient.execute(httpGet);
            // 判断返回状态是否为200
            if (response.getStatusLine().getStatusCode() == 200) {
                resultString = EntityUtils.toString(response.getEntity(), "UTF-8");
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                if (response != null) {
                    response.close();
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        return resultString;
    }

    public static String doGet(String url) {
        return doGet(url, null);
    }

    public static String doPost(String url, Map<String, String> param) {
        // 创建Httpclient对象
        CloseableHttpClient httpClient = HttpClients.custom().setConnectionManager(connMgr).build();
        CloseableHttpResponse response = null;
        String resultString = "";
        try {
            // 创建Http Post请求
            HttpPost httpPost = new HttpPost(url);
            httpPost.setConfig(requestConfig);
            // 创建参数列表
            if (param != null) {
                List<NameValuePair> paramList = new ArrayList<>();
                for (String key : param.keySet()) {
                    paramList.add(new BasicNameValuePair(key, param.get(key)));
                }
                // 模拟表单
                UrlEncodedFormEntity entity = new UrlEncodedFormEntity(paramList);
                httpPost.setEntity(entity);
            }
            // 执行http请求
            response = httpClient.execute(httpPost);
            resultString = EntityUtils.toString(response.getEntity(), "utf-8");
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                response.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

        return resultString;
    }

    public static String doPost(String url) {
        return doPost(url, null);
    }

    public static String doPostJson(String url, String json) {
        // 创建Httpclient对象
        CloseableHttpClient httpClient = HttpClients.custom().setConnectionManager(connMgr).build();
        CloseableHttpResponse response = null;
        String resultString = "";
        try {
            // 创建Http Post请求
            HttpPost httpPost = new HttpPost(url);
            httpPost.setConfig(requestConfig);
            // 创建请求内容
            StringEntity entity = new StringEntity(json, ContentType.APPLICATION_JSON);
            httpPost.setEntity(entity);
            // 执行http请求
            response = httpClient.execute(httpPost);
            resultString = EntityUtils.toString(response.getEntity(), "utf-8");
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                response.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

        return resultString;
    }
}
package com.da.address;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class Adress {
    public static Set<String> getUrls(String path) {
        Set<String> urls = new HashSet<>();
        // String baseUrl = "http://www.kugou.com/song/#hash=%s&album_id=%s";
        String baseUrl = "http://www.kugou.com/yy/index.php?r=play/getdata&hash=%s&album_id=%s";
        BufferedReader br = null;
        try {
            br = new BufferedReader(new InputStreamReader(new FileInputStream(new File(path)), "utf-8"));
            String line;
            while ((line = br.readLine()) != null) {
                String[] fields = line.split("\\|");
                // System.out.println(fields.length);
                if (fields.length < 4)
                    continue;
                if (!fields[1].startsWith("Kugou-"))
                    continue;
                String hash = fields[1].replace("Kugou-", "");
                String album_id;
                if (fields[2].indexOf("album_id") != -1) {
                    album_id = fields[2].substring(fields[2].indexOf("album_id") + 9);
                    if (album_id.indexOf("&") != -1) {
                        album_id = album_id.substring(0, album_id.indexOf("&"));
                        urls.add(String.format(baseUrl, hash, album_id));
                    }
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            if (br != null) {
                try {
                    br.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
        return urls;
    }

    public static void writeFiles(String path, List<String> datas) {
        OutputStreamWriter osw = null;
        try {
            osw = new OutputStreamWriter(new FileOutputStream(new File(path)), "UTF-8");
            for (String d : datas) {
                osw.write(d);
                osw.flush();
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            if (osw != null) {
                try {
                    osw.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }
}

这里也是获取json数据然后解析
这里写图片描述

把工程打成jar包运行,pom文件需要加入

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-surefire-plugin</artifactId>
            <version>2.17</version>
        </plugin>

        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-assembly-plugin</artifactId>
            <version>2.2</version>
            <configuration>
                <appendAssemblyId>false</appendAssemblyId>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
                <archive>
                    <manifest>
                        <mainClass>com.framework.interf</mainClass>
                    </manifest>
                </archive>
            </configuration>
            <executions>
                <execution>
                    <id>make-assembly</id>
                    <phase>package</phase>
                    <goals>
                        <goal>assembly</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>

        <plugin>
            <artifactId>maven-source-plugin</artifactId>
            <version>2.1</version>
            <configuration>
                <attach>true</attach>
            </configuration>
            <executions>
                <execution>
                    <phase>compile</phase>
                    <goals>
                        <goal>jar</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>

    </plugins>
</build>

然后run -> maven install
最后打开cmd进入jar包所在路径输入:

java -classpath kugou.jar com.da.main.MainSpider C:/Users/Administrator/Desktop/music.txt C:/Users/Administrator/Desktop/music2.txt

最终结果:

5d640ccca02c0c37c2ef3426ef8fa9db,连借口都没有,孙子涵,辞旧,,,,,,,,,,,酷狗音乐   
c3a3c8d769744b6ea96a37cfcc02df4b,一次就好 (Live),杨宗纬,嗨,唱起来! 第四期,,,,,,,,,,,酷狗音乐  
303d5dcf6a7542bf4686effe496706bb,爱你每一天,阿斯满,云朵上的家乡,,,,,,,,,,,酷狗音乐    
c5fdf4564d33fc695175acab5757e353,The Phoenix,Fall Out Boy,Save Rock And Roll,,,,,,,,,,,酷狗音乐 
7d7f2794d22b2bf686e537895f7b04ac,那个人,周延英(英子-effie),那个人,,,,,,,,,,,酷狗音乐   
a3f421512e5ce3c96e3c2503461d5ad6,You've Changed,charles mcpherson,From This Moment On!,,,,,,,,,,,酷狗音乐   
3e42bc172f305fcf7381f462eb8a4f00,Boyfriend,Ashlee Simpson,Boy Crazy,,,,,,,,,,,酷狗音乐  
3f04b986936e95b0e4020e05026f9a74,嘟啦啦慢摇 (Remix),新旭,何去何从,,,,,,,,,,,酷狗音乐   
fb4bb141b3742d7f2546cca5ef9b3297,相思的债 (DJ版),陈瑞,未知专辑,,,,,,,,,,,酷狗音乐  
38ca88b56d2963f88efe110424bacec0,还有我,任贤齐,如果没有你,,,,,,,,,,,酷狗音乐   
71a3828001d931b239c191144346b472,9277,深七,9277,,,,,,,,,,,酷狗音乐    
784440d86b7cfbf4d35ad73c6fd112f0,捕风的汉子,谭咏麟,爱的根源,,,,,,,,,,,酷狗音乐  

猜你喜欢

转载自blog.csdn.net/qq_35641192/article/details/80546933