获取内容中所有指定标签的src

下边这个方法是获取内容中(比如网页)所有的指定标签的src集合,htmlstr为内容,type为标签名称,比如图片标签名称为img(代码块中注释部分即是以img为示例),视频标签为video等

/**
     * 获取指定类型的src值的集合
     * @param htmlStr
     * @param type 标签名称
     * @return
     */
    public static Set<String> getSrcStr(String htmlStr,String type) {
        Set<String> srcs = new HashSet<String>();
        String src = "";
        Pattern p_src;
        Matcher m_src;
// String regEx_img = "<img.*src=(.*?)[^>]*?>"; //图片链接地址
        String regEx_src = "<"+type+".*src\\s*=\\s*(.*?)[^>]*?>";
        p_src = Pattern.compile
                (regEx_src, Pattern.CASE_INSENSITIVE);
        m_src = p_src.matcher(htmlStr);
        while (m_src.find()) {
// 得到<img />数据
            src = m_src.group();
// 匹配<img>中的src数据
            Matcher m = Pattern.compile("src\\s*=\\s*\"?(.*?)(\"|>|\\s+)").matcher(src);
            while (m.find()) {
                srcs.add(m.group(1));
            }
        }
        return srcs;
    }

使用示例:

		String str="12345<video src=\"http://vd4.bdstatic.com/mda-jkjf1ab31ekxafc4/sc/mda-jkjf1ab31ekxafc4.mp4?playlist=%5B%22hd%22%2C%22sc%22%5D\" width=\"100px\"></video>,<video src=\"http://vd4.bdstatic.com/mda-jkjf1ab31ekxafc4/sc/mda-jkjf1ab31ekxafc4.mp4?playlist=%5B%22hd%22%2C%22sc%22%5g\"></video>12345";
        System.out.println(getSrcStr(str,"video"));

示例结果(为地址的set集合):

[http://vd4.bdstatic.com/mda-jkjf1ab31ekxafc4/sc/mda-jkjf1ab31ekxafc4.mp4?playlist=%5B%22hd%22%2C%22sc%22%5D, http://vd4.bdstatic.com/mda-jkjf1ab31ekxafc4/sc/mda-jkjf1ab31ekxafc4.mp4?playlist=%5B%22hd%22%2C%22sc%22%5g]
发布了43 篇原创文章 · 获赞 12 · 访问量 4659

猜你喜欢

转载自blog.csdn.net/Jarbein/article/details/103614892