Simple crawling data table in html page

On a white reptile aspects of himself, by the degree of your mother omnipotent, learn nutrition, can be used to give a simple example, and in this share, for everyone to learn together.

 

First of all, did you want to get from one page to the data you want, you have to get this first page and then get to the page 

Jsoup using a series of operations as a Document object after parsing limited writing skills, directly on the code:

 


org.jsoup.Jsoup Import; 
Import org.jsoup.nodes.Document; 
Import org.jsoup.select.Elements; 

Import the java.io. *; 
Import the java.net.URL; 
Import the java.net.URLConnection; 
Import Classes in java.util .ArrayList; 
Import java.util.List; 

public class Test { 

        public static void main (String [] args) throws IOException { 
            System.out.println ( "start"); 
            Test Test new new D = (); 
            String STR = D .getHtml (); 
            d.readHtml (STR); 
            System.out.println ( "end"); 

        }     // this is the entire page returned by the page acquiring url string 
            the StringBuffer the StringBuffer new new Buffer = ();         // connected url
     
public string the getHtml () throws IOException {
The URLPath = String "http://www.dyhjw.com/dyhjw/etf.html"; the URL of the URL of new new url = (the URLPath); URLConnection conn = url.openConnection (); InputStream in = conn.getInputStream (); // byte stream - "character stream the InputStreamReader the InputStreamReader Reader new new = the InputStreamReader (in," UTF-. 8 "); // read out row by the BufferedReader = new new breader the BufferedReader (Reader); // read data String line =" "; the while ((Line = breader.readLine ()) = null!) { buffer.append (Line);     // reading the string of the page } return buffer+""; } public void readHtml(String html){ //使用Jsoup解析html 成Document对象 Document document = Jsoup.parse(html);         //获取页面中table 的tr部分 Elements trs=document.select("table").select("tr"); List<Object[]> list=new ArrayList<>();
        //输出的位置 File file =new File("d://xxxx.txt"); FileWriter fWriter= null; if(!file.exists()) { try { file.createNewFile(); fWriter= new FileWriter(file); fWriter.append("Date (Beijing) \ t net position (t) \ t the total value (US $) \ t decrease (t) \ t affect (gold and silver) \ r \ n "); for (int i = 1; i < trs.size(); i++) { Elements tds=trs.get(i).select("td"); Object[]obj={ tds.get(0).text(), Double.parseDouble(tds.get(1).text()), Double.parseDouble(tds.get(2).text()), tds.get(3).text(), tds.get(4).text() }; list.add(obj); String txt = ""; for (int j = 0; j < tds.size(); j++) { if (txt == "") { txt = tds.get(j).text(); }else { txt = txt + "\t"+tds.get(j).text(); } } fWriter.append( txt+"\r\n"); fWriter.flush(); } }catch (IOException e) { e.printStackTrace(); }finally { try { fWriter.close(); } catch (IOException e) { e.printStackTrace(); } } } } }

  A little progress every day, encourage each other

 

Guess you like

Origin www.cnblogs.com/lmtdb/p/11598497.html