inotgo.com

Step 1 : What is? xml   
Step 2 : common xml   
Step 3 : What is? html   
Step 4 : html and xml The relationship between   
Step 5 : analysis   
Step 6 : xml Several ways of parsing   
Step 7 : jar   
Step 8 : Example   
Step 9 : Runnable project   

xml yes Extensible markup language Abbreviation for : Extensible Markup Language.
For example, like the following :
<root> <e1> text 1</e1> </root>
<root>
  <e1> text 1</e1>
</root>
For example, do web Application development , Need to configure web.xml, It's a typical xml File .
There are these elements in it : web-app, servlet, servlet-name, servlet-class these .

notes : What are elements ? A format like this is an element : < Element name > Element content </ Element name >. For example : <servlet-name>HelloServlet</servlet-name> namely servlet-name Element .
<web-app> <servlet> <servlet-name>HelloServlet</servlet-name> <servlet-class>HelloServlet</servlet-class> </servlet> <servlet-mapping> <servlet-name>HelloServlet</servlet-name> <url-pattern>/hello</url-pattern> </servlet-mapping> </web-app>
<web-app>
 
    <servlet>
        <servlet-name>HelloServlet</servlet-name>
        <servlet-class>HelloServlet</servlet-class>
    </servlet>
 
    <servlet-mapping>
        <servlet-name>HelloServlet</servlet-name>
        <url-pattern>/hello</url-pattern>
    </servlet-mapping>
 
</web-app>
html yes HyperText Markup Language Abbreviation for , Hypertext markup language .
The following paragraph is a section html.
Do not understand html , Welcome to learn html Series of teaching materials : The first paragraph html code
<html> <body> <p>Hello HTML</p> </body> </html>
<html>
  <body>
    <p>Hello HTML</p>
  </body>
</html>
Step 4 :

html and xml The relationship between

edit top fracture
html It can be simply regarded as xml A subset of . html It uses some predefined elements , as <html>, <a>, <body>, <table> . and xml Any element can be customized : as <a> , <b>, <aabb> .
since html yes xml Subset of , So it's like xml Same , Let's talk about xml It's OK to parse .
1. java It has its own right xml Analysis of . stay javax.xml Under this bag , Very difficult to use , It's hard to vomit . This method is called sax/dom
2. Because java It's hard to use what comes with it , So there are more convenient third-party tools dom4j, The parsing efficiency is greatly improved .
3. Now there are more convenient jsoup, We'll explain how to use jsoup To analyze xml.
jsoup It's also a third-party tool , So use , First download jar, In the upper right corner :jsoup-1.12.1.jar.
Look at the code , You can put a section html:

<html><body><p>Hello HTML</p></body></html>

In the p The content of the element Hello HTML Take it out .

1. Parse the text into Document Object , Document The object represents the whole xml file .

Document doc = Jsoup.parse(html);

2. Get all p Element .

Elements as= doc.getElementsByTag("p");

3. Traverse be-all p Element ( There's actually only one ), Print its contents

for (Element e : as) {
System.out.println(e.text());
}
 Example
package cn.how2j.jsoup; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class Test { public static void main(String[] args) throws Exception { String html = "<html><body><p>Hello HTML</p></body></html>"; Document doc = Jsoup.parse(html); Elements as= doc.getElementsByTag("p"); for (Element e : as) { System.out.println(e.text()); } } }
package cn.how2j.jsoup;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class Test {

	public static void main(String[] args) throws Exception {
		String html = "<html><body><p>Hello HTML</p></body></html>";
		Document doc = Jsoup.parse(html);
		
		Elements as= doc.getElementsByTag("p");
		for (Element e : as) {
			System.out.println(e.text());
		}

	}
}
In the upper right corner, there is the runnable Project Download corresponding to this knowledge point , I really can't do it myself , Just download and unzip it and compare it .


The official account of programming , Follow and get the latest tutorials and promotions in real time , thank you .


Q & A area    
2020-11-08 I have this tool , Can I take it, too java Write about reptiles ?
FARO_Z

rt




1 One answer

now2iava
Answer time :2021-04-21
Certainly. . Crawlers are just data acquisition , Data filtering , Data warehousing . You can write in any language , Not just python



The answer has been submitted successfully , Auditing . Please My answer Check the answer record at , thank you
answer Or code please Fill in at least one , If you have a problem , Please ask again , Otherwise, the webmaster may not see




2020-01-15 Content causes comfort
hx1176406648

Yes xml dom course , It's used inside js Analyze , java I really don't know




1 One answer

West Wing 666
Answer time :2020-07-03
right , This is typical JS obtain dom Writing method of



The answer has been submitted successfully , Auditing . Please My answer Check the answer record at , thank you
answer Or code please Fill in at least one , If you have a problem , Please ask again , Otherwise, the webmaster may not see








Please... Before asking questions land
The question has been submitted successfully , Auditing . Please My question Check the question record at , thank you
about JAVA application -jsoup- Introduction Your questions

Try to provide Screenshot code and Abnormal information , Help to analyze and solve problems . You can also enter this station QQ Group communication : 496725845
Ask questions and try to provide complete code , Environment description , The more conducive to the recurrence of the problem , The faster your question can be answered .
Have questions about the code in the tutorial , Please provide which step , Which line is in doubt , This makes it easy to quickly locate the problem , Improve the speed at which questions are answered
In the thousands of questions that already exist , A considerable proportion , Because of the use of and webmaster Different versions of the development environment Resulting in , For example jdk, eclpise, idea, mysql,tomcat Wait, the version of the software is inconsistent .
Please use the same version as the webmaster , You can save yourself a lot of learning time . The webmaster sorted out the software versions used in teaching , It's all here , Convenient for everyone to download : /k/helloworld/helloworld-version/1718.html

Upload screenshot