Case five, format the output xml file

At work we have contact with more or less xml file, its format is very regular, but read because there are too many labels (<>), is not clear, for example, you configure the following paragraph:

<configuration>
     <artifactItems>
          <artifactItem>
              <groupId>zzz</groupId>
              <artifactld>aaa</artifactld>
          </artifactItem>
          <artifactItem>
              <groupId>xxx</groupId>
              <artifactld>yyy</artifactld>
          </artifactItem>
</artifactItems>

This case needs to be extracted from the above artifactld groupId and XML text, and outputs the following format:

artifactItem:groupId:zzz
artifactItem:artifactld:aaa
artifactItem:groupId:xxx
artifactItem:artifactld:yyy


A knowledge point: tips about XML

XML ( Extensible Markup Language), Chinese called: Extensible Markup Language. Like XML and HTML, it is a markup language. XML is mainly used to carry data transfer and information, not for show, so read a little obstacle.


There are many service configuration file is an XML text, define the corresponding configuration in the XML text, like on this case is an example of a text configuration file. The main effect that the storage of XML data, which is stored in plain text format, thus providing a method of storing data that is independent of software and hardware. This allows to create different applications can share data easier. Since the format of the XML text is fixed, whether it is Windows, Linux or MAC and other operating systems, it can be identified, so it's a good compatibility.


One thing we need to know, is not as XML is a markup language that, unlike HTML needs to be resolved, perform and show beautiful web, meaning it exists only structured, storage and transmission of information.


Knowledge Point two: the interception of two key documents in the middle of the line

Demand is included in the text section 123 and the intermediate abc print out, assuming 123 abc above. If you are using sed, a command can be realized:

# sed -n '/abc/,/123/p' 1.txt

But this still abc and line 123, in order to get rid of them, it is very simple:

# sed -n '/abc/,/123/p' 1.txt |sed '/abc/d;/123/d'

If there are more than 123 text and abc will simultaneously all qualified rows to print them all, provided the soil below a stupid way to help exercise logical thinking.

mysed.sh

! # / bin / the bash 
# abc and 123 to obtain the line number of the line 
egrep -n 'abc | 123' 1.txt | awk -F ':' 'Print $ {}. 1'> /tmp/line_number.txt 

# Calculation abc and comprising a total number of 123 rows 
n-WC = `-l /tmp/line_number.txt|awk '{}. 1 Print $'` 

# abc calculated and a total number of 123 
N2 = $ [$ n-/ 2] 

for I SEQ. 1 $ `n2` in 
do 
    # two rows per treatment cycle it should, for the first time is 1, 2, 3 and 4 is the second, and so on 
    M1 = $ [$ I * 2-1] 
    M2 = $ [$ I * 2] 

    # abc each pass to be acquired and the line number 123 
    NU1 Sed -n = `" $ M1 "P / tmp / line_number.txt` 
    NU2 Sed -n =` "$ M2" P / tmp / line_number.txt` 

    # abc acquired line number in the following 
    NU3 = $ [$ + NU1. 1] 

     # 123 acquires the above line number in 
    NU4 = $ [$ nu2-1] 
    
    # sed with the intermediate line 123 and abc Print out 
    sed -n "$ nu3, $ nu4 " p 1.TXT 

    # easy identification, adding row symbols are separated " p 1.txt
    echo "============="
done

Provide a test text 1.txt, reads as follows:

alskdfkjlasldkjfabalskdjflkajsd
asldkfjjk232k3jlk2
alskk2lklkkabclaksdj
skjjfk23kjalf09wlkjlah lkaswlekjl9
aksjdf
123asd232323
aaaaaaaaaa
222222222222222222
abcabc12121212
fa2klj
slkj32k3j
22233232123
bbbbbbb
ddddddddddd

Sed with treatment, the result is:

# Sed -N / abc /, / 123 / p 1.txt | sed / abc / d, / 123 / D ' 
skjjfk23kjalf09wlkjlah lkaswlekjl9 
aksjdf 
fa2klj 
slkj32k3j

With mysed.sh process, the result is:

# sh mysed.sh 
skjjfk23kjalf09wlkjlah lkaswlekjl9
aksjdf
=============
fa2klj
slkj32k3j
=============


case study

1) First, to find < artifactItem > and </ artifactItem > intermediate data segment, the data analysis for this part of

2) can be found in the XML document contains < artifactItem > and </ artifactItem > line number of the line, and then use sed part of this interception

3) taken out of the process data segment using sed, awk keywords, and the corresponding value taken


This case reference script

! # / bin / bash 
# required output XML content, this custom script is strong, not universal 
# Author: 
# Date: 

XML document name # suppose to be processed is test.xml 
# obtain and line number where the 
grep -n 'artifactItem>' the test.xml | awk '{}. 1 Print $' | Sed 'S /: //'> /tmp/line_number.txt 

# calculates a total number of rows and the row 
n = `wc -l / tmp / line_number .txt | awk '{Print $. 1}' ` 

# define getters keywords and their values 
the get_value () { 
    # $. 1 and $ 2 as a function of two parameters, i.e., the next line and the line number (this operation on one line below ) # 
    middle and the cut out, and then acquires keywords (e.g., the groupId) and its corresponding value is written /tmp/value.txt 
    Sed -n "$. 1, $ 2" P the test.xml | awk -F '<' 'Print $ {2}' | awk -F '>' '. 1 {Print $, $ 2}'> /tmp/value.txt 

    # traverses the entire document /tmp/value.txt 
    cat / tmp / value.TXT | the while Read Line 
    do 
        #x is the key words, such as the groupId  
        #y being a value for the keyword
        X = $ `echo Line | awk '{}. 1 Print $'`
        Line echo $ = `Y | awk 'Print $ {2}'` 
        echo artifactItem: X $: $ Y 
    DONE 
} 

# Because /tmp/line_number.txt appear in pairs, n2 is a total number of 
n2 = $ [$ n / 2] 

# for each pair, and the corresponding values of the print keyword 
for J in SEQ. 1 $ `n2` 
do 
    # two rows per treatment cycle should, for the first time is 1, 2, 3 is the second time, 4, and so on 
    M1 = $ [$ J * 2-1] 
    M2 = $ [$ J * 2] 

    # each iteration should obtain and line number 
    nu1 = `sed -n" $ m1 "p / tmp / line_number.txt` 
    NU2 Sed -n = `" $ M2 "P / tmp / line_number.txt` 

    # line number in the following acquired 
    NU3 = $ [$ + NU1. 1] 

     # line number in the above acquired 
    nu4 = $ [$ nu2-1] 

    get_value $ NU3 $ NU4 
DONE


Guess you like

Origin blog.51cto.com/13576245/2430302