Extract data from json and save it in txt format

Some time ago, there was a need for text semantic matching, but the company's labeled data was not enough for unsupervised learning, so we could only use open source data sets. The open source data set is cleaned into json format, and we extract data from json and save it in txt format for subsequent use. The JSON data format is as follows:
insert image description here
the processed txt data format is as follows:
insert image description here
the processing code is shown in the figure below:

import json
import os
import sys
sen1 = []
sen2 = []
label = []
with open('./1.json',encoding='utf-8') as f:
 for line in f:
  try:
   line.index("sen1")
   # line = line.strip('\n')
   pos = line.index(':')
   sen1.append(line[pos+3:len(line)-3])
  except ValueError:
   pass
  try:
   line.index("sen2")
   # line = line.strip('\n')
   pos = line.index(':')
   sen2.append((line[pos+3:len(line)-3]))
  except ValueError:
   pass
  try:
   line.index("label")
   try:
    line.index("sen1")
   except ValueError:
    pos = line.index(':')
    # label.append(line[pos + 3:len(line) - 2])
    # label.append(line[pos + 1:len(line) - 1])
    label.append(line[pos + 3:len(line) - 2])
  except ValueError:
   pass
 write_file = open('./1.txt',"a+",encoding='utf-8')
 j=0
 while j< len(sen1):
  str_info = sen1[j]+"\t"+sen2[j]+"\t"+label[j]+"\n"
  write_file.write(str_info)
  j = j + 1

Guess you like

Origin blog.csdn.net/weixin_43228814/article/details/125923787
Recommended