The notes are study notes compiled by myself. If there are any mistakes, please point them out~
[doccano] Text annotation tool - attribute-level sentiment analysis to label your own business data
1. Description
2. Prerequisites
Make sure doccano has been installed
Please refer to the article:
[doccano] Text Annotation Tool - Installation and Operation Tutorial
3.doccano creates project
Select sequence annotation
Allow overlapping spans when annotating text
Check allow overlapping spans
Label relationships between entities in text
Check use relation labeling
4. Add data set
The data set format is txt text
One comment per line
Select textline and import
Import completed a>
5. Add tags
Or import custom tags
[
{
"text": "体验:1",
"background_color": "#FF0000",
"text_color": "#ffffff"
},
{
"text": "体验:-1",
"background_color": "#FF0000",
"text_color": "#ffffff"
},
{
"text": "设计:1",
"background_color": "#00FF00",
"text_color": "#000000"
},
{
"text": "设计:-1",
"background_color": "#00FF00",
"text_color": "#000000"
},
{
"text": "电池:1",
"background_color": "#0000FF",
"text_color": "#ffffff"
},
{
"text": "电池:-1",
"background_color": "#0000FF",
"text_color": "#ffffff"
},
{
"text": "性能:1",
"background_color": "#FFFF00",
"text_color": "#000000"
},
{
"text": "性能:-1",
"background_color": "#FFFF00",
"text_color": "#000000"
},
{
"text": "摄像:1",
"background_color": "#FF00FF",
"text_color": "#ffffff"
},
{
"text": "摄像:-1",
"background_color": "#FF00FF",
"text_color": "#ffffff"
},
{
"text": "通信:1",
"background_color": "#00FFFF",
"text_color": "#000000"
},
{
"text": "通信:-1",
"background_color": "#00FFFF",
"text_color": "#000000"
},
]
6. Label data
7. Export data conversion format
Export annotation data to jsonl format, change the suffix to json format
Convert to txt format
import json
# 读取JSON文件并处理每条数据
with open('admin.json', 'r', encoding='utf-8') as file:
lines = file.readlines()
for line in lines:
data = json.loads(line)
# 处理每条数据并写入txt文件
id = data['id']
text = data['text']
label = data['label']
with open('output.txt', 'a', encoding='utf-8') as output_file:
for lbl in label:
start = lbl[0]
end = lbl[1]
category = lbl[2].split(":")[0] # 获取类别名称
tag = lbl[2].split(":")[1] # 获取类别标签
output_file.write(f"{
tag}\t{
category}#{
text[start:end]}\t{
text}\n")
Output format: