[doccano] Text annotation tool - attribute-level sentiment analysis to label your own business data

The notes are study notes compiled by myself. If there are any mistakes, please point them out~

[doccano] Text annotation tool - attribute-level sentiment analysis to label your own business data

1. Description

Insert image description here

2. Prerequisites

Make sure doccano has been installed
Please refer to the article:
[doccano] Text Annotation Tool - Installation and Operation Tutorial

3.doccano creates project

Insert image description here
Select sequence annotation
Insert image description here

Insert image description here
Allow overlapping spans when annotating text
Check allow overlapping spans

Label relationships between entities in text
Check use relation labeling
Insert image description here

4. Add data set

The data set format is txt text
One comment per line
Insert image description here
Select textline and import
Insert image description here
Import completed a>
Insert image description here

5. Add tags

Insert image description here
Or import custom tags

[
    {
    
    
        "text": "体验:1",
        "background_color": "#FF0000",
        "text_color": "#ffffff"
    },
    {
    
    
        "text": "体验:-1",
       "background_color": "#FF0000",
        "text_color": "#ffffff"
    },
    {
    
    
        "text": "设计:1",
        "background_color": "#00FF00",
        "text_color": "#000000"
    },
    {
    
    
        "text": "设计:-1",
        "background_color": "#00FF00",
        "text_color": "#000000"
    },
    {
    
    
        "text": "电池:1",
        "background_color": "#0000FF",
        "text_color": "#ffffff"
    },
    {
    
    
        "text": "电池:-1",
        "background_color": "#0000FF",
        "text_color": "#ffffff"
    },
    {
    
    
        "text": "性能:1",
        "background_color": "#FFFF00",
        "text_color": "#000000"
    },
    {
    
    
        "text": "性能:-1",
        "background_color": "#FFFF00",
        "text_color": "#000000"
    },
    {
    
    
        "text": "摄像:1",
        "background_color": "#FF00FF",
        "text_color": "#ffffff"
    },
    {
    
    
        "text": "摄像:-1",
        "background_color": "#FF00FF",
        "text_color": "#ffffff"
    },
    {
    
    
        "text": "通信:1",
       "background_color": "#00FFFF",
        "text_color": "#000000"
    },
    {
    
    
        "text": "通信:-1",
       "background_color": "#00FFFF",
        "text_color": "#000000"
    },
]

Insert image description here

6. Label data

Insert image description here

7. Export data conversion format

Export annotation data to jsonl format, change the suffix to json format
Insert image description here

Convert to txt format

import json

# 读取JSON文件并处理每条数据
with open('admin.json', 'r', encoding='utf-8') as file:
    lines = file.readlines()
    for line in lines:
        data = json.loads(line)

        # 处理每条数据并写入txt文件
        id = data['id']
        text = data['text']
        label = data['label']

        with open('output.txt', 'a', encoding='utf-8') as output_file:
            for lbl in label:
                start = lbl[0]
                end = lbl[1]
                category = lbl[2].split(":")[0]   # 获取类别名称
                tag = lbl[2].split(":")[1]    # 获取类别标签
                output_file.write(f"{
      
      tag}\t{
      
      category}#{
      
      text[start:end]}\t{
      
      text}\n")

Output format:
Insert image description here

Guess you like

Origin blog.csdn.net/weixin_44319595/article/details/134667102
Recommended