How to use the Go language to implement the function of finding duplicate rows?

During programming, there are times when you need to find duplicate lines. This kind of operation can help us find out the repeated lines of text and perform subsequent processing, such as deleting repeated lines or counting the number of repetitions. This article will introduce how to use the Go language to implement the function of finding duplicate rows, and provide several commonly used algorithms and techniques.

1. Read the contents of the file

First, we need to read a file containing lines of text. Go language provides bufiopackage to read file content conveniently. We can use Scannerthe type to read the file line by line and store each line in a string slice. Here is a code example to read the contents of a file:

package main

import (
    "bufio"
    "fmt"
    "os"
)

func readFile(filename string) ([]string, error) {
    
    
    file, err := os.Open(filename)
    if err != nil {
    
    
        return nil, err
    }
    defer file.Close()

    lines := make([]string, 0)
    scanner := bufio.NewScanner(file)
    for scanner.Scan() {
    
    
        lines = append(lines, scanner.Text())
    }

    if scanner.Err() != nil {
    
    
        return nil, scanner.Err()
    }

    return lines, nil
}

func main() {
    
    
    lines, err := readFile("input.txt")
    if err != nil {
    
    
        fmt.Println("Error:", err)
        return
    }

    // 在这里接下来的代码中进行查找重复行的操作
}

In the above code, readFilethe function receives a file name as a parameter and returns a string slice, where each element represents a line of text in the file. Use to bufio.Scannerread the file contents line by line and add each line to linesa slice. Finally, the slice is returned to the caller.

2. Find duplicate rows

With the contents of the file already read, we can start looking for duplicate lines. Here are several commonly used methods for finding duplicate rows:

1. Use a Map to store rows and occurrences

A simple and efficient approach is to use a Map data structure to store each line of text and the number of times it occurs. Traverse each line of text and use it as the key value of the Map. If the line already exists in the Map, add one to the corresponding count; otherwise, add this line to the Map and set the count to one. Here is a code example of using a Map to find duplicate rows:

func findDuplicateLines(lines []string) map[string]int {
    
    
    duplicates := make(map[string]int)

    for _, line := range lines {
    
    
        duplicates[line]++
    }

    return duplicates
}

In the above code, findDuplicateLinesthe function receives a string slice as a parameter and returns a Map, where the key is the text of the repeated row and the value is the corresponding number of occurrences. Use a Map to count the number of occurrences of each line of text by iterating over each line of input text.

2. Use sorted slices for comparison

Another approach is to sort the file contents and compare adjacent lines of text. If two lines of text are the same, it means there is a duplicate line. Here's a code example that uses sorted slices to find duplicate rows:

import "sort"

func findDuplicateLines(lines []string) []string {
    
    
    sortedLines := make([]string, len(lines))
    copy(sortedLines, lines)

    sort.Strings(sortedLines)

    duplicates := make([]string, 0)
    for i := 1; i < len(sortedLines); i++ {
    
    
        if sortedLines[i] == sortedLines[i-1] {
    
    
            duplicates = append(duplicates, sortedLines[i])
        }
    }

    return duplicates
}

In the above code, we first make a copy of the original string slice, and sort the copied slice. Then, iterate through the sorted slice, comparing adjacent lines of text, and adding them to the duplicate line's string slice if they are the same.

3. Example of use

Next, we can maincall the above method of finding duplicate rows in the function, and output the result. For example, here is a complete example:

func main() {
    
    
    lines, err := readFile("input.txt")
    if err != nil {
    
    
        fmt.Println("Error:", err)
        return
    }

    duplicates := findDuplicateLines(lines)
    for line, count := range duplicates {
    
    
        fmt.Printf("Line '%s' has %d occurrences\n", line, count)
    }
}

In the above code, we first read the contents of the file, then call findDuplicateLinesthe function to find duplicate lines, and print out the result.

Four. Summary

This article introduces the method of finding duplicate lines using Go language, including reading file content, using Map to store line and occurrence count, and using sorted slices for comparison. Through these methods, we can easily find duplicate rows and perform further processing. Hope this article helps you.