In C#, the method of reading the content of one or more files

The method of reading the contents of one or more files
In C#, you can use the File.ReadAllLines method to read all the line contents in multiple files at one time. For example, the following code reads all the lines from two files and merges them together:

string[] file1Lines = File.ReadAllLines("file1.txt");
string[] file2Lines = File.ReadAllLines("file2.txt");
string[] allLines = file1Lines.Concat(file2Lines).ToArray();

The above code first uses the File.ReadAllLines method to read all the line contents in file1.txt and file2.txt respectively, and store them in two different string arrays. Then, use LINQ's Concat method to combine these arrays into a new string array. Finally, use the ToArray method to convert the IEnumerable to a string[] array type.

Read the contents of thousands of files at a time
If you want to read the contents of thousands of files at a time, you can use the Parallel.ForEach method to increase the reading speed. The Parallel.ForEach method allows parallel processing of elements in a collection.

Here is a sample code that demonstrates how to use Parallel.ForEach to read the contents of thousands of files:

string[] fileNames = Directory.GetFiles(@"C:\files\"); // 获取要读取的文件列表

ConcurrentBag<string> allLines = new ConcurrentBag<string>();

Parallel.ForEach(fileNames, fileName =>
{
    
    
    string[] fileLines = File.ReadAllLines(fileName);
    foreach (string line in fileLines)
    {
    
    
        allLines.Add(line);
    }
});

// 在这里对所有行进行处理
foreach (string line in allLines)
{
    
    
    // 处理逻辑
}

The above code first uses the Directory.GetFiles method to get all the files in the directory. Then, pass the list of filenames to the Parallel.ForEach method to process each file in parallel. As each file is processed, read the contents of all lines using the File.ReadAllLines method and use a ConcurrentBag to safely store the contents of all lines. Finally, all row contents can be processed in a loop.

Using the Parallel.ForEach method can effectively utilize multi-core processors and speed up the speed of reading files. However, it should be noted that multi-threading has a certain overhead. If the file is small, enabling multi-threading will actually reduce the reading speed. Therefore, it is also necessary to choose according to the situation and scene during specific use.

Read the contents of tens of thousands of files at a time without taking up many resources
If you need to read the contents of tens of thousands of files at a time without taking up many resources, you can use Stream Processing to read the files. That is, only read part of the content at a time, and then continue to read the next part after processing this part of the content. This avoids reading all file content into memory at once, thereby reducing memory usage.

The following is a sample code that demonstrates how to read tens of thousands of files using streaming:

string[] fileNames = Directory.GetFiles(@"C:\files\");

foreach (string fileName in fileNames)
{
    
    
    using (StreamReader reader = new StreamReader(fileName))
    {
    
    
        string line;
        while ((line = reader.ReadLine()) != null)
        {
    
    
            // 在这里处理每一行的内容
        }
    }
}

The above code first gets all the files in the directory and then iterates through each file using a foreach loop. In each file, use the ReadLine method of StreamReader to read the contents of the file line by line until the end of the file is read. After each line of content is read, each line of content can be processed in a loop.

This method can make the program not occupy a large amount of memory at once during the process of reading the file, and read it line by line step by step to release the memory. However, this method may take more time than reading and reprocessing all at once.

Read the contents of tens of thousands of files at a time without taking up a lot of resources and fast speed.
If you want to read tens of thousands of files at a time without taking up a lot of memory and at a faster speed, you can use parallel processing. The files are assigned to different threads for processing. At the same time, stream processing should also be used to read only part of the content to avoid occupying a large amount of memory at one time.

Here is a sample code that demonstrates how to read tens of thousands of files at once using parallel processing and streaming:

string[] fileNames = Directory.GetFiles(@"C:\files\");

Parallel.ForEach(fileNames, fileName =>
{
    
    
    using (StreamReader reader = new StreamReader(fileName))
    {
    
    
        string line;
        while ((line = reader.ReadLine()) != null)
        {
    
    
            // 在这里处理每一行的内容
        }
    }
});

The above code uses the Parallel.ForEach method to assign different files to different threads for processing, thereby processing files in parallel. In each thread, use the ReadLine method of StreamReader to read the content of the file line by line to avoid occupying a large amount of memory at one time, and process each line of content after the content of each line is processed.

Using parallel processing can maximize the use of multi-core processors, thereby increasing the speed of reading files. At the same time, the use of stream processing can avoid occupying a large amount of memory at one time, so it can not only process a large number of files quickly, but also ensure that the program will not crash due to occupying a large amount of memory.

The difference between Parallel.ForEach and ThreadPool to create and manage threads

Both Parallel.ForEach and ThreadPool in C# are technologies for multi-threaded programming, but there are some essential differences between the two. The main differences are as follows:

Usage: Parallel.ForEach is generally used to process elements in a collection in parallel, while ThreadPool is used to manage thread pools and perform some relatively simple concurrent tasks.

Control granularity: In Parallel.ForEach, the basic unit of controlling the number of thread calls is the element in the collection; in ThreadPool, the unit is task.

Explicitness: Parallel.ForEach is a method of explicitly creating threads, which can be used directly in the code that requires concurrent processing without other cooperation; while ThreadPool exists in the form of a thread pool in the system and needs The caller hands over specific tasks to the thread pool for execution.

Thread life cycle: Parallel.ForEach will create and recycle threads during the execution process. When the method in ForEach is executed, the thread will be released: while ThreadPool will maintain the threads in the thread pool during the running of the program. Even if these threads have no tasks Will not be released either, always on call.

Operation control: Parallel.ForEach can control the operations in the loop more directly; while ThreadPool must use some means to control the conflicts generated in concurrent operations.

In short, Parallel.ForEach and ThreadPool each have their own advantages and disadvantages. When facing different concurrent tasks, developers need to comprehensively consider factors such as usage effects and performance according to the actual situation, and make a choice.
Here are two examples:

Use Parallel.ForEach:

List<int> numbers = Enumerable.Range(0, 1000000).ToList();

Parallel.ForEach(numbers, number =>
{
    
    
    int result = SomeExpensiveCalculation(number);
    Console.WriteLine(result);
});

Use ThreadPool:

List<int> numbers = Enumerable.Range(0, 1000000).ToList();

foreach (int number in numbers)
{
    
    
    ThreadPool.QueueUserWorkItem(state =>
    {
    
    
        int result = SomeExpensiveCalculation(number);
        Console.WriteLine(result);
    });
}

The above codes demonstrate how to use Parallel.ForEach and ThreadPool to process a list of 1000000 elements concurrently and perform expensive calculations on each element, respectively. When using Parallel.ForEach, you can pass a list directly to the ForEach method and perform calculations on each element. When using ThreadPool, you need to encapsulate the task in the ThreadPool.QueueUserWorkItem method, and add the task to the thread pool for execution.

Guess you like

Origin blog.csdn.net/shanniuliqingming/article/details/129325768