Performance optimization searching data in file system

Samarjit Baruah :

I have a network associated storage where around 5 million txt files are there related to around 3 million transactions. Size of the total data is around 3.5 TB. I have to search in that location to find if the transaction related file is available or not and have to make two separate reports as CSV file of "available files" and "not available files". We are still in JAVA 6. The challenge that I am facing since I have to search in the location recursively, it takes me around average 2 mins to search in that location because of huge size. I am using Java I/O API to search recursively like below. Is there any way I can improve the performance?

File searchFile(File location, String fileName) {
     if (location.isDirectory()) {
         File[] arr = location.listFiles();
         for (File f : arr) {
             File found = searchFile(f, fileName);
             if (found != null)
                 return found;
         }
     } else {
         if (location.getName().equals(fileName)) {
             return location;
         }
     }
     return null;
}
utpal416 :
  • Searching in a Directory or a Network Associated Storage is a nightmare.It takes lot of time when directory is too big / depth. As you are in Java 6 , So you can follow an old fashion approach. List all files in a CSV file like below.
  • e.g

    find . -type f -name '*.txt' >> test.csv . (if unix)

    dir /b/s *.txt > test.csv (if Windows)

  • Now load this CSV file into a Map to have an index as filename. Loading the file will take some time as it will be huge but once you load then searching in the map ( as it will be file name ) will be much more quick and will reduce your search time drastically.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=69894&siteId=1