How many files can I put in the directory?

How many files I keep in a directory in a relationship? If so, how many files in the directory are too many, too many documents will have any effect? (This is on a Linux server.)

Background: I have a photo album website, and upload each image are renamed to eight hexadecimal ID (such as a58f375c.jpg). This is to avoid file name conflicts (for example, if you upload a lot of "IMG0001.JPG" file). Original filename and any useful metadata is stored in the database. Now, in the images directory approximately 1500 files. This makes the files in the directory takes a few seconds via FTP or SSH client list. But I do not see what other effects. In particular, the image file to the user's speed does not seem to have any effect.

I have considered to reduce the number of images by making 16 subdirectories (0-9 and af). Then, according to the first hexadecimal number in the filename of the image to a subdirectory. But I'm not sure whether there are grounds to do so, except occasionally by FTP / SSH listed in the directory.


#1st Floor

I have a directory containing 88,914 files. Like you, it is used on a Linux server to store thumbnails.

Yes, the files via FTP or php functions listed slower, yes, but when displaying the file will degrade performance. E.g. www.website.com/thumbdir/gh3hg4h2b4h234b3h2.jpg latency of 200-400 ms. For comparison, on another site, I have about 100 files in the directory, wait for about 40 ms after the display image.

I gave this answer, because most people just write the implementation directory search function, you will not use it in a folder on a thumb - just statically displayed file, but the file will actually be interested in the performance mode of use.


#2nd Floor

I just need to ext4create a directory on the file system, which contains 1,000,000 files, then you can randomly access those files through a Web server. I visited there (for example) only 10 files and there is no premium.

This is me in a few years ago ntfsthe experience to do this on a completely different.


#3rd floor

Is not the answer, but some suggestions.

Selecting a more appropriate FS (File System). From a historical point of view, all of your questions are very wise, FS can become the center of decades of continuous development. I mean, more modern FS can better support your issue. First, according to FS list of the ultimate goal of making a comparison decision table.

I think it is time to change your paradigm of. So, I personally recommend the use of identifiable distributed system FS , which has no limit on the size, number and other documents. Otherwise, sooner or later you will encounter a new unforeseen problems.

I'm not sure if it is working, but if you do not mention some of the experiments, try using currently AUFS on the file system. I guess it will have multiple folders modeled as a function of a single virtual folder.

To overcome hardware limitations, you can use RAID-0.


#4th floor

ext3 does in fact have a directory size limit, and they depend on the block size of the file system. There is no "maximum number" of files per directory, but "maximum number of blocks used to store the file entry" of each directory. Specifically, the size of the directory itself can not exceed the height of the b-tree 3, and tree fan out depending on the size of the block. For more details, see this link.

https://www.mail-archive.com/[email protected]/msg01944.html

Recently, I use is this a problem on a 2K block formatted file system warning: ext3_dx_add_entry: Directory index full!, the directory is full of kernel message that the file system is somehow received a warning: ext3_dx_add_entry: Directory index full!copy from another ext3 file system. In my case, only 480,000 catalog files can not be copied to the target location.


#5th Floor

As long as does not exceed the limit of the operating system, it is not a number "too much." However, no matter what operating system is, the more files in the directory, the longer it takes to access any single file, and on most operating systems, performance is nonlinear, from 10, 000 file time to find a file takes more than 10 times. Then find 1,000 files.

Minor problems with directory contains a large number of files related to the failure include wildcard characters. To reduce the risk, you may consider by upload date or other useful metadata directory sort order.


#6th floor

"Depending on the file system"
Some users mentioned file system performance impact depends on the use. of course. Such as EXT3 file systems can be very slow. However, even with EXT4 or XFS, you can not stop by lsor findor connection (such as FTP) List Folder will become slower and slower external.

Solutions
I like @armandino the same way. For this, I use this small function in PHP to convert a file path ID for each catalog can be generated 1000 document:

function dynamic_path($int) {
    // 1000 = 1000 files per dir
    // 10000 = 10000 files per dir
    // 2 = 100 dirs per dir
    // 3 = 1000 dirs per dir
    return implode('/', str_split(intval($int / 1000), 2)) . '/';
}

Or, if you want to use alphanumeric characters, you can use the second version:

function dynamic_path2($str) {
    // 26 alpha + 10 num + 3 special chars (._-) = 39 combinations
    // -1 = 39^2 = 1521 files per dir
    // -2 = 39^3 = 59319 files per dir (if every combination exists)
    $left = substr($str, 0, -1);
    return implode('/', str_split($left ? $left : $str[0], 2)) . '/';
}

result:

<?php
$files = explode(',', '1.jpg,12.jpg,123.jpg,999.jpg,1000.jpg,1234.jpg,1999.jpg,2000.jpg,12345.jpg,123456.jpg,1234567.jpg,12345678.jpg,123456789.jpg');
foreach ($files as $file) {
    echo dynamic_path(basename($file, '.jpg')) . $file . PHP_EOL;
}
?>

1/1.jpg
1/12.jpg
1/123.jpg
1/999.jpg
1/1000.jpg
2/1234.jpg
2/1999.jpg
2/2000.jpg
13/12345.jpg
12/4/123456.jpg
12/35/1234567.jpg
12/34/6/12345678.jpg
12/34/57/123456789.jpg

<?php
$files = array_merge($files, explode(',', 'a.jpg,b.jpg,ab.jpg,abc.jpg,ddd.jpg,af_ff.jpg,abcd.jpg,akkk.jpg,bf.ff.jpg,abc-de.jpg,abcdef.jpg,abcdefg.jpg,abcdefgh.jpg,abcdefghi.jpg'));
foreach ($files as $file) {
    echo dynamic_path2(basename($file, '.jpg')) . $file . PHP_EOL;
}
?>

1/1.jpg
1/12.jpg
12/123.jpg
99/999.jpg
10/0/1000.jpg
12/3/1234.jpg
19/9/1999.jpg
20/0/2000.jpg
12/34/12345.jpg
12/34/5/123456.jpg
12/34/56/1234567.jpg
12/34/56/7/12345678.jpg
12/34/56/78/123456789.jpg
a/a.jpg
b/b.jpg
a/ab.jpg
ab/abc.jpg
dd/ddd.jpg
af/_f/af_ff.jpg
ab/c/abcd.jpg
ak/k/akkk.jpg
bf/.f/bf.ff.jpg
ab/c-/d/abc-de.jpg
ab/cd/e/abcdef.jpg
ab/cd/ef/abcdefg.jpg
ab/cd/ef/g/abcdefgh.jpg
ab/cd/ef/gh/abcdefghi.jpg

As you can see, for $int-version, each folder contains up to 1000 files and 99 directories, which contains 1,000 files and 99 directories ...

But do not forget that this directory will cause many of the same performance problems!

Finally, you should consider how to reduce the total number of files. Depending on your goals, you can use CSS sprites to combine multiple small images, such as avatars, icons, emoticons and so on. Or, if you use many small non-media files, consider combining JSON format. In my case, I have thousands of small cache, and finally I decided to combine them into a pack of 10.


#7th floor

Most of the answers above can not show that the original problem is no "one size fits all" answer.

In today's environment, we have made various hardware and software components of large enterprise groups - some 32, some 64, some sophisticated, some are proven - reliable and will never change. In addition, there are a variety of newer hardware, the newer OS, different vendors (Windows, Unix, Apple, etc.) as well as a number of utility and server. With the improvement of hardware and software to 64-bit compatibility, in making this very large and complex all parts of the world are able to adapt to the rapid changes of the time, there must be a considerable delay.

IMHO, not a way to solve the problem. The solution is to study the possibility of the method that best suits your specific needs and then find by trial and error. Each user must determine effective methods for their system, rather than using the Cookie cutter approach.

For example, I have a media server with a few very large files. The results are only about 400 files filled with 3 TB drives. Use only 1% of the inode, but uses 95% of the total space. Other people have a lot of smaller files before they may run out of inodes close to the filling space. (Typically, the ext4 file system, each file / directory using an inode.) Although the total number of files that can be included in the directory theoretically almost unlimited, but the practicality determine the overall usage determines the actual unit, rather than only file system functions.

I hope that all of the above answers can promote different ideas and solve problems, rather than an insurmountable obstacle for the progress made.


Building # 8

I encountered a similar problem. I tried to access contains more than 10,000 files directory. Build a list of files and run any type of command have spent too much time on any document.

I came up with some PHP script to do this for themselves, and trying to find a method in which super prevent the browser.

The following is a php script I solve this problem and write.

List of FTP directory contains too many files

How does it help someone


House # 9

The biggest problem I encountered on 32-bit systems. Once a certain number of passes, "ls" like the tool will stop working.

Once through the barrier, try to do anything in this directory will become a huge problem.


#10th floor

The problem comes down to how you will handle files.

Under Windows, any more than 2k file directories tend to slow open for me in the Explorer. If they are image files, open more than 1k speed will be very slow in the thumbnail view.

Once the system imposed limit of 32,767. Now higher, but even in most cases, one can not deal with too many files.


House # 11

Depending on the specific file system used on Linux servers. Today, with the default ext3 is dir_index, which makes searching large directories very fast.

Therefore, in addition to the speed you have already mentioned, the speed should not be a problem, because it will take longer time to market.

The total number of files in a directory is limited. I seem to remember it certainly can process 32,000 files.


House # 12

It really depends on the file system used and some flags.

For example, ext3 can have thousands of files. But after thousands of its past runs very slowly. Usually listed in the directory, but when you open a single file. A few years ago, it won the "htree" option, greatly reducing the time to achieve a given filename inode required.

Personally, I use the most subdirectory level remained at about one thousand. In your case, I will create a 256 directory, and use the last two hexadecimal digits of the ID. Use the last rather than the first one, so you can make load balancing.


House # 13

I remember running a program to create a large number of files in the output. Sort files in each directory of 30,000. I do not remember reading any problems encountered when output produced must be reused. It is on 32-bit Ubuntu Linux laptop, even if after a few seconds Nautilus still display the contents of a directory.

ext3 file system: on 64-bit systems, similar code works well with each directory file 64000.


House # 14

The FAT32 :

  • Maximum number of files: 268,173,300
  • The maximum number of files per directory: Yue 2 16 Ri Zhi 1 Ri (65535)
  • The maximum file size: 2 GiB-1 (without LFS) ,. 1. 4-GiB (with LFS)

NTFS

  • The maximum number of files: 2 32 - 1 (4,294,967,295)
  • The maximum file size
    • Embodiment: 2 44 is - 2 . 6 bytes (TiB 16 a - 64 KIB)
    • Theory: 2 64 - 2 . 6 bytes (16 EIB - 64 KIB)
  • The maximum volume
    • Embodiment: 2 32 - 1 cluster (256 TiB - 64 KIB)
    • Theory: 2 two 64-1 clusters (1 YiB-64 KiB)

ext2

  • Maximum number of files: 10 18
  • The maximum number of files per directory: ~1.3 × 10 20 is (more than 10,000 performance issues)
  • The maximum file size
    • 16 GiB (block size is 1 KiB)
    • 256 GiB (block size 2 KiB)
    • 2 TiB (block size is 4 KiB)
    • 2 TiB (the block size is 8 KiB)
  • The maximum volume
    • 4 TiB (block size is 1 KiB)
    • 8 TiB (the block size is 2 KiB)
    • 16 TiB (block size is 4 KiB)
    • 32 TiB (the block size is 8 KiB)

of ext3 :

  • The maximum number of files: min (volumeSize / 2 13 is , the NumberOfBlocks)
  • Maximum file size: the same as ext2
  • The maximum volume size: the same as ext2

ext4 :

  • The maximum number of files: 2 32 - 1 (4,294,967,295)
  • The maximum number of files per directory: Unlimited
  • The maximum file size: 2 44 is - 1 byte (16 TiB - 1)
  • Maximum volume: 2 48 - 1 byte (256 TiB - 1)

House # 15

If the directory partition scheme to achieve the time it takes to get, then I am in favor of it. The first time debugging problems involving operations 10 000 files in the directory through the console, you will learn.

For example, F-Spot photo files as YYYY \\ MM \\ DD \\ filename.ext, which means the largest directory in manual operation ~20000 set of photos I have to deal with about 800 files. It also makes it easier to browse files from third-party applications. Never think your software is the only way to access the software files.


House # 16

Remember, on Linux, if the directory contains too many files, the shell may not be extended wildcards. I hosted on Linux album experience this problem. It will resize all the images stored in a directory. Although the file system can handle many documents, but Shell could not. example:

-shell-3.00$ ls A*
-shell: /bin/ls: Argument list too long

Or

-shell-3.00$ chmod 644 *jpg
-shell: /bin/chmod: Argument list too long

House # 17

It absolutely depends on the file system. Many modern file systems use a decent data structure to store the contents of a directory, but the older file system usually just add an entry to the list, therefore retrieve files is O (n) operation.

Even if the file system is doing the right thing, to list the contents of the program is absolutely possible to mess up and O (n ^ 2) ordering, therefore, to be safe, I will always limit the number of each file. Catalog no more than 500.


House # 18

I think this can not completely answer your question of how many, but an idea to solve long-term problems, in addition to the original file metadata stored outside, but also stored on disk storage folder - delete that part of the standardized metadata. Once the folder size exceeds the limit, whether for performance, aesthetic or other reasons, you can use it, you simply create a second folder and began to put the file there ...


House # 19

I am working on a similar problem. We have a hierarchical directory structure, and using the image ID as the file name. For example, the id=1234567image is placed in

..../45/67/1234567_<...>.jpg

Using last 4 digits to determine the location of the file.

For thousands of images, you can use a hierarchy level. We recommend that system administrators add up to several thousand files in any given directory (ext3) in order to improve efficiency / backup / other reasons he thought.


House # 20

I have encountered the same problem. Attempt to store millions of files in the Ubuntu ext4 server. Ended run your own benchmarks. The plane was found in the directory using a simpler and better performance at the same time:

Datum

I wrote an article .


House # 21

I have more than 8 million ext3 file in a directory. find, lsLibc most of the other methods discussed in this thread and used readdir()lists a large directory.

In this case, lsand findthe reason is slow readdir()read-only directory entries 32K, so on slow disk, you need to read a lot of times in order to be listed. There are ways to solve this problem of speed. I wrote a very detailed article at the following locations: HTTP : //www.olark.com/spw/2011/08/you-can-list-a-directory-with-8-million-files-but-not -with- ls /

The key take away is: the use of getdents()direct - http://www.kernel.org/doc/man-pages/online/pages/man2/getdents.2.html This is based on libc, rather than what the readdir()reason, you can specify the size of the buffer is read from the disk directory entries.

Original articles published 0 · won praise 0 · Views 2197

Guess you like

Origin blog.csdn.net/p15097962069/article/details/103935382