Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
PostPosted: Mon Jul 16, 2012 4:32 pm 
Offline
Junior Member

Joined: Mon Apr 18, 2011 1:54 pm
Posts: 45
Website: http://www.rassoc.com/gregr/weblog
I'm running some nodes with Ubuntu 10.04, 32-bit, with an ext3 file system. Is there a limit to the number of files I can have in a single directory?

I've found conflicting information on this, so I thought I'd ask here. I know the limit isn't 32K files, because I've already got 136K files in one directory. I'm trying to figure out if there is any downside, or any limitation, to letting this continue to grow...


Top
   
PostPosted: Mon Jul 16, 2012 6:26 pm 
Offline
Senior Member
User avatar

Joined: Sat Aug 30, 2008 1:55 pm
Posts: 1739
Location: Rochester, New York
No, although sometimes dealing with the directory can get cumbersome and slow. Make sure you have the "dir_index" filesystem feature enabled, particularly if ls is starting to get quite sluggish.

_________________
Code:
/* TODO: need to add signature to posts */


Top
   
PostPosted: Mon Jul 16, 2012 9:09 pm 
Offline
Senior Member

Joined: Wed May 13, 2009 1:18 am
Posts: 681
gregr wrote:
I've found conflicting information on this, so I thought I'd ask here. I know the limit isn't 32K files, because I've already got 136K files in one directory. I'm trying to figure out if there is any downside, or any limitation, to letting this continue to grow...

You may have heard about 32K since that's the limit (in ext3, ext4 increases that) on the number of sub-directories within a single directory.

There's no hard limit on files in a directory though, as long as you have free inode blocks on the filesystem. So it does depend on the initial filesystem creation of inode space, but that's usually more than enough for the practical number of files that would use up the actual data space. You can use "df -i" to see how things stand on the filesystem in question.

As for downsides, it's mostly a question of performance at very large sizes, which in turn will depend on the applications being used and whether or not they become inefficient in processing very large numbers of files.

For myself, I don't tend to let single directories get into the hundreds of thousands, even if it works. In such cases, management can become more complex (getting efficient ls output, etc...). But there's normally a pretty easy way to slice up such storage.

For example, with such large sets of files, there's usually some pattern to the naming, and if an even distribution, you can create an extra level of sub-directory using the first character or two of the filename. So a file of an arbitrary name is still trivial to locate (including the containing directory), but you divide things up into smaller chunks of files.

-- David


Top
   
PostPosted: Tue Jul 17, 2012 11:27 am 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
If you've got 136k files in a single directory, you might want to be asking yourself if that should be living in a database instead.


Top
   
PostPosted: Wed Jul 18, 2012 6:09 pm 
Offline
Junior Member

Joined: Mon Apr 18, 2011 1:54 pm
Posts: 45
Website: http://www.rassoc.com/gregr/weblog
Awesome - thanks for the info, guys.

Quote:
There's no hard limit on files in a directory though, as long as you have free inode blocks on the filesystem. So it does depend on the initial filesystem creation of inode space, but that's usually more than enough for the practical number of files that would use up the actual data space. You can use "df -i" to see how things stand on the filesystem in question.


Running df -i shows I'm using 20% of my available inodes, while I'm 50% out of disk space, so seems like there's not a immediate danger of running out. Glad I know this now, though, so I can keep an eye on it. :-)

Quote:
As for downsides, it's mostly a question of performance at very large sizes, which in turn will depend on the applications being used and whether or not they become inefficient in processing very large numbers of files.


Makes sense. I do notice things are a little sluggish when trying to, say, auto-complete filenames from the shell, or doing a ls of some sort; however, my application never tries to list the files - it always knows the exact filename it's looking for, and performance doesn't seem to be suffering.

Quote:
For example, with such large sets of files, there's usually some pattern to the naming, and if an even distribution, you can create an extra level of sub-directory using the first character or two of the filename. So a file of an arbitrary name is still trivial to locate (including the containing directory), but you divide things up into smaller chunks of files.


Yep, that's actually what I'm planning to do, which also gives me natural partitions to move this out over multiple servers when it becomes necessary.

Quote:
If you've got 136k files in a single directory, you might want to be asking yourself if that should be living in a database instead.


It's actually done this way on purpose - it's essentially blob data, I always need an entire blob at one time, and it's not relational in any sense. In general, I've found relational databases to be the most expensive way (in terms of I/O and CPU) to store and access data like this...when in this particular case, file system access is efficient and cheap.

Thanks everyone!


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group