Tuesday, January 24, 2012

block size / files size

Today I was wondering about the optimal blocksize and typical file sizes

The filesystem I am presenlty concerned about contains our big source tree (Java), and its compiled classes. I am trying to optimize the continuous compilation and integration system (Jenkins). This things report to developpers wether the build is broken or tests fail. We want it to run as frequently as possible.

I wrote a short python script to find file size ranges:


#!/usr/bin/python
import os
d=os.popen("find . -type f -exec ls -ld {} \; ")      # ls all files through find


sizes= [ int( line.split(' ')[4] ) for line in d.readlines() ]      #size is 4th field in ls output: '-rwxr-xr-x 1 hudson hudson 120 Jan 23 15:56 ./file1'
d.close()


keys=[ pow(2,n) for n in range(8,32)]      # [ 256,512,1024....
NbFilesPerSizeRange={}    # this will contain count for each range, starting with zero
for k in keys :
NbFilesPerSizeRange[k]=0


for s in sizes:                          
for srange in keys:       #then just count
if s< srange :
NbFilesPerSizeRange[srange]+=1
break


csvTxt="".join([ "%d\n" %NbFilesPerSizeRange[s] for s in keys ])
print csvTxt


file sizes in Java + classes tree of our project

It turns out the default suggested by mkfs  ( 4096 bytes block for that disk) is not too bad.


While I was at it, I wondered what are the typical file sizes in the Linux distribution?
( Here it is Red Hat RHEL5.6 )
I slightly modified the script and got this result:
                          
file sizes in a Red Hat distribution







No comments:

Post a Comment