Monday, June 30, 2014

How long to complete this command ?

Running some heavy command line operations, you have usually no idea when it will complete...

Actually you can probably find out!... if you're on Linux and it is a file-related operations.

1) Find out how much is done


For example if I am running this restore:

$ zcat dump_MYDB_20140501_08h00.sql.gz | mysql MYDB

Under Linux I can actually find out how much of the dump file has been read, and therefore guess the speed.

Using 'top' or 'ps' I can easily find the PID of the process doing the read, here it is "zcat"

  PID USER    PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
20425 phil    26   0  4172  516  348 R  2.0  0.0   0:15.18 zcat

Then from /proc/<PID>/fd   one can read the current status of its file descriptors

$ ls -l /proc/20425/fd

total 0lrwx------ 1 phil g 64 May  7 14:17 0 -> /dev/pts/5
l-wx------ 1 phil g 64 May  7 14:17 1 -> pipe:[69451820]
lrwx------ 1 phil g 64 May  7 14:05 2 -> /dev/pts/5
lr-x------ 1 phil g 64 May  7 14:17 3 -> /var/dump_MYDB_20140501_08h00.sql.gz


I see that the file descriptor 3 is used to read the file. With recent Linux kernels I can also check the current position of each file descriptor, in the /proc/<PID>/fdinfo directory:

$ cat /proc/20425/fdinfo/3
pos:    353861632
flags:  0100000


I just have to compare to the full size of my dump file:

$ ls -l /var/lib/mysql/backups/dump_MYDB_20140501_08h00.sql.gz
-rw-r--r-- 1 phil g 686830066 May  6 13:00 /var/dump_MYDB_20140501_08h00.sql.gz


So at this point the process has read 353 861 632 bytes out of 686 830 066 , so we're roughly 50% done!

2) Compute an approximate speed, and finishing time


It is now easy to do several readings and approximate the speed:

$ cat fdinfo/3 ; sleep 10 ; cat fdinfo/3
pos:    372080640
flags:  0100000
pos:    447578112
flags:  0100000


$ echo $(( (447578112-372080640)/10 ))
7549747


I'm processing the file at 7MB/s  (Yes pathetic I know... I'd love to investigate this damn SAN!)

(Of course it is an approximation because the processing speed may not be linear. In this very example the speed depends on the nature of the tables and indexes. Perhaps the end of the dump contain more complex ones.)

Since I still have to read the remainder of the file at this approximate speed, I can compute how long it would take.

$ echo $(( (686830066-447578112)/7549747 ))
31


...Well I shall be done in about 31 seconds!

OK back to work! Cheers!