SlideShare a Scribd company logo
3 bash one-liners
Carlo “zED” Caputo
carlo.caputo@gmail.com
1
To lá eu fazendo um dump de banco...
$ mongoexport -d pseudo -c games | gzip > dump.json.gz
connected to: 127.0.0.1
Demorou mais de 10 segundos...
$ mongoexport -d pseudo -c games | gzip > dump.json.gz
connected to: 127.0.0.1
^C
Tá, preciso mostrar o progresso...
$ mongoexport -d pseudo -c games | gzip | pv > dump.json.gz
connected to: 127.0.0.1
7.83MB 0:00:05 [1.75MB/s] [ <=> ]
Hunnn... Ainda não sei quanto falta.
$ mongoexport -d pseudo -c games | pv -c -N raw | gzip | pv -c
-N gz > dump.json.gz
connected to: 127.0.0.1
raw: 54.5MB 0:00:07 [8.28MB/s] [ <=> ]
gz: 11.1MB 0:00:07 [1.72MB/s] [ <=> ]
Ah, eu sei quantas linhas são...
$ ( export H=localhost D=pseudo C=games; export
F="dump-$H-$D-$C-$(date +%Y%m%d_%H%M%S)";
mongoexport -h $H -d $D -c $C 2> $F.log | pv -c -N raw | pv -c
-N rows -l -s "$( echo -e "use $D;ndb.$C.find().count();" |
mongo $H 2>> $F.log | grep "switched to db $D" -A1 | tail -1 )" |
gzip | pv -c -N gz | tee $F.gz | md5sum > $F.md5 )
raw: 78.4MB 0:00:10 [7.97MB/s] [ <=> ]
gz: 16.1MB 0:00:10 [1.69MB/s] [ <=> ]
rows: 46.7k 0:00:10 [4.59k/s] [> ] 63% ETA 0:00:05
Mas tá muito idle ainda....
top - 23:19:46 up 53 days, 17 min, 24 users, load average: 0.15, 0.12, 0.09
Tasks: 334 total, 2 running, 331 sleeping, 0 stopped, 1 zombie
Cpu0 : 33.3%us, 8.8%sy, 0.0%ni, 57.9%id, 0.0%wa, 0.0%hi, 0.0%si,
Cpu1 : 2.3%us, 1.0%sy, 0.3%ni, 94.2%id, 2.3%wa, 0.0%hi, 0.0%si,
Cpu2 : 85.2%us, 4.6%sy, 0.0%ni, 10.2%id, 0.0%wa, 0.0%hi, 0.0%si,
Cpu3 : 2.6%us, 1.0%sy, 0.3%ni, 94.1%id, 2.0%wa, 0.0%hi, 0.0%si,
Mem: 4107620k total, 3687764k used, 419856k free, 278800k buffers
Swap: 4192960k total, 363512k used, 3829448k free, 658816k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24448 zed 20 0 17688 4300 2880 R 98 0.1 0:10.45 mongoexport
24451 zed 20 0 2052 504 344 S 31 0.0 0:03.20 gzip
24449 zed 20 0 4056 644 568 S 10 0.0 0:01.04 pv
24450 zed 20 0 4056 656 576 S 8 0.0 0:00.81 pv
3587 mongodb 20 0 804m 120m 119m S 1 3.0 0:08.59 mongod
24456 zed 20 0 3892 508 440 S 1 0.0 0:00.07 md5sum
24454 zed 20 0 3884 548 480 S 1 0.0 0:00.05 tee
24452 zed 20 0 4056 660 568 S 0 0.0 0:00.03 pv
Vamos ver...
$ time --help
--help: command not found
$ type time
time is a shell keyword
$ help time
time: time [-p] pipeline
Report time consumed by pipeline's execution.
$ /usr/bin/time --help
Usage: /usr/bin/time [-apvV] [-f format] [-o file] [--append] [--verbose]
[--portability] [--format=format] [--output=file] [--version]
[--quiet] [--help] command [arg...]
Ei, espera aí, que tal usar xz?!
$ mongoexport -d pseudo -c games | head -c 10000 | tee >( (
/usr/bin/time -f 'gz: %E real, %U user, %S sys' gzip -9 | wc -c |
sed 's/^/gz: /' ) 1>&2) | /usr/bin/time -f 'xz: %E real, %U user,
%S sys' xz -9 | wc -c | sed 's/^/xz: /'
connected to: 127.0.0.1
gz: 0:00.24 real, 0.00 user, 0.00 sys
gz: 2078
xz: 0:00.31 real, 0.02 user, 0.04 sys
xz: 2012
Melhor limitar por tempo do que por
bytes. Mas como parar?...
$ { ( mongoexport -d pseudo -c games | tee >(wc -c | sed
's/^/raw: /' 1>&2) | tee >( ( /usr/bin/time -f 'gz: %E real, %U
user, %S sys' gzip -9 | wc -c | sed 's/^/gz: /' ) 1>&2) |
/usr/bin/time -f 'xz: %E real, %U user, %S sys' xz -9 | wc -c |
sed 's/^/xz: /' ) & } ; sleep 9 ; kill -INT %%
[1] 12289
connected to: 127.0.0.1
$ raw: 14623779
gz: 0:08.99 real, 0.75 user, 0.00 sys
gz: 2706591
Command terminated by signal 2
xz: 0:09.02 real, 8.78 user, 0.17 sys
Ahh, aí está o número perdido!
$ { ( mongoexport -d pseudo -c games | tee >(wc -c | sed
's/^/raw: /' 1>&2) | tee >( ( /usr/bin/time -f 'gz: %E real, %U
user, %S sys' gzip -9 | wc -c | sed 's/^/gz: /' ) 1>&2) |
/usr/bin/time -f 'xz: %E real, %U user, %S sys' xz -9 | wc -c |
sed 's/^/xz: /' ) & } ; sleep 9 ; kill -INT $(pstree -p $! | sed -n
's/.*mongoexport(([0-9]*)).*/1/p')
[1] 12724
connected to: 127.0.0.1
$ raw: 14497402
gz: 0:09.10 real, 0.73 user, 0.01 sys
gz: 2695783
xz: 0:09.18 real, 8.73 user, 0.16 sys
xz: 2021928
Mas deixa eu formatar isso melhor...
$ ( { ( { mongoexport -d pseudo -c games 2>/dev/null | tee >(wc
-c | sed 's/^/s_raw=/' 1>&2) | tee >( ( /usr/bin/time -f
't_gz=%U+%S' gzip -9 | wc -c | sed 's/^/s_gz=/') 1>&2) |
/usr/bin/time -f 't_xz=%U+%S' xz -9 | wc -c | sed 's/^/s_xz=/' ;
echo 'scale=2; print "nxz is ", t_xz/t_gz; r_sz=s_gz/s_xz;
r_gz=s_gz/s_raw; r_xz=s_xz/s_raw; scale=0; print " times
slower and ", (r_sz-1)/.01,"% smallerncompression ratios:
gz=",(1-r_gz)/.01,"%, xz=",(1-r_xz)/.01,"%n"' ; } 2>&1 | bc ) & } ;
sleep 9 ; kill -INT $(pstree -p $! | sed -n
's/.*mongoexport(([0-9]*)).*/1/p') 2>/dev/null )
xz is 12.02 times slower and 33% smaller
compression ratios: gz=82%, xz=87%
Ai... :/ será que tem algum outro nível
de compressão que valha a pena?
$ ( { ( { { echo -n 'mongoexport -d pseudo -c games 2>/dev/null | tee' ; for
z in 1 2 ; do for l in $(seq 0 9); do test $z = 2 -a $l = 0 || echo -n "
>(/usr/bin/time -f 't[$z$l]=%U+%S' $(test $z == 2 && echo gzip || echo xz)
-$l | wc -c | sed 's/^/s[$z$l]=/' 1>&2)" ; done ; done ; echo -n ' | cat >
/dev/null' ; } | bash 2>&1 ; echo 'scale=4; for(p=1;p<=13;p+=6) {
for(x=1;x<=2;x++) { for(l=x-1;l<=9;l++) { i=x*10+l; s=s[i]/s[26]; t=t[i]/t[26];
print s^p*t," (s^",p,"*t) "; if(x==2) print "gzip" else print "xz"; print" -",l," is
",t,"x time and ",s,"x size of gzip -6n" } } }' ; } | bash <( echo -n
'BC_LINE_LENGTH=0 bc | tee ' ; for p in $(seq 1 6 13) ; do echo -n "
>(grep 's^$p.t' | sort -n | head -1 | cut -d ' ' -f 2- 1>&2)" ; done ; echo -n ' >
/dev/null' ) ) & } ; sleep 9 ; kill -INT $(pstree -p $! | sed -n
's/.*mongoexport(([0-9]*)).*/1/p'); while wait -n ; do : ; done )
(s^13*t) xz -6 is 17.2978x time and .5913x size of gzip -6
(s^7*t) xz -1 is 2.4042x time and .7802x size of gzip -6
(s^1*t) gzip -2 is .5957x time and 1.1931x size of gzip -6
pstree dos comandos no slide anterior
bash─┬─bash───bash─┬─cat
│ ├─mongoexport
│ └─tee─┬─10*[bash─┬─sed]
│ │ ├─time───xz]
│ │ └─wc]
│ └─9*[bash─┬─sed]
│ ├─time───gzip]
│ └─wc]
└─bash─┬─bc
└─tee───3*[bash─┬─cut]
├─grep]
├─sort]
└─tail]
código bash, resultado do primeiro trecho
mongoexport -d pseudo -c games 2>/dev/null | tee
>(/usr/bin/time -f 't[11]=%U+%S' xz -1 | wc -c | sed 's/^/s[11]=/' 1>&2)
>(/usr/bin/time -f 't[12]=%U+%S' xz -2 | wc -c | sed 's/^/s[12]=/' 1>&2)
>(/usr/bin/time -f 't[13]=%U+%S' xz -3 | wc -c | sed 's/^/s[13]=/' 1>&2)
>(/usr/bin/time -f 't[14]=%U+%S' xz -4 | wc -c | sed 's/^/s[14]=/' 1>&2)
>(/usr/bin/time -f 't[15]=%U+%S' xz -5 | wc -c | sed 's/^/s[15]=/' 1>&2)
>(/usr/bin/time -f 't[16]=%U+%S' xz -6 | wc -c | sed 's/^/s[16]=/' 1>&2)
>(/usr/bin/time -f 't[17]=%U+%S' xz -7 | wc -c | sed 's/^/s[17]=/' 1>&2)
>(/usr/bin/time -f 't[18]=%U+%S' xz -8 | wc -c | sed 's/^/s[18]=/' 1>&2)
>(/usr/bin/time -f 't[19]=%U+%S' xz -9 | wc -c | sed 's/^/s[19]=/' 1>&2)
>(/usr/bin/time -f 't[21]=%U+%S' gzip -1 | wc -c | sed 's/^/s[21]=/' 1>&2)
>(/usr/bin/time -f 't[22]=%U+%S' gzip -2 | wc -c | sed 's/^/s[22]=/' 1>&2)
>(/usr/bin/time -f 't[23]=%U+%S' gzip -3 | wc -c | sed 's/^/s[23]=/' 1>&2)
>(/usr/bin/time -f 't[24]=%U+%S' gzip -4 | wc -c | sed 's/^/s[24]=/' 1>&2)
>(/usr/bin/time -f 't[25]=%U+%S' gzip -5 | wc -c | sed 's/^/s[25]=/' 1>&2)
>(/usr/bin/time -f 't[26]=%U+%S' gzip -6 | wc -c | sed 's/^/s[26]=/' 1>&2)
>(/usr/bin/time -f 't[27]=%U+%S' gzip -7 | wc -c | sed 's/^/s[27]=/' 1>&2)
>(/usr/bin/time -f 't[28]=%U+%S' gzip -8 | wc -c | sed 's/^/s[28]=/' 1>&2)
>(/usr/bin/time -f 't[29]=%U+%S' gzip -9 | wc -c | sed 's/^/s[29]=/' 1>&2)
| cat > /dev/null
t[29]=0.93+0.03
s[29]=1725598
t[28]=1.09+0.00
s[28]=1728327
t[27]=0.53+0.01
s[27]=1779913
t[26]=0.42+0.01
s[26]=1817520
t[25]=0.40+0.06
s[25]=1868768
t[24]=0.28+0.06
s[24]=2001968
t[23]=0.26+0.01
s[23]=2068779
t[22]=0.20+0.00
s[22]=2167255
t[21]=0.29+0.01
s[21]=2257515
scale=4;
for(p=1;p<=13;p+=6) {
for(x=1;x<=2;x++) {
for(l=x-1;l<=9;l++) {
i=x*10+l;
s=s[16]/s[i];
t=t[i]/t[16];
print s^p/t," (s^",p,"/t) ";
if(x==2) print "gzip"
else print "xz";
print" -",l," is ",t," slower
and ",s," smaller than gzip
-6n"
} } }
t[19]=8.87+0.25
s[19]=1067488
t[18]=8.75+0.31
s[18]=1067488
t[17]=8.67+0.25
s[17]=1067488
t[16]=8.56+0.25
s[16]=1072596
t[15]=4.75+0.15
s[15]=1336496
t[14]=3.12+0.12
s[14]=1465184
t[13]=2.92+0.10
s[13]=1306504
t[12]=1.60+0.04
s[12]=1350200
t[11]=1.06+0.04
s[11]=1419720
t[10]=0.84+0.00
s[10]=1612572
código bc, resultado do primeiro e segundo trechos
entrada do ultimo script bash
1.7356 (s^1*t) xz -0 is 1.9565x time and .8871x size of gzip -6
1.7317 (s^1*t) xz -1 is 2.2173x time and .7810x size of gzip -6
2.9388 (s^1*t) xz -2 is 3.9565x time and .7428x size of gzip -6
3.9534 (s^1*t) xz -3 is 5.5000x time and .7188x size of gzip -6
5.6251 (s^1*t) xz -4 is 6.9782x time and .8061x size of gzip -6
7.9124 (s^1*t) xz -5 is 10.7608x time and .7353x size of gzip -6
11.5197 (s^1*t) xz -6 is 19.5217x time and .5901x size of gzip -6
11.5289 (s^1*t) xz -7 is 19.6304x time and .5873x size of gzip -6
11.6693 (s^1*t) xz -8 is 19.8695x time and .5873x size of gzip -6
11.7460 (s^1*t) xz -9 is 20.0000x time and .5873x size of gzip -6
.6479 (s^1*t) gzip -1 is .5217x time and 1.2420x size of gzip -6
.4924 (s^1*t) gzip -2 is .4130x time and 1.1923x size of gzip -6
.7174 (s^1*t) gzip -3 is .6304x time and 1.1381x size of gzip -6
.8379 (s^1*t) gzip -4 is .7608x time and 1.1014x size of gzip -6
1.0056 (s^1*t) gzip -5 is .9782x time and 1.0281x size of gzip -6
1.0000 (s^1*t) gzip -6 is 1.0000x time and 1.0000x size of gzip -6
1.4688 (s^1*t) gzip -7 is 1.5000x time and .9792x size of gzip -6
2.1495 (s^1*t) gzip -8 is 2.2608x time and .9508x size of gzip -6
2.5589 (s^1*t) gzip -9 is 2.6956x time and .9493x size of gzip -6
.8457 (s^7*t) xz -0 is 1.9565x time and .8871x size of gzip -6
.3929 (s^7*t) xz -1 is 2.2173x time and .7810x size of gzip -6
.4933 (s^7*t) xz -2 is 3.9565x time and .7428x size of gzip -6
.5450 (s^7*t) xz -3 is 5.5000x time and .7188x size of gzip -6
1.5428 (s^7*t) xz -4 is 6.9782x time and .8061x size of gzip -6
1.2504 (s^7*t) xz -5 is 10.7608x time and .7353x size of gzip -6
.4860 (s^7*t) xz -6 is 19.5217x time and .5901x size of gzip -6
.4730 (s^7*t) xz -7 is 19.6304x time and .5873x size of gzip -6
.4788 (s^7*t) xz -8 is 19.8695x time and .5873x size of gzip -6
.4820 (s^7*t) xz -9 is 20.0000x time and .5873x size of gzip -6
2.3783 (s^7*t) gzip -1 is .5217x time and 1.2420x size of gzip -6
1.4146 (s^7*t) gzip -2 is .4130x time and 1.1923x size of gzip -6
1.5591 (s^7*t) gzip -3 is .6304x time and 1.1381x size of gzip -6
1.4958 (s^7*t) gzip -4 is .7608x time and 1.1014x size of gzip -6
1.1875 (s^7*t) gzip -5 is .9782x time and 1.0281x size of gzip -6
1.0000 (s^7*t) gzip -6 is 1.0000x time and 1.0000x size of gzip -6
1.2946 (s^7*t) gzip -7 is 1.5000x time and .9792x size of gzip -6
1.5879 (s^7*t) gzip -8 is 2.2608x time and .9508x size of gzip -6
1.8726 (s^7*t) gzip -9 is 2.6956x time and .9493x size of gzip -6
.4120 (s^13*t) xz -0 is 1.9565x time and .8871x size of gzip -6
.0891 (s^13*t) xz -1 is 2.2173x time and .7810x size of gzip -6
.0826 (s^13*t) xz -2 is 3.9565x time and .7428x size of gzip -6
.0748 (s^13*t) xz -3 is 5.5000x time and .7188x size of gzip -6
.4228 (s^13*t) xz -4 is 6.9782x time and .8061x size of gzip -6
.1969 (s^13*t) xz -5 is 10.7608x time and .7353x size of gzip -6
.0195 (s^13*t) xz -6 is 19.5217x time and .5901x size of gzip -6
.0176 (s^13*t) xz -7 is 19.6304x time and .5873x size of gzip -6
.0178 (s^13*t) xz -8 is 19.8695x time and .5873x size of gzip -6
.0180 (s^13*t) xz -9 is 20.0000x time and .5873x size of gzip -6
8.7297 (s^13*t) gzip -1 is .5217x time and 1.2420x size of gzip -6
4.0640 (s^13*t) gzip -2 is .4130x time and 1.1923x size of gzip -6
3.3880 (s^13*t) gzip -3 is .6304x time and 1.1381x size of gzip -6
2.6702 (s^13*t) gzip -4 is .7608x time and 1.1014x size of gzip -6
1.4024 (s^13*t) gzip -5 is .9782x time and 1.0281x size of gzip -6
1.0000 (s^13*t) gzip -6 is 1.0000x time and 1.0000x size of gzip -6
1.1413 (s^13*t) gzip -7 is 1.5000x time and .9792x size of gzip -6
1.1731 (s^13*t) gzip -8 is 2.2608x time and .9508x size of gzip -6
1.3704 (s^13*t) gzip -9 is 2.6956x time and .9493x size of gzip -6
código bash, resultado do terceiro trecho
BC_LINE_LENGTH=0 bc | tee
>(grep 's^1.t' | sort -n | head -1 | cut -d ' ' -f 2- 1>&2)
>(grep 's^7.t' | sort -n | head -1 | cut -d ' ' -f 2- 1>&2)
>(grep 's^13.t' | sort -n | head -1 | cut -d ' ' -f 2- 1>&2)
> /dev/null
Como tirar o race-condition?
$ while true ; do { { { for d in {1,2}{1,2}{0..9} ; do echo -n " { {
echo -n "$d: " ; ls /dev/fd/ ; } 1>&2 ; " ; done ; echo -n 'seq 2000
2>/dev/null | tee' ; for z in 1 2 ; do for l in $(seq 0 9); do test $z
= 2 -a $l = 0 || echo -n " >(/usr/bin/time -f 't[$z$l]=%U+%S'
2>&2$z$l $(test $z == 2 && echo gzip || echo xz) -$l | wc -c |
sed 's/^/s[$z$l]=/' 1>&1$z$l)" ; done ; done ; echo -n '
>/dev/null' ; for d in {1,2}{1,2}{0..9} ; do echo -n " ; } $d>&1 |
grep '' " ; done ; } | tee /dev/fd/2 | bash ; echo 123 ; } | sed
's/^/b:/' | grep =.*= ; } 2>/dev/null ; done
Mais consistência no teste dos
parâmetros de compressão
$ ( { { { { for d in {1,2}{1,2}{0..9} ; do echo -n " { " ; done ; echo -n
'mongoexport -d pseudo -c games 2>/dev/null | tee' ; for z in 1 2 ; do for l in
$(seq 0 9); do test $z = 2 -a $l = 0 || echo -n " >(/usr/bin/time -f
't[$z$l]=%U+%S' 2>&2$z$l $(test $z == 2 && echo gzip || echo xz) -$l | wc -c
| sed 's/^/s[$z$l]=/' 1>&1$z$l)" ; done ; done ; echo -n ' > /dev/null' ; for d in
{1,2}{1,2}{0..9} ; do echo -n " ; } $d>&1 | sed '' " ; done ; } | bash ; echo
'scale=4; for(p=1;p<=13;p+=6) { for(x=1;x<=2;x++) { for(l=x-1;l<=9;l++) {
i=x*10+l; s=s[i]/s[26]; t=t[i]/t[26]; print s^p*t," (s^",p,"*t) "; if(x==2) print "gzip"
else print "xz"; print" -",l," is ",t,"x time and ",s,"x size of gzip -6n" } } }' ; } |
bash <( echo -n 'BC_LINE_LENGTH=0 bc | { tee ' ; for p in $(seq 1 6 13) ; do
echo -n " >(grep 's^$p.t' | sort -n | tail -1 | cut -d ' ' -f 2- 1>&2)" ; done ; echo
-n ' >/dev/null ; } 2>&1 | grep "" ' ) ; } & } ; A=$! ; { { sleep 9 ; kill -INT $(pstree
-laAp $A | sed -n 's/^[^,]*<mongoexport,([0-9][0-9]*)>.*/1/p') ; } & } ; B=$! ;
wait $B || { echo "finishing..." ; kill -INT $( { pstree -laAp $A ; pstree -laAp $B ;
} | sed -n 's/^[^,]*,([0-9][0-9]*)>.*/1/p') ; } ; while wait -n ; do : ; done)
#xz-vs-gz
Bom, mas voltando ao dump do banco,
com xv fica assim:
$ ( export H=localhost D=pseudo C=games; export
F="dump-$H-$D-$C-$(date +%Y%m%d_%H%M%S)";
mongoexport -h $H -d $D -c $C 2> $F.log | pv -c -N raw | pv -c
-N rows -l -s "$( echo -e "use $D;ndb.$C.find().count();" |
mongo $H 2>> $F.log | grep "switched to db $D" -A1 | tail -1 )" |
xz | pv -c -N xz | tee $F.xz | md5sum > $F.md5 )
raw: 54.7MB 0:00:38 [ 1.4MB/s] [ <=> ]
xz: 7.85MB 0:00:38 [ 176kB/s] [ <=> ]
rows: 32.7k 0:00:38 [ 818/s ] [=> ] 44% ETA 0:00:47
Como está idle e o xv é o gargalo, o
negócio é paralelizar de algum jeito
top - 12:15:16 up 53 days, 13:13, 27 users, load average: 0.69, 0.24, 0.29
Tasks: 338 total, 3 running, 334 sleeping, 0 stopped, 1 zombie
Cpu0 : 99.7%us, 0.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
Cpu1 : 2.3%us, 1.3%sy, 0.0%ni, 96.4%id, 0.0%wa, 0.0%hi, 0.0%si,
Cpu2 : 18.3%us, 1.7%sy, 0.0%ni, 79.4%id, 0.7%wa, 0.0%hi, 0.0%si,
Cpu3 : 2.0%us, 1.3%sy, 0.0%ni, 96.7%id, 0.0%wa, 0.0%hi, 0.0%si,
Mem: 4107620k total, 1334364k used, 2773256k free, 24868k buffers
Swap: 4192960k total, 2373236k used, 1819724k free, 259280k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20507 zed 20 0 99420 93m 736 R 100 2.3 1:06.74 xz
20504 zed 20 0 17688 4304 2880 S 17 0.1 0:11.88 mongoexport
20506 zed 20 0 4056 780 576 S 2 0.0 0:01.45 pv
20505 zed 20 0 4056 732 576 S 0 0.0 0:00.25 pv
pixz ou parallel?
$ git clone https://github.com/vasi/pixz.git
$ cd pixz/
$ make
pixz.h:1: fatal error: lzma.h: No such file or directory
$ sudo apt-get install lzma-dev
Setting up lzma-dev (4.43-14ubuntu2) ...
$ make
pixz.h:1: fatal error: lzma.h: No such file or directory
$ grep lzma README
* liblzma 4.999.9-beta-212 or later (from the xz
distribution)
parallel? não?..
$ parallel
The program 'parallel' is currently not installed. You
can install it by typing: sudo apt-get install moreutils
$ apt-get install moreutils
...
$ parallel --help
parallel: invalid option -- '-'
parallel [OPTIONS] command -- arguments
for each argument, run command with argument, in
parallel
parallel [OPTIONS] -- commands
run specified commands in parallel
Bem vindo ao mundo real...
$ wget http://ftp.gnu.org/gnu/parallel/parallel-20110822.tar.bz2
$ tar xvf parallel-20110822.tar.bz2
$ cd parallel-20110822
$ ./configure && make -j 4
$ ./src/parallel --help
Usage:
parallel [options] [command [arguments]] < list_of_arguments
parallel [options] [command [arguments]] (::: arguments|:::: argfile(s))...
cat ... | parallel --pipe [options] [command [arguments]]
-j n Run n jobs in parallel
-k Keep same order
Quantos jobs pra compressão?...
$ uptime ; for j in $(seq $(( $(grep -c ^bogomips /proc/cpuinfo) * 4
)) ) ; do echo "$j: $( ( time ( mongoexport -d pseudo -c games
2>/dev/null | head -20000 | parallel -j $j --pipe nice xz | xz -t ) )
2>&1 | tr "n" "t" )" ; done ; uptime
14:46:08 up 53 days, ... load average: 0.07, 1.29, 2.23
1: real 0m18.759s user 0m26.602s sys 0m2.448s
2: real 0m11.673s user 0m24.746s sys 0m1.916s
3: real 0m9.033s user 0m23.553s sys 0m1.460s
4: real 0m8.561s user 0m23.093s sys 0m1.468s
...
8: real 0m7.645s user 0m23.173s sys 0m1.356s
...
16: real 0m7.087s user 0m22.153s sys 0m1.260s
14:48:27 up 53 days, ... load average: 6.49, 2.88, 2.66
Achievement Unlocked
$ ( export H=localhost D=pseudo C=games; export
F="dump-$H-$D-$C-$(date +%Y%m%d_%H%M%S)";
mongoexport -h $H -d $D -c $C 2> $F.log | pv -c -N raw | pv -c
-N rows -l -s "$( echo -e "use $D;ndb.$C.find().count();" |
mongo $H 2>> $F.log | grep "switched to db $D" -A1 | tail -1 )" |
parallel --pipe nice xz | pv -c -N xz | tee $F.xz | md5sum >
$F.md5 ) #dump-mongo-collection
raw: 48.9MB 0:00:10 [5.64MB/s] [ <=> ]
xz: 6.38MB 0:00:10 [ 678kB/s] [ <=> ]
rows: 29.6k 0:00:10 [3.27k/s] [==> ] 40% ETA 0:00:14
2
E se eu quiser mandar pra net?
$ scp dump.xz pseudogames.com:
dump.xz 100% 97MB 32.3MB/s 00:03
Putz, mas e a integridade?
$ ssh pseudogames.com "xz -vd > dump" < dump-*.xz
(stdin): 96.9 MiB / 228.5 MiB = 0.424, 22 MiB/s, 0:10
Mas não queria descomprimir lá...
$ F="dump.xz"; ssh pseudogames.com "tee >(xz -tv) > '$F' " <
"$F"
Ueh?.. kd o output?
$ F="dump.xz"; ssh pseudogames.com "tee >(xz -tv) > '$F' " <
"$F"
$
Aff, a stderr nao passou pelo ssh,
claro... mas tudo bem.
$ time ( F="dump.xz"; ssh pseudogames.com "{ tee >(xz -vt) >
'$F' ; } 2>&1" < "$F" )
(stdin): 96.9 MiB / 228.5 MiB = 0.424, 22 MiB/s, 0:10
real 0m10.580s
user 0m2.963s
sys 0m0.243s
Soh que descomprimir o xz pra validar
eh overkill... que tal usar o md5sum?
$ F=dump.xz; pv "$F" | tee >(printf 'size:%-10dn' $(wc -c) )
>(echo "md5:$(md5sum)") | ssh pseudogames.com
'FIFO=$(mktemp -d); mkfifo $FIFO/{a,b}; trap "rm $FIFO/[ab];
rmdir $FIFO" 0; tee >(tail -c 56 | tee >( sed -n "s/size:(.*)/1/p"
>$FIFO/a ) | sed -n "s/md5:(.*)/1/p" >$FIFO/b ) | head -c $(cat
<$FIFO/a) | tee '"$F"' | md5sum -c $FIFO/b | cut -c 4-'
Droga, deadlock! E agora?...
$ F=dump.xz; pv "$F" | tee >(printf 'size:%-10dn' $(wc -c) )
>(echo "md5:$(md5sum)") | ssh pseudogames.com
'FIFO=$(mktemp -d); mkfifo $FIFO/{a,b}; trap "rm $FIFO/[ab];
rmdir $FIFO" 0; tee >(tail -c 56 | tee >( sed -n "s/size:(.*)/1/p"
>$FIFO/a ) | sed -n "s/md5:(.*)/1/p" >$FIFO/b ) | head -c $(cat
<$FIFO/a) | tee '"$F"' | md5sum -c $FIFO/b | cut -c 4-'
^C
kkkk, tem um jeito bem mais simples!
$ time ( F=dump.xz; pv "$F" | { tee >(md5sum 1>&2) | ssh
pseudogames.com 'tee '"$F"' | md5sum' ; } 2>&1 | sort -u | wc -l
| grep -qx 1 && echo OK || echo ERR )
96.9MB 0:00:03 [28.7MB/s] [===========>] 100%
OK
real 0m3.212s
user 0m2.373s
sys 0m0.307s
$ ### time ssh + xz:
$ #real 0m10.580s
$ #user 0m2.963s
$ #sys 0m0.243s
Jah tah bem mais leve;
que tal tirar o ssh tb?
$ { F="dump.xz" ; H="pseudogames.com" ; P=9999 ; pv "$F" |
tee >(md5sum 1>&2) >(wc -c 1>&2) >( sed -n '' | ssh "$H" "{
netcat $(set | grep [S]SH_CLIENT | sed "s/.*'([^ ]*) .*/1/") $P
| tee '$F' >(wc -c 1>&2) | md5sum ; } 2>&1" 1>&2 ) >(netcat -l
$P) >/dev/null ; } 2>&1 | tee >( sort -u | wc -l | grep -qx 2 &&
echo OK || echo ERR )
19405636
a84a3cd315f4ee235c5e15d82872ab5d -
19335168
85d99c783b541885b05ef263c1f35cdc -
ERR
Morreu antes de terminar o envio,
o receptor vai ter que autorizar o fim
$ { F="dump.xz" ; H="pseudogames.com" ; P=$(printf
"39%03d" $RANDOM | cut -c 1-5) ; { ssh "$H" "{ netcat $(set |
grep [S]SH_CLIENT | sed "s/.*'([^ ]*) .*/1/") $P | tee '$F' >(wc
-c 1>&2) | md5sum ; } 2>&1" 1>&2 ; } | { < "$F" tee >(md5sum
1>&2) >(wc -c 1>&2) >(netcat -l $P) >/dev/null ; } ; } 2>&1 | tee
>( sort -u | wc -l | grep -qx 2 && echo OK || echo ERR )
19405636
a84a3cd315f4ee235c5e15d82872ab5d -
19405636
a84a3cd315f4ee235c5e15d82872ab5d -
OK
Cuidados com o netcat
$ { F="dump.xz" ; H=pseudogames.com ; P=$(printf "39%03d"
$RANDOM | cut -c 1-5) ; S=$(stat -c %s "$F") ; { { pv "$F" | { tee
>(md5sum 1>&2) | { netcat -l -p $P 2>/dev/null || netcat -l $P ; } ; }
2>&1 ; } & } ; A=$! ; ssh $H "{ H=$(set | grep [S]SH_CLIENT | sed
"s/.*'([^ ]*) .*/1/") ; { netcat -w 5 -c $H $P 2>/dev/null || netcat -w 5
$H $P ; } | tee '$F' >(md5sum 1>&2) | pv -n -s $S 2>&1 >/dev/null |
while read X ; do test "$X" -ge 100 && kill -INT $(pstree -laAp $$ |
sed -n 's/^[^,]*<netcat,([0-9][0-9]*)>.*/1/p') &>/dev/null ; done ; }
2>&1" ; kill -INT $(pstree -laAp $A | sed -n
's/^[^,]*<netcat,([0-9][0-9]*)>.*/1/p') &>/dev/null ; } | uniq | wc -l |
grep -qx 1 && echo OK || echo ERR #netcat-send
537MB 0:00:04 [ 110MB/s] [=================>] 100%
OK
Mas serah que houve vantagem?
$ time !?#netcat-send
537MB 0:00:04 [ 110MB/s] [============>]
100%
real 0m5.298s
user 0m0.063s
sys 0m0.003s
OK
$ time !?scp
dump.xz 100% 537MB 35.8MB/s 00:15
real 0m14.658s
user 0m11.123s
sys 0m1.130s
3
E se forem vários arquivos?
$ find dumps/ -type f -printf "%10s %pn"
101632892 dumps/b/dump.xz.2
65536 dumps/b/a/dump.xz.1
0 dumps/a/3688
0 dumps/a/8958
(... total de 10000 arquivos no dumps/a/)
0 dumps/a/4308
0 dumps/a/3338
$ scp -r dumps/* pseudogames.com:dump-backup/
(...)
3338 100% 0 0.0KB/s 00:00
dump.xz.2 100% 97MB 48.5MB/s 00:02
dump.xz.1 100% 64KB 64.0KB/s 00:00
Mas quero selecionar melhor
$ tar c -C dumps/ --exclude=./a . | ssh pseudogames.com "tar
xv -C dump-backup/"
./
./b/
./b/dump.xz.2
./b/a/
./b/a/dump.xz.1
Progresso...
$ TAR="tar c -C dumps/ --exclude=./a ." ; $TAR | pv -s $($TAR |
wc -c) | ssh pseudogames.com "tar x -C dump-backup/"
97MB 0:00:02 [32.8MB/s] [================>] 100%
Não rola fazer o tar duas vezes...
$ S="dumps" ; D="dump-backup" ; X="--exclude=a" ;
H=pseudogames.com ; tar c -C "$S" $X . | pv -s $(cd "$S" &&
du --apparent-size --block-size=1 $X -s . | cut -f 1) | ssh $H "tar
x -C '$D' "
96.9MB 0:00:02 [33.5MB/s] [==============>] 100%
$ S="dumps" ; D="dump-backup" ; X="" ; H=pseudogames.com
; tar c -C "$S" $X . | pv -s $(cd "$S" && du --apparent-size
--block-size=1 $X -s . | cut -f 1) | ssh $H "tar x -C '$D' "
102MB 0:00:03 [33.7MB/s] [================] 104%
Mas assim tem muita imprecisão... :/
$ S="dumps" ; D="dump-backup" ; X="--exclude=./a
--exclude=./b/d*" ; H=pseudogames.com ; tar c -C "$S" $X . |
pv -s $(cd "$S" && du --apparent-size --block-size=1 $X -s . |
cut -f 1) | ssh $H "tar x -C '$D' "
70kB 0:00:00 [ 544kB/s] [====> ] 29%
$ S="dumps" ; D="dump-backup" ; X="--exclude=./b/d*" ;
H=pseudogames.com ; tar c -C "$S" $X . | pv -s $(cd "$S" &&
du --apparent-size --block-size=1 $X -s . | cut -f 1) | ssh $H "tar
x -C '$D' "
4.95MB 0:00:00 [7.77MB/s] [================] 1267%
Vou tentar incorporar o overhead do tar
na conta do tamanho total
$ for X in "('./a/*' './b/a/*')" "()" "('./a/*' './b/d*')" "('./b/d*')" ; do eval
"X=$X" ; E=""; N=""; for x in "${X[@]}" ; do test -n "$N" && N="$N -and"
; N="$N -not -path '$x'" ; E="$E --exclude='$x'" ; done ; echo "${X[@]}
=> $E => $N"; S="dumps" ; D="dump-backup" ; H=pseudogames.com
; eval "tar c -C '$S' $E ." | pv -s $(cd "$S" && eval "find . $N -printf '%y
%s %pn'" | awk 'BEGIN { sum=0 } { s=length($0)-length($2)-1+($1 ~
/f/ ? $2 : 0) ; c=512; r=s%c; s=s+(r?c-r:0) ; sum+=s } END {
chunk=10240; remain=sum%chunk; print
sum+(remain?chunk-remain:0) }') | ssh $H "tar x -C '$D' " ; done
#tar-pipe-ssh
96.9MB 0:00:02 [34.7MB/s] [====================>] 100%
102MB 0:00:02 [41.5MB/s] [====================>] 100%
70kB 0:00:00 [ 835kB/s] [====================>] 100%
4.95MB 0:00:00 [21.2MB/s] [====================>] 100%
0
Não tentem isso em casa
Nesse fantástico mundo dos one-liners se pode usar o history
como repositório de comandos; basta comentá-los com
tags memoráveis e deixar o history bem, bem grande:
$ export HISTSIZE=999999999 HISTFILESIZE=999999999
$ echo !! >> $(ls -S ~/.{,bash}{rc,{,_}profile} 2>/dev/null | head
-1)
$ ls -ltr #lista-recentes
...
$ !?#lista-re
ls -ltr #lista-recentes
...
Referência
Ubuntu 10.10
GNU bash, version 4.1.5(1)-release (i686-pc-linux-gnu)
Copyright (C) 2009 Free Software Foundation, Inc.
Arch Linux 2011-09
GNU bash, version 4.2.10(2)-release (i686-pc-linux-gnu)
Copyright (C) 2011 Free Software Foundation, Inc.
Windows 10.0.14393 (WSL)
GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2013 Free Software Foundation, Inc.
Obrigado
Obrigado
( export M=localhost D=pseudo H=pseudogames.com ; export F="dump-$M-$D-$(date
+%Y%m%d_%H%M%S)" P=$(printf "39%03d" $RANDOM | cut -c 1-5) ; { { { { echo -e
"use $D;nprint(db.getCollectionNames().reduce(function(a,b){ return
a+db[b].find().count()+1; }, 0)) ; db.getCollectionNames().forEach(function(a){print(a)});" |
mongo $M 2> /dev/null | grep -v " |^(system.indexes|bye)$" | { read N ; echo $N ; while
read C ; do echo "--- $C ---" ; mongoexport -h $M -d $D -c $C 2> /dev/null ; done ; } ; } | {
read N ; pv -c -l -s $N ; } | parallel --pipe nice xz | { tee >(md5sum 1>&2) | { netcat -l -p $P
2>/dev/null || netcat -l $P ; } ; } 2>&1 ; } & } ; A=$! ; ssh $H "{ H=$(set | grep
[S]SH_CLIENT | sed "s/.*'([^ ]*) .*/1/") ; { netcat -w 5 -c $H $P 2>/dev/null || netcat -w
5 $H $P ; } | tee '$F.xz' >(md5sum 1>&2) | xz -d | pv -l -n -s $N 2>&1 >/dev/null | while
read X ; do test "$X" -ge 100 && kill -INT $(pstree -laAp $$ | sed -n
's/^[^,]*<netcat,([0-9][0-9]*)>.*/1/p') &>/dev/null ; done ; } 2>&1" ; kill -INT $(pstree
-laAp $A | sed -n 's/^[^,]*<netcat,([0-9][0-9]*)>.*/1/p') &>/dev/null ; } | uniq | wc -l | grep
-qx 1 && echo "OK" || echo "ERR" ) #remote-database-backup
http://tldp.org/LDP/abs/html/
http://bashoneliners.com/
http://bash.org/

More Related Content

PDF
Kubernetes Tutorial
PPTX
Gevent rabbit rpc
PDF
Cooking pies with Celery
PDF
Do snow.rwn
DOCX
Gruntosoft windows
PPTX
Best Practices in Handling Performance Issues
PDF
ZeroMQ Is The Answer: DPC 11 Version
PDF
Containers for sysadmins
Kubernetes Tutorial
Gevent rabbit rpc
Cooking pies with Celery
Do snow.rwn
Gruntosoft windows
Best Practices in Handling Performance Issues
ZeroMQ Is The Answer: DPC 11 Version
Containers for sysadmins

What's hot (20)

PDF
R-House (LSRC)
PPTX
Groovy
PDF
Is HTML5 Ready? (workshop)
PDF
Степан Кольцов — Rust — лучше, чем C++
PDF
Writing Modular Command-line Apps with App::Cmd
PPTX
QA Fest 2019. Saar Rachamim. Developing Tools, While Testing
PDF
DOC
X64服务器 lnmp服务器部署标准 new
PPTX
ql.io: Consuming HTTP at Scale
PDF
計算機性能の限界点とその考え方
PDF
marko_go_in_badoo
PDF
Kubernetes + Docker + Elixir - Alexei Sholik, Andrew Dryga | Elixir Club Ukraine
PDF
Keep it simple web development stack
PDF
Tests unitaires pour PostgreSQL avec pgTap
PPTX
Dtalk shell
PDF
Parallel Computing With Dask - PyDays 2017
PDF
How to stand on the shoulders of giants
PDF
PFIセミナー資料 H27.10.22
PDF
Building a DSL with GraalVM (CodeOne)
DOCX
serverstats
R-House (LSRC)
Groovy
Is HTML5 Ready? (workshop)
Степан Кольцов — Rust — лучше, чем C++
Writing Modular Command-line Apps with App::Cmd
QA Fest 2019. Saar Rachamim. Developing Tools, While Testing
X64服务器 lnmp服务器部署标准 new
ql.io: Consuming HTTP at Scale
計算機性能の限界点とその考え方
marko_go_in_badoo
Kubernetes + Docker + Elixir - Alexei Sholik, Andrew Dryga | Elixir Club Ukraine
Keep it simple web development stack
Tests unitaires pour PostgreSQL avec pgTap
Dtalk shell
Parallel Computing With Dask - PyDays 2017
How to stand on the shoulders of giants
PFIセミナー資料 H27.10.22
Building a DSL with GraalVM (CodeOne)
serverstats
Ad

Viewers also liked (14)

PPTX
Campaña publicitaria Tarea 1 antonio soto
PPTX
Círculo y circunferencia
PDF
Praktijkcase tennis point.de
DOC
La vuelta al mundo en 80 días con Willy Fog
ODT
Cálculo decimal
PDF
Destination Open Access: Getting Researchers On Board - Lynne Gault, Mark Ben...
PDF
Vánoce ve škole 2015
PDF
Atletické přebory podluží 2016
PPTX
CAVAL: an Australian collaborative experience, Michelle Agar
PDF
Onmisbare trends en ontwikkelingen op het gebied van E-mailmarketing
PDF
Paul broersen
PDF
Wat kunt u met iBeacon en hoe neemt u de eerste stap?
PPTX
Maximaal e-Commerce rendement uit uw SEO strategie
PDF
Presentatie conversie-optimalisatie-ecommerce-live
Campaña publicitaria Tarea 1 antonio soto
Círculo y circunferencia
Praktijkcase tennis point.de
La vuelta al mundo en 80 días con Willy Fog
Cálculo decimal
Destination Open Access: Getting Researchers On Board - Lynne Gault, Mark Ben...
Vánoce ve škole 2015
Atletické přebory podluží 2016
CAVAL: an Australian collaborative experience, Michelle Agar
Onmisbare trends en ontwikkelingen op het gebied van E-mailmarketing
Paul broersen
Wat kunt u met iBeacon en hoe neemt u de eerste stap?
Maximaal e-Commerce rendement uit uw SEO strategie
Presentatie conversie-optimalisatie-ecommerce-live
Ad

Similar to Bash tricks (20)

RTF
Useful linux-commands
DOCX
Really useful linux commands
PPTX
Naughty And Nice Bash Features
PDF
Linux 系統管理與安全:基本 Linux 系統知識
TXT
An a z index of the bash commands
TXT
/Root/exam unidad1/muestraip red
PDF
Linux Command Line - By Ranjan Raja
PDF
Quick guide of the most common linux commands
PPT
Unix 5 en
PPTX
Linux tech talk
PPT
Linux50commands
PDF
Linux command line
PDF
Comenzi unix
PDF
Metrics-Driven Engineering at Etsy
PPTX
Linux commands
PDF
Basic shell commands by Jeremy Sanders
PDF
js_injwqeweqwqewqewqewqewqewqewqeected_xss.pdf
PDF
Workshop on command line tools - day 1
PDF
Unix commands
DOCX
archive A-Z linux
Useful linux-commands
Really useful linux commands
Naughty And Nice Bash Features
Linux 系統管理與安全:基本 Linux 系統知識
An a z index of the bash commands
/Root/exam unidad1/muestraip red
Linux Command Line - By Ranjan Raja
Quick guide of the most common linux commands
Unix 5 en
Linux tech talk
Linux50commands
Linux command line
Comenzi unix
Metrics-Driven Engineering at Etsy
Linux commands
Basic shell commands by Jeremy Sanders
js_injwqeweqwqewqewqewqewqewqewqeected_xss.pdf
Workshop on command line tools - day 1
Unix commands
archive A-Z linux

Recently uploaded (20)

PPTX
Spectroscopy.pptx food analysis technology
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Empathic Computing: Creating Shared Understanding
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PPT
Teaching material agriculture food technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Approach and Philosophy of On baking technology
PDF
Encapsulation theory and applications.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Spectroscopy.pptx food analysis technology
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
MYSQL Presentation for SQL database connectivity
NewMind AI Weekly Chronicles - August'25 Week I
Diabetes mellitus diagnosis method based random forest with bat algorithm
Unlocking AI with Model Context Protocol (MCP)
Empathic Computing: Creating Shared Understanding
sap open course for s4hana steps from ECC to s4
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Programs and apps: productivity, graphics, security and other tools
Teaching material agriculture food technology
20250228 LYD VKU AI Blended-Learning.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
cuic standard and advanced reporting.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
The Rise and Fall of 3GPP – Time for a Sabbatical?
Approach and Philosophy of On baking technology
Encapsulation theory and applications.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

Bash tricks

  • 1. 3 bash one-liners Carlo “zED” Caputo carlo.caputo@gmail.com
  • 2. 1
  • 3. To lá eu fazendo um dump de banco... $ mongoexport -d pseudo -c games | gzip > dump.json.gz connected to: 127.0.0.1
  • 4. Demorou mais de 10 segundos... $ mongoexport -d pseudo -c games | gzip > dump.json.gz connected to: 127.0.0.1 ^C
  • 5. Tá, preciso mostrar o progresso... $ mongoexport -d pseudo -c games | gzip | pv > dump.json.gz connected to: 127.0.0.1 7.83MB 0:00:05 [1.75MB/s] [ <=> ]
  • 6. Hunnn... Ainda não sei quanto falta. $ mongoexport -d pseudo -c games | pv -c -N raw | gzip | pv -c -N gz > dump.json.gz connected to: 127.0.0.1 raw: 54.5MB 0:00:07 [8.28MB/s] [ <=> ] gz: 11.1MB 0:00:07 [1.72MB/s] [ <=> ]
  • 7. Ah, eu sei quantas linhas são... $ ( export H=localhost D=pseudo C=games; export F="dump-$H-$D-$C-$(date +%Y%m%d_%H%M%S)"; mongoexport -h $H -d $D -c $C 2> $F.log | pv -c -N raw | pv -c -N rows -l -s "$( echo -e "use $D;ndb.$C.find().count();" | mongo $H 2>> $F.log | grep "switched to db $D" -A1 | tail -1 )" | gzip | pv -c -N gz | tee $F.gz | md5sum > $F.md5 ) raw: 78.4MB 0:00:10 [7.97MB/s] [ <=> ] gz: 16.1MB 0:00:10 [1.69MB/s] [ <=> ] rows: 46.7k 0:00:10 [4.59k/s] [> ] 63% ETA 0:00:05
  • 8. Mas tá muito idle ainda.... top - 23:19:46 up 53 days, 17 min, 24 users, load average: 0.15, 0.12, 0.09 Tasks: 334 total, 2 running, 331 sleeping, 0 stopped, 1 zombie Cpu0 : 33.3%us, 8.8%sy, 0.0%ni, 57.9%id, 0.0%wa, 0.0%hi, 0.0%si, Cpu1 : 2.3%us, 1.0%sy, 0.3%ni, 94.2%id, 2.3%wa, 0.0%hi, 0.0%si, Cpu2 : 85.2%us, 4.6%sy, 0.0%ni, 10.2%id, 0.0%wa, 0.0%hi, 0.0%si, Cpu3 : 2.6%us, 1.0%sy, 0.3%ni, 94.1%id, 2.0%wa, 0.0%hi, 0.0%si, Mem: 4107620k total, 3687764k used, 419856k free, 278800k buffers Swap: 4192960k total, 363512k used, 3829448k free, 658816k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24448 zed 20 0 17688 4300 2880 R 98 0.1 0:10.45 mongoexport 24451 zed 20 0 2052 504 344 S 31 0.0 0:03.20 gzip 24449 zed 20 0 4056 644 568 S 10 0.0 0:01.04 pv 24450 zed 20 0 4056 656 576 S 8 0.0 0:00.81 pv 3587 mongodb 20 0 804m 120m 119m S 1 3.0 0:08.59 mongod 24456 zed 20 0 3892 508 440 S 1 0.0 0:00.07 md5sum 24454 zed 20 0 3884 548 480 S 1 0.0 0:00.05 tee 24452 zed 20 0 4056 660 568 S 0 0.0 0:00.03 pv
  • 9. Vamos ver... $ time --help --help: command not found $ type time time is a shell keyword $ help time time: time [-p] pipeline Report time consumed by pipeline's execution. $ /usr/bin/time --help Usage: /usr/bin/time [-apvV] [-f format] [-o file] [--append] [--verbose] [--portability] [--format=format] [--output=file] [--version] [--quiet] [--help] command [arg...]
  • 10. Ei, espera aí, que tal usar xz?! $ mongoexport -d pseudo -c games | head -c 10000 | tee >( ( /usr/bin/time -f 'gz: %E real, %U user, %S sys' gzip -9 | wc -c | sed 's/^/gz: /' ) 1>&2) | /usr/bin/time -f 'xz: %E real, %U user, %S sys' xz -9 | wc -c | sed 's/^/xz: /' connected to: 127.0.0.1 gz: 0:00.24 real, 0.00 user, 0.00 sys gz: 2078 xz: 0:00.31 real, 0.02 user, 0.04 sys xz: 2012
  • 11. Melhor limitar por tempo do que por bytes. Mas como parar?... $ { ( mongoexport -d pseudo -c games | tee >(wc -c | sed 's/^/raw: /' 1>&2) | tee >( ( /usr/bin/time -f 'gz: %E real, %U user, %S sys' gzip -9 | wc -c | sed 's/^/gz: /' ) 1>&2) | /usr/bin/time -f 'xz: %E real, %U user, %S sys' xz -9 | wc -c | sed 's/^/xz: /' ) & } ; sleep 9 ; kill -INT %% [1] 12289 connected to: 127.0.0.1 $ raw: 14623779 gz: 0:08.99 real, 0.75 user, 0.00 sys gz: 2706591 Command terminated by signal 2 xz: 0:09.02 real, 8.78 user, 0.17 sys
  • 12. Ahh, aí está o número perdido! $ { ( mongoexport -d pseudo -c games | tee >(wc -c | sed 's/^/raw: /' 1>&2) | tee >( ( /usr/bin/time -f 'gz: %E real, %U user, %S sys' gzip -9 | wc -c | sed 's/^/gz: /' ) 1>&2) | /usr/bin/time -f 'xz: %E real, %U user, %S sys' xz -9 | wc -c | sed 's/^/xz: /' ) & } ; sleep 9 ; kill -INT $(pstree -p $! | sed -n 's/.*mongoexport(([0-9]*)).*/1/p') [1] 12724 connected to: 127.0.0.1 $ raw: 14497402 gz: 0:09.10 real, 0.73 user, 0.01 sys gz: 2695783 xz: 0:09.18 real, 8.73 user, 0.16 sys xz: 2021928
  • 13. Mas deixa eu formatar isso melhor... $ ( { ( { mongoexport -d pseudo -c games 2>/dev/null | tee >(wc -c | sed 's/^/s_raw=/' 1>&2) | tee >( ( /usr/bin/time -f 't_gz=%U+%S' gzip -9 | wc -c | sed 's/^/s_gz=/') 1>&2) | /usr/bin/time -f 't_xz=%U+%S' xz -9 | wc -c | sed 's/^/s_xz=/' ; echo 'scale=2; print "nxz is ", t_xz/t_gz; r_sz=s_gz/s_xz; r_gz=s_gz/s_raw; r_xz=s_xz/s_raw; scale=0; print " times slower and ", (r_sz-1)/.01,"% smallerncompression ratios: gz=",(1-r_gz)/.01,"%, xz=",(1-r_xz)/.01,"%n"' ; } 2>&1 | bc ) & } ; sleep 9 ; kill -INT $(pstree -p $! | sed -n 's/.*mongoexport(([0-9]*)).*/1/p') 2>/dev/null ) xz is 12.02 times slower and 33% smaller compression ratios: gz=82%, xz=87%
  • 14. Ai... :/ será que tem algum outro nível de compressão que valha a pena? $ ( { ( { { echo -n 'mongoexport -d pseudo -c games 2>/dev/null | tee' ; for z in 1 2 ; do for l in $(seq 0 9); do test $z = 2 -a $l = 0 || echo -n " >(/usr/bin/time -f 't[$z$l]=%U+%S' $(test $z == 2 && echo gzip || echo xz) -$l | wc -c | sed 's/^/s[$z$l]=/' 1>&2)" ; done ; done ; echo -n ' | cat > /dev/null' ; } | bash 2>&1 ; echo 'scale=4; for(p=1;p<=13;p+=6) { for(x=1;x<=2;x++) { for(l=x-1;l<=9;l++) { i=x*10+l; s=s[i]/s[26]; t=t[i]/t[26]; print s^p*t," (s^",p,"*t) "; if(x==2) print "gzip" else print "xz"; print" -",l," is ",t,"x time and ",s,"x size of gzip -6n" } } }' ; } | bash <( echo -n 'BC_LINE_LENGTH=0 bc | tee ' ; for p in $(seq 1 6 13) ; do echo -n " >(grep 's^$p.t' | sort -n | head -1 | cut -d ' ' -f 2- 1>&2)" ; done ; echo -n ' > /dev/null' ) ) & } ; sleep 9 ; kill -INT $(pstree -p $! | sed -n 's/.*mongoexport(([0-9]*)).*/1/p'); while wait -n ; do : ; done ) (s^13*t) xz -6 is 17.2978x time and .5913x size of gzip -6 (s^7*t) xz -1 is 2.4042x time and .7802x size of gzip -6 (s^1*t) gzip -2 is .5957x time and 1.1931x size of gzip -6
  • 15. pstree dos comandos no slide anterior bash─┬─bash───bash─┬─cat │ ├─mongoexport │ └─tee─┬─10*[bash─┬─sed] │ │ ├─time───xz] │ │ └─wc] │ └─9*[bash─┬─sed] │ ├─time───gzip] │ └─wc] └─bash─┬─bc └─tee───3*[bash─┬─cut] ├─grep] ├─sort] └─tail]
  • 16. código bash, resultado do primeiro trecho mongoexport -d pseudo -c games 2>/dev/null | tee >(/usr/bin/time -f 't[11]=%U+%S' xz -1 | wc -c | sed 's/^/s[11]=/' 1>&2) >(/usr/bin/time -f 't[12]=%U+%S' xz -2 | wc -c | sed 's/^/s[12]=/' 1>&2) >(/usr/bin/time -f 't[13]=%U+%S' xz -3 | wc -c | sed 's/^/s[13]=/' 1>&2) >(/usr/bin/time -f 't[14]=%U+%S' xz -4 | wc -c | sed 's/^/s[14]=/' 1>&2) >(/usr/bin/time -f 't[15]=%U+%S' xz -5 | wc -c | sed 's/^/s[15]=/' 1>&2) >(/usr/bin/time -f 't[16]=%U+%S' xz -6 | wc -c | sed 's/^/s[16]=/' 1>&2) >(/usr/bin/time -f 't[17]=%U+%S' xz -7 | wc -c | sed 's/^/s[17]=/' 1>&2) >(/usr/bin/time -f 't[18]=%U+%S' xz -8 | wc -c | sed 's/^/s[18]=/' 1>&2) >(/usr/bin/time -f 't[19]=%U+%S' xz -9 | wc -c | sed 's/^/s[19]=/' 1>&2) >(/usr/bin/time -f 't[21]=%U+%S' gzip -1 | wc -c | sed 's/^/s[21]=/' 1>&2) >(/usr/bin/time -f 't[22]=%U+%S' gzip -2 | wc -c | sed 's/^/s[22]=/' 1>&2) >(/usr/bin/time -f 't[23]=%U+%S' gzip -3 | wc -c | sed 's/^/s[23]=/' 1>&2) >(/usr/bin/time -f 't[24]=%U+%S' gzip -4 | wc -c | sed 's/^/s[24]=/' 1>&2) >(/usr/bin/time -f 't[25]=%U+%S' gzip -5 | wc -c | sed 's/^/s[25]=/' 1>&2) >(/usr/bin/time -f 't[26]=%U+%S' gzip -6 | wc -c | sed 's/^/s[26]=/' 1>&2) >(/usr/bin/time -f 't[27]=%U+%S' gzip -7 | wc -c | sed 's/^/s[27]=/' 1>&2) >(/usr/bin/time -f 't[28]=%U+%S' gzip -8 | wc -c | sed 's/^/s[28]=/' 1>&2) >(/usr/bin/time -f 't[29]=%U+%S' gzip -9 | wc -c | sed 's/^/s[29]=/' 1>&2) | cat > /dev/null
  • 17. t[29]=0.93+0.03 s[29]=1725598 t[28]=1.09+0.00 s[28]=1728327 t[27]=0.53+0.01 s[27]=1779913 t[26]=0.42+0.01 s[26]=1817520 t[25]=0.40+0.06 s[25]=1868768 t[24]=0.28+0.06 s[24]=2001968 t[23]=0.26+0.01 s[23]=2068779 t[22]=0.20+0.00 s[22]=2167255 t[21]=0.29+0.01 s[21]=2257515 scale=4; for(p=1;p<=13;p+=6) { for(x=1;x<=2;x++) { for(l=x-1;l<=9;l++) { i=x*10+l; s=s[16]/s[i]; t=t[i]/t[16]; print s^p/t," (s^",p,"/t) "; if(x==2) print "gzip" else print "xz"; print" -",l," is ",t," slower and ",s," smaller than gzip -6n" } } } t[19]=8.87+0.25 s[19]=1067488 t[18]=8.75+0.31 s[18]=1067488 t[17]=8.67+0.25 s[17]=1067488 t[16]=8.56+0.25 s[16]=1072596 t[15]=4.75+0.15 s[15]=1336496 t[14]=3.12+0.12 s[14]=1465184 t[13]=2.92+0.10 s[13]=1306504 t[12]=1.60+0.04 s[12]=1350200 t[11]=1.06+0.04 s[11]=1419720 t[10]=0.84+0.00 s[10]=1612572 código bc, resultado do primeiro e segundo trechos
  • 18. entrada do ultimo script bash 1.7356 (s^1*t) xz -0 is 1.9565x time and .8871x size of gzip -6 1.7317 (s^1*t) xz -1 is 2.2173x time and .7810x size of gzip -6 2.9388 (s^1*t) xz -2 is 3.9565x time and .7428x size of gzip -6 3.9534 (s^1*t) xz -3 is 5.5000x time and .7188x size of gzip -6 5.6251 (s^1*t) xz -4 is 6.9782x time and .8061x size of gzip -6 7.9124 (s^1*t) xz -5 is 10.7608x time and .7353x size of gzip -6 11.5197 (s^1*t) xz -6 is 19.5217x time and .5901x size of gzip -6 11.5289 (s^1*t) xz -7 is 19.6304x time and .5873x size of gzip -6 11.6693 (s^1*t) xz -8 is 19.8695x time and .5873x size of gzip -6 11.7460 (s^1*t) xz -9 is 20.0000x time and .5873x size of gzip -6 .6479 (s^1*t) gzip -1 is .5217x time and 1.2420x size of gzip -6 .4924 (s^1*t) gzip -2 is .4130x time and 1.1923x size of gzip -6 .7174 (s^1*t) gzip -3 is .6304x time and 1.1381x size of gzip -6 .8379 (s^1*t) gzip -4 is .7608x time and 1.1014x size of gzip -6 1.0056 (s^1*t) gzip -5 is .9782x time and 1.0281x size of gzip -6 1.0000 (s^1*t) gzip -6 is 1.0000x time and 1.0000x size of gzip -6 1.4688 (s^1*t) gzip -7 is 1.5000x time and .9792x size of gzip -6 2.1495 (s^1*t) gzip -8 is 2.2608x time and .9508x size of gzip -6 2.5589 (s^1*t) gzip -9 is 2.6956x time and .9493x size of gzip -6 .8457 (s^7*t) xz -0 is 1.9565x time and .8871x size of gzip -6 .3929 (s^7*t) xz -1 is 2.2173x time and .7810x size of gzip -6 .4933 (s^7*t) xz -2 is 3.9565x time and .7428x size of gzip -6 .5450 (s^7*t) xz -3 is 5.5000x time and .7188x size of gzip -6 1.5428 (s^7*t) xz -4 is 6.9782x time and .8061x size of gzip -6 1.2504 (s^7*t) xz -5 is 10.7608x time and .7353x size of gzip -6 .4860 (s^7*t) xz -6 is 19.5217x time and .5901x size of gzip -6 .4730 (s^7*t) xz -7 is 19.6304x time and .5873x size of gzip -6 .4788 (s^7*t) xz -8 is 19.8695x time and .5873x size of gzip -6 .4820 (s^7*t) xz -9 is 20.0000x time and .5873x size of gzip -6 2.3783 (s^7*t) gzip -1 is .5217x time and 1.2420x size of gzip -6 1.4146 (s^7*t) gzip -2 is .4130x time and 1.1923x size of gzip -6 1.5591 (s^7*t) gzip -3 is .6304x time and 1.1381x size of gzip -6 1.4958 (s^7*t) gzip -4 is .7608x time and 1.1014x size of gzip -6 1.1875 (s^7*t) gzip -5 is .9782x time and 1.0281x size of gzip -6 1.0000 (s^7*t) gzip -6 is 1.0000x time and 1.0000x size of gzip -6 1.2946 (s^7*t) gzip -7 is 1.5000x time and .9792x size of gzip -6 1.5879 (s^7*t) gzip -8 is 2.2608x time and .9508x size of gzip -6 1.8726 (s^7*t) gzip -9 is 2.6956x time and .9493x size of gzip -6 .4120 (s^13*t) xz -0 is 1.9565x time and .8871x size of gzip -6 .0891 (s^13*t) xz -1 is 2.2173x time and .7810x size of gzip -6 .0826 (s^13*t) xz -2 is 3.9565x time and .7428x size of gzip -6 .0748 (s^13*t) xz -3 is 5.5000x time and .7188x size of gzip -6 .4228 (s^13*t) xz -4 is 6.9782x time and .8061x size of gzip -6 .1969 (s^13*t) xz -5 is 10.7608x time and .7353x size of gzip -6 .0195 (s^13*t) xz -6 is 19.5217x time and .5901x size of gzip -6 .0176 (s^13*t) xz -7 is 19.6304x time and .5873x size of gzip -6 .0178 (s^13*t) xz -8 is 19.8695x time and .5873x size of gzip -6 .0180 (s^13*t) xz -9 is 20.0000x time and .5873x size of gzip -6 8.7297 (s^13*t) gzip -1 is .5217x time and 1.2420x size of gzip -6 4.0640 (s^13*t) gzip -2 is .4130x time and 1.1923x size of gzip -6 3.3880 (s^13*t) gzip -3 is .6304x time and 1.1381x size of gzip -6 2.6702 (s^13*t) gzip -4 is .7608x time and 1.1014x size of gzip -6 1.4024 (s^13*t) gzip -5 is .9782x time and 1.0281x size of gzip -6 1.0000 (s^13*t) gzip -6 is 1.0000x time and 1.0000x size of gzip -6 1.1413 (s^13*t) gzip -7 is 1.5000x time and .9792x size of gzip -6 1.1731 (s^13*t) gzip -8 is 2.2608x time and .9508x size of gzip -6 1.3704 (s^13*t) gzip -9 is 2.6956x time and .9493x size of gzip -6
  • 19. código bash, resultado do terceiro trecho BC_LINE_LENGTH=0 bc | tee >(grep 's^1.t' | sort -n | head -1 | cut -d ' ' -f 2- 1>&2) >(grep 's^7.t' | sort -n | head -1 | cut -d ' ' -f 2- 1>&2) >(grep 's^13.t' | sort -n | head -1 | cut -d ' ' -f 2- 1>&2) > /dev/null
  • 20. Como tirar o race-condition? $ while true ; do { { { for d in {1,2}{1,2}{0..9} ; do echo -n " { { echo -n "$d: " ; ls /dev/fd/ ; } 1>&2 ; " ; done ; echo -n 'seq 2000 2>/dev/null | tee' ; for z in 1 2 ; do for l in $(seq 0 9); do test $z = 2 -a $l = 0 || echo -n " >(/usr/bin/time -f 't[$z$l]=%U+%S' 2>&2$z$l $(test $z == 2 && echo gzip || echo xz) -$l | wc -c | sed 's/^/s[$z$l]=/' 1>&1$z$l)" ; done ; done ; echo -n ' >/dev/null' ; for d in {1,2}{1,2}{0..9} ; do echo -n " ; } $d>&1 | grep '' " ; done ; } | tee /dev/fd/2 | bash ; echo 123 ; } | sed 's/^/b:/' | grep =.*= ; } 2>/dev/null ; done
  • 21. Mais consistência no teste dos parâmetros de compressão $ ( { { { { for d in {1,2}{1,2}{0..9} ; do echo -n " { " ; done ; echo -n 'mongoexport -d pseudo -c games 2>/dev/null | tee' ; for z in 1 2 ; do for l in $(seq 0 9); do test $z = 2 -a $l = 0 || echo -n " >(/usr/bin/time -f 't[$z$l]=%U+%S' 2>&2$z$l $(test $z == 2 && echo gzip || echo xz) -$l | wc -c | sed 's/^/s[$z$l]=/' 1>&1$z$l)" ; done ; done ; echo -n ' > /dev/null' ; for d in {1,2}{1,2}{0..9} ; do echo -n " ; } $d>&1 | sed '' " ; done ; } | bash ; echo 'scale=4; for(p=1;p<=13;p+=6) { for(x=1;x<=2;x++) { for(l=x-1;l<=9;l++) { i=x*10+l; s=s[i]/s[26]; t=t[i]/t[26]; print s^p*t," (s^",p,"*t) "; if(x==2) print "gzip" else print "xz"; print" -",l," is ",t,"x time and ",s,"x size of gzip -6n" } } }' ; } | bash <( echo -n 'BC_LINE_LENGTH=0 bc | { tee ' ; for p in $(seq 1 6 13) ; do echo -n " >(grep 's^$p.t' | sort -n | tail -1 | cut -d ' ' -f 2- 1>&2)" ; done ; echo -n ' >/dev/null ; } 2>&1 | grep "" ' ) ; } & } ; A=$! ; { { sleep 9 ; kill -INT $(pstree -laAp $A | sed -n 's/^[^,]*<mongoexport,([0-9][0-9]*)>.*/1/p') ; } & } ; B=$! ; wait $B || { echo "finishing..." ; kill -INT $( { pstree -laAp $A ; pstree -laAp $B ; } | sed -n 's/^[^,]*,([0-9][0-9]*)>.*/1/p') ; } ; while wait -n ; do : ; done) #xz-vs-gz
  • 22. Bom, mas voltando ao dump do banco, com xv fica assim: $ ( export H=localhost D=pseudo C=games; export F="dump-$H-$D-$C-$(date +%Y%m%d_%H%M%S)"; mongoexport -h $H -d $D -c $C 2> $F.log | pv -c -N raw | pv -c -N rows -l -s "$( echo -e "use $D;ndb.$C.find().count();" | mongo $H 2>> $F.log | grep "switched to db $D" -A1 | tail -1 )" | xz | pv -c -N xz | tee $F.xz | md5sum > $F.md5 ) raw: 54.7MB 0:00:38 [ 1.4MB/s] [ <=> ] xz: 7.85MB 0:00:38 [ 176kB/s] [ <=> ] rows: 32.7k 0:00:38 [ 818/s ] [=> ] 44% ETA 0:00:47
  • 23. Como está idle e o xv é o gargalo, o negócio é paralelizar de algum jeito top - 12:15:16 up 53 days, 13:13, 27 users, load average: 0.69, 0.24, 0.29 Tasks: 338 total, 3 running, 334 sleeping, 0 stopped, 1 zombie Cpu0 : 99.7%us, 0.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, Cpu1 : 2.3%us, 1.3%sy, 0.0%ni, 96.4%id, 0.0%wa, 0.0%hi, 0.0%si, Cpu2 : 18.3%us, 1.7%sy, 0.0%ni, 79.4%id, 0.7%wa, 0.0%hi, 0.0%si, Cpu3 : 2.0%us, 1.3%sy, 0.0%ni, 96.7%id, 0.0%wa, 0.0%hi, 0.0%si, Mem: 4107620k total, 1334364k used, 2773256k free, 24868k buffers Swap: 4192960k total, 2373236k used, 1819724k free, 259280k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20507 zed 20 0 99420 93m 736 R 100 2.3 1:06.74 xz 20504 zed 20 0 17688 4304 2880 S 17 0.1 0:11.88 mongoexport 20506 zed 20 0 4056 780 576 S 2 0.0 0:01.45 pv 20505 zed 20 0 4056 732 576 S 0 0.0 0:00.25 pv
  • 24. pixz ou parallel? $ git clone https://github.com/vasi/pixz.git $ cd pixz/ $ make pixz.h:1: fatal error: lzma.h: No such file or directory $ sudo apt-get install lzma-dev Setting up lzma-dev (4.43-14ubuntu2) ... $ make pixz.h:1: fatal error: lzma.h: No such file or directory $ grep lzma README * liblzma 4.999.9-beta-212 or later (from the xz distribution)
  • 25. parallel? não?.. $ parallel The program 'parallel' is currently not installed. You can install it by typing: sudo apt-get install moreutils $ apt-get install moreutils ... $ parallel --help parallel: invalid option -- '-' parallel [OPTIONS] command -- arguments for each argument, run command with argument, in parallel parallel [OPTIONS] -- commands run specified commands in parallel
  • 26. Bem vindo ao mundo real... $ wget http://ftp.gnu.org/gnu/parallel/parallel-20110822.tar.bz2 $ tar xvf parallel-20110822.tar.bz2 $ cd parallel-20110822 $ ./configure && make -j 4 $ ./src/parallel --help Usage: parallel [options] [command [arguments]] < list_of_arguments parallel [options] [command [arguments]] (::: arguments|:::: argfile(s))... cat ... | parallel --pipe [options] [command [arguments]] -j n Run n jobs in parallel -k Keep same order
  • 27. Quantos jobs pra compressão?... $ uptime ; for j in $(seq $(( $(grep -c ^bogomips /proc/cpuinfo) * 4 )) ) ; do echo "$j: $( ( time ( mongoexport -d pseudo -c games 2>/dev/null | head -20000 | parallel -j $j --pipe nice xz | xz -t ) ) 2>&1 | tr "n" "t" )" ; done ; uptime 14:46:08 up 53 days, ... load average: 0.07, 1.29, 2.23 1: real 0m18.759s user 0m26.602s sys 0m2.448s 2: real 0m11.673s user 0m24.746s sys 0m1.916s 3: real 0m9.033s user 0m23.553s sys 0m1.460s 4: real 0m8.561s user 0m23.093s sys 0m1.468s ... 8: real 0m7.645s user 0m23.173s sys 0m1.356s ... 16: real 0m7.087s user 0m22.153s sys 0m1.260s 14:48:27 up 53 days, ... load average: 6.49, 2.88, 2.66
  • 28. Achievement Unlocked $ ( export H=localhost D=pseudo C=games; export F="dump-$H-$D-$C-$(date +%Y%m%d_%H%M%S)"; mongoexport -h $H -d $D -c $C 2> $F.log | pv -c -N raw | pv -c -N rows -l -s "$( echo -e "use $D;ndb.$C.find().count();" | mongo $H 2>> $F.log | grep "switched to db $D" -A1 | tail -1 )" | parallel --pipe nice xz | pv -c -N xz | tee $F.xz | md5sum > $F.md5 ) #dump-mongo-collection raw: 48.9MB 0:00:10 [5.64MB/s] [ <=> ] xz: 6.38MB 0:00:10 [ 678kB/s] [ <=> ] rows: 29.6k 0:00:10 [3.27k/s] [==> ] 40% ETA 0:00:14
  • 29. 2
  • 30. E se eu quiser mandar pra net? $ scp dump.xz pseudogames.com: dump.xz 100% 97MB 32.3MB/s 00:03
  • 31. Putz, mas e a integridade? $ ssh pseudogames.com "xz -vd > dump" < dump-*.xz (stdin): 96.9 MiB / 228.5 MiB = 0.424, 22 MiB/s, 0:10
  • 32. Mas não queria descomprimir lá... $ F="dump.xz"; ssh pseudogames.com "tee >(xz -tv) > '$F' " < "$F"
  • 33. Ueh?.. kd o output? $ F="dump.xz"; ssh pseudogames.com "tee >(xz -tv) > '$F' " < "$F" $
  • 34. Aff, a stderr nao passou pelo ssh, claro... mas tudo bem. $ time ( F="dump.xz"; ssh pseudogames.com "{ tee >(xz -vt) > '$F' ; } 2>&1" < "$F" ) (stdin): 96.9 MiB / 228.5 MiB = 0.424, 22 MiB/s, 0:10 real 0m10.580s user 0m2.963s sys 0m0.243s
  • 35. Soh que descomprimir o xz pra validar eh overkill... que tal usar o md5sum? $ F=dump.xz; pv "$F" | tee >(printf 'size:%-10dn' $(wc -c) ) >(echo "md5:$(md5sum)") | ssh pseudogames.com 'FIFO=$(mktemp -d); mkfifo $FIFO/{a,b}; trap "rm $FIFO/[ab]; rmdir $FIFO" 0; tee >(tail -c 56 | tee >( sed -n "s/size:(.*)/1/p" >$FIFO/a ) | sed -n "s/md5:(.*)/1/p" >$FIFO/b ) | head -c $(cat <$FIFO/a) | tee '"$F"' | md5sum -c $FIFO/b | cut -c 4-'
  • 36. Droga, deadlock! E agora?... $ F=dump.xz; pv "$F" | tee >(printf 'size:%-10dn' $(wc -c) ) >(echo "md5:$(md5sum)") | ssh pseudogames.com 'FIFO=$(mktemp -d); mkfifo $FIFO/{a,b}; trap "rm $FIFO/[ab]; rmdir $FIFO" 0; tee >(tail -c 56 | tee >( sed -n "s/size:(.*)/1/p" >$FIFO/a ) | sed -n "s/md5:(.*)/1/p" >$FIFO/b ) | head -c $(cat <$FIFO/a) | tee '"$F"' | md5sum -c $FIFO/b | cut -c 4-' ^C
  • 37. kkkk, tem um jeito bem mais simples! $ time ( F=dump.xz; pv "$F" | { tee >(md5sum 1>&2) | ssh pseudogames.com 'tee '"$F"' | md5sum' ; } 2>&1 | sort -u | wc -l | grep -qx 1 && echo OK || echo ERR ) 96.9MB 0:00:03 [28.7MB/s] [===========>] 100% OK real 0m3.212s user 0m2.373s sys 0m0.307s $ ### time ssh + xz: $ #real 0m10.580s $ #user 0m2.963s $ #sys 0m0.243s
  • 38. Jah tah bem mais leve; que tal tirar o ssh tb? $ { F="dump.xz" ; H="pseudogames.com" ; P=9999 ; pv "$F" | tee >(md5sum 1>&2) >(wc -c 1>&2) >( sed -n '' | ssh "$H" "{ netcat $(set | grep [S]SH_CLIENT | sed "s/.*'([^ ]*) .*/1/") $P | tee '$F' >(wc -c 1>&2) | md5sum ; } 2>&1" 1>&2 ) >(netcat -l $P) >/dev/null ; } 2>&1 | tee >( sort -u | wc -l | grep -qx 2 && echo OK || echo ERR ) 19405636 a84a3cd315f4ee235c5e15d82872ab5d - 19335168 85d99c783b541885b05ef263c1f35cdc - ERR
  • 39. Morreu antes de terminar o envio, o receptor vai ter que autorizar o fim $ { F="dump.xz" ; H="pseudogames.com" ; P=$(printf "39%03d" $RANDOM | cut -c 1-5) ; { ssh "$H" "{ netcat $(set | grep [S]SH_CLIENT | sed "s/.*'([^ ]*) .*/1/") $P | tee '$F' >(wc -c 1>&2) | md5sum ; } 2>&1" 1>&2 ; } | { < "$F" tee >(md5sum 1>&2) >(wc -c 1>&2) >(netcat -l $P) >/dev/null ; } ; } 2>&1 | tee >( sort -u | wc -l | grep -qx 2 && echo OK || echo ERR ) 19405636 a84a3cd315f4ee235c5e15d82872ab5d - 19405636 a84a3cd315f4ee235c5e15d82872ab5d - OK
  • 40. Cuidados com o netcat $ { F="dump.xz" ; H=pseudogames.com ; P=$(printf "39%03d" $RANDOM | cut -c 1-5) ; S=$(stat -c %s "$F") ; { { pv "$F" | { tee >(md5sum 1>&2) | { netcat -l -p $P 2>/dev/null || netcat -l $P ; } ; } 2>&1 ; } & } ; A=$! ; ssh $H "{ H=$(set | grep [S]SH_CLIENT | sed "s/.*'([^ ]*) .*/1/") ; { netcat -w 5 -c $H $P 2>/dev/null || netcat -w 5 $H $P ; } | tee '$F' >(md5sum 1>&2) | pv -n -s $S 2>&1 >/dev/null | while read X ; do test "$X" -ge 100 && kill -INT $(pstree -laAp $$ | sed -n 's/^[^,]*<netcat,([0-9][0-9]*)>.*/1/p') &>/dev/null ; done ; } 2>&1" ; kill -INT $(pstree -laAp $A | sed -n 's/^[^,]*<netcat,([0-9][0-9]*)>.*/1/p') &>/dev/null ; } | uniq | wc -l | grep -qx 1 && echo OK || echo ERR #netcat-send 537MB 0:00:04 [ 110MB/s] [=================>] 100% OK
  • 41. Mas serah que houve vantagem? $ time !?#netcat-send 537MB 0:00:04 [ 110MB/s] [============>] 100% real 0m5.298s user 0m0.063s sys 0m0.003s OK $ time !?scp dump.xz 100% 537MB 35.8MB/s 00:15 real 0m14.658s user 0m11.123s sys 0m1.130s
  • 42. 3
  • 43. E se forem vários arquivos? $ find dumps/ -type f -printf "%10s %pn" 101632892 dumps/b/dump.xz.2 65536 dumps/b/a/dump.xz.1 0 dumps/a/3688 0 dumps/a/8958 (... total de 10000 arquivos no dumps/a/) 0 dumps/a/4308 0 dumps/a/3338 $ scp -r dumps/* pseudogames.com:dump-backup/ (...) 3338 100% 0 0.0KB/s 00:00 dump.xz.2 100% 97MB 48.5MB/s 00:02 dump.xz.1 100% 64KB 64.0KB/s 00:00
  • 44. Mas quero selecionar melhor $ tar c -C dumps/ --exclude=./a . | ssh pseudogames.com "tar xv -C dump-backup/" ./ ./b/ ./b/dump.xz.2 ./b/a/ ./b/a/dump.xz.1
  • 45. Progresso... $ TAR="tar c -C dumps/ --exclude=./a ." ; $TAR | pv -s $($TAR | wc -c) | ssh pseudogames.com "tar x -C dump-backup/" 97MB 0:00:02 [32.8MB/s] [================>] 100%
  • 46. Não rola fazer o tar duas vezes... $ S="dumps" ; D="dump-backup" ; X="--exclude=a" ; H=pseudogames.com ; tar c -C "$S" $X . | pv -s $(cd "$S" && du --apparent-size --block-size=1 $X -s . | cut -f 1) | ssh $H "tar x -C '$D' " 96.9MB 0:00:02 [33.5MB/s] [==============>] 100% $ S="dumps" ; D="dump-backup" ; X="" ; H=pseudogames.com ; tar c -C "$S" $X . | pv -s $(cd "$S" && du --apparent-size --block-size=1 $X -s . | cut -f 1) | ssh $H "tar x -C '$D' " 102MB 0:00:03 [33.7MB/s] [================] 104%
  • 47. Mas assim tem muita imprecisão... :/ $ S="dumps" ; D="dump-backup" ; X="--exclude=./a --exclude=./b/d*" ; H=pseudogames.com ; tar c -C "$S" $X . | pv -s $(cd "$S" && du --apparent-size --block-size=1 $X -s . | cut -f 1) | ssh $H "tar x -C '$D' " 70kB 0:00:00 [ 544kB/s] [====> ] 29% $ S="dumps" ; D="dump-backup" ; X="--exclude=./b/d*" ; H=pseudogames.com ; tar c -C "$S" $X . | pv -s $(cd "$S" && du --apparent-size --block-size=1 $X -s . | cut -f 1) | ssh $H "tar x -C '$D' " 4.95MB 0:00:00 [7.77MB/s] [================] 1267%
  • 48. Vou tentar incorporar o overhead do tar na conta do tamanho total $ for X in "('./a/*' './b/a/*')" "()" "('./a/*' './b/d*')" "('./b/d*')" ; do eval "X=$X" ; E=""; N=""; for x in "${X[@]}" ; do test -n "$N" && N="$N -and" ; N="$N -not -path '$x'" ; E="$E --exclude='$x'" ; done ; echo "${X[@]} => $E => $N"; S="dumps" ; D="dump-backup" ; H=pseudogames.com ; eval "tar c -C '$S' $E ." | pv -s $(cd "$S" && eval "find . $N -printf '%y %s %pn'" | awk 'BEGIN { sum=0 } { s=length($0)-length($2)-1+($1 ~ /f/ ? $2 : 0) ; c=512; r=s%c; s=s+(r?c-r:0) ; sum+=s } END { chunk=10240; remain=sum%chunk; print sum+(remain?chunk-remain:0) }') | ssh $H "tar x -C '$D' " ; done #tar-pipe-ssh 96.9MB 0:00:02 [34.7MB/s] [====================>] 100% 102MB 0:00:02 [41.5MB/s] [====================>] 100% 70kB 0:00:00 [ 835kB/s] [====================>] 100% 4.95MB 0:00:00 [21.2MB/s] [====================>] 100%
  • 49. 0
  • 50. Não tentem isso em casa Nesse fantástico mundo dos one-liners se pode usar o history como repositório de comandos; basta comentá-los com tags memoráveis e deixar o history bem, bem grande: $ export HISTSIZE=999999999 HISTFILESIZE=999999999 $ echo !! >> $(ls -S ~/.{,bash}{rc,{,_}profile} 2>/dev/null | head -1) $ ls -ltr #lista-recentes ... $ !?#lista-re ls -ltr #lista-recentes ...
  • 51. Referência Ubuntu 10.10 GNU bash, version 4.1.5(1)-release (i686-pc-linux-gnu) Copyright (C) 2009 Free Software Foundation, Inc. Arch Linux 2011-09 GNU bash, version 4.2.10(2)-release (i686-pc-linux-gnu) Copyright (C) 2011 Free Software Foundation, Inc. Windows 10.0.14393 (WSL) GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu) Copyright (C) 2013 Free Software Foundation, Inc.
  • 53. Obrigado ( export M=localhost D=pseudo H=pseudogames.com ; export F="dump-$M-$D-$(date +%Y%m%d_%H%M%S)" P=$(printf "39%03d" $RANDOM | cut -c 1-5) ; { { { { echo -e "use $D;nprint(db.getCollectionNames().reduce(function(a,b){ return a+db[b].find().count()+1; }, 0)) ; db.getCollectionNames().forEach(function(a){print(a)});" | mongo $M 2> /dev/null | grep -v " |^(system.indexes|bye)$" | { read N ; echo $N ; while read C ; do echo "--- $C ---" ; mongoexport -h $M -d $D -c $C 2> /dev/null ; done ; } ; } | { read N ; pv -c -l -s $N ; } | parallel --pipe nice xz | { tee >(md5sum 1>&2) | { netcat -l -p $P 2>/dev/null || netcat -l $P ; } ; } 2>&1 ; } & } ; A=$! ; ssh $H "{ H=$(set | grep [S]SH_CLIENT | sed "s/.*'([^ ]*) .*/1/") ; { netcat -w 5 -c $H $P 2>/dev/null || netcat -w 5 $H $P ; } | tee '$F.xz' >(md5sum 1>&2) | xz -d | pv -l -n -s $N 2>&1 >/dev/null | while read X ; do test "$X" -ge 100 && kill -INT $(pstree -laAp $$ | sed -n 's/^[^,]*<netcat,([0-9][0-9]*)>.*/1/p') &>/dev/null ; done ; } 2>&1" ; kill -INT $(pstree -laAp $A | sed -n 's/^[^,]*<netcat,([0-9][0-9]*)>.*/1/p') &>/dev/null ; } | uniq | wc -l | grep -qx 1 && echo "OK" || echo "ERR" ) #remote-database-backup http://tldp.org/LDP/abs/html/ http://bashoneliners.com/ http://bash.org/