Monitorització i estat de càrrega dels servidors
Contingut
Referències
- http://en.wikipedia.org/wiki/Load_(computing)
- http://www.cyberciti.biz/tips/top-linux-monitoring-tools.html
- http://www.lunarforums.com/vps_hosting_at_lunarpages/useful_linux_scripts_lsof_ps_fuser_netstat-t41474.0.html
Desenvolupament
fitxer proc/stat. Monitoritzar l'ús de la CPU
- http://colby.id.au/node/39
- /proc/stat explained: http://www.linuxhowtos.org/System/procstat.htm
El fitxer /proc/stat dóna diversa informació sobre l'estat del kernel. Els diferents valors que es mostren són acumulats des que es va iniciar la màquina.
El següent script utilitza la informació de /proc/stat per monitoritzar l'ús de la CPU.
$ cat /proc/stat | grep '^cpu ' cpu 3341144 9453 5733615 44755881 143881 4769 58286 0 0 $ cat /proc/stat | grep '^cpu ' cpu 3341252 9453 5733627 44756742 143882 4769 58286 0 0 $ cat /proc/stat | grep '^cpu ' cpu 3341298 9453 5733648 44757640 143887 4769 58286 0 0
script cpu_usage.sh:
#!/bin/bash # by Paul Colby (http://colby.id.au), no rights reserved ;) PREV_TOTAL=0 PREV_IDLE=0 while true; do CPU=(`cat /proc/stat | grep '^cpu '`) # Get the total CPU statistics. unset CPU[0] # Discard the "cpu" prefix. IDLE=${CPU[4]} # Get the idle CPU time. # Calculate the total CPU time. TOTAL=0 for VALUE in "${CPU[@]}"; do let "TOTAL=$TOTAL+$VALUE" done # Calculate the CPU usage since we last checked. let "DIFF_IDLE=$IDLE-$PREV_IDLE" let "DIFF_TOTAL=$TOTAL-$PREV_TOTAL" let "DIFF_USAGE=(1000*($DIFF_TOTAL-$DIFF_IDLE)/$DIFF_TOTAL+5)/10" echo -en "\rCPU: $DIFF_USAGE% \b\b" # Remember the total and idle CPU times for the next check. PREV_TOTAL="$TOTAL" PREV_IDLE="$IDLE" # Wait before checking again. sleep 1 done
$ ./cpu_usage.sh CPU: 15%
uptime: system reliability and load average
$ man uptime UPTIME(1) Linux User’s Manual UPTIME(1) NAME uptime - Tell how long the system has been running. SYNOPSIS uptime uptime [-V] DESCRIPTION uptime gives a one line display of the following information. The cur‐ rent time, how long the system has been running, how many users are currently logged on, and the system load averages for the past 1, 5, and 15 minutes.
$ uptime 15:56:45 up 15 min, 3 users, load average: 0.47, 0.89, 0.71
Exemples de script
- script PHP: http://www.4webhelp.net/scripts/php/uptime.php
- http://linux-101.org/script/bash-script-check-uptime
El script anterior envia un mail en el cas de què hi hagi hagut algun problema en el servidor i s'hagi reiniciat.
aquest script utilitza gawk:
If you are like many computer users, you would frequently like to make changes in various text files wherever certain patterns appear, or extract data from parts of certain lines while discarding the rest. To write a program to do this in a language such as C or Pascal is a time-consuming inconvenience that may take many lines of code. The job is easy with awk, especially the GNU implementation: gawk.
$ gawk -F . '{ print $1 }' /proc/uptime 2079388 joan@joan-servidor:~$ gawk -F . '{ print $1 }' /proc/uptime 2079395
En el moment en què es reinicia l'ordinador aquest valor que es mostra passa a ser inferior a l'anterior i s'enviarà un mail a l'administrador.
top i htop: overall system view
- http://www.devdaily.com/linux/unix-linux-top-command-cpu-memory
- http://unstableme.blogspot.com.es/2008/12/redirect-top-command-output-to-file.html
$ man top TOP(1) Linux User’s Manual TOP(1) NAME top - display Linux tasks SYNOPSIS top -hv | -bcHisS -d delay -n iterations -p pid [, pid ...] The traditional switches ’-’ and whitespace are optional. DESCRIPTION The top program provides a dynamic real-time view of a running system. It can display system summary information as well as a list of tasks currently being managed by the Linux kernel. The types of system sum‐ mary information shown and the types, order and size of information displayed for tasks are all user configurable and that configuration can be made persistent across restarts.
$ top top - 16:16:52 up 35 min, 3 users, load average: 0.58, 0.37, 0.38 Tasks: 177 total, 2 running, 175 sleeping, 0 stopped, 0 zombie Cpu(s): 16.9%us, 2.2%sy, 0.0%ni, 80.3%id, 0.6%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2571184k total, 2085352k used, 485832k free, 678816k buffers Swap: 2441840k total, 0k used, 2441840k free, 888564k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8138 root 20 0 259m 106m 24m S 28 4.2 2:32.26 firefox 6359 root 20 0 382m 50m 10m R 5 2.0 0:55.47 Xorg 4974 postgres 20 0 18672 4536 484 S 1 0.2 0:28.95 postgres 7380 joan 20 0 57376 21m 16m S 1 0.9 0:02.84 gnome-panel 7379 joan 20 0 27520 18m 6732 S 1 0.7 0:08.88 compiz.real 7423 joan 20 0 21876 11m 8484 S 1 0.4 0:01.30 gtk-window-deco 10443 joan 20 0 56728 19m 12m S 1 0.8 0:02.87 gedit 18608 joan 20 0 2580 1216 912 R 1 0.0 0:00.10 top 4973 postgres 20 0 56016 5900 708 S 0 0.2 0:08.93 postgres ...
Amb top existeixen moltes opcions. Entre elles, ordenar per consum de RAM (Ctrl-O i Ctrl-N) (en majúscules) o bé ordenar per consum de CPU (Ctrl-O i Ctrl-K).
Però potser és més útil per als administradors de sistema utilitzar top en mode batch (mode no-interactiu):
$ top -b -n 1 > top.out.$(date +%s) joan@ubuntu-bbdd:~$ cat top.out.1334069520 top - 16:52:00 up 1:10, 4 users, load average: 0.94, 0.62, 0.39 Tasks: 180 total, 1 running, 179 sleeping, 0 stopped, 0 zombie Cpu(s): 7.5%us, 2.8%sy, 0.1%ni, 82.5%id, 7.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2571184k total, 2464536k used, 106648k free, 670396k buffers Swap: 2441840k total, 0k used, 2441840k free, 1214524k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8138 root 20 0 298m 136m 29m S 2 5.5 6:39.04 firefox 32219 joan 20 0 2572 1116 824 R 2 0.0 0:00.02 top 1 root 20 0 3084 1888 564 S 0 0.1 0:01.24 init 2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root RT -5 0 0 0 S 0 0.0 0:00.05 migration/0 4 root 15 -5 0 0 0 S 0 0.0 0:00.56 ksoftirqd/0 5 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/0 6 root RT -5 0 0 0 S 0 0.0 0:00.05 migration/1 ...
iftop: network traffic monitor
Té un funcionament similar a top, i és per veure el tràfic de xarxa
$ sudo apt-get install iftop
$ man iftop IFTOP(8) IFTOP(8) NAME iftop - display bandwidth usage on an interface by host SYNOPSIS iftop -h | [-nNpbBP] [-i interface] [-f filter code] [-F net/mask] DESCRIPTION iftop listens to network traffic on a named interface, or on the first interface it can find which looks like an external interface if none is specified, and displays a table of current bandwidth usage by pairs of hosts. iftop must be run with sufficient permissions to monitor all network traffic on the interface; see pcap(3) for more information, but on most systems this means that it must be run as root.
$ iftop -h iftop: display bandwidth usage on an interface by host Synopsis: iftop -h | [-npbBP] [-i interface] [-f filter code] [-N net/mask] -h display this message -n don't do hostname lookups -N don't convert port numbers to services -p run in promiscuous mode (show traffic between other hosts on the same network segment) -b don't display a bar graph of traffic -B Display bandwidth in bytes -i interface listen on named interface -f filter code use filter code to select packets to count (default: none, but only IP packets are counted) -F net/mask show traffic flows in/out of network -P show ports as well as hosts -m limit sets the upper limit for the bandwidth scale -c config file specifies an alternative configuration file
$ sudo iftop -i eth0 12.5Kb 25.0Kb 37.5Kb 50.0Kb 62.5K └───────────────┴───────────────┴───────────────┴───────────────┴────────────── joan-servidor.local => 85.192.112.45 4.59Kb 4.03Kb 4.03Kb <= 416b 648b 648b joan-servidor.local => 66.249.71.228 3.28Kb 1.76Kb 1.76Kb <= 1.62Kb 948b 948b joan-servidor.local => 94.102.48.116 0b 1.05Kb 1.05Kb <= 0b 1.24Kb 1.24Kb joan-servidor.local => 65.1.216.87.static.jazzte 0b 568b 568b <= 0b 890b 890b joan-servidor.local => 224.0.0.251 568b 852b 852b <= 0b 0b 0b ─────────────────────────────────────────────────────────────────────────────── TX: cumm: 4.11KB peak: 8.43Kb rates: 8.43Kb 8.23Kb 8.23Kb RX: 1.83KB 5.31Kb 2.02Kb 3.67Kb 3.67Kb TOTAL: 5.95KB 13.3Kb 10.5Kb 11.9Kb 11.9Kb
iotop
$ sudo apt-get install iotop $ man iotop IOTOP(1) IOTOP(1) NAME iotop - simple top-like I/O monitor SYNOPSIS iotop [OPTIONS] DESCRIPTION iotop watches I/O usage information output by the Linux kernel (requires 2.6.20 or later) and displays a table of current I/O usage by processes or threads on the system. At least the CONFIG_TASK_DELAY_ACCT and CONFIG_TASK_IO_ACCOUNTING options need to be enabled in your Linux kernel build configuration, these options depend on CONFIG_TASKSTATS.
Per veure com es registra l'escriptura de fitxers en el disc dur, podem començar a descarregar la beta de Ubuntu 12.04:
$ wget http://ftp.heanet.ie/pub/ubuntu-cdimage/releases/12.04/beta-2/ubuntu-12.04-beta2-dvd-i386.iso
$ iotop Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 17174 be/4 joan 0.00 B/s 629.88 K/s ?unavailable? wget http://ftp.heanet.ie/pub/ub~2/ubuntu-12.04-beta2-dvd-i386.iso 1 be/4 root 0.00 B/s 0.00 B/s ?unavailable? init 2 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [kthreadd] 3 rt/4 root 0.00 B/s 0.00 B/s ?unavailable? [migration/0] 4 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [ksoftirqd/0] 5 rt/4 root 0.00 B/s 0.00 B/s ?unavailable? [watchdog/0] 6 rt/4 root 0.00 B/s 0.00 B/s ?unavailable? [migration/1] 7 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [ksoftirqd/1] 8 rt/4 root 0.00 B/s 0.00 B/s ?unavailable? [watchdog/1] 9 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [events/0] 10 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [events/1] 11 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [cpuset] 12 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [khelper] 13 be/4 root 0.00 B/s 0.00 B/s ?unavailable? [netns]
netstat: network statistics
$ man netstat NETSTAT(8) Linux Programmer's Manual NETSTAT(8) NAME netstat - Print network connections, routing tables, interface statistics, masquerade connections, and multicast memberships ...
$ netstat Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 joan-servidor.local:www 123.126.68.31:50510 SYN_RECV tcp 0 0 joan-servidor.local:www 109.75.79.188.dyn:54067 TIME_WAIT tcp 0 48 joan-servidor.local:ssh ies-jaumebalmes.xt:2066 ESTABLISHED tcp 0 0 joan-servidor.loc:45749 192.168.1.1:netbios-ssn ESTABLISHED tcp 0 0 joan-servidor.local:www 109.75.79.188.dyn:45553 TIME_WAIT tcp 0 0 joan-servidor.loc:45757 192.168.1.1:netbios-ssn ESTABLISHED tcp 0 0 joan-servidor.local:www 109.75.79.188.dyn:59906 TIME_WAIT tcp 0 0 joan-servidor.local:www 109.75.79.188.dyn:43134 ESTABLISHED tcp 0 0 joan-servidor.loc:45742 192.168.1.1:netbios-ssn ESTABLISHED tcp 0 0 joan-servidor.loc:45745 192.168.1.1:netbios-ssn ESTABLISHED tcp 0 0 joan-servidor.loc:45740 192.168.1.1:netbios-ssn ESTABLISHED tcp 0 2879 joan-servidor.local:www 123.126.68.31:34351 FIN_WAIT1 tcp 0 0 joan-servidor.local:www 109.75.79.188.dyn:49590 ESTABLISHED Active UNIX domain sockets (w/o servers) Proto RefCnt Flags Type State I-Node Path unix 2 [ ] DGRAM 2631 @/org/kernel/udev/udevd unix 2 [ ] DGRAM 6727 @/org/freedesktop/hal/udev_event unix 15 [ ] DGRAM 3988 /dev/log unix 3 [ ] STREAM CONNECTED 750664 unix 3 [ ] STREAM CONNECTED 750663 unix 3 [ ] STREAM CONNECTED 750520 /var/run/dbus/system_bus_socket unix 3 [ ] STREAM CONNECTED 750519 unix 2 [ ] DGRAM 750518 unix 3 [ ] STREAM CONNECTED 12104 @/tmp/.ICE-unix/1805 unix 3 [ ] STREAM CONNECTED 12103 unix 3 [ ] STREAM CONNECTED 12075 @/tmp/.X11-unix/X0 ...
Per mirar els ports oberts:
$ netstat --listen Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 *:www *:* LISTEN tcp 0 0 *:ssh *:* LISTEN tcp 0 0 localhost:ipp *:* LISTEN tcp 0 0 localhost:smtp *:* LISTEN tcp 0 0 localhost:mysql *:* LISTEN
paquet sysstat: col.lecció d'utilitats
El paquet sysstat conté utilitats per monitoritzar la performance del sistema i l'activitat. Sysstat conté diferents utilitats i eines, que es poden executar periòdicament amb cron i d'aquesta manera recollir informació sobre activitat i rendiment per ser analitzada posteriorment.
- iostat(1) reports CPU statistics and input/output statistics for devices, partitions and network filesystems.
- mpstat(1) reports individual or combined processor related statistics.
- pidstat(1) reports statistics for Linux tasks (processes) : I/O, CPU, memory, etc.
- sar(1) collects, reports and saves system activity information (CPU, memory, disks, interrupts, network interfaces, TTY, kernel tables,etc.)
- sadc(8) is the system activity data collector, used as a backend for sar.
- sa1(8) collects and stores binary data in the system activity daily data file. It is a front end to sadc designed to be run from cron.
- sa2(8) writes a summarized daily activity report. It is a front end to sar designed to be run from cron.
- sadf(1) displays data collected by sar in multiple formats (CSV, XML, etc.) This is useful to load performance data into a database, or import them in a spreadsheet to make graphs.
- nfsiostat(1) reports input/output statistics for network filesystems (NFS).
- cifsiostat(1) reports CIFS statistics.
$ sudo apt-get install sysstat
- mpstat - Report processors related statistics.
$ mpstat Linux 2.6.32-38-generic (joan-servidor) 10/04/12 _i686_ (2 CPU) 17:33:05 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 17:33:05 all 0,78 0,01 0,12 0,19 0,05 0,00 0,00 0,00 98,85
però també puc veure el detall dels diferents processadors que hi ha a la màquina:
$ mpstat -P ALL Linux 2.6.32-38-generic (joan-servidor) 10/04/12 _i686_ (2 CPU) 17:35:11 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 17:35:11 all 0,78 0,01 0,12 0,19 0,05 0,00 0,00 0,00 98,85 17:35:11 0 0,63 0,01 0,12 0,38 0,02 0,00 0,00 0,00 98,85 17:35:11 1 0,93 0,01 0,11 0,01 0,08 0,00 0,00 0,00 98,86
tload, xload
- tload load average graph for terminal
- xload load average graph for X
$ tload $ xload
ps
Evidentment, es pot utilitzar ps per extreure informació interessant del sistema
PS(1) NAME ps - report a snapshot of the current processes.
Per veure els 10 processos que consumeixen més CPU, farem ps ordenat per la columna 3 i ens quedem amb les 10 primeres files (també mostrem la capçalera):
$ ps aux | head -1; ps aux | sort -rn -k+3 | head -10 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 8138 11.3 6.5 322324 169008 pts/0 Rl+ 15:50 14:13 /usr/lib/firefox-3.0.15/firefox root 6359 2.9 2.4 147860 62956 tty7 Rs+ 15:48 3:44 /usr/X11R6/bin/X :0 -br -audit 0 -auth /var/lib/gdm/:0.Xauth -nolisten tcp vt7 postgres 4974 1.6 0.1 18672 4536 ? Ss 15:46 2:06 postgres: stats collector process www-data 21937 0.0 0.2 52236 6572 ? S 16:25 0:00 /usr/sbin/apache2 -k start www-data 14654 0.0 0.2 52236 6572 ? S 16:07 0:00 /usr/sbin/apache2 -k start www-data 14653 0.0 0.2 52572 6860 ? S 16:07 0:00 /usr/sbin/apache2 -k start www-data 14652 0.0 0.2 52236 6572 ? S 16:07 0:00 /usr/sbin/apache2 -k start www-data 14651 0.0 0.7 63400 19344 ? S 16:07 0:00 /usr/sbin/apache2 -k start www-data 14650 0.0 0.2 52236 6572 ? S 16:07 0:00 /usr/sbin/apache2 -k start
i si volem ordenar per consum de RAM, veiem que els processos que consumeixen més són el Firefos, Oracle i Tomcat:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 8138 11.4 6.5 322324 169016 pts/0 Sl+ 15:50 14:41 /usr/lib/firefox-3.0.15/firefox root 6359 2.9 2.4 146028 62952 tty7 Ss+ 15:48 3:51 /usr/X11R6/bin/X :0 -br -audit 0 -auth /var/lib/gdm/:0.Xauth -nolisten tcp vt7 oracle 5275 0.0 2.1 330412 54616 ? Ss 15:47 0:01 ora_mmon_BBDD oracle 5270 0.0 2.4 328392 62852 ? Ss 15:47 0:01 ora_smon_BBDD root 6762 0.0 1.2 218228 33092 ? Sl 15:48 0:07 /usr/lib/jvm/java-6-sun/bin/java -Djava.util.logging.config.file=/usr/share/tomcat6/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.endorsed.dirs=/usr/share/tomcat6/endorsed -classpath /usr/share/tomcat6/bin/bootstrap.jar -Dcatalina.base=/usr/share/tomcat6 -Dcatalina.home=/usr/share/tomcat6 -Djava.io.tmpdir=/usr/share/tomcat6/temp org.apache.catalina.startup.Bootstrap start oracle 6049 0.0 1.3 328400 34516 ? Ss 15:48 0:00 ora_cjq0_BBDD oracle 5264 0.0 1.3 327460 33688 ? Ss 15:47 0:01 ora_dbw0_BBDD ...
Veiem que el procés que consumeix més CPU és el navegador web.
Entrega
Recorda la normativa per entregar les pràctiques al Moodle: ASIX-M11-SAD#Normativa_d.27entrega_de_les_pr.C3.A0ctiques_al_Moodle
creat per Joan Quintana Compte, març 2012