Monitorització i estat de càrrega dels servidors

De wikijoan
Salta a la navegació Salta a la cerca

Referències

Desenvolupament

fitxer proc/stat. Monitoritzar l'ús de la CPU

El fitxer /proc/stat dóna diversa informació sobre l'estat del kernel. Els diferents valors que es mostren són acumulats des que es va iniciar la màquina.

El següent script utilitza la informació de /proc/stat per monitoritzar l'ús de la CPU.

$ cat /proc/stat | grep '^cpu '
cpu  3341144 9453 5733615 44755881 143881 4769 58286 0 0
$ cat /proc/stat | grep '^cpu '
cpu  3341252 9453 5733627 44756742 143882 4769 58286 0 0
$ cat /proc/stat | grep '^cpu '
cpu  3341298 9453 5733648 44757640 143887 4769 58286 0 0

script cpu_usage.sh:

#!/bin/bash
# by Paul Colby (http://colby.id.au), no rights reserved ;)
 
PREV_TOTAL=0
PREV_IDLE=0
 
while true; do
  CPU=(`cat /proc/stat | grep '^cpu '`) # Get the total CPU statistics.
  unset CPU[0]                          # Discard the "cpu" prefix.
  IDLE=${CPU[4]}                        # Get the idle CPU time.
 
  # Calculate the total CPU time.
  TOTAL=0
  for VALUE in "${CPU[@]}"; do
    let "TOTAL=$TOTAL+$VALUE"
  done
 
  # Calculate the CPU usage since we last checked.
  let "DIFF_IDLE=$IDLE-$PREV_IDLE"
  let "DIFF_TOTAL=$TOTAL-$PREV_TOTAL"
  let "DIFF_USAGE=(1000*($DIFF_TOTAL-$DIFF_IDLE)/$DIFF_TOTAL+5)/10"
  echo -en "\rCPU: $DIFF_USAGE%  \b\b"
 
  # Remember the total and idle CPU times for the next check.
  PREV_TOTAL="$TOTAL"
  PREV_IDLE="$IDLE"
 
  # Wait before checking again.
  sleep 1
done
$ ./cpu_usage.sh
CPU: 15%

uptime: system reliability and load average

$ man uptime

UPTIME(1)                     Linux User’s Manual                    UPTIME(1)

NAME
       uptime - Tell how long the system has been running.

SYNOPSIS
       uptime
       uptime [-V]

DESCRIPTION
       uptime gives a one line display of the following information.  The cur‐
       rent time, how long the system has been running,  how  many  users  are
       currently  logged  on,  and the system load averages for the past 1, 5,
       and 15 minutes.
$ uptime
 15:56:45 up 15 min,  3 users,  load average: 0.47, 0.89, 0.71

Exemples de script

El script anterior envia un mail en el cas de què hi hagi hagut algun problema en el servidor i s'hagi reiniciat.

aquest script utilitza gawk:

If you are like many computer users, you would frequently like to make changes in various text files wherever certain patterns appear, or extract data from parts of certain lines while discarding the rest. To write a program to do this in a language such as C or Pascal is a time-consuming inconvenience that may take many lines of code. The job is easy with awk, especially the GNU implementation: gawk. 
$ gawk -F . '{ print $1 }' /proc/uptime
2079388
joan@joan-servidor:~$ gawk -F . '{ print $1 }' /proc/uptime
2079395

En el moment en què es reinicia l'ordinador aquest valor que es mostra passa a ser inferior a l'anterior i s'enviarà un mail a l'administrador.

top i htop: overall system view

$ man top

TOP(1)                        Linux User’s Manual                       TOP(1)

NAME
       top - display Linux tasks

SYNOPSIS
       top -hv | -bcHisS -d delay -n iterations -p pid [, pid ...]

       The traditional switches ’-’ and whitespace are optional.

DESCRIPTION
       The  top program provides a dynamic real-time view of a running system.
       It can display system summary information as well as a  list  of  tasks
       currently  being managed by the Linux kernel.  The types of system sum‐
       mary information shown and the types, order  and  size  of  information
       displayed  for  tasks  are all user configurable and that configuration
       can be made persistent across restarts.

$ top

top - 16:16:52 up 35 min,  3 users,  load average: 0.58, 0.37, 0.38
Tasks: 177 total,   2 running, 175 sleeping,   0 stopped,   0 zombie
Cpu(s): 16.9%us,  2.2%sy,  0.0%ni, 80.3%id,  0.6%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2571184k total,  2085352k used,   485832k free,   678816k buffers
Swap:  2441840k total,        0k used,  2441840k free,   888564k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 8138 root      20   0  259m 106m  24m S   28  4.2   2:32.26 firefox            
 6359 root      20   0  382m  50m  10m R    5  2.0   0:55.47 Xorg               
 4974 postgres  20   0 18672 4536  484 S    1  0.2   0:28.95 postgres           
 7380 joan      20   0 57376  21m  16m S    1  0.9   0:02.84 gnome-panel        
 7379 joan      20   0 27520  18m 6732 S    1  0.7   0:08.88 compiz.real        
 7423 joan      20   0 21876  11m 8484 S    1  0.4   0:01.30 gtk-window-deco    
10443 joan      20   0 56728  19m  12m S    1  0.8   0:02.87 gedit              
18608 joan      20   0  2580 1216  912 R    1  0.0   0:00.10 top                
 4973 postgres  20   0 56016 5900  708 S    0  0.2   0:08.93 postgres    
...

Amb top existeixen moltes opcions. Entre elles, ordenar per consum de RAM (Ctrl-O i Ctrl-N) (en majúscules) o bé ordenar per consum de CPU (Ctrl-O i Ctrl-K).

Però potser és més útil per als administradors de sistema utilitzar top en mode batch (mode no-interactiu):

$ top -b -n 1 > top.out.$(date +%s)
joan@ubuntu-bbdd:~$ cat top.out.1334069520 
top - 16:52:00 up  1:10,  4 users,  load average: 0.94, 0.62, 0.39
Tasks: 180 total,   1 running, 179 sleeping,   0 stopped,   0 zombie
Cpu(s):  7.5%us,  2.8%sy,  0.1%ni, 82.5%id,  7.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2571184k total,  2464536k used,   106648k free,   670396k buffers
Swap:  2441840k total,        0k used,  2441840k free,  1214524k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                   
 8138 root      20   0  298m 136m  29m S    2  5.5   6:39.04 firefox                                   
32219 joan      20   0  2572 1116  824 R    2  0.0   0:00.02 top                                       
    1 root      20   0  3084 1888  564 S    0  0.1   0:01.24 init                                      
    2 root      15  -5     0    0    0 S    0  0.0   0:00.00 kthreadd                                  
    3 root      RT  -5     0    0    0 S    0  0.0   0:00.05 migration/0                               
    4 root      15  -5     0    0    0 S    0  0.0   0:00.56 ksoftirqd/0                               
    5 root      RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/0                                
    6 root      RT  -5     0    0    0 S    0  0.0   0:00.05 migration/1 
...

iftop: network traffic monitor

Té un funcionament similar a top, i és per veure el tràfic de xarxa

$ sudo apt-get install iftop
$ man iftop

IFTOP(8)                                                              IFTOP(8)

NAME
       iftop - display bandwidth usage on an interface by host

SYNOPSIS
       iftop -h | [-nNpbBP] [-i interface] [-f filter code] [-F net/mask]

DESCRIPTION
       iftop  listens to network traffic on a named interface, or on the first
       interface it can find which looks like an external interface if none is
       specified,  and displays a table of current bandwidth usage by pairs of
       hosts.  iftop must be run with sufficient permissions  to  monitor  all
       network traffic on the interface; see pcap(3) for more information, but
       on most systems this means that it must be run as root.
$ iftop -h
iftop: display bandwidth usage on an interface by host

Synopsis: iftop -h | [-npbBP] [-i interface] [-f filter code] [-N net/mask]

   -h                  display this message
   -n                  don't do hostname lookups
   -N                  don't convert port numbers to services
   -p                  run in promiscuous mode (show traffic between other
                       hosts on the same network segment)
   -b                  don't display a bar graph of traffic
   -B                  Display bandwidth in bytes
   -i interface        listen on named interface
   -f filter code      use filter code to select packets to count
                       (default: none, but only IP packets are counted)
   -F net/mask         show traffic flows in/out of network
   -P                  show ports as well as hosts
   -m limit            sets the upper limit for the bandwidth scale
   -c config file      specifies an alternative configuration file
$ sudo iftop -i eth0

                12.5Kb          25.0Kb          37.5Kb          50.0Kb    62.5K
└───────────────┴───────────────┴───────────────┴───────────────┴──────────────
joan-servidor.local        => 85.192.112.45              4.59Kb  4.03Kb  4.03Kb
                           <=                             416b    648b    648b
joan-servidor.local        => 66.249.71.228              3.28Kb  1.76Kb  1.76Kb
                           <=                            1.62Kb   948b    948b
joan-servidor.local        => 94.102.48.116                 0b   1.05Kb  1.05Kb
                           <=                               0b   1.24Kb  1.24Kb
joan-servidor.local        => 65.1.216.87.static.jazzte     0b    568b    568b
                           <=                               0b    890b    890b
joan-servidor.local        => 224.0.0.251                 568b    852b    852b
                           <=                               0b      0b      0b


───────────────────────────────────────────────────────────────────────────────
TX:             cumm:  4.11KB   peak:   8.43Kb  rates:   8.43Kb  8.23Kb  8.23Kb
RX:                    1.83KB           5.31Kb           2.02Kb  3.67Kb  3.67Kb
TOTAL:                 5.95KB           13.3Kb           10.5Kb  11.9Kb  11.9Kb

iotop

$ sudo apt-get install iotop

$ man iotop
IOTOP(1)                                                              IOTOP(1)

NAME
       iotop - simple top-like I/O monitor

SYNOPSIS
       iotop [OPTIONS]

DESCRIPTION
       iotop  watches  I/O  usage  information  output  by  the  Linux  kernel
       (requires 2.6.20 or later) and displays a table of current I/O usage by
       processes or threads on the system. At least the CONFIG_TASK_DELAY_ACCT
       and CONFIG_TASK_IO_ACCOUNTING options need to be enabled in your  Linux
       kernel build configuration, these options depend on CONFIG_TASKSTATS.

Per veure com es registra l'escriptura de fitxers en el disc dur, podem començar a descarregar la beta de Ubuntu 12.04:

$ wget http://ftp.heanet.ie/pub/ubuntu-cdimage/releases/12.04/beta-2/ubuntu-12.04-beta2-dvd-i386.iso
$ iotop

Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND        
17174 be/4 joan        0.00 B/s  629.88 K/s  ?unavailable?  wget http://ftp.heanet.ie/pub/ub~2/ubuntu-12.04-beta2-dvd-i386.iso
    1 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  init
    2 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [kthreadd]
    3 rt/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [migration/0]
    4 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [ksoftirqd/0]
    5 rt/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [watchdog/0]
    6 rt/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [migration/1]
    7 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [ksoftirqd/1]
    8 rt/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [watchdog/1]
    9 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [events/0]
   10 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [events/1]
   11 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [cpuset]
   12 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [khelper]
   13 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [netns]

netstat: network statistics

$ man netstat

NETSTAT(8)                                       Linux Programmer's Manual                                      NETSTAT(8)

NAME
       netstat  -  Print  network connections, routing tables, interface statistics, masquerade connections, and multicast
       memberships
...
$ netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 joan-servidor.local:www 123.126.68.31:50510     SYN_RECV   
tcp        0      0 joan-servidor.local:www 109.75.79.188.dyn:54067 TIME_WAIT  
tcp        0     48 joan-servidor.local:ssh ies-jaumebalmes.xt:2066 ESTABLISHED
tcp        0      0 joan-servidor.loc:45749 192.168.1.1:netbios-ssn ESTABLISHED
tcp        0      0 joan-servidor.local:www 109.75.79.188.dyn:45553 TIME_WAIT  
tcp        0      0 joan-servidor.loc:45757 192.168.1.1:netbios-ssn ESTABLISHED
tcp        0      0 joan-servidor.local:www 109.75.79.188.dyn:59906 TIME_WAIT  
tcp        0      0 joan-servidor.local:www 109.75.79.188.dyn:43134 ESTABLISHED
tcp        0      0 joan-servidor.loc:45742 192.168.1.1:netbios-ssn ESTABLISHED
tcp        0      0 joan-servidor.loc:45745 192.168.1.1:netbios-ssn ESTABLISHED
tcp        0      0 joan-servidor.loc:45740 192.168.1.1:netbios-ssn ESTABLISHED
tcp        0   2879 joan-servidor.local:www 123.126.68.31:34351     FIN_WAIT1  
tcp        0      0 joan-servidor.local:www 109.75.79.188.dyn:49590 ESTABLISHED
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags       Type       State         I-Node   Path
unix  2      [ ]         DGRAM                    2631     @/org/kernel/udev/udevd
unix  2      [ ]         DGRAM                    6727     @/org/freedesktop/hal/udev_event
unix  15     [ ]         DGRAM                    3988     /dev/log
unix  3      [ ]         STREAM     CONNECTED     750664   
unix  3      [ ]         STREAM     CONNECTED     750663   
unix  3      [ ]         STREAM     CONNECTED     750520   /var/run/dbus/system_bus_socket
unix  3      [ ]         STREAM     CONNECTED     750519   
unix  2      [ ]         DGRAM                    750518   
unix  3      [ ]         STREAM     CONNECTED     12104    @/tmp/.ICE-unix/1805
unix  3      [ ]         STREAM     CONNECTED     12103    
unix  3      [ ]         STREAM     CONNECTED     12075    @/tmp/.X11-unix/X0
...

Per mirar els ports oberts:

$ netstat --listen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 *:www                   *:*                     LISTEN     
tcp        0      0 *:ssh                   *:*                     LISTEN     
tcp        0      0 localhost:ipp           *:*                     LISTEN     
tcp        0      0 localhost:smtp          *:*                     LISTEN     
tcp        0      0 localhost:mysql         *:*                     LISTEN 

paquet sysstat: col.lecció d'utilitats

El paquet sysstat conté utilitats per monitoritzar la performance del sistema i l'activitat. Sysstat conté diferents utilitats i eines, que es poden executar periòdicament amb cron i d'aquesta manera recollir informació sobre activitat i rendiment per ser analitzada posteriorment.

  • iostat(1) reports CPU statistics and input/output statistics for devices, partitions and network filesystems.
  • mpstat(1) reports individual or combined processor related statistics.
  • pidstat(1) reports statistics for Linux tasks (processes) : I/O, CPU, memory, etc.
  • sar(1) collects, reports and saves system activity information (CPU, memory, disks, interrupts, network interfaces, TTY, kernel tables,etc.)
  • sadc(8) is the system activity data collector, used as a backend for sar.
  • sa1(8) collects and stores binary data in the system activity daily data file. It is a front end to sadc designed to be run from cron.
  • sa2(8) writes a summarized daily activity report. It is a front end to sar designed to be run from cron.
  • sadf(1) displays data collected by sar in multiple formats (CSV, XML, etc.) This is useful to load performance data into a database, or import them in a spreadsheet to make graphs.
  • nfsiostat(1) reports input/output statistics for network filesystems (NFS).
  • cifsiostat(1) reports CIFS statistics.
$ sudo apt-get install sysstat
  • mpstat - Report processors related statistics.
$ mpstat
Linux 2.6.32-38-generic (joan-servidor) 	10/04/12 	_i686_	(2 CPU)

17:33:05     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
17:33:05     all    0,78    0,01    0,12    0,19    0,05    0,00    0,00    0,00   98,85

però també puc veure el detall dels diferents processadors que hi ha a la màquina:

$ mpstat -P ALL
Linux 2.6.32-38-generic (joan-servidor) 	10/04/12 	_i686_	(2 CPU)

17:35:11     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
17:35:11     all    0,78    0,01    0,12    0,19    0,05    0,00    0,00    0,00   98,85
17:35:11       0    0,63    0,01    0,12    0,38    0,02    0,00    0,00    0,00   98,85
17:35:11       1    0,93    0,01    0,11    0,01    0,08    0,00    0,00    0,00   98,86

tload, xload

  • tload load average graph for terminal
  • xload load average graph for X
$ tload
$ xload

ps

Evidentment, es pot utilitzar ps per extreure informació interessant del sistema

PS(1)

NAME
       ps - report a snapshot of the current processes.

Per veure els 10 processos que consumeixen més CPU, farem ps ordenat per la columna 3 i ens quedem amb les 10 primeres files (també mostrem la capçalera):

$ ps aux | head -1; ps aux | sort -rn -k+3 | head -10
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      8138 11.3  6.5 322324 169008 pts/0   Rl+  15:50  14:13 /usr/lib/firefox-3.0.15/firefox
root      6359  2.9  2.4 147860 62956 tty7     Rs+  15:48   3:44 /usr/X11R6/bin/X :0 -br -audit 0 -auth /var/lib/gdm/:0.Xauth -nolisten tcp vt7
postgres  4974  1.6  0.1  18672  4536 ?        Ss   15:46   2:06 postgres: stats collector process                                                                                           
www-data 21937  0.0  0.2  52236  6572 ?        S    16:25   0:00 /usr/sbin/apache2 -k start
www-data 14654  0.0  0.2  52236  6572 ?        S    16:07   0:00 /usr/sbin/apache2 -k start
www-data 14653  0.0  0.2  52572  6860 ?        S    16:07   0:00 /usr/sbin/apache2 -k start
www-data 14652  0.0  0.2  52236  6572 ?        S    16:07   0:00 /usr/sbin/apache2 -k start
www-data 14651  0.0  0.7  63400 19344 ?        S    16:07   0:00 /usr/sbin/apache2 -k start
www-data 14650  0.0  0.2  52236  6572 ?        S    16:07   0:00 /usr/sbin/apache2 -k start

i si volem ordenar per consum de RAM, veiem que els processos que consumeixen més són el Firefos, Oracle i Tomcat:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      8138 11.4  6.5 322324 169016 pts/0   Sl+  15:50  14:41 /usr/lib/firefox-3.0.15/firefox
root      6359  2.9  2.4 146028 62952 tty7     Ss+  15:48   3:51 /usr/X11R6/bin/X :0 -br -audit 0 -auth /var/lib/gdm/:0.Xauth -nolisten tcp vt7
oracle    5275  0.0  2.1 330412 54616 ?        Ss   15:47   0:01 ora_mmon_BBDD
oracle    5270  0.0  2.4 328392 62852 ?        Ss   15:47   0:01 ora_smon_BBDD
root      6762  0.0  1.2 218228 33092 ?        Sl   15:48   0:07 /usr/lib/jvm/java-6-sun/bin/java -Djava.util.logging.config.file=/usr/share/tomcat6/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.endorsed.dirs=/usr/share/tomcat6/endorsed -classpath /usr/share/tomcat6/bin/bootstrap.jar -Dcatalina.base=/usr/share/tomcat6 -Dcatalina.home=/usr/share/tomcat6 -Djava.io.tmpdir=/usr/share/tomcat6/temp org.apache.catalina.startup.Bootstrap start
oracle    6049  0.0  1.3 328400 34516 ?        Ss   15:48   0:00 ora_cjq0_BBDD
oracle    5264  0.0  1.3 327460 33688 ?        Ss   15:47   0:01 ora_dbw0_BBDD
...

Veiem que el procés que consumeix més CPU és el navegador web.

Entrega

Recorda la normativa per entregar les pràctiques al Moodle: ASIX-M11-SAD#Normativa_d.27entrega_de_les_pr.C3.A0ctiques_al_Moodle


creat per Joan Quintana Compte, març 2012