Pràctica ASI-C6-ASGBD: Scripts amb PostgreSQL

De wikijoan
Salta a la navegació Salta a la cerca

Objectius: Automating PostgreSQL Tasks

Anem a automatitzar tasques administratives amb diferents tècniques i scripts. Concretament, utilitzaràs scripts fets amb SQL, bash i Perl, i diferents opcions de la comanda psql. Aquesta pràctica és un exemple de com els administradors de sistema i DBAs han de conèixer i no tenir por a treballar en la consola per tal d'automatitzar tasques i estalviar-se feina.

Desenvolupament Pràctica

omplir la taula PRODUCTS amb valors que vénen en format CSV

Per començar a treballar, primer de tot hem de crear en el postgres la taula products. En l'exemple s'utilitza la base de dades postgres i l'usuari postgres. Fés servir el script creacio_taula_productes.sql.

Ja tenim la taula. Les dades estan en el fitxer products.csv, en format CSV. En aquest cas només hi ha tres files, però n'hi podria haver centenars o milers. Per convertir el format CSV al format INSERT INTO que entén el Postgres, utilitzem un script fet amb llenguatge PERL: create_input_sql.pl. Aquest script l'integrem en el script bash load_new_products.sh, de manera que ara ja tenim tot el procés: com a origen tenim el fitxer products.csv, i com a destí hem de comprovar que s'han importat les dades en la taula PRODUCTS.

$ sudo ./load_new_products.sh

on s'executa la comanda:

./create_input_sql.pl products.csv

Per comprovar el bon funcionament del procés podem editar els fitxer load_new_products.log i runme.postgresql.

Fer els informes de vendes

Primer de tot, dades d'exemple per a la taula sales (vendes):

insert into sales (first_name, last_name, city, state, zip, email_address, product_id, product_name, product_sale_date) values ('John', 'Hancock', 'Medford', 'MA', '02145', 'john_hancock@pretendmail.com', 234, 'The Signature Widget', '2004-10-13');

insert into sales (first_name, last_name, city, state, zip, email_address, product_id, product_name, product_sale_date) values ('Bob', 'Smith', 'Boston', 'MA', '02123', 'bob_smith@pretendmail.com', 234, 'The Bigger Widget', '2004-10-14');

insert into sales (first_name, last_name, city, state, zip, email_address, product_id, product_name, product_sale_date) values ('Jane', 'Doe', 'Somerville', 'MA', '02144', 'jane_doe@pretendmail.com', 234, 'The Nuclear Widget', '2004-10-15');

insert into sales (first_name, last_name, city, state, zip, email_address, product_id, product_name, product_sale_date) values ('John', 'Adams', 'Cambridge', 'MA', '02122', 'john_adams@pretendmail.com', 234, 'The Beer Widget', '2004-10-16');

insert into sales (first_name, last_name, city, state, zip, email_address, product_id, product_name, product_sale_date) values ('Joe', 'Random', 'New York', 'NY', '02122', 'john_smith@pretendmail.com', 234, 'The Stylish Widget', '2004-10-17');

Ara ja podem fer els informes. Els farem amb un script on li passis dos paràmetres, les dates d'inici i fi a partir de les quals fer el report. Per exemple:

./gen_sales_report_param.sh 2004/10/14 2004/10/16
#!/bin/sh

# gen_sales_report.sh
PGUSER=postgres
PGPASSWORD=postgres
PGHOST=localhost
PGPORT=5432

export PGUSER PGPASSWORD PGHOST PGPORT

psql --set start_date=\'$1\' \
     --set end_date=\'$2\' \
     -f generate_monthly_sales_report.postgresql sales_db

#reset PGUSER and PGPASSWORD  
PGUSER=""
PGPASSWORD=""
PGHOST=""
PGPORT=""
export PGUSER PGPASSWORD PGHOST PGPORT  
 
#End

on el fitxer generate_monthly_sales_report.postgresql és:

-- generate_monthly_sales_report.postgresql
-- 
-- Outputs one month's-worth of sales to a CSV file that
-- is easily imported into spreadsheet software.

\o report.csv
\a
\f ,
select product_id, product_name, product_sale_date from sales where 
product_sale_date >= :start_date and product_sale_date <= :end_date;
\o

L'informe que heu d'obtenir en el fitxer report.csv és similar a:

 product_id,product_name,product_sale_date
 234,The Big Widget,2004-10-13
 234,The Signature Widget,2004-10-13
 234,The Bigger Widget,2004-10-14
 234,The Nuclear Widget,2004-10-15
 234,The Beer Widget,2004-10-16
 234,The Stylish Widget,2004-10-17
(6 rows)

Escollim el format CSV perquè és el format natural per importar la informació als fulls de càlcul.

Per acabar, cron

Finalment l'execució d'aquests scripts i la generació dels informes es poden automatitzar de manera que s'executin de forma periòdic. Per exemple, un problema a plantejar seria la generació automàtica d'un informe setmanal de vendes. Hauríem de modificar el script anterior i crear una tasca cron que ens generi l'informe. Aquest, en comptes de dir-se report.csv s'hauria de dir report_ddmmyyy.csv (ddmmyyy es substitueix pel dia, mes i any).

Entrega

Moodle

Recursos

script creacio_taula_productes.sql:

drop sequence products_row_num_seq;
 
create sequence products_row_num_seq cycle;
 
drop table products;
 
create table products (
  row_num int not null default nextval('products_row_num_seq'),
  id int primary key,
  name varchar(35) not null,
  description varchar(1024) null,
  price numeric(8, 2) not null);
 
comment on table products is 'Records of products sold at the bookstore.';
comment on column products.row_num is 'For de-duping purposes;

drop sequence sales_row_num_seq;

create sequence sales_row_num_seq cycle;

drop table sales;

create table sales (
  row_num int not null default nextval('sales_row_num_seq'),
  first_name varchar(35) not null,
  middle_name varchar(35) null,
  last_name varchar(35) not null,
  city varchar(35) not null,
  state char(2) not null,
  zip varchar(5) not null,
  email_address varchar(50) not null,
  product_id int not null,
  product_name varchar(50) not null,
  product_sale_date date not null);

comment on table sales is 'Conains records of books sold from the bookstore.';
comment on column sales.row_num is 'For de-duping purposes.';

-- PLEASE don't regard this as good database design; customers should be
-- foreign keys, as should products.

script perl per crear els inserts de forma automàtica: create_input_sql.pl:

#!/usr/bin/perl

# create_input_sql.pl --> Create PostgreSQL sql statements from
# .csv file for products table, and put them in a file called
# runme.postgresql.

# by Manni Wood

use strict;

my $csv_file_name = $ARGV[0];
unless ($csv_file_name) {
    print STDERR "Please enter the name of a CSV file.\n";
    print STDERR "Aborting.\n";
    exit 1;
}
my $output_file_name = "runme.postgresql";

open (IN, $csv_file_name) || die "Could not read \"$csv_file_name\": $!";
open (OUT, ">$output_file_name") || die "Could not write \"$output_file_name\": $!";
# The first line of this file is the field names, which we do *not*
# want to insert into the database. So read and discard the first line.
$_ = <IN>;

# Instructions to psql to stop on the first error.
print OUT "\\set ON_ERROR_STOP 1\n";
# Do this all as one transaction.
print OUT "begin;\n";

while (<IN>) {
    # Blank lines can creep into .csv files. Skip them.
    next if (m/^\s*$/);
    chomp;
    my ($id, $name, $description, $price) = parse_csv($_);

    # Make all fields SQL-friendly
    $id = nullify_field($id);
    $name = quote_or_nullify_field($name);
    $description = quote_or_nullify_field($description);
    $price = nullify_field($price);

    print OUT "insert into products (id, name, description, price) values (" .
    $id . ", " .
    $name . ", " .
    $description . ", " .
    $price . ");\n";
}
print OUT "commit;\n";
close (OUT);
close (IN);

sub parse_csv {
    # The goal here is to just go ahead and split on commas,
    # and then find chunks that start with a " and assume that we've
    # broken apart a field containing commas; re-join the chunks to
    # the chunk beginning with " until we find a chunk ending with ".
    my @chunks;
    my $chunk;
    my @fields;
    my $field;
    my $line = shift;

    @chunks = split(/,/, $line);

    my $i;
    for ($i = 0; $i <= $#chunks; ++$i) {
        $chunk = $chunks[$i];
        $field = $chunk;
        # If chunk starts wtih a double-quote but does not end with one,
        if (substr($chunk, 0, 1) eq '"' && substr($chunk, -1, 1) ne '"') {
            # join the current chunk with the next chunk, replacing the
            # comma that got eliminated during the split on commas
            ++$i;
            $chunk = $chunks[$i];
            $field .= "," . $chunk;
            # and continue to do so until you find a chunk that ends with "
            # or you have run out of chunks.
            while (substr($chunk, -1, 1) ne '"' && $i <= $#chunks) {
                ++$i;
                $chunk = $chunks[$i];
                $field .= "," . $chunk;
            }
            # Our goal is to get rid of all field delimitors, so
            # get rid of the leading double-quote
            $field = substr($field, 1);
            # and the trailing double-quote.
            chop($field);
        }
        # If the chunk starts and ends with a double-quote,
        if (substr($chunk, 0, 1) eq '"' && substr($chunk, -1, 1) eq '"') {
            # get rid of the leading double-quote
            $field = substr($field, 1);
            # and the trailing double-quote.
            chop($field);
        }
        # In CSV files, double quotes are escaped by doubling them up,
        # so un-double them.
        $field =~ s/""/"/g;
        # Finally, we have a field that is completely usable, so add it to the
        # array of fields we will return at the end of this subroutine.
        push(@fields, $field);
        # Clear the field for re-use in the next iteration of this loop.
        $field = "";
    }
    return @fields;
}

sub nullify_field {
    # For SQL fields that do not need to be wrapped in single quotes.
    # If a field is empty, replace it with the string "null",
    # which can safely be used in sql insert statements.
    my $field = $_[0];
    if ($field) {
        return $field;
    } else {
        return "null";
    }
}

sub quote_or_nullify_field {
    # For SQL fields that need to be wrapped in single quotes.
    # If a field is empty, replace it with the string "null",
    # which can safely be used in sql insert statements.
    # If it is not empty, replace any single quote with double single
    # quotes (that's how you escape them in SQL) and wrap the whole thing
    # in single quotes (for use in a SQL insert statement).
    my $field = $_[0];
    if ($field) {
        # Remove backslashes: they cause havoc in SQL.
        $field =~ s/\\//g;
        # Repeat single quotes: this is how they are escaped in SQL.
        $field =~ s/'/''/g;
        $field = "'" . $field . "'";
    } else {
        $field = "null";
    }
}

load_new_products.sh:

#!/bin/sh

# load_new_products.sh
#
# Wrapper script that transforms products.csv into runme.postgresql
# and feeds runme.postgresql to psql.

# gen_sales_report.sh
PGUSER=postgres
PGPASSWORD=postgres
PGHOST=localhost
PGPORT=5432

export PGUSER PGPASSWORD PGHOST PGPORT

./create_input_sql.pl products.csv  # outputs runme.postgresql

psql -f runme.postgresql postgres 1> load_new_products.log 2>&1

#reset PGUSER and PGPASSWORD  
PGUSER=""
PGPASSWORD=""
PGHOST=""
PGPORT=""
export PGUSER PGPASSWORD PGHOST PGPORT  
 
#End

Script per fer els informes: gen_sales_report_param.sh:

#!/bin/sh

# gen_sales_report.sh
PGUSER=postgres
PGPASSWORD=postgres
PGHOST=localhost
PGPORT=5432

export PGUSER PGPASSWORD PGHOST PGPORT

psql --set start_date=\'$1\' \
     --set end_date=\'$2\' \
     -f report_mensual_param.sql postgres

#reset PGUSER and PGPASSWORD  
PGUSER=""
PGPASSWORD=""
PGHOST=""
PGPORT=""
export PGUSER PGPASSWORD PGHOST PGPORT  
 
#End

Script report_mensual_param.sql:

-- generate_monthly_sales_report.postgresql
-- 
-- Outputs one month's-worth of sales to a CSV file that
-- is easily imported into spreadsheet software.

\o report.csv
\a
\f ,
select product_id, product_name, product_sale_date from sales where 
product_sale_date *gt;= '2004-10-01' and product_sale_date <= '2004-10-31';
\o

Ús de l'opció \o

per enviar el resultat d'una consulta a fitxer:

\o [FILE]      send all query results to file or |pipe
postgres# \o reposrt.txt
postgres# \o select product_id, product_name, product_sale_date from sales where 
product_sale_date >= '2004-10-01' and product_sale_date <= '2004-10-31';
postgres# \o 

Per obtenir un resultat que sigui aprofitable per a Excel o OpenOffice.org:

\o report.csv
\a -> toggle between unaligned and aligned output mode
\f , -> \f [STRING]    show or set field separator for unaligned query output (en aquest cas, una coma)
postgres# select product_id, product_name, product_sale_date from sales where 
product_sale_date >= '2004-10-01' and product_sale_date <= '2004-10-31';
postgres# \o

Ús de l'opció \SET

\set [NAME [VALUE]] -> set internal variable, or list all if no parameters

Durarda

2 hores